[clug-talk] RAID resync failure
Dan Graham
grahadan at gmail.com
Mon Jan 5 20:00:59 PST 2009
Hi Ian,
"Copy from my backup? But, it's RAID! I don't need a backup!"
LOLs :-)
RAID setup properly with a spare can provide hot fail over (when a
disk crashes within the active array it is deprecated and the spare is
made active) The primary function of RAID is uptime.
RAID is never a substitute for backup unless you have more than one ;-)
I'm happy it worked out good for you Ian.
All the best, Dan
On Mon, Jan 5, 2009 at 7:44 PM, Ian Bruseker <ian.bruseker at gmail.com> wrote:
> Dan,
>
> Copy from my backup? But, it's RAID! I don't need a backup!
>
> ;-)
>
> You appear to be right on the money, which sort of annoys me. :-)
> The whole point of RAID (particularly RAID 5) is it's supposed to
> bring a level of reliability the system. If a disk fails, data isn't
> lost. Bringing the whole system to a hard locked crashing death is
> hardly "reliable", unless you count the fact that it did reliably lock
> up solid at exactly the same point of the resync each time. I suppose
> technically I didn't lose any data, because I was able to copy files
> off the array during that time from boot up until it hit the magical
> 40.1% (it restarted the sync from zero each time that happened), and
> could do it over and over after each restart as often as it took me to
> make sure everything was safely backed up (not that I cared - I never
> fully trusted the thing anyway, so I'd used it as my junk space,
> things I've downloaded, saved somewhere else, backed up to CD/DVD sort
> of space, so nothing of value would have been lost anyway even if I
> had no backup of it). There were only two files I couldn't copy off,
> which I guess must have been sitting on the bad place on the bad disk,
> because trying to copy them caused the system to lock up instantly
> during the copy.
>
> But back to you being right. First, I bought a new drive, completely
> removed all the partitions from all the drives and started from
> scratch. Of course I didn't get the bad drive on the first try, so I
> put the whole array together with one existing drive replaced by the
> new one, and watched it die promptly at 40.1% of the resync. But I
> got it on the second try and watched it happily rebuild all the way to
> 100%. So, clearly it's a drive. To put your idea of spare drives to
> the test, I rebuilt the array again, with 3 active and one spare
> drive, thus including the bad drive in the setup. And wouldn't you
> know, it rebuilt the array, and flagged the bad drive as faulty in the
> process rather than just falling over dead. How nice. Actually it
> flagged two as spare, one (the bad one) as "faulty spare", and left
> only one disk active in the RAID 5 array, which makes no sense at all,
> but at least it proves out that it could find the faulty drive given
> the chance. It even logged a ton of error messages to
> /var/log/messages rather than just locking up with no feedback.
>
> So, there's the lesson for the day, I guess. When running a RAID 5
> with software RAID, put a spare drive in the setup to catch such a
> event as a failed disk. I wouldn't have thought it was necessary, but
> in this case it seems it is.
>
> Thanks for the guidance, Dan. You are a guru. :-)
>
> Ian
>
> 2009/1/3 Dan Graham <grahadan at gmail.com>
>>
>> Hi Ian,
>>
>> I have seen this happen when you create an mdadm RAID5 array without a
>> hot spare drive (4th disk). When a drive in the array fails with only
>> 3 disks it cannot rebuild itself without the hot spare. You may be
>> able to add an additional disk to the array and then try rebuilding it
>> but it will take far less time to create an entirely new array and
>> copy your backup data to it.
>>
>> All the best, Dan
>>
>
> _______________________________________________
> clug-talk mailing list
> clug-talk at clug.ca
> http://clug.ca/mailman/listinfo/clug-talk_clug.ca
> Mailing List Guidelines (http://clug.ca/ml_guidelines.php)
> **Please remove these lines when replying
>
--
One thing you can be sure of. If you throw a loaded gun in monkey
cage, something bad is going to happen.
More information about the clug-talk
mailing list