Re: Rebuilding a degraded RAID5 softraid array

Joel Sing Mon, 19 Jun 2017 09:15:38 -0700

On Friday 16 June 2017 10:11:20 LÉVAI Dániel wrote:
> Karel Gardas @ 2017-06-15T09:07:39 +0200:
> > On Thu, Jun 15, 2017 at 7:04 AM, LEVAI Daniel <l...@ecentrum.hu> wrote:
> [...]
> 
> > > Strangest thing is, if I boot with the 'bad' (=failing) drive as
> > > part of the array, softraid brings the volume online (albeit
> > > degraded) and I can even decrypt/mount the volume and use it (only
> > > one drive being bad in the array of RAID5).  If I remove/replace
> > > said failing drive, I'm not getting a degraded volume, just the
> > > error about the missing chunk and that it refuses to bring it
> > > online.
> 
> [...]
> [...]
> 
> > So I see you do have two possibilities probably:
> > 
> > 1) IMHO more safe. If you do have enough SATA ports, then attach both
> > your failing drive and your new drive to the system. Boot. OpenBSD
> > should detect and attach RAID5 in degraded state and then you will be
> > able to perform your rebuild (if your failing drive is not offline,
> > you can use bioctl to offline it)
> > or
> 
> Thanks Karel, this indeed did the trick. I'm still baffled however, that
> the whole purpose of the RAID setup was diminished by a missing disk 8-\


This is certainly not expected behaviour - I've only skimmed/picked over parts 
of this thread, however softraid will attempt to bring a degraded array online 
(which it seemed to be doing):

softraid0: not all chunks were provided; attempting to bring volume 1 online
softraid0: trying to bring up sd7 degraded
softraid0: sd7 is offline, will not be brought online

For some reason it was unable to bring it up in a degraded state (for example, 
multiple missing disks in a RAID 5 array, different metadata versions, etc) - 
obviously the logging does not explain why this is the case and we may not be 
able to reproduce the situation now.

For future travellers, using dd to capture the start of each partition (which 
contains the softraid metadata), would allow this to be analysed further.

> You in fact gave the advice at a so lucky time, that I was about to
> return the disk for a warranty replacement -- had I done that, I could
> not have been able to repair the array. So thanks again, and I guess
> you'll have a beer on me when you're around Budapest ;)

Just to clarify, you're saying that when you plugged all of the original disks 
back in the array came up again correctly? And if this is correct, was this at 
boot time?

> (Just a side note: to attach the new disk, I had to remove one of the
> system disks that are in a RAID1 setup, also with softraid. Softraid
> however had no problem bringing up *that* RAID1 volume in a degraded
> state with the missing disk...)

Right - that is how it should behave.

> > 2) less safe (read completely untested and unverified by reading the
> > code on my side). Use bioctl -c 5 -l <your drives including a new one>
> > <etc> to attach the RAID5 array including the new drive. Please do
> > *NOT* force this. See if bioctl complains for example about missing
> > metadata or if it automatically detects new drive and start rebuild.
> > 
> > Generally speaking I'd use (1) since I used this in the past and had
> > no issue with it.
> 
> Now this was more interesting. I tried eg. (re)creating the RAID5 array
> with only the remaining three (out of four) disks, with:
> # bioctl -c 5 -l /dev/sd2a,/dev/sd3a,/dev/sd4a softraid0
> 
> Now the result was a firmly reproducable kernel panic and a ddb console.
> I tried with 6.1 and 6.0 (and 5.8 :) ), just for kicks, but it seems
> this is a not supported feature(tm) :).

I can reproduce this and will investigate - for future reference, reporting 
this as a bug and providing the trace would be helpful :)
 
> When I specified the remaining three disks plus the new/clean one,
> softraid complained that 'not all chunks are of the native metadata',
> whatever this means.

Right - basically it is saying that the fourth chunk does not contain softraid 
metadata and is not part of the volume.

> But for some reason I liked this idea better, 'cause I wouldn't have
> keep the failing disk connected.

As per earlier, you should be able to bring it up in a degraded state, then 
rebuild onto an additional chunk.

> Anyway, all sync'd now, and the rebuild speed was quite good -- around
> 100MB/s --, so it basically finished overnight.

Good, glad to hear.

Re: Rebuilding a degraded RAID5 softraid array

Reply via email to