Re: Rebuilding a degraded RAID5 softraid array

LÉVAI Dániel Tue, 20 Jun 2017 01:23:03 -0700

Joel Sing @ 2017-06-19T18:14:30 +0200:
> On Friday 16 June 2017 10:11:20 LÉVAI Dániel wrote:
> > Karel Gardas @ 2017-06-15T09:07:39 +0200:
> > > On Thu, Jun 15, 2017 at 7:04 AM, LEVAI Daniel <l...@ecentrum.hu> wrote:
> > [...]
> > 
> > > > Strangest thing is, if I boot with the 'bad' (=failing) drive as
> > > > part of the array, softraid brings the volume online (albeit
> > > > degraded) and I can even decrypt/mount the volume and use it (only
> > > > one drive being bad in the array of RAID5).  If I remove/replace
> > > > said failing drive, I'm not getting a degraded volume, just the
> > > > error about the missing chunk and that it refuses to bring it
> > > > online.
> > 
> > [...]
> > [...]
> > 
> > > So I see you do have two possibilities probably:
> > > 
> > > 1) IMHO more safe. If you do have enough SATA ports, then attach both
> > > your failing drive and your new drive to the system. Boot. OpenBSD
> > > should detect and attach RAID5 in degraded state and then you will be
> > > able to perform your rebuild (if your failing drive is not offline,
> > > you can use bioctl to offline it)
> > > or
> > 
> > Thanks Karel, this indeed did the trick. I'm still baffled however, that
> > the whole purpose of the RAID setup was diminished by a missing disk 8-\
> 
> This is certainly not expected behaviour - I've only skimmed/picked over 
> parts 
> of this thread, however softraid will attempt to bring a degraded array 
> online 
> (which it seemed to be doing):
> 
> softraid0: not all chunks were provided; attempting to bring volume 1 online
> softraid0: trying to bring up sd7 degraded
> softraid0: sd7 is offline, will not be brought online
> 
> For some reason it was unable to bring it up in a degraded state (for 
> example, 
> multiple missing disks in a RAID 5 array, different metadata versions, etc) - 
> obviously the logging does not explain why this is the case and we may not be 
> able to reproduce the situation now.
> For future travellers, using dd to capture the start of each partition (which 
> contains the softraid metadata), would allow this to be analysed further.


Hm, would it help to extract the info now from the three old and the new
disk -- just to see if there's any anomaly? How many bytes should one
extract in this case?

Wouldn't differing metadata also hinder the assembly of the array
with four disks in this case (the fourth being the failed one; but also
see my answer below for your question about this)?

> > You in fact gave the advice at a so lucky time, that I was about to
> > return the disk for a warranty replacement -- had I done that, I could
> > not have been able to repair the array. So thanks again, and I guess
> > you'll have a beer on me when you're around Budapest ;)
> 
> Just to clarify, you're saying that when you plugged all of the original 
> disks 
> back in the array came up again correctly? And if this is correct, was this 
> at 
> boot time?

Yes, when I plugged back the 'broken' disk, the array came up in
degraded state during boot.

The order of events were the following:
First, one of the disks went offline, then the array became degraded.
Then after numerous reboots it always came back degraded with the
failing disk being Offline, but after the very first reboot (after the
fail) softraid couldn't read eg. the size of the failed disk anymore,
when I ran `bioctl softraid0` it showed something like this:
(sorry, this is not the actual output, I'm just trying to remember this)

softraid0 1 Degraded    9001777889280 sd8     RAID5
          0 Online      3000592678912 1:0.0   noencl <sd2a>
          1 Online      3000592678912 1:1.0   noencl <sd3a>
          2 Online      3000592678912 1:2.0   noencl <sd4a>
          3 Offline                 0 1:3.0   noencl <sd5a>

Softraid could however still read eg. the serial number of the failed
disk.

> > (Just a side note: to attach the new disk, I had to remove one of the
> > system disks that are in a RAID1 setup, also with softraid. Softraid
> > however had no problem bringing up *that* RAID1 volume in a degraded
> > state with the missing disk...)
> 
> Right - that is how it should behave.

This happened to me once before with that same RAID1, and the
replacement and the rebuilding was error free -- just like now, only
this time it was an 'artificial' failure.

> > > 2) less safe (read completely untested and unverified by reading the
> > > code on my side). Use bioctl -c 5 -l <your drives including a new one>
> > > <etc> to attach the RAID5 array including the new drive. Please do
> > > *NOT* force this. See if bioctl complains for example about missing
> > > metadata or if it automatically detects new drive and start rebuild.
> > > 
> > > Generally speaking I'd use (1) since I used this in the past and had
> > > no issue with it.
> > 
> > Now this was more interesting. I tried eg. (re)creating the RAID5 array
> > with only the remaining three (out of four) disks, with:
> > # bioctl -c 5 -l /dev/sd2a,/dev/sd3a,/dev/sd4a softraid0
> > 
> > Now the result was a firmly reproducable kernel panic and a ddb console.
> > I tried with 6.1 and 6.0 (and 5.8 :) ), just for kicks, but it seems
> > this is a not supported feature(tm) :).
> 
> I can reproduce this and will investigate - for future reference, reporting 
> this as a bug and providing the trace would be helpful :)
>  
> > When I specified the remaining three disks plus the new/clean one,
> > softraid complained that 'not all chunks are of the native metadata',
> > whatever this means.
> 
> Right - basically it is saying that the fourth chunk does not contain 
> softraid 
> metadata and is not part of the volume.

Got it! So the rebuild (bioctl -R) makes the new disk the new
member/chunk of this array, and there's basically no point in trying to
'recreate' the array with -c.


Daniel

-- 
LÉVAI Dániel
PGP key ID = 0x83B63A8F
Key fingerprint = DBEC C66B A47A DFA2 792D  650C C69B BE4C 83B6 3A8F

Re: Rebuilding a degraded RAID5 softraid array

Reply via email to