Re: softraid, bioctl -c 1C failed array question

Stefan Sperling Sat, 25 Jan 2025 14:13:03 -0800

On Fri, Jan 24, 2025 at 02:53:06PM -0500, James Boyle wrote:
> Hello,
> 
> I was hoping to get a little help with bioctl and the 1C raid mode after a 
> drive failure.  The most recent error message I'm getting when trying to 
> start the array in a degraded mode is:
> # bioctl -c 1C -l /dev/sd0a softraid0
> softraid0: RAID 1C requires two or more chunks
> 
> Previously, the array had two identical Toshiba 16TB drives as sd0 and 
> sd1.  The array used partitions sd0a and sd1a.  One of those drives, sd1, 
> failed before Christmas.  I was able to run the degraded array without 
> issue.  After replacing the failed drive, I kicked off a rebuild using 
> bioctl -R.  The array came back to the optimal "Online" state.  Just a few 
> days ago, the second drive of the original pair failed.  I was able to 
> again start the array with only one working drive (sd0 is the failed 
> drive, sd1 is the new drive, sd2 & sd3 are part of another array):
> 
> # for X in sd{0,1,2,3,4,5,6} ; do bioctl -v ${X} ; done
> sd0: <ATA, TOSHIBA MG08ACA1, 0102>, serial 71H0A3SWFVGG
> sd1: <ATA, TOSHIBA MG08ACA1, 0103>, serial 44M0A008FVGG
> sd2: <ATA, WDC WD2000F9YZ-0, 01.0>, serial WD-WMC160D3WKSS
> sd3: <ATA, TOSHIBA HDWE150, FP2A>, serial 38EBK7BTF57D
> Volume      Status               Size Device  
> softraid0 0 Online      1999861775872 sd4     RAID1C 
>           0 Online      1999861775872 0:0.0   noencl <sd2a>
>                                                      'unknown serial'
>           1 Online      1999861775872 0:1.0   noencl <sd3a>
>                                                      'unknown serial'
> Volume      Status               Size Device  
> softraid0 1 Degraded   16000895729664 sd5     RAID1C 
>           0 Offline    16000895729664 1:0.0   noencl <sd0a>
>                                                      'unknown serial'
>           1 Online     16000895729664 1:1.0   noencl <sd1a>
>                                                      'unknown serial'
> 
> After that I shut the system down, removed the failed drive.  When the 
> system started again, what was previously sd1 had been initialized as sd0.  
> The other (boot/system) array started fine.  I was unable to start the 
> degraded array.  I got the error messages:
> 
> softraid0: trying to bring up sd5 degraded
> softraid0: trying to bring up sd5 degraded
> softraid0: sd5 is offline, will not be brought online
> softraid0: trying to bring up sd5 degraded
> softraid0: trying to bring up sd5 degraded
> softraid0: sd5 is offline, will not be brought online
> softraid0: RAID 1C requires two or more chunks
> softraid0: RAID 1C requires two or more chunks
> 
> At one point I put the failed drive back in to see if it could start.  I'm 
> afraid that may have been the wrong thing to do.


Before you removed the above sd0 drive, the state of the working drive
(then sd1) was "Online".

What is the current state of this working drive? Is it still Online now?
It doesn't sound like it is. Maybe it's now also in degrated state, for
example due to a transient write error?
If it is still in Online state then the above errors look like a bug.

You will not be able to use bioctl to see the current state while the
volume isn't assembled. But there is the SR_DEBUG kernel option. A kernel
compiled with this option enabled should eventually print the state into
dmesg on a line which contains "scm_status".

The volume state values are defined in sys/dev/biovar.h: 

#define BIOC_SDONLINE           0x00
#define BIOC_SDONLINE_S         "Online"
etc.

The on-disk meta data structures can be found in sys/dev/softraidvar.h.

> Is there a way to troubleshoot and restart the array with just the single 
> working drive as a degraded array again?

You'll need at least one chunk in Online state to perform a rebuild and
rescue the array. Otherwise, it seems the only officially supported way
out would be to create a fresh volume and restore the data from backup.

If your working drive is really still working, it should be possible
to extract the data somehow using raw disk reads to obtain an image of
the filesystem without the softraid meta data headers, and mounting that
image on a vnd(4) device with vnconfig(6) and then copying the files out
to a new array. I've never had to try that myself yet, fortunately.

Re: softraid, bioctl -c 1C failed array question

Reply via email to