On Fri, Jan 24, 2025 at 02:53:06PM -0500, James Boyle wrote: > Hello, > > I was hoping to get a little help with bioctl and the 1C raid mode after a > drive failure. The most recent error message I'm getting when trying to > start the array in a degraded mode is: > # bioctl -c 1C -l /dev/sd0a softraid0 > softraid0: RAID 1C requires two or more chunks > > Previously, the array had two identical Toshiba 16TB drives as sd0 and > sd1. The array used partitions sd0a and sd1a. One of those drives, sd1, > failed before Christmas. I was able to run the degraded array without > issue. After replacing the failed drive, I kicked off a rebuild using > bioctl -R. The array came back to the optimal "Online" state. Just a few > days ago, the second drive of the original pair failed. I was able to > again start the array with only one working drive (sd0 is the failed > drive, sd1 is the new drive, sd2 & sd3 are part of another array): > > # for X in sd{0,1,2,3,4,5,6} ; do bioctl -v ${X} ; done > sd0: <ATA, TOSHIBA MG08ACA1, 0102>, serial 71H0A3SWFVGG > sd1: <ATA, TOSHIBA MG08ACA1, 0103>, serial 44M0A008FVGG > sd2: <ATA, WDC WD2000F9YZ-0, 01.0>, serial WD-WMC160D3WKSS > sd3: <ATA, TOSHIBA HDWE150, FP2A>, serial 38EBK7BTF57D > Volume Status Size Device > softraid0 0 Online 1999861775872 sd4 RAID1C > 0 Online 1999861775872 0:0.0 noencl <sd2a> > 'unknown serial' > 1 Online 1999861775872 0:1.0 noencl <sd3a> > 'unknown serial' > Volume Status Size Device > softraid0 1 Degraded 16000895729664 sd5 RAID1C > 0 Offline 16000895729664 1:0.0 noencl <sd0a> > 'unknown serial' > 1 Online 16000895729664 1:1.0 noencl <sd1a> > 'unknown serial' > > After that I shut the system down, removed the failed drive. When the > system started again, what was previously sd1 had been initialized as sd0. > The other (boot/system) array started fine. I was unable to start the > degraded array. I got the error messages: > > softraid0: trying to bring up sd5 degraded > softraid0: trying to bring up sd5 degraded > softraid0: sd5 is offline, will not be brought online > softraid0: trying to bring up sd5 degraded > softraid0: trying to bring up sd5 degraded > softraid0: sd5 is offline, will not be brought online > softraid0: RAID 1C requires two or more chunks > softraid0: RAID 1C requires two or more chunks > > At one point I put the failed drive back in to see if it could start. I'm > afraid that may have been the wrong thing to do.
Before you removed the above sd0 drive, the state of the working drive (then sd1) was "Online". What is the current state of this working drive? Is it still Online now? It doesn't sound like it is. Maybe it's now also in degrated state, for example due to a transient write error? If it is still in Online state then the above errors look like a bug. You will not be able to use bioctl to see the current state while the volume isn't assembled. But there is the SR_DEBUG kernel option. A kernel compiled with this option enabled should eventually print the state into dmesg on a line which contains "scm_status". The volume state values are defined in sys/dev/biovar.h: #define BIOC_SDONLINE 0x00 #define BIOC_SDONLINE_S "Online" etc. The on-disk meta data structures can be found in sys/dev/softraidvar.h. > Is there a way to troubleshoot and restart the array with just the single > working drive as a degraded array again? You'll need at least one chunk in Online state to perform a rebuild and rescue the array. Otherwise, it seems the only officially supported way out would be to create a fresh volume and restore the data from backup. If your working drive is really still working, it should be possible to extract the data somehow using raw disk reads to obtain an image of the filesystem without the softraid meta data headers, and mounting that image on a vnd(4) device with vnconfig(6) and then copying the files out to a new array. I've never had to try that myself yet, fortunately.