Hi again,

I attempted to get the debug messages to print by doing:
cd /usr/src/sys/arch/amd64/conf, copying GENERIC.MP to 
GENERIC.MP.BIODEBUG, and making this change:
--- GENERIC.MP  Wed Feb  5 23:54:35 2025
+++ GENERIC.MP.BIODEBUG Wed Feb  5 15:38:10 2025
@@ -6,4 +6,6 @@
 #option        MP_LOCKDEBUG
 #option        WITNESS
 
+option  SR_DEBUG
+
 cpu*           at mainbus?

Then, I recompiled the kernel and rebooted.  I didn't see any debug 
messages related to softraid, even though my system partitions are using 
a RAID 1C device.

# bioctl -vi softraid0
Volume      Status               Size Device  
softraid0 0 Online      1999861775872 sd5     RAID1C 
          0 Online      1999861775872 0:0.0   noencl <sd2a>
                                                     'unknown serial'
          1 Online      1999861775872 0:1.0   noencl <sd3a>
                                                     'unknown serial'

so, I checked out the source in /usr/src/sys/dev.  It seems that several 
of the RAID disciplines have ifdef statements to handle SR_DEBUG, but not 
the RAID 1C discipline:

# grep SR_DEBUG softraid*c |sort |uniq 
softraid.c:#endif /* SR_DEBUG */
softraid.c:#ifdef SR_DEBUG
softraid_crypto.c:#endif        /* SR_DEBUG */
softraid_crypto.c:#ifdef SR_DEBUG0
softraid_raid1.c:#ifdef SR_DEBUG
softraid_raid5.c:#ifdef SR_DEBUG
# grep DEBUG softraid_raid1c.c
# 

So, I'm thinking that I set the option correctly, but perhaps the 
debugging isn't available for RAID 1C?

Also, I got hold of another drive that can copy the entire to original 
16GB partition to and run tests against.  Is there a procedure where I 
could copy out the correct byte range to my new drive with dd and try to 
mount it using the simple CRYPTO discipline (bioctl -c C instead of bioctl 
-c 1C)?

Thank you,
--James

On Sat, 25 Jan 2025, Stefan Sperling wrote:

> Date: Sat, 25 Jan 2025 23:12:01 +0100
> From: Stefan Sperling <s...@stsp.name>
> To: James Boyle <jbo...@canonic.net>
> Cc: misc@openbsd.org
> Subject: Re: softraid, bioctl -c 1C failed array question
> 
> On Fri, Jan 24, 2025 at 02:53:06PM -0500, James Boyle wrote:
> > Hello,
> > 
> > I was hoping to get a little help with bioctl and the 1C raid mode after a 
> > drive failure.  The most recent error message I'm getting when trying to 
> > start the array in a degraded mode is:
> > # bioctl -c 1C -l /dev/sd0a softraid0
> > softraid0: RAID 1C requires two or more chunks
> > 
> > Previously, the array had two identical Toshiba 16TB drives as sd0 and 
> > sd1.  The array used partitions sd0a and sd1a.  One of those drives, sd1, 
> > failed before Christmas.  I was able to run the degraded array without 
> > issue.  After replacing the failed drive, I kicked off a rebuild using 
> > bioctl -R.  The array came back to the optimal "Online" state.  Just a few 
> > days ago, the second drive of the original pair failed.  I was able to 
> > again start the array with only one working drive (sd0 is the failed 
> > drive, sd1 is the new drive, sd2 & sd3 are part of another array):
> > 
> > # for X in sd{0,1,2,3,4,5,6} ; do bioctl -v ${X} ; done
> > sd0: <ATA, TOSHIBA MG08ACA1, 0102>, serial 71H0A3SWFVGG
> > sd1: <ATA, TOSHIBA MG08ACA1, 0103>, serial 44M0A008FVGG
> > sd2: <ATA, WDC WD2000F9YZ-0, 01.0>, serial WD-WMC160D3WKSS
> > sd3: <ATA, TOSHIBA HDWE150, FP2A>, serial 38EBK7BTF57D
> > Volume      Status               Size Device  
> > softraid0 0 Online      1999861775872 sd4     RAID1C 
> >           0 Online      1999861775872 0:0.0   noencl <sd2a>
> >                                                      'unknown serial'
> >           1 Online      1999861775872 0:1.0   noencl <sd3a>
> >                                                      'unknown serial'
> > Volume      Status               Size Device  
> > softraid0 1 Degraded   16000895729664 sd5     RAID1C 
> >           0 Offline    16000895729664 1:0.0   noencl <sd0a>
> >                                                      'unknown serial'
> >           1 Online     16000895729664 1:1.0   noencl <sd1a>
> >                                                      'unknown serial'
> > 
> > After that I shut the system down, removed the failed drive.  When the 
> > system started again, what was previously sd1 had been initialized as sd0.  
> > The other (boot/system) array started fine.  I was unable to start the 
> > degraded array.  I got the error messages:
> > 
> > softraid0: trying to bring up sd5 degraded
> > softraid0: trying to bring up sd5 degraded
> > softraid0: sd5 is offline, will not be brought online
> > softraid0: trying to bring up sd5 degraded
> > softraid0: trying to bring up sd5 degraded
> > softraid0: sd5 is offline, will not be brought online
> > softraid0: RAID 1C requires two or more chunks
> > softraid0: RAID 1C requires two or more chunks
> > 
> > At one point I put the failed drive back in to see if it could start.  I'm 
> > afraid that may have been the wrong thing to do.
> 
> Before you removed the above sd0 drive, the state of the working drive
> (then sd1) was "Online".
> 
> What is the current state of this working drive? Is it still Online now?
> It doesn't sound like it is. Maybe it's now also in degrated state, for
> example due to a transient write error?
> If it is still in Online state then the above errors look like a bug.
> 
> You will not be able to use bioctl to see the current state while the
> volume isn't assembled. But there is the SR_DEBUG kernel option. A kernel
> compiled with this option enabled should eventually print the state into
> dmesg on a line which contains "scm_status".
> 
> The volume state values are defined in sys/dev/biovar.h: 
> 
> #define BIOC_SDONLINE         0x00
> #define BIOC_SDONLINE_S               "Online"
> etc.
> 
> The on-disk meta data structures can be found in sys/dev/softraidvar.h.
> 
> > Is there a way to troubleshoot and restart the array with just the single 
> > working drive as a degraded array again?
> 
> You'll need at least one chunk in Online state to perform a rebuild and
> rescue the array. Otherwise, it seems the only officially supported way
> out would be to create a fresh volume and restore the data from backup.
> 
> If your working drive is really still working, it should be possible
> to extract the data somehow using raw disk reads to obtain an image of
> the filesystem without the softraid meta data headers, and mounting that
> image on a vnd(4) device with vnconfig(6) and then copying the files out
> to a new array. I've never had to try that myself yet, fortunately.
> 

Reply via email to