On 10/18/22 09:35, se...@0x0000.su wrote:
I have raid1 volume (one of two on PC) with 2 disks.
# disklabel sd5
# /dev/rsd5c:
type: SCSI
disk: SCSI disk
label: SR RAID 1
duid: 7a03a84165b3d165
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 243201
total sectors: 3907028640
boundstart: 0
boundend: 3907028640
drivedata: 0
16 partitions:
# size offset fstype [fsize bsize cpg]
a: 3907028608 0 4.2BSD 8192 65536 52270 # /home/vmail
c: 3907028640 0 unused
Recently I got an error in dmesg
mail# dmesg | grep retry
sd5: retrying read on block 767483392
(This happened during copying process)
and system marked volume as degraded
mail# bioctl sd5
Volume Status Size Device
softraid0 1 Degraded 2000398663680 sd5 RAID1
0 Online 2000398663680 1:0.0 noencl <sd2a>
1 Offline 2000398663680 1:1.0 noencl <sd3a>
I tried to reread this sector (and a couple around) with dd to make sure
the sector is unreadable:
mail# dd if=/dev/rsd3c of=/dev/null bs=512 count=16 skip=767483384
16+0 records in
16+0 records out
8192 bytes transferred in 0.025 secs (316536 bytes/sec)
mail# dd if=/dev/rsd5c of=/dev/null bs=512 count=16 skip=767483384
16+0 records in
16+0 records out
8192 bytes transferred in 0.050 secs (161303 bytes/sec)
but error did not appeared.
Are there any methods to check if sector is bad (preferably on the fly)?
If this is not a disk error (im going to replace cables just in case)
should i just get disk back online with
bioctl -R /dev/sd3a sd5
?
You made some assumptions about the math that the disk uses vs. the math
dd uses, and I'm not sure I agree with them. I'd suggest doing a dd read
of the entire disk (rsd3c), rather than trying to read just the one
sector. Remember, there's an offset between the sectors of sd5 (the
softraid drive) and sd2 & sd3 where sd5 lives. So I'd kinda expect your
sd3 check to pass because you missed the bad spot, and I'd expect your
sd5 check to pass because the bad drive is locked out of the array and
no longer a problem.
IF you are a cheap ******* or the machine is in another country, you might
want to try dd'ing zeros and 0xff's over the entire disk before putting it
back in the array. That sometimes triggers a discovery of a bad spot and
locks it out and replaces it with a spare. I've had some success with
this process, actually, though it's a bad idea. :)
Nick.