Re: [OpenIndiana-discuss] Huge ZFS root pool slowdown - diagnose root cause?

jason matthews Tue, 11 Dec 2018 09:28:25 -0800


This is your offending device:


$ pfexec smartctl -a -d sat,12 /dev/rdsk/c2t0d0s0 | grep Raw_Read
  1 Raw_Read_Error_Rate     0x000b   094   094   016    Pre-fail  Always       
-       1376259

Try removing this disk.

The boot manager is in your bios. It currently points to one of yourrpool disks. Go into the boot manager and pick the other disk and seehow it boots then. You can either set this up as a one time boot orchange the setting so it is persistant.


Life should be better with the sick disk removed.


j.

On 12/11/18 8:16 AM, Lou Picciano wrote:


I have now, finally) managed to get perhaps the key bit of reporting from 
smartctl - does this seem adequately diagnostic?:
(I am fully satisfied to replace the drive; I just want to be sure I’ve run to 
ground any potential root causes.)

$ pfexec smartctl -a -d sat,12 /dev/rdsk/c2t0d0s0 | grep Raw_Read
   1 Raw_Read_Error_Rate     0x000b   094   094   016    Pre-fail  Always       
-       1376259
$ pfexec smartctl -a -d sat,12 /dev/rdsk/c2t1d0s0 | grep Raw_Read
   1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       
-       0

Above seems consistent with all the read errors I see at boot.

What happens if you go into the boot manager and manually select a boot disk? 
If the problem is with a single drive, then the other drive should boot 
normally right? Try booting from both drives select each one manually.
That’s also interesting. With the hundreds of read errors at boot up, the boot 
manager is never even (visibly) presented. I guess I could try this again from 
a boot from USB image...

you can speed up the scrub with:

echo zfs_scrub_delay/W0x0 |mdb -kw

echo zfs_scan_min_time_ms/W0x0

Good commands for reference. I was unaware of these! But, even with scrub 
canceled for the moment, am still seeing virtually continuous drive controller 
traffic.

You also wanted to see:
$ iostat -nMxC 5
                     extended device statistics
     r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
     0.0  962.3    0.0   11.3 15.7  0.2   16.3    0.2   5  23 c2
     0.0  398.4    0.0    4.3  7.1  0.1   17.9    0.2  83   6 c2t0d0
     0.0  415.2    0.0    4.2  8.6  0.1   20.6    0.2  87   9 c2t1d0
     0.0   40.2    0.0    0.7  0.0  0.0    0.0    0.4   0   2 c2t2d0
     0.0   40.4    0.0    0.7  0.0  0.0    0.0    1.1   0   4 c2t3d0
     0.0   34.4    0.0    0.7  0.0  0.0    0.0    0.3   0   1 c2t4d0
     0.0   33.6    0.0    0.7  0.0  0.0    0.0    0.3   0   1 c2t5d0

Again, I assume the symmetry in findings between t0 and t1 is due to their 
mirrored status… But doesn’t seem to help in differentiating offending device. 
(For comparison, t2-t5 are the data pool.) There is essential zero ‘user’ 
activity on either data or root pools...
_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
https://openindiana.org/mailman/listinfo/openindiana-discuss


_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
https://openindiana.org/mailman/listinfo/openindiana-discuss

Re: [OpenIndiana-discuss] Huge ZFS root pool slowdown - diagnose root cause?

Reply via email to