ok, see below...

On Oct 23, 2009, at 8:14 PM, Adam Cheal wrote:

Here is example of the pool config we use:

# zpool status
 pool: pool002
state: ONLINE
scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52 2009
config:

       NAME         STATE     READ WRITE CKSUM
       pool002      ONLINE       0     0     0
         raidz2     ONLINE       0     0     0
           c9t18d0  ONLINE       0     0     0
           c9t17d0  ONLINE       0     0     0
           c9t55d0  ONLINE       0     0     0
           c9t13d0  ONLINE       0     0     0
           c9t15d0  ONLINE       0     0     0
           c9t16d0  ONLINE       0     0     0
           c9t11d0  ONLINE       0     0     0
           c9t12d0  ONLINE       0     0     0
           c9t14d0  ONLINE       0     0     0
           c9t9d0   ONLINE       0     0     0
           c9t8d0   ONLINE       0     0     0
           c9t10d0  ONLINE       0     0     0
           c9t29d0  ONLINE       0     0     0
           c9t28d0  ONLINE       0     0     0
           c9t27d0  ONLINE       0     0     0
           c9t23d0  ONLINE       0     0     0
           c9t25d0  ONLINE       0     0     0
           c9t26d0  ONLINE       0     0     0
           c9t21d0  ONLINE       0     0     0
           c9t22d0  ONLINE       0     0     0
           c9t24d0  ONLINE       0     0     0
           c9t19d0  ONLINE       0     0     0
         raidz2     ONLINE       0     0     0
           c9t30d0  ONLINE       0     0     0
           c9t31d0  ONLINE       0     0     0
           c9t32d0  ONLINE       0     0     0
           c9t33d0  ONLINE       0     0     0
           c9t34d0  ONLINE       0     0     0
           c9t35d0  ONLINE       0     0     0
           c9t36d0  ONLINE       0     0     0
           c9t37d0  ONLINE       0     0     0
           c9t38d0  ONLINE       0     0     0
           c9t39d0  ONLINE       0     0     0
           c9t40d0  ONLINE       0     0     0
           c9t41d0  ONLINE       0     0     0
           c9t42d0  ONLINE       0     0     0
           c9t44d0  ONLINE       0     0     0
           c9t45d0  ONLINE       0     0     0
           c9t46d0  ONLINE       0     0     0
           c9t47d0  ONLINE       0     0     0
           c9t48d0  ONLINE       0     0     0
           c9t49d0  ONLINE       0     0     0
           c9t50d0  ONLINE       0     0     0
           c9t51d0  ONLINE       0     0     0
           c9t52d0  ONLINE       0     0     0
       cache
         c8t2d0     ONLINE       0     0     0
         c8t3d0     ONLINE       0     0     0
       spares
         c9t20d0    AVAIL
         c9t43d0    AVAIL

errors: No known data errors

 pool: rpool
state: ONLINE
scrub: none requested
config:

       NAME          STATE     READ WRITE CKSUM
       rpool         ONLINE       0     0     0
         mirror      ONLINE       0     0     0
           c8t0d0s0  ONLINE       0     0     0
           c8t1d0s0  ONLINE       0     0     0

errors: No known data errors

...and here is a snapshot of the system using "iostat -indexC 5" during a scrub of "pool002" (c8 is onboard AHCI controller, c9 is LSI SAS 3801E):

extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t2d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t3d0 8738.7 0.0 555346.1 0.0 0.1 345.0 0.0 39.5 0 3875 0 1 1 2 c9

You see 345 entries in the active queue. If the controller rolls over at
511 active entries, then it would explain why it would soon begin to
have difficulty.

Meanwhile, it is providing 8,738 IOPS and 555 MB/sec, which is quite
respectable.

194.8 0.0 11936.9 0.0 0.0 7.9 0.0 40.3 0 87 0 0 0 0 c9t8d0

These disks are doing almost 200 read IOPS, but are not 100% busy.
Average I/O size is 66 KB, which is not bad, lots of little I/Os could be
worse, but at only 11.9 MB/s, you are not near the media bandwidth.
Average service time is 40.3 milliseconds, which is not super, but may
be reflective of contention in the channel.
So there is more capacity to accept I/O commands, but...

194.6 0.0 12927.9 0.0 0.0 7.6 0.0 38.9 0 86 0 0 0 0 c9t9d0 194.6 0.0 12622.6 0.0 0.0 8.1 0.0 41.7 0 90 0 0 0 0 c9t10d0 201.6 0.0 13350.9 0.0 0.0 8.0 0.0 39.5 0 90 0 0 0 0 c9t11d0 194.4 0.0 12902.3 0.0 0.0 7.8 0.0 40.1 0 88 0 0 0 0 c9t12d0 194.6 0.0 12902.3 0.0 0.0 7.7 0.0 39.3 0 88 0 0 0 0 c9t13d0 195.4 0.0 12479.0 0.0 0.0 8.5 0.0 43.4 0 92 0 0 0 0 c9t14d0 197.6 0.0 13107.4 0.0 0.0 8.1 0.0 41.0 0 92 0 0 0 0 c9t15d0 198.8 0.0 12918.1 0.0 0.0 8.2 0.0 41.4 0 92 0 0 0 0 c9t16d0 201.0 0.0 13350.3 0.0 0.0 8.1 0.0 40.4 0 91 0 0 0 0 c9t17d0 201.2 0.0 13325.0 0.0 0.0 7.8 0.0 38.5 0 88 0 0 0 0 c9t18d0 200.6 0.0 13021.5 0.0 0.0 8.2 0.0 40.7 0 91 0 0 0 0 c9t19d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t20d0 196.6 0.0 12991.9 0.0 0.0 7.6 0.0 38.8 0 85 0 0 0 0 c9t21d0 196.4 0.0 11499.3 0.0 0.0 8.0 0.0 40.5 0 89 0 0 0 0 c9t22d0 197.6 0.0 13030.3 0.0 0.0 8.0 0.0 40.3 0 90 0 0 0 0 c9t23d0 198.4 0.0 11535.8 0.0 0.0 7.8 0.0 39.3 0 87 0 0 0 0 c9t24d0 202.2 0.0 13096.3 0.0 0.0 7.9 0.0 39.3 0 89 0 0 0 0 c9t25d0 193.6 0.0 12457.4 0.0 0.0 8.3 0.0 42.8 0 90 0 0 0 0 c9t26d0 194.0 0.0 12799.9 0.0 0.0 8.2 0.0 42.1 0 91 0 0 0 0 c9t27d0 193.0 0.0 12748.8 0.0 0.0 7.9 0.0 41.0 0 88 0 0 0 0 c9t28d0 194.6 0.0 12863.9 0.0 0.0 7.9 0.0 40.6 0 89 0 0 0 0 c9t29d0 199.8 0.0 12849.1 0.0 0.0 7.8 0.0 39.0 0 87 0 0 0 0 c9t30d0 205.0 0.0 13631.9 0.0 0.0 7.8 0.0 38.2 0 88 0 0 0 0 c9t31d0 204.0 0.0 11674.3 0.0 0.0 7.9 0.0 38.6 0 88 0 0 0 0 c9t32d0 204.2 0.0 11339.9 0.0 0.0 8.1 0.0 39.7 0 89 0 0 0 0 c9t33d0 204.8 0.0 11569.7 0.0 0.0 7.7 0.0 37.7 0 86 0 0 0 0 c9t34d0 205.2 0.0 11268.7 0.0 0.0 7.9 0.0 38.6 0 88 0 0 0 0 c9t35d0 198.4 0.0 12814.9 0.0 0.0 7.8 0.0 39.5 0 88 0 0 0 0 c9t36d0 200.4 0.0 13222.3 0.0 0.0 7.9 0.0 39.2 0 88 0 0 0 0 c9t37d0 200.2 0.0 12324.5 0.0 0.0 7.4 0.0 37.1 0 85 0 0 0 0 c9t38d0 203.0 0.0 11928.8 0.0 0.0 7.7 0.0 37.7 0 88 0 0 0 0 c9t39d0 196.2 0.0 12966.3 0.0 0.0 7.5 0.0 38.0 0 84 0 0 0 0 c9t40d0 195.2 0.0 11544.8 0.0 0.0 7.9 0.0 40.5 0 89 0 0 0 0 c9t41d0 199.2 0.0 12601.8 0.0 0.0 7.8 0.0 38.9 0 88 0 0 0 0 c9t42d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t43d0 194.4 0.0 12940.7 0.0 0.0 7.6 0.0 39.2 0 86 0 0 0 0 c9t44d0 198.2 0.0 13120.6 0.0 0.0 7.5 0.0 38.1 0 86 0 0 0 0 c9t45d0 201.2 0.0 11713.6 0.0 0.0 7.8 0.0 39.0 0 89 0 0 0 0 c9t46d0 197.8 0.0 13196.7 0.0 0.0 7.4 0.0 37.4 0 85 0 0 0 0 c9t47d0 197.4 0.0 13094.3 0.0 0.0 7.6 0.0 38.6 0 87 0 0 0 0 c9t48d0 195.8 0.0 13017.5 0.0 0.0 7.5 0.0 38.4 0 85 0 1 1 2 c9t49d0 205.0 0.0 11384.4 0.0 0.0 8.0 0.0 39.0 0 89 0 0 0 0 c9t50d0 200.6 0.0 13286.6 0.0 0.0 7.5 0.0 37.2 0 85 0 0 0 0 c9t51d0 200.6 0.0 12931.6 0.0 0.0 7.9 0.0 39.5 0 89 0 0 0 0 c9t52d0 196.6 0.0 13055.9 0.0 0.0 7.5 0.0 38.3 0 87 0 0 0 0 c9t55d0

I had to abort the scrub shortly after this or we would start seeing the timeouts.

yep. If you set the queue depth to 7, does it complete without timeouts?
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to