ok, see below...
On Oct 23, 2009, at 8:14 PM, Adam Cheal wrote:
Here is example of the pool config we use:
# zpool status
pool: pool002
state: ONLINE
scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52
2009
config:
NAME STATE READ WRITE CKSUM
pool002 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c9t18d0 ONLINE 0 0 0
c9t17d0 ONLINE 0 0 0
c9t55d0 ONLINE 0 0 0
c9t13d0 ONLINE 0 0 0
c9t15d0 ONLINE 0 0 0
c9t16d0 ONLINE 0 0 0
c9t11d0 ONLINE 0 0 0
c9t12d0 ONLINE 0 0 0
c9t14d0 ONLINE 0 0 0
c9t9d0 ONLINE 0 0 0
c9t8d0 ONLINE 0 0 0
c9t10d0 ONLINE 0 0 0
c9t29d0 ONLINE 0 0 0
c9t28d0 ONLINE 0 0 0
c9t27d0 ONLINE 0 0 0
c9t23d0 ONLINE 0 0 0
c9t25d0 ONLINE 0 0 0
c9t26d0 ONLINE 0 0 0
c9t21d0 ONLINE 0 0 0
c9t22d0 ONLINE 0 0 0
c9t24d0 ONLINE 0 0 0
c9t19d0 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c9t30d0 ONLINE 0 0 0
c9t31d0 ONLINE 0 0 0
c9t32d0 ONLINE 0 0 0
c9t33d0 ONLINE 0 0 0
c9t34d0 ONLINE 0 0 0
c9t35d0 ONLINE 0 0 0
c9t36d0 ONLINE 0 0 0
c9t37d0 ONLINE 0 0 0
c9t38d0 ONLINE 0 0 0
c9t39d0 ONLINE 0 0 0
c9t40d0 ONLINE 0 0 0
c9t41d0 ONLINE 0 0 0
c9t42d0 ONLINE 0 0 0
c9t44d0 ONLINE 0 0 0
c9t45d0 ONLINE 0 0 0
c9t46d0 ONLINE 0 0 0
c9t47d0 ONLINE 0 0 0
c9t48d0 ONLINE 0 0 0
c9t49d0 ONLINE 0 0 0
c9t50d0 ONLINE 0 0 0
c9t51d0 ONLINE 0 0 0
c9t52d0 ONLINE 0 0 0
cache
c8t2d0 ONLINE 0 0 0
c8t3d0 ONLINE 0 0 0
spares
c9t20d0 AVAIL
c9t43d0 AVAIL
errors: No known data errors
pool: rpool
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror ONLINE 0 0 0
c8t0d0s0 ONLINE 0 0 0
c8t1d0s0 ONLINE 0 0 0
errors: No known data errors
...and here is a snapshot of the system using "iostat -indexC 5"
during a scrub of "pool002" (c8 is onboard AHCI controller, c9 is
LSI SAS 3801E):
extended device statistics ----
errors ---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w
trn tot device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
0 0 0 c8
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
0 0 0 c8t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
0 0 0 c8t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
0 0 0 c8t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
0 0 0 c8t3d0
8738.7 0.0 555346.1 0.0 0.1 345.0 0.0 39.5 0 3875
0 1 1 2 c9
You see 345 entries in the active queue. If the controller rolls over at
511 active entries, then it would explain why it would soon begin to
have difficulty.
Meanwhile, it is providing 8,738 IOPS and 555 MB/sec, which is quite
respectable.
194.8 0.0 11936.9 0.0 0.0 7.9 0.0 40.3 0 87 0
0 0 0 c9t8d0
These disks are doing almost 200 read IOPS, but are not 100% busy.
Average I/O size is 66 KB, which is not bad, lots of little I/Os could
be
worse, but at only 11.9 MB/s, you are not near the media bandwidth.
Average service time is 40.3 milliseconds, which is not super, but may
be reflective of contention in the channel.
So there is more capacity to accept I/O commands, but...
194.6 0.0 12927.9 0.0 0.0 7.6 0.0 38.9 0 86 0
0 0 0 c9t9d0
194.6 0.0 12622.6 0.0 0.0 8.1 0.0 41.7 0 90 0
0 0 0 c9t10d0
201.6 0.0 13350.9 0.0 0.0 8.0 0.0 39.5 0 90 0
0 0 0 c9t11d0
194.4 0.0 12902.3 0.0 0.0 7.8 0.0 40.1 0 88 0
0 0 0 c9t12d0
194.6 0.0 12902.3 0.0 0.0 7.7 0.0 39.3 0 88 0
0 0 0 c9t13d0
195.4 0.0 12479.0 0.0 0.0 8.5 0.0 43.4 0 92 0
0 0 0 c9t14d0
197.6 0.0 13107.4 0.0 0.0 8.1 0.0 41.0 0 92 0
0 0 0 c9t15d0
198.8 0.0 12918.1 0.0 0.0 8.2 0.0 41.4 0 92 0
0 0 0 c9t16d0
201.0 0.0 13350.3 0.0 0.0 8.1 0.0 40.4 0 91 0
0 0 0 c9t17d0
201.2 0.0 13325.0 0.0 0.0 7.8 0.0 38.5 0 88 0
0 0 0 c9t18d0
200.6 0.0 13021.5 0.0 0.0 8.2 0.0 40.7 0 91 0
0 0 0 c9t19d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
0 0 0 c9t20d0
196.6 0.0 12991.9 0.0 0.0 7.6 0.0 38.8 0 85 0
0 0 0 c9t21d0
196.4 0.0 11499.3 0.0 0.0 8.0 0.0 40.5 0 89 0
0 0 0 c9t22d0
197.6 0.0 13030.3 0.0 0.0 8.0 0.0 40.3 0 90 0
0 0 0 c9t23d0
198.4 0.0 11535.8 0.0 0.0 7.8 0.0 39.3 0 87 0
0 0 0 c9t24d0
202.2 0.0 13096.3 0.0 0.0 7.9 0.0 39.3 0 89 0
0 0 0 c9t25d0
193.6 0.0 12457.4 0.0 0.0 8.3 0.0 42.8 0 90 0
0 0 0 c9t26d0
194.0 0.0 12799.9 0.0 0.0 8.2 0.0 42.1 0 91 0
0 0 0 c9t27d0
193.0 0.0 12748.8 0.0 0.0 7.9 0.0 41.0 0 88 0
0 0 0 c9t28d0
194.6 0.0 12863.9 0.0 0.0 7.9 0.0 40.6 0 89 0
0 0 0 c9t29d0
199.8 0.0 12849.1 0.0 0.0 7.8 0.0 39.0 0 87 0
0 0 0 c9t30d0
205.0 0.0 13631.9 0.0 0.0 7.8 0.0 38.2 0 88 0
0 0 0 c9t31d0
204.0 0.0 11674.3 0.0 0.0 7.9 0.0 38.6 0 88 0
0 0 0 c9t32d0
204.2 0.0 11339.9 0.0 0.0 8.1 0.0 39.7 0 89 0
0 0 0 c9t33d0
204.8 0.0 11569.7 0.0 0.0 7.7 0.0 37.7 0 86 0
0 0 0 c9t34d0
205.2 0.0 11268.7 0.0 0.0 7.9 0.0 38.6 0 88 0
0 0 0 c9t35d0
198.4 0.0 12814.9 0.0 0.0 7.8 0.0 39.5 0 88 0
0 0 0 c9t36d0
200.4 0.0 13222.3 0.0 0.0 7.9 0.0 39.2 0 88 0
0 0 0 c9t37d0
200.2 0.0 12324.5 0.0 0.0 7.4 0.0 37.1 0 85 0
0 0 0 c9t38d0
203.0 0.0 11928.8 0.0 0.0 7.7 0.0 37.7 0 88 0
0 0 0 c9t39d0
196.2 0.0 12966.3 0.0 0.0 7.5 0.0 38.0 0 84 0
0 0 0 c9t40d0
195.2 0.0 11544.8 0.0 0.0 7.9 0.0 40.5 0 89 0
0 0 0 c9t41d0
199.2 0.0 12601.8 0.0 0.0 7.8 0.0 38.9 0 88 0
0 0 0 c9t42d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
0 0 0 c9t43d0
194.4 0.0 12940.7 0.0 0.0 7.6 0.0 39.2 0 86 0
0 0 0 c9t44d0
198.2 0.0 13120.6 0.0 0.0 7.5 0.0 38.1 0 86 0
0 0 0 c9t45d0
201.2 0.0 11713.6 0.0 0.0 7.8 0.0 39.0 0 89 0
0 0 0 c9t46d0
197.8 0.0 13196.7 0.0 0.0 7.4 0.0 37.4 0 85 0
0 0 0 c9t47d0
197.4 0.0 13094.3 0.0 0.0 7.6 0.0 38.6 0 87 0
0 0 0 c9t48d0
195.8 0.0 13017.5 0.0 0.0 7.5 0.0 38.4 0 85 0
1 1 2 c9t49d0
205.0 0.0 11384.4 0.0 0.0 8.0 0.0 39.0 0 89 0
0 0 0 c9t50d0
200.6 0.0 13286.6 0.0 0.0 7.5 0.0 37.2 0 85 0
0 0 0 c9t51d0
200.6 0.0 12931.6 0.0 0.0 7.9 0.0 39.5 0 89 0
0 0 0 c9t52d0
196.6 0.0 13055.9 0.0 0.0 7.5 0.0 38.3 0 87 0
0 0 0 c9t55d0
I had to abort the scrub shortly after this or we would start seeing
the timeouts.
yep. If you set the queue depth to 7, does it complete without
timeouts?
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss