How do you estimate needed queue depth if one has say 64 to 128 disks sitting behind LSI? Is it bad idea having queuedepth 1?
Yours Markus Kovero ________________________________________ Lähettäjä: zfs-discuss-boun...@opensolaris.org [zfs-discuss-boun...@opensolaris.org] käyttäjän Richard Elling [richard.ell...@gmail.com] puolesta Lähetetty: 24. lokakuuta 2009 7:36 Vastaanottaja: Adam Cheal Kopio: zfs-discuss@opensolaris.org Aihe: Re: [zfs-discuss] SNV_125 MPT warning in logfile ok, see below... On Oct 23, 2009, at 8:14 PM, Adam Cheal wrote: > Here is example of the pool config we use: > > # zpool status > pool: pool002 > state: ONLINE > scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52 > 2009 > config: > > NAME STATE READ WRITE CKSUM > pool002 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c9t18d0 ONLINE 0 0 0 > c9t17d0 ONLINE 0 0 0 > c9t55d0 ONLINE 0 0 0 > c9t13d0 ONLINE 0 0 0 > c9t15d0 ONLINE 0 0 0 > c9t16d0 ONLINE 0 0 0 > c9t11d0 ONLINE 0 0 0 > c9t12d0 ONLINE 0 0 0 > c9t14d0 ONLINE 0 0 0 > c9t9d0 ONLINE 0 0 0 > c9t8d0 ONLINE 0 0 0 > c9t10d0 ONLINE 0 0 0 > c9t29d0 ONLINE 0 0 0 > c9t28d0 ONLINE 0 0 0 > c9t27d0 ONLINE 0 0 0 > c9t23d0 ONLINE 0 0 0 > c9t25d0 ONLINE 0 0 0 > c9t26d0 ONLINE 0 0 0 > c9t21d0 ONLINE 0 0 0 > c9t22d0 ONLINE 0 0 0 > c9t24d0 ONLINE 0 0 0 > c9t19d0 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c9t30d0 ONLINE 0 0 0 > c9t31d0 ONLINE 0 0 0 > c9t32d0 ONLINE 0 0 0 > c9t33d0 ONLINE 0 0 0 > c9t34d0 ONLINE 0 0 0 > c9t35d0 ONLINE 0 0 0 > c9t36d0 ONLINE 0 0 0 > c9t37d0 ONLINE 0 0 0 > c9t38d0 ONLINE 0 0 0 > c9t39d0 ONLINE 0 0 0 > c9t40d0 ONLINE 0 0 0 > c9t41d0 ONLINE 0 0 0 > c9t42d0 ONLINE 0 0 0 > c9t44d0 ONLINE 0 0 0 > c9t45d0 ONLINE 0 0 0 > c9t46d0 ONLINE 0 0 0 > c9t47d0 ONLINE 0 0 0 > c9t48d0 ONLINE 0 0 0 > c9t49d0 ONLINE 0 0 0 > c9t50d0 ONLINE 0 0 0 > c9t51d0 ONLINE 0 0 0 > c9t52d0 ONLINE 0 0 0 > cache > c8t2d0 ONLINE 0 0 0 > c8t3d0 ONLINE 0 0 0 > spares > c9t20d0 AVAIL > c9t43d0 AVAIL > > errors: No known data errors > > pool: rpool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c8t0d0s0 ONLINE 0 0 0 > c8t1d0s0 ONLINE 0 0 0 > > errors: No known data errors > > ...and here is a snapshot of the system using "iostat -indexC 5" > during a scrub of "pool002" (c8 is onboard AHCI controller, c9 is > LSI SAS 3801E): > > extended device statistics ---- > errors --- > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w > trn tot device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t0d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t1d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t2d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t3d0 > 8738.7 0.0 555346.1 0.0 0.1 345.0 0.0 39.5 0 3875 > 0 1 1 2 c9 You see 345 entries in the active queue. If the controller rolls over at 511 active entries, then it would explain why it would soon begin to have difficulty. Meanwhile, it is providing 8,738 IOPS and 555 MB/sec, which is quite respectable. > 194.8 0.0 11936.9 0.0 0.0 7.9 0.0 40.3 0 87 0 > 0 0 0 c9t8d0 These disks are doing almost 200 read IOPS, but are not 100% busy. Average I/O size is 66 KB, which is not bad, lots of little I/Os could be worse, but at only 11.9 MB/s, you are not near the media bandwidth. Average service time is 40.3 milliseconds, which is not super, but may be reflective of contention in the channel. So there is more capacity to accept I/O commands, but... > 194.6 0.0 12927.9 0.0 0.0 7.6 0.0 38.9 0 86 0 > 0 0 0 c9t9d0 > 194.6 0.0 12622.6 0.0 0.0 8.1 0.0 41.7 0 90 0 > 0 0 0 c9t10d0 > 201.6 0.0 13350.9 0.0 0.0 8.0 0.0 39.5 0 90 0 > 0 0 0 c9t11d0 > 194.4 0.0 12902.3 0.0 0.0 7.8 0.0 40.1 0 88 0 > 0 0 0 c9t12d0 > 194.6 0.0 12902.3 0.0 0.0 7.7 0.0 39.3 0 88 0 > 0 0 0 c9t13d0 > 195.4 0.0 12479.0 0.0 0.0 8.5 0.0 43.4 0 92 0 > 0 0 0 c9t14d0 > 197.6 0.0 13107.4 0.0 0.0 8.1 0.0 41.0 0 92 0 > 0 0 0 c9t15d0 > 198.8 0.0 12918.1 0.0 0.0 8.2 0.0 41.4 0 92 0 > 0 0 0 c9t16d0 > 201.0 0.0 13350.3 0.0 0.0 8.1 0.0 40.4 0 91 0 > 0 0 0 c9t17d0 > 201.2 0.0 13325.0 0.0 0.0 7.8 0.0 38.5 0 88 0 > 0 0 0 c9t18d0 > 200.6 0.0 13021.5 0.0 0.0 8.2 0.0 40.7 0 91 0 > 0 0 0 c9t19d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t20d0 > 196.6 0.0 12991.9 0.0 0.0 7.6 0.0 38.8 0 85 0 > 0 0 0 c9t21d0 > 196.4 0.0 11499.3 0.0 0.0 8.0 0.0 40.5 0 89 0 > 0 0 0 c9t22d0 > 197.6 0.0 13030.3 0.0 0.0 8.0 0.0 40.3 0 90 0 > 0 0 0 c9t23d0 > 198.4 0.0 11535.8 0.0 0.0 7.8 0.0 39.3 0 87 0 > 0 0 0 c9t24d0 > 202.2 0.0 13096.3 0.0 0.0 7.9 0.0 39.3 0 89 0 > 0 0 0 c9t25d0 > 193.6 0.0 12457.4 0.0 0.0 8.3 0.0 42.8 0 90 0 > 0 0 0 c9t26d0 > 194.0 0.0 12799.9 0.0 0.0 8.2 0.0 42.1 0 91 0 > 0 0 0 c9t27d0 > 193.0 0.0 12748.8 0.0 0.0 7.9 0.0 41.0 0 88 0 > 0 0 0 c9t28d0 > 194.6 0.0 12863.9 0.0 0.0 7.9 0.0 40.6 0 89 0 > 0 0 0 c9t29d0 > 199.8 0.0 12849.1 0.0 0.0 7.8 0.0 39.0 0 87 0 > 0 0 0 c9t30d0 > 205.0 0.0 13631.9 0.0 0.0 7.8 0.0 38.2 0 88 0 > 0 0 0 c9t31d0 > 204.0 0.0 11674.3 0.0 0.0 7.9 0.0 38.6 0 88 0 > 0 0 0 c9t32d0 > 204.2 0.0 11339.9 0.0 0.0 8.1 0.0 39.7 0 89 0 > 0 0 0 c9t33d0 > 204.8 0.0 11569.7 0.0 0.0 7.7 0.0 37.7 0 86 0 > 0 0 0 c9t34d0 > 205.2 0.0 11268.7 0.0 0.0 7.9 0.0 38.6 0 88 0 > 0 0 0 c9t35d0 > 198.4 0.0 12814.9 0.0 0.0 7.8 0.0 39.5 0 88 0 > 0 0 0 c9t36d0 > 200.4 0.0 13222.3 0.0 0.0 7.9 0.0 39.2 0 88 0 > 0 0 0 c9t37d0 > 200.2 0.0 12324.5 0.0 0.0 7.4 0.0 37.1 0 85 0 > 0 0 0 c9t38d0 > 203.0 0.0 11928.8 0.0 0.0 7.7 0.0 37.7 0 88 0 > 0 0 0 c9t39d0 > 196.2 0.0 12966.3 0.0 0.0 7.5 0.0 38.0 0 84 0 > 0 0 0 c9t40d0 > 195.2 0.0 11544.8 0.0 0.0 7.9 0.0 40.5 0 89 0 > 0 0 0 c9t41d0 > 199.2 0.0 12601.8 0.0 0.0 7.8 0.0 38.9 0 88 0 > 0 0 0 c9t42d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t43d0 > 194.4 0.0 12940.7 0.0 0.0 7.6 0.0 39.2 0 86 0 > 0 0 0 c9t44d0 > 198.2 0.0 13120.6 0.0 0.0 7.5 0.0 38.1 0 86 0 > 0 0 0 c9t45d0 > 201.2 0.0 11713.6 0.0 0.0 7.8 0.0 39.0 0 89 0 > 0 0 0 c9t46d0 > 197.8 0.0 13196.7 0.0 0.0 7.4 0.0 37.4 0 85 0 > 0 0 0 c9t47d0 > 197.4 0.0 13094.3 0.0 0.0 7.6 0.0 38.6 0 87 0 > 0 0 0 c9t48d0 > 195.8 0.0 13017.5 0.0 0.0 7.5 0.0 38.4 0 85 0 > 1 1 2 c9t49d0 > 205.0 0.0 11384.4 0.0 0.0 8.0 0.0 39.0 0 89 0 > 0 0 0 c9t50d0 > 200.6 0.0 13286.6 0.0 0.0 7.5 0.0 37.2 0 85 0 > 0 0 0 c9t51d0 > 200.6 0.0 12931.6 0.0 0.0 7.9 0.0 39.5 0 89 0 > 0 0 0 c9t52d0 > 196.6 0.0 13055.9 0.0 0.0 7.5 0.0 38.3 0 87 0 > 0 0 0 c9t55d0 > > I had to abort the scrub shortly after this or we would start seeing > the timeouts. yep. If you set the queue depth to 7, does it complete without timeouts? -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss