Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!

Richard Elling Mon, 19 Dec 2011 18:02:10 -0800

comments below…

On Dec 18, 2011, at 6:53 AM, Jan-Aage Frydenbø-Bruvoll wrote:


> Dear List,
> 
> I have a storage server running OpenIndiana with a number of storage
> pools on it. All the pools' disks come off the same controller, and
> all pools are backed by SSD-based l2arc and ZIL. Performance is
> excellent on all pools but one, and I am struggling greatly to figure
> out what is wrong.
> 
> A very basic test shows the following - pretty much typical
> performance at the moment:
> 
> root@stor:/# for a in pool1 pool2 pool3; do dd if=/dev/zero of=$a/file
> bs=1M count=10; done
> 10+0 records in
> 10+0 records out
> 10485760 bytes (10 MB) copied, 0.00772965 s, 1.4 GB/s
> 10+0 records in
> 10+0 records out
> 10485760 bytes (10 MB) copied, 0.00996472 s, 1.1 GB/s
> 10+0 records in
> 10+0 records out
> 10485760 bytes (10 MB) copied, 71.8995 s, 146 kB/s

Enable compression and they should all go fast :-)
But seriously, you could be getting tripped up by the allocator. There are
several different allocator algorithms and they all begin to thrash at high
utilization. Some are better than others for various cases. For OpenIndiana,
you might be getting bit by the allocator. One troubleshooting tip would be
to observe the utilization of the metaslabs:
        zdb -m pool3

If there are metaslabs that are > 96% full, then look more closely at the 
allocator algorithms.

> 
> The zpool status of the affected pool is:
> 
> root@stor:/# zpool status pool3
>  pool: pool3
> state: ONLINE
> scan: resilvered 222G in 24h2m with 0 errors on Wed Dec 14 15:20:11 2011
> config:
> 
>        NAME          STATE     READ WRITE CKSUM
>        pool3         ONLINE       0     0     0
>          c1t0d0      ONLINE       0     0     0
>          c1t1d0      ONLINE       0     0     0
>          c1t2d0      ONLINE       0     0     0
>          c1t3d0      ONLINE       0     0     0
>          c1t4d0      ONLINE       0     0     0
>          c1t5d0      ONLINE       0     0     0
>          c1t6d0      ONLINE       0     0     0
>          c1t7d0      ONLINE       0     0     0
>          c1t8d0      ONLINE       0     0     0
>          c1t9d0      ONLINE       0     0     0
>          c1t10d0     ONLINE       0     0     0
>          mirror-12   ONLINE       0     0     0
>            c1t26d0   ONLINE       0     0     0
>            c1t27d0   ONLINE       0     0     0
>          mirror-13   ONLINE       0     0     0
>            c1t28d0   ONLINE       0     0     0
>            c1t29d0   ONLINE       0     0     0
>          mirror-14   ONLINE       0     0     0
>            c1t34d0   ONLINE       0     0     0
>            c1t35d0   ONLINE       0     0     0
>        logs
>          mirror-11   ONLINE       0     0     0
>            c2t2d0p8  ONLINE       0     0     0
>            c2t3d0p8  ONLINE       0     0     0
>        cache
>          c2t2d0p12   ONLINE       0     0     0
>          c2t3d0p12   ONLINE       0     0     0
> 
> errors: No known data errors
> 
> Ditto for the disk controller - MegaCli reports zero errors, be that
> on the controller itself, on this pool's disks or on any of the other
> attached disks.
> 
> I am pretty sure I am dealing with a disk-based problem here, i.e. a
> flaky disk that is "just" slow without exhibiting any actual data
> errors, holding the rest of the pool back, but I am at a miss as how
> to pinpoint what is going on.

"iostat -x" shows the average service time of each disk. If one disk or 
set of disks is a lot slower, when also busy, then it should be clearly visible
in the iostat output. Personally, I often use something like "iostat -zxCn 10"
for 10-second samples.
 -- richard

> 
> Would anybody on the list be able to give me any pointers as how to
> dig up more detailed information about the pool's/hardware's
> performance?
> 
> Thank you in advance for your kind assistance.
> 
> Best regards
> Jan
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 

ZFS and performance consulting
http://www.RichardElling.com
LISA '11, Boston, MA, December 4-9 














_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!

Reply via email to