comments below... Pascal Vandeputte wrote: > Thanks for all the replies! > > Some output from "iostat -x 1" while doing a dd of /dev/zero to a file on a > raidz of c1t0d0s3, c1t1d0 and c1t2d0 using bs=1048576: > > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 104.0 0.0 13312.0 4.0 32.0 346.0 100 100 > sd1 0.0 104.0 0.0 13312.0 3.0 32.0 336.4 100 100 > sd2 0.0 104.0 0.0 13312.0 3.0 32.0 336.4 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 104.0 0.0 13311.5 4.0 32.0 346.0 100 100 > sd1 0.0 106.0 0.0 13567.5 3.0 32.0 330.1 100 100 > sd2 0.0 106.0 0.0 13567.5 3.0 32.0 330.1 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 135.0 0.0 12619.3 2.6 25.9 211.3 66 100 > sd1 0.0 107.0 0.0 8714.6 1.1 16.3 163.3 38 66 > sd2 0.0 101.0 0.0 8077.0 1.0 14.5 153.5 32 61 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 1.0 13.0 8.0 14.5 1.7 0.2 139.9 29 22 > sd1 0.0 6.0 0.0 4.0 0.0 0.0 0.9 0 0 > sd2 0.0 6.0 0.0 4.0 0.0 0.0 0.9 0 0 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 77.0 0.0 9537.9 19.7 0.6 264.5 63 63 > sd1 0.0 122.0 0.0 13833.2 1.7 19.6 174.5 58 63 > sd2 0.0 136.0 0.0 15497.6 1.7 19.6 156.8 59 63 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 106.0 0.0 13567.8 34.0 1.0 330.1 100 100 > sd1 0.0 103.0 0.0 13183.8 3.0 32.0 339.7 100 100 > sd2 0.0 97.0 0.0 12415.8 3.0 32.0 360.7 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 104.0 0.0 13311.7 34.0 1.0 336.4 100 100 > sd1 0.0 83.0 0.0 10623.8 3.0 32.0 421.6 100 100 > sd2 0.0 76.0 0.0 9727.8 3.0 32.0 460.4 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 104.0 0.0 13312.7 34.0 1.0 336.4 100 100 > sd1 0.0 104.0 0.0 13312.7 3.0 32.0 336.4 100 100 > sd2 0.0 105.0 0.0 13440.7 3.0 32.0 333.2 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 104.0 0.0 13311.9 34.0 1.0 336.4 100 100 > sd1 0.0 106.0 0.0 13567.9 3.0 32.0 330.1 100 100 > sd2 0.0 105.0 0.0 13439.9 3.0 32.0 333.2 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 106.0 0.0 13567.6 34.0 1.0 330.1 100 100 > sd1 0.0 106.0 0.0 13567.6 3.0 32.0 330.1 100 100 > sd2 0.0 104.0 0.0 13311.6 3.0 32.0 336.4 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 120.0 0.0 14086.7 17.0 18.0 291.6 100 100 > sd1 0.0 104.0 0.0 13311.7 7.8 27.1 336.4 100 100 > sd2 0.0 107.0 0.0 13695.7 7.3 27.7 327.0 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 103.0 0.0 13185.0 3.0 32.0 339.7 100 100 > sd1 0.0 104.0 0.0 13313.0 3.0 32.0 336.4 100 100 > sd2 0.0 104.0 0.0 13313.0 3.0 32.0 336.4 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 115.0 0.0 12824.4 3.0 32.0 304.3 100 100 > sd1 0.0 131.0 0.0 14360.3 3.0 32.0 267.1 100 100 > sd2 0.0 125.0 0.0 14104.8 3.0 32.0 279.9 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 99.0 0.0 12672.9 3.0 32.0 353.4 100 100 > sd1 0.0 82.0 0.0 10496.8 3.0 32.0 426.7 100 100 > sd2 0.0 95.0 0.0 12160.9 3.0 32.0 368.3 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 104.0 0.0 13311.7 3.0 32.0 336.4 100 100 > sd1 0.0 103.0 0.0 13183.7 3.0 32.0 339.7 100 100 > sd2 0.0 105.0 0.0 13439.7 3.0 32.0 333.2 100 100 > > > Similar output when running "iostat -xn 1": > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 103.0 0.0 13184.3 4.0 32.0 38.7 310.7 100 100 c1t0d0 > 0.0 104.0 0.0 13312.3 3.0 32.0 28.7 307.7 100 100 c1t1d0 > 0.0 104.0 0.0 13312.3 3.0 32.0 28.7 307.7 100 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 106.0 0.0 13567.9 4.0 32.0 37.6 301.9 100 100 c1t0d0 > 0.0 123.0 0.0 13592.9 2.9 31.9 23.4 259.2 96 100 c1t1d0 > 0.0 122.0 0.0 13467.4 2.7 31.3 22.1 256.3 90 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 1.0 91.0 8.0 6986.7 2.2 12.7 23.8 137.8 45 79 c1t0d0 > 0.0 47.0 0.0 3057.1 0.0 3.0 0.0 63.9 0 24 c1t1d0 > 0.0 42.0 0.0 2545.1 0.0 1.8 0.0 43.7 0 19 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 36.7 0.0 2747.0 1.4 0.1 38.4 1.7 14 6 c1t0d0 > 0.0 42.7 0.0 4326.3 0.0 1.2 0.6 28.6 1 6 c1t1d0 > 0.0 44.7 0.0 4707.1 0.0 1.3 0.6 28.1 1 6 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 99.7 0.0 12760.7 33.3 1.0 334.4 10.0 100 100 c1t0d0 > 0.0 128.9 0.0 15215.3 3.0 32.0 23.2 248.2 100 100 c1t1d0 > 0.0 141.0 0.0 16504.8 3.0 32.0 21.2 227.0 100 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 104.0 0.0 13313.1 34.0 1.0 326.8 9.6 100 100 c1t0d0 > 0.0 80.0 0.0 10240.9 3.0 32.0 37.4 400.0 100 100 c1t1d0 > 0.0 68.0 0.0 8704.7 3.0 32.0 44.0 470.5 100 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 104.0 0.0 13311.6 34.0 1.0 326.8 9.6 100 100 c1t0d0 > 0.0 106.0 0.0 13567.6 3.0 32.0 28.2 301.9 100 100 c1t1d0 > 0.0 105.0 0.0 13439.6 3.0 32.0 28.5 304.8 100 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 104.0 0.0 13312.5 34.0 1.0 326.8 9.6 100 100 c1t0d0 > 0.0 104.0 0.0 13312.5 3.0 32.0 28.7 307.7 100 100 c1t1d0 > 0.0 106.0 0.0 13568.5 3.0 32.0 28.2 301.9 100 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 104.0 0.0 13311.8 34.0 1.0 326.8 9.6 100 100 c1t0d0 > 0.0 106.0 0.0 13567.8 3.0 32.0 28.2 301.9 100 100 c1t1d0 > 0.0 104.0 0.0 13311.8 3.0 32.0 28.7 307.7 100 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 106.0 0.0 13567.7 34.0 1.0 320.7 9.4 100 100 c1t0d0 > 0.0 106.0 0.0 13567.7 3.0 32.0 28.2 301.9 100 100 c1t1d0 > 0.0 104.0 0.0 13311.7 3.0 32.0 28.7 307.7 100 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 120.0 0.0 14087.1 4.0 31.0 33.0 258.6 100 100 c1t0d0 > 0.0 104.0 0.0 13312.1 7.8 27.1 75.5 260.9 100 100 c1t1d0 > 0.0 107.0 0.0 13696.1 7.3 27.7 68.4 258.5 100 100 c1t2d0 > > I mostly get readings like the first two ones. > Another run, half an hour later, most often shows this instead: > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 102.0 0.0 13054.5 34.0 1.0 333.3 9.8 100 100 c1t0d0 > 0.0 111.0 0.0 14206.4 34.0 1.0 306.2 9.0 100 100 c1t1d0 > 0.0 106.0 0.0 13503.9 3.0 32.0 28.2 301.9 100 100 c1t2d0 >
The average service time for all of your disks is about 9-10ms. The difference you see here (9.8 for c1t0d0 vs. 301.9 for c1t2d0) is due to the queue depth at the disk (actv). By default, ZFS will try to queue 35 iops to each vdev, which is why you see 34 + 1 (wait + actv) or 3 + 32, and so on. The takeaway here is that ZFS is sending a bunch of work to the disks, it is just a matter of how quickly the disks finish the work. An average of 10ms for a 7,200 rpm disk for write workloads is less than I would expect. Just doing the simple math, 10ms = ~ 100 w/s @ 128kBytes/iop or about 12.8 MBytes/s. More interestingly, sequential writes should not require seeks and the 10ms response time implies seeking. Even in the NCQ case (actv > 1), we see the same disk performance, ~ 10ms/iop. I would look more closely at the hardware and firmware for clues. -- richard > It's all a little fishy, and kw/s doesn't differ much between the drives (but > this could be explained as drive(s) with longer wait queues holding back the > others I guess?). > > According to Jeff's script, read speed seems to differ slightly as well (I > repeated it 3 times and always got the same result): > > # ./jeff.sh > c1t0d0 100 MB/sec > c1t1d0 112 MB/sec > c1t2d0 112 MB/sec > > I have tested dd write speed on the root partition (c1t0d0s0) and I get 27 > MB/s there (which I find quite low as well, these people here get 87 MB/s > average write speed... http://techreport.com/articles.x/13440/13 ). I'll > double-check using Linux what sequential write speeds I can get out of a > single drive on this system. > So 100MB/s is indeed out of the question on a raidz with 3 drives, but I > would still expect 50 MB/s to be technically possible (at least on nearly > empty disks). > > > For fun I tried a mirror of c1t1d0 and c1t2d0, so the OS disk is not involved > and write caching should work. I still get the same write speed of 13 MB/s > per drive: > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0 > 0.0 104.0 0.0 13313.0 3.0 32.0 28.8 307.7 100 100 c1t1d0 > 0.0 104.0 0.0 13313.0 3.0 32.0 28.8 307.7 100 100 c1t2d0 > > And if I do the same with c1t0d0s3 and c1t2d0: > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 104.0 0.0 13311.7 3.0 32.0 28.8 307.7 100 100 c1t0d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0 > 0.0 106.0 0.0 13567.7 3.0 32.0 28.3 301.9 100 100 c1t2d0 > > Hmm, doesn't look like one drive holding back another one, all of them seem > to be equally slow at writing. > > > This is the current partition table of the boot drive: > > partition> print > Current partition table (original): > Total disk cylinders available: 45597 + 2 (reserved cylinders) > > Part Tag Flag Cylinders Size Blocks > 0 root wm 1 - 45 705.98MB (45/0/0) 1445850 > 1 swap wu 46 - 78 517.72MB (33/0/0) 1060290 > 2 backup wm 0 - 45596 698.58GB (45597/0/0) 1465031610 > 3 unassigned wm 79 - 45596 697.37GB (45518/0/0) 1462493340 > 4 unassigned wm 0 0 (0/0/0) 0 > 5 unassigned wm 0 0 (0/0/0) 0 > 6 unassigned wm 0 0 (0/0/0) 0 > 7 unassigned wm 0 0 (0/0/0) 0 > 8 boot wu 0 - 0 15.69MB (1/0/0) 32130 > 9 unassigned wm 0 0 (0/0/0) 0 > > Note that I have included a small 512MB slice for swap space. The ZFS Best > Practices Guide recommends against swap on the same disk as ZFS storage, but > being new to Solaris I don't know if it would run fine without any swap space > at all. I've got 2GB of memory and don't intend to run anything else than > ZFS, Samba and NFS. > > > > Apr 18, 2008 12:48 AM milek wrote : > >> Also try to lower number of outstanding IOs per device from default 35 >> > in zfs to something much slower. > > Thanks a lot for the suggestion. But how do I do that? I've found some > information on pending IOs on http://blogs.sun.com/roch/ , but neither the > Best Practices Guide, the ZFS Administration Guide, or Google give much > information if I search for pending/outstanding or 35 etc. > > > Could the ahci driver be suspect? Maybe I can change my BIOS SATA support to > legacy IDE, reinstall and see if anything interesting occurs? > > > Finally, while recreating the ZFS pool, I got a "raidz contains devices of > different sizes" warning (I must have forgotten about using -f to force > creation the first time). I do hope this is safe, right? How does ZFS handle > block devices of different sizes? I get the same amount of blocks when doing > df on a mirror of (whole disk 2 & 3) versus a mirror of (the large slice of > disk 1 & whole disk 3)... :-| Which bothers me a lot now! I could always try > to install Solaris on a compactflash card in an IDE-to-CF adapter. > > > Many thanks for your help, > > Pascal > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss