Hmm, scratch that. Maybe. I did not first get the point that your writes to a filesystem dataset work quickly. Perhaps filesystem is (better) cached indeed, i.e. *maybe* zvol writes are synchronous and zfs writes may be cached and thus async? Try playing around with relevant dataset attributes...
I'm running a test on my system (a snv_114 Thumper, 16Gb RAM, used for other purposes as well), the CPU is mostly idle now (2.5-3.2% kernel time, that's about it). Seems I have results not unlike yours. Not cool because I wanted to play with COMSTAR iSCSI - and I'm not sure it will perform well ;) I'm dd'ing 30Gb to an uncompressed test zvol with same 64kb block sizes (maybe they are too small?), and zpool iostat goes like this - a hundred IOs at 7Mbps for a minute, then a burst of 100-170Mbps and 20-25K IOps for a second: pond 5.79T 4.41T 0 106 0 7.09M pond 5.79T 4.41T 0 1.93K 0 20.7M pond 5.79T 4.41T 0 13.3K 0 106M pond 5.79T 4.41T 0 116 0 7.76M pond 5.79T 4.41T 0 108 0 7.23M pond 5.79T 4.41T 0 107 0 7.16M pond 5.79T 4.41T 0 107 0 7.16M or pond 5.79T 4.41T 0 117 0 7.83M pond 5.79T 4.41T 0 5.61K 0 49.7M pond 5.79T 4.41T 0 19.0K 504 149M pond 5.79T 4.41T 0 104 0 6.96M Weird indeed. It wrote 10Gb (according to "zfs get usedbydataset pond/test") taking roughly 30 minutes after which I killed it. Now, writing to an uncompressed filesystem dataset (although very far from what's trumpeted as Thumper performance) yields quite different numbers: pond 5.80T 4.40T 1 3.64K 1022 457M pond 5.80T 4.40T 0 866 967 75.7M pond 5.80T 4.40T 0 4.65K 0 586M pond 5.80T 4.40T 6 802 33.4K 69.2M pond 5.80T 4.40T 29 2.44K 1.10M 301M pond 5.80T 4.40T 32 691 735K 25.0M pond 5.80T 4.40T 56 1.59K 2.29M 184M pond 5.80T 4.40T 150 768 4.61M 10.5M pond 5.80T 4.40T 2 0 25.5K 0 pond 5.80T 4.40T 0 2.75K 0 341M pond 5.80T 4.40T 7 3.96K 339K 497M pond 5.80T 4.39T 85 740 3.57M 59.0M pond 5.80T 4.39T 67 0 2.22M 0 pond 5.80T 4.39T 9 4.67K 292K 581M pond 5.80T 4.39T 4 1.07K 126K 137M pond 5.80T 4.39T 27 333 338K 9.15M pond 5.80T 4.39T 5 0 28.0K 3.99K pond 5.82T 4.37T 1 5.42K 1.67K 677M pond 5.83T 4.37T 3 1.69K 8.36K 173M pond 5.83T 4.37T 2 0 5.49K 0 pond 5.83T 4.37T 0 6.32K 0 790M pond 5.83T 4.37T 2 290 7.95K 27.8M pond 5.83T 4.37T 0 9.64K 1.23K 1.18G The numbers are jumpy (maybe due to fragmentation, other processes, etc.) but there are often spikes in excess of 500MBps. The whole test took a relatively little time: # time dd if=/dev/zero of=/pond/tmpnocompress/test30g bs=65536 count=500000 500000+0 records in 500000+0 records out real 1m27.657s user 0m0.302s sys 0m46.976s # du -hs /pond/tmpnocompress/test30g 30G /pond/tmpnocompress/test30g To detail about the pool: The pool is on a Sun X4500 with 48 250Gb SATA drives. It was created as a 9x5 set (9 stripes made of 5-disk raidz1 vdevs) spread across different controllers, with the command: # zpool create -f pond \ raidz1 c0t0d0 c1t0d0 c4t0d0 c6t0d0 c7t0d0 \ raidz1 c0t1d0 c1t2d0 c4t3d0 c6t5d0 c7t6d0 \ raidz1 c1t1d0 c4t1d0 c5t1d0 c6t1d0 c7t1d0 \ raidz1 c0t2d0 c4t2d0 c5t2d0 c6t2d0 c7t2d0 \ raidz1 c0t3d0 c1t3d0 c5t3d0 c6t3d0 c7t3d0 \ raidz1 c0t4d0 c1t4d0 c4t4d0 c6t4d0 c7t4d0 \ raidz1 c0t5d0 c1t5d0 c4t5d0 c5t5d0 c7t5d0 \ raidz1 c0t6d0 c1t6d0 c4t6d0 c5t6d0 c6t6d0 \ raidz1 c1t7d0 c4t7d0 c5t7d0 c6t7d0 c7t7d0 \ spare c0t7d0 Alas, while there were many blogs, I couldn't find a definitive answer last year as to which Thumper layout is optimal in performance and/or reliability (in regard to 6 controllers of 8 disks each, with 2 disks on one of the controllers reserved for booting). As a result, we spread each raidz1 across 5 controllers, so the loss of one controller should have minimal impact on data loss on the average. Since the system layout is not symmetrical, some controllers are more important than others (say, the boot one). //Jim -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss