Hi,
We're doing some benchmarking at a customer (using IOzone) and for some specific small block random tests, performance of their X4500 is very poor (~1.2 MB/s aggregate throughput for a 5+1 RAIDZ). Specifically, the test is the IOzone multithreaded throughput test of an 8GB file size and 8KB record size, with the server physmem'd to 2GB. I noticed a couple of peculiar anomalies when investigating the slow results. I am wondering if Sun has any best practices, tips for optimizing small block random I/O on ZFS, or any other documents that might explain what we're seeing and give us guidance on how to most effectively deploy ZFS in an environment with heavy small block random I/O. The first anomaly, Brendan Gregg's CacheKit Perl script fcachestat shows the segmap cache is hardly used (occasionally during the IOzone random read benchmark, while the disks are grabbing 20MB/s in aggregate, the segmap cache gets 100% hits for 1-3 attempts *every 10 seconds*--while all other samples are zero% for zero attempts. I don't know the kernel I/O path as well as I'd like, but I tried to see requests for ZFS to grab a file/offset block from disk by DTracing fbt::zfs_getpage (assuming it was the ZFS equivalent of ufs_getpage) and got no hits as well. In other words, it's as if ZFS isn't using the segmap cache. Secondly, DTrace scripts show the IOzone application is reading 8KB blocks, but by the time the physical I/O happens it's ballooned into a 26KB read operation for each disk. In other words, a single 8KB read generates 156KB of actual disk reads. We tried changing the ZFS recsize parameter from 128KB down to 8KB (recreated the ZPool and ZFS file system and changing recsize before creating the file), and that made the performance even worse-which has thrown us for a loop. I appreciate any assistance or direction you might be able to provide! Thanks! Marcel
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss