On Fri, Jun 15, 2012 at 12:56 PM, Timothy Coalson <tsc...@mst.edu> wrote: > Thanks for the suggestions. I think it would also depend on whether > the nfs server has tried to write asynchronously to the pool in the > meantime, which I am unsure how to test, other than making the txgs > extremely frequent and watching the load on the log devices.
I didn't want to reboot the main file server to test this, so I used zilstat on the backup nfs server (which has nearly identical hardware and configuration, but doesn't have SSDs for a separate ZIL) to see if I could estimate the difference it would make, and the story got stranger: it wrote far less data to the ZIL for the same copy operation (single 8GB file): $ sudo ./zilstat -M -l 20 -p backuppool txg waiting for txg commit... txg N-MB N-MB/s N-Max-Rate B-MB B-MB/s B-Max-Rate ops <=4kB 4-32kB >=32kB 2833307 1 0 1 1 0 1 15 0 0 15 2833308 0 0 0 0 0 0 0 0 0 0 2833309 1 0 1 1 0 1 8 0 0 8 2833310 0 0 0 0 0 0 4 0 0 4 2833311 1 0 0 1 0 0 9 0 0 9 2833312 0 0 0 0 0 0 0 0 0 0 2833313 2 0 2 2 0 2 21 0 0 21 2833314 7 1 7 8 1 8 63 0 0 63 2833315 1 0 1 2 0 2 18 0 0 18 2833316 0 0 0 0 0 0 5 0 0 5 A small sample from the server with SSD log devices doing the same operation: $ sudo ./zilstat -M -l 20 -p mainpool txg waiting for txg commit... txg N-MB N-MB/s N-Max-Rate B-MB B-MB/s B-Max-Rate ops <=4kB 4-32kB >=32kB 2808483 989 197 593 1967 393 1180 15010 0 0 15010 2808484 599 99 208 1134 189 393 8653 0 0 8653 2808485 0 0 0 0 0 0 0 0 0 0 2808486 137 27 126 255 51 235 1953 0 0 1953 2808487 460 92 460 859 171 859 6555 0 0 6555 2808488 530 75 530 1031 147 1031 7871 0 0 7871 Setting logbias=throughput makes the server with the SSD log devices act the same as the server without them, as far as I can tell, which I somewhat expected. However, I did not expect use of separate log devices to change how often ZIL ops are performed, other than to raise the upper limit if the device can service more IOPS. Additionally, nfssvrtop showed a lower value for Com_t when not using the separate log device (2.1s with logbias=latency, 0.24s with throughput). Copying a folder with small files and subdirectories pushes the server to ~400 ZIL ops per txg with logbias=throughput, so it shouldn't be the device performance making it only issue ~15 ops per txg copying a large file without using a separate log device. I am thinking of transplanting one of the SSDs temporarily for testing, but I would be interested to know the cause of this behavior. I don't know why more asynchronous writes seem to be making it into txgs without being caught by an nfs commit when a separate log device isn't used. Tim _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss