> Yeah, it has a BBU, and it is charged and on.. > Very odd behavior, I'm stumped.
I advise double-checking raid volume settings and ensuring that policy is truly such that the write cache is used. This may also be a function of kernel driver settings depending on what RAID controller/kernel version you have (for example make sure that an fsync() doesn't result in asking the RAID controller to flush caches regardless of BBU state). In any case, these stats: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util cciss/c0d0 0.00 908.50 0.00 110.50 0.00 8152.00 73.77 0.57 5.20 4.93 54.50 ... are highly inconsistent with write-back caching under the assumption that the writes are indeed the sequential writes to the commit log. Write counts in the ~ 100/second ballpark, with no reads, average request size of 74, an average transaction time of 4.93 and a utilization of 54% *really really* sounds like the I/O is going all the way down to the platter. Either that or the RAID firmware/driver is not doing its job properly. I've attached a small script (note: UGLY hack since I just whipped it up, not a proper tool, but it does the job for this) that you can run to test it: ./writelat.py /path/to/file_to_write_to # warning, file will be truncated/destroyed if it exists If you run this on an idle system and you're truly doing write-back caching, you should see numbers BELOW 1 (i.e., sub-millisecond times) (but you can ignore the first sample probably). It writes one 8kb chunk of data, fsync:s, writes another 8kb, fsyncs, etc. I predict you'll see numbers in the 3-7 millisecond range. Sample output for me with a plain old SATA drive (and ZFS on FreeBSD) follows. 33.8141918182 8.32605361938 8.44812393188 8.44788551331 8.40210914612 8.4490776062 8.38303565979 8.57901573181 8.20922851562 8.21614265442 10.0581645966 8.37683677673 8.50605964661 8.376121521 9.86790657043 8.43715667725 8.45789909363 8.40520858765 8.4171295166 8.46195220947 8.41498374939 8.46099853516 8.44287872314 8.43000411987 8.455991745 -- / Peter Schuller
#!/usr/bin/env python # ugly hack, reader beware import os import sys import time COUNT = 25 BLOCK = ''.join('x' for _ in xrange(8*1024)) f = file(sys.argv[1], 'w') for _ in xrange(COUNT): start_time = time.time() f.write(BLOCK) f.flush() os.fsync(f.fileno()) stop_time = time.time() # uncomment to test-rewriting the same block repeatedly # f.seek(0) print (stop_time - start_time) * 1000 f.close()