On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote: > This is not advisable in general, since non-mmap'd I/O is substantially > slower.
I see this again and again as a claim here, but it is actually close to 10 years since I saw mmap'd I/O have any substantial performance benefits on any real life use I have needed. We have done a lot of testing of this also with cassandra and I don't see anything conclusive. We have done as many test where normal I/O has been faster than mmap and the differences may very well be within statistical variances given the complexity and number of factors involved in something like a distributed cassandra working at quorum. mmap made a difference in 2000 when memory throughput was still measured in hundreds of megabytes/sec and cpu caches was a few kilobytes, but today, you got megabytes of CPU caches with 100GB/sec bandwidths and even memory bandwidths are in 10's of GB/sec. However, I/O buffers are generally quiet small and copying an I/O buffer from kernel to user space inside a cache with 100GB/sec bandwidth is really a non-issue given the I/O throughput cassandra generates. In 2005 or so, CPUs had already reached a limit where I saw that mmap performed worse than regular I/O on as a large number of use cases. Hard to say exactly why, but I saw one theory from a FreeBSD core developer speculating back then that the extra MMU work involved in some I/O loads may actually be slower than cache internal memcopy of tiny I/O buffers (they are pretty small after all). I don't have a personal theory here. I just know that especially on large amounts of smaller I/O operations regular I/O was typically faster than mmap, which could back up that theory. So, I wonder how people came to this conclusion as I am, under no real life use case with cassandra, able to reproduce anything resembling a significant difference and we have been benchmarking on nodes with ssd setups which can churn out 1GB/sec+ read speeds. Way more I/O throughput than most people have at hand and still I cannot get mmap to give me better performance. I do, although subjectively, feel that things just seem to work better with regular I/O for us. We have currently have very nice and stable heap sizes at regardless of I/O loads and we have an easier system to operate as we can actually monitor how much memory the darned thing work. My recommendation? Stay away from mmap. I would love to understand how people got to this conclusion however and try to find out why we seem to see differences! > The OP is correct that it is best to disable swap entirely, and > second-best to enable JNA for mlockall. Be a bit careful with removing swap completely. Linux is not always happy when it gets short on memory. Terje