If you're actually hitting disk for most or even many of your reads then
mmap doesn't matter since the extra copy to a Java buffer is negligible
compared to the i/o itself (even on ssds).
On Jul 28, 2011 9:04 AM, "Terje Marthinussen" <tmarthinus...@gmail.com>
wrote:
>
> On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote:
>
>> This is not advisable in general, since non-mmap'd I/O is substantially
slower.
>
> I see this again and again as a claim here, but it is actually close to 10
years since I saw mmap'd I/O have any substantial performance benefits on
any real life use I have needed.
>
> We have done a lot of testing of this also with cassandra and I don't see
anything conclusive. We have done as many test where normal I/O has been
faster than mmap and the differences may very well be within statistical
variances given the complexity and number of factors involved in something
like a distributed cassandra working at quorum.
>
> mmap made a difference in 2000 when memory throughput was still measured
in hundreds of megabytes/sec and cpu caches was a few kilobytes, but today,
you got megabytes of CPU caches with 100GB/sec bandwidths and even memory
bandwidths are in 10's of GB/sec.
>
> However, I/O buffers are generally quiet small and copying an I/O buffer
from kernel to user space inside a cache with 100GB/sec bandwidth is really
a non-issue given the I/O throughput cassandra generates.
>
> In 2005 or so, CPUs had already reached a limit where I saw that mmap
performed worse than regular I/O on as a large number of use cases.
>
> Hard to say exactly why, but I saw one theory from a FreeBSD core
developer speculating back then that the extra MMU work involved in some I/O
loads may actually be slower than cache internal memcopy of tiny I/O buffers
(they are pretty small after all).
>
> I don't have a personal theory here. I just know that especially on large
amounts of smaller I/O operations regular I/O was typically faster than
mmap, which could back up that theory.
>
> So, I wonder how people came to this conclusion as I am, under no real
life use case with cassandra, able to reproduce anything resembling a
significant difference and we have been benchmarking on nodes with ssd
setups which can churn out 1GB/sec+ read speeds.
>
> Way more I/O throughput than most people have at hand and still I cannot
get mmap to give me better performance.
>
> I do, although subjectively, feel that things just seem to work better
with regular I/O for us. We have currently have very nice and stable heap
sizes at regardless of I/O loads and we have an easier system to operate as
we can actually monitor how much memory the darned thing work.
>
> My recommendation? Stay away from mmap.
>
> I would love to understand how people got to this conclusion however and
try to find out why we seem to see differences!
>
>> The OP is correct that it is best to disable swap entirely, and
>> second-best to enable JNA for mlockall.
>
> Be a bit careful with removing swap completely. Linux is not always happy
when it gets short on memory.
>
> Terje