I found, for large dataset, long-term random reading test, the performance
with mmap is very bad.
See the attached chart in
https://issues.apache.org/jira/browse/CASSANDRA-1214.

On Fri, Jul 16, 2010 at 12:41 AM, Peter Schuller <
peter.schul...@infidyne.com> wrote:

> > Can someone please explain the mmap issue.
> > mmap is default for all storage files for 64bit machines.
> > according to this case
> https://issues.apache.org/jira/browse/CASSANDRA-1214
> > it might not be a good thing.
> > Is it right to say that you should use mmap only if your MAX expected
> data
> > is smaller then the MIN free RAM that could be in your system?
>
> Not really. That is, the intent of mmap is to let the OS dynamically
> choose what gets swapped in and out. The practical problem is that the
> OS will often tend to swap too much. I got the impression jbellis
> wasn't convinced, but my anecdotal experience is that this is a much
> larger problem for mmap():ed data than for regular buffer cached data
> - presumably, or so my assumption has been, because in the cache of
> the buffer cache the kernel has direct knowledge that it's cache only
> while with mmap() it's directly competing with regular application
> memory (I haven't actually checked the source; I suppose I should).
>
> One thing you can do is decrease swappiness (assuming Linux; check out
> /proc/sys/vm/swappiness) and see if it helps. But in general, you
> don't have, to my knowledge, good direct control over swapping
> policies.
>
> As noted in the thread, the best bet would probably be to make the JVM
> use mlock()/mlockall() to guarantee that the JVM doesn't swap anything
> out, and then let the OS do it's thing with any remaining data.
>
> That said, certainly if the total amount of data is less than the
> minimum free after JVM heap, you're very much less likely to see
> swapping. But it's not the intent that you should only use mmap()
> under such circumstances.
>
> Also, personally I'm interested in hearing what kind of performance
> impacts people have *actually* seen with standard I/O; especially if
> cassandra is configured to configure a significant amount of data in
> RAM itself. I'm a bit skeptical about claims of extreme performance
> differences, in spite of syscalls being expensive.
>
> --
> / Peter Schuller
>

Reply via email to