Id like to add one caveat to Weijun's statement. I agree with everything, except if your access pattern doesnt look like a random sampling of data across all your sstables. If it turns out that at any given time, you're doing many repeated hits to a smaller subset of keys, then using mmap even if your live sstables are much larger than available memory should be ok. The key is to have enough memory available (pre-mmap) so that there are few page-in operations relative to client read requests.
Also, I suppose if you dont have a lot of repeat hits per key, mmap prob doesnt buy you a ton either, unless your rows are very skinny and lots of them fit in a page - as far as I can tell, linux lazily pages in data thats been mmap-ed. (apologies for describing mmap inaccurately earlier in the thread) Kyusik Chung On May 6, 2010, at 11:05 AM, Weijun Li wrote: > I just used Linux "Top" to see the number of virtual memory used by JVM. When > you turned on mmap, this number is equal to the size of your live sstables. > And if you turn off mmap the VIRT will be close to the xmx of your jvm. > > Anyway, for mmap, in order for you to access the data in the buffer or > virtual address, OS has to read/page in the data to a block of physical > memory and assign your virtual address to that physical memory block. So if > you use random partitioner you'll most likely force Linux to page in/out all > the time. In this case, disabling mmap and let Cassandra to use random file > access seems to make more sense. mmap should be used when you have enough ram > for OS to cache most or all of your data files. > > -Weijun > > On Thu, May 6, 2010 at 10:49 AM, Vick Khera <vi...@khera.org> wrote: > On Thu, May 6, 2010 at 1:06 PM, Weijun Li <weiju...@gmail.com> wrote: > > In this case using mmap will cause Cassandra to use sometimes > 100G virtual > > memory which is much more than the physical ram, since we are using random > > partitioner the OS will be busy doing swap. > > mmap uses the virtual address space to reference bits on the disk; it > does *NOT* use physical or virtual memory to copy that data other than > perhaps any disk buffer cache from reading the file (which you would > have anyhow). Your memory usage tools will report high memory usage > because they tell you how much virtual address space you have > allocated. >