Id like to add one caveat to Weijun's statement.  I agree with everything, 
except if your access pattern doesnt look like a random sampling of data across 
all your sstables.  If it turns out that at any given time, you're doing many 
repeated hits to a smaller subset of keys, then using mmap even if your live 
sstables are much larger than available memory should be ok.  The key is to 
have enough memory available (pre-mmap) so that there are few page-in 
operations relative to client read requests.

Also, I suppose if you dont have a lot of repeat hits per key, mmap prob doesnt 
buy you a ton either, unless your rows are very skinny and lots of them fit in 
a page - as far as I can tell, linux lazily pages in data thats been mmap-ed.

(apologies for describing mmap inaccurately earlier in the thread)

Kyusik Chung

On May 6, 2010, at 11:05 AM, Weijun Li wrote:

> I just used Linux "Top" to see the number of virtual memory used by JVM. When 
> you turned on mmap, this number is equal to the size of your live sstables. 
> And if you turn off mmap the VIRT will be close to the xmx of your jvm.
> 
> Anyway, for mmap, in order for you to access the data in the buffer or 
> virtual address, OS has to read/page in the data to a block of physical 
> memory and assign your virtual address to that physical memory block. So if 
> you use random partitioner you'll most likely force Linux to page in/out all 
> the time. In this case, disabling mmap and let Cassandra to use random file 
> access seems to make more sense. mmap should be used when you have enough ram 
> for OS to cache most or all of your data files.
> 
> -Weijun
> 
> On Thu, May 6, 2010 at 10:49 AM, Vick Khera <vi...@khera.org> wrote:
> On Thu, May 6, 2010 at 1:06 PM, Weijun Li <weiju...@gmail.com> wrote:
> > In this case using mmap will cause Cassandra to use sometimes > 100G virtual
> > memory which is much more than the physical ram, since we are using random
> > partitioner the OS will be busy doing swap.
> 
> mmap uses the virtual address space to reference bits on the disk; it
> does *NOT* use physical or virtual memory to copy that data other than
> perhaps any disk buffer cache from reading the file (which you would
> have anyhow).  Your memory usage tools will report high memory usage
> because they tell you how much virtual address space you have
> allocated.
> 

Reply via email to