> The read rate that I have been seeing is about 3MB/sec, and that is reading 
> the raw bytes... using string serializer the rate is even lower, about 
> 2.2MB/sec. 
Can we break this down a bit:

Is this a single client ? 
How many columns is it asking for ? 
What sort of query are you sending, slice or named columns? 
From the client side how long is a single read taking ? 
What is the write workload like?  it sounds like it's write once read many. 

Use nodetool cfstats to see what the read latency is on a single node. (see 
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/) Is there much 
difference between this and the latency from the client perspective ?

> Using JNA may help, but a blog article seems to say it only increase 13%, 
> which is not very significant when the base performance is in single-digit 
> MBs.
There are other reasons to have JNA installed: more efficient snapshots and 
advising the OS when file operations should not be cached.

>  Our environment is virtualized, and the disks are actually SAN through fiber 
> channels, so I don't know if that has impact on performance as well.  
memory speed > network speed

Aaron Morton
Freelance Developer

On 17/05/2012, at 12:35 AM, Yiming Sun wrote:

> Thanks Aaron.  The reason I raised the question about memory requirements is 
> because we are seeing some very low performance on cassandra read.
> We are using cassandra as the backend for an IR repository, and granted the 
> size of each column is very small (OCRed text).  Each row represents a book 
> volume, and the columns of the row represent pages of the volume.  The 
> average size of a column text is 2-3KB, and each row has about 250 columns 
> (varies quite a bit from one volume to another).
> The read rate that I have been seeing is about 3MB/sec, and that is reading 
> the raw bytes... using string serializer the rate is even lower, about 
> 2.2MB/sec.   To retrieve each volume, a slice query is used via Hector that 
> specifies the row key (the volume), and a list of column keys (pages), and 
> the consistency level is set to ONE.  So I am a bit lost in trying to figure 
> out how to increase the performance.  Using JNA may help, but a blog article 
> seems to say it only increase 13%, which is not very significant when the 
> base performance is in single-digit MBs.
> Do you have any suggestions?
> Oh, another thing is you mentioned memory mapped files.  Our environment is 
> virtualized, and the disks are actually SAN through fiber channels, so I 
> don't know if that has impact on performance as well.  Would greatly 
> appreciate any help.  Thanks.
> -- Y.
> On Wed, May 16, 2012 at 5:48 AM, aaron morton <aa...@thelastpickle.com> wrote:
> The JVM will not swap out if you have JNA.jar in the path or you have 
> disabled swap on the machine (the simplest thing to do). 
> Cassandra uses memory mapped file access. If you have 16GB of ram, 8 will go 
> to the JVM and the rest can be used by the os to cache files. (Plus the off 
> heap stuff)
> Cheers
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 16/05/2012, at 11:12 AM, Yiming Sun wrote:
>> Thanks Tyler... so my understanding is, even if Cassandra doesn't do 
>> off-heap caching, by having a large-enough memory, it minimize the chance of 
>> swapping the java heap to a disk.  Is that correct?
>> -- Y.
>> On Tue, May 15, 2012 at 6:26 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>> On Tue, May 15, 2012 at 3:19 PM, Yiming Sun <yiming....@gmail.com> wrote:
>> Hello,
>> I was reading the Apache Cassandra 1.0 Documentation PDF dated May 10, 2012, 
>> and had some questions on what the recommended memory size is.
>> Below is the snippet from the PDF.  Bullet 1 suggests to have 16-32GB of 
>> RAM, yet Bullet 2 suggests to limit Java heap size to no more than 8GB.  My 
>> understanding is that Cassandra is implemented purely in Java, so all memory 
>> it sees and uses is the JVM Heap.
>> The main way that additional RAM helps is through the OS page cache, which 
>> will store hot portions of SSTables in memory. Additionally, Cassandra can 
>> now do off-heap caching.
>>  So can someone help me understand the discrepancy between 16-32GB of RAM 
>> and 8GB of heap?  Thanks.
>> == snippet ==
>> Memory
>> The more memory a Cassandra node has, the better read performance. More RAM 
>> allows for larger cache sizes and
>> reduces disk I/O for reads. More RAM also allows memory tables (memtables) 
>> to hold more recently written data. Larger
>> memtables lead to a fewer number of SSTables being flushed to disk and fewer 
>> files to scan during a read. The ideal
>> amount of RAM depends on the anticipated size of your hot data.
>> • For dedicated hardware, a minimum of than 8GB of RAM is needed. DataStax 
>> recommends 16GB - 32GB.
>> • Java heap space should be set to a maximum of 8GB or half of your total 
>> RAM, whichever is lower. (A greater
>> heap size has more intense garbage collection periods.)
>> • For a virtual environment use a minimum of 4GB, such as Amazon EC2 Large 
>> instances. For production clusters
>> with a healthy amount of traffic, 8GB is more common.
>> -- 
>> Tyler Hobbs
>> DataStax

Reply via email to