> The read rate that I have been seeing is about 3MB/sec, and that is reading > the raw bytes... using string serializer the rate is even lower, about > 2.2MB/sec. Can we break this down a bit:
Is this a single client ? How many columns is it asking for ? What sort of query are you sending, slice or named columns? From the client side how long is a single read taking ? What is the write workload like? it sounds like it's write once read many. Use nodetool cfstats to see what the read latency is on a single node. (see http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/) Is there much difference between this and the latency from the client perspective ? > Using JNA may help, but a blog article seems to say it only increase 13%, > which is not very significant when the base performance is in single-digit > MBs. There are other reasons to have JNA installed: more efficient snapshots and advising the OS when file operations should not be cached. > Our environment is virtualized, and the disks are actually SAN through fiber > channels, so I don't know if that has impact on performance as well. memory speed > network speed ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/05/2012, at 12:35 AM, Yiming Sun wrote: > Thanks Aaron. The reason I raised the question about memory requirements is > because we are seeing some very low performance on cassandra read. > > We are using cassandra as the backend for an IR repository, and granted the > size of each column is very small (OCRed text). Each row represents a book > volume, and the columns of the row represent pages of the volume. The > average size of a column text is 2-3KB, and each row has about 250 columns > (varies quite a bit from one volume to another). > > The read rate that I have been seeing is about 3MB/sec, and that is reading > the raw bytes... using string serializer the rate is even lower, about > 2.2MB/sec. To retrieve each volume, a slice query is used via Hector that > specifies the row key (the volume), and a list of column keys (pages), and > the consistency level is set to ONE. So I am a bit lost in trying to figure > out how to increase the performance. Using JNA may help, but a blog article > seems to say it only increase 13%, which is not very significant when the > base performance is in single-digit MBs. > > Do you have any suggestions? > > Oh, another thing is you mentioned memory mapped files. Our environment is > virtualized, and the disks are actually SAN through fiber channels, so I > don't know if that has impact on performance as well. Would greatly > appreciate any help. Thanks. > > -- Y. > > On Wed, May 16, 2012 at 5:48 AM, aaron morton <aa...@thelastpickle.com> wrote: > The JVM will not swap out if you have JNA.jar in the path or you have > disabled swap on the machine (the simplest thing to do). > > Cassandra uses memory mapped file access. If you have 16GB of ram, 8 will go > to the JVM and the rest can be used by the os to cache files. (Plus the off > heap stuff) > > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 16/05/2012, at 11:12 AM, Yiming Sun wrote: > >> Thanks Tyler... so my understanding is, even if Cassandra doesn't do >> off-heap caching, by having a large-enough memory, it minimize the chance of >> swapping the java heap to a disk. Is that correct? >> >> -- Y. >> >> On Tue, May 15, 2012 at 6:26 PM, Tyler Hobbs <ty...@datastax.com> wrote: >> On Tue, May 15, 2012 at 3:19 PM, Yiming Sun <yiming....@gmail.com> wrote: >> Hello, >> >> I was reading the Apache Cassandra 1.0 Documentation PDF dated May 10, 2012, >> and had some questions on what the recommended memory size is. >> >> Below is the snippet from the PDF. Bullet 1 suggests to have 16-32GB of >> RAM, yet Bullet 2 suggests to limit Java heap size to no more than 8GB. My >> understanding is that Cassandra is implemented purely in Java, so all memory >> it sees and uses is the JVM Heap. >> >> The main way that additional RAM helps is through the OS page cache, which >> will store hot portions of SSTables in memory. Additionally, Cassandra can >> now do off-heap caching. >> >> >> So can someone help me understand the discrepancy between 16-32GB of RAM >> and 8GB of heap? Thanks. >> >> == snippet == >> Memory >> The more memory a Cassandra node has, the better read performance. More RAM >> allows for larger cache sizes and >> reduces disk I/O for reads. More RAM also allows memory tables (memtables) >> to hold more recently written data. Larger >> memtables lead to a fewer number of SSTables being flushed to disk and fewer >> files to scan during a read. The ideal >> amount of RAM depends on the anticipated size of your hot data. >> >> • For dedicated hardware, a minimum of than 8GB of RAM is needed. DataStax >> recommends 16GB - 32GB. >> >> • Java heap space should be set to a maximum of 8GB or half of your total >> RAM, whichever is lower. (A greater >> heap size has more intense garbage collection periods.) >> >> • For a virtual environment use a minimum of 4GB, such as Amazon EC2 Large >> instances. For production clusters >> with a healthy amount of traffic, 8GB is more common. >> >> >> >> -- >> Tyler Hobbs >> DataStax >> >> > >