Will do, Oleg. Again, thanks for the information. -- Y.
On Wed, May 16, 2012 at 4:44 PM, Oleg Dulin <oleg.du...@gmail.com> wrote: > ** > > Please do keep us posted. We have a somewhat similar Cassandra utilization > pattern, and I would like to know what your solution is... > > > > On 2012-05-16 20:38:37 +0000, Yiming Sun said: > > > Thanks Oleg. Another caveat from our side is, we have a very large data > space (imaging picking 100 items out of 3 million, the chance of having 2 > items from the same bin is pretty low). We will experiment with row cache, > and hopefully it will help, not the opposite (the tuning guide says row > cache could be detrimental in some circumstances). > > > -- Y. > > > On Wed, May 16, 2012 at 4:25 PM, Oleg Dulin <oleg.du...@gmail.com> wrote: > > Indeed. This is how we are trying to solve this problem. > > > Our application has a built-in cache that resembles a supercolumn or > standardcolumn data structure and has API that resembles a combination of > Pelops selector and mutator. You can do something like that for Hector. > > > The cache is constrained and uses LRU to purge unused items and keep > memory usage steady. > > > It is not perfect and we have bugs still but it cuts down on 90% of > cassandra reads. > > > > On 2012-05-16 20:07:11 +0000, Mike Peters said: > > > Hi Yiming, > > > Cassandra is optimized for write-heavy environments. > > > If you have a read-heavy application, you shouldn't be running your reads > through Cassandra. > > > On the bright side - Cassandra read throughput will remain consistent, > regardless of your volume. But you are going to have to "wrap" your reads > with memcache (or redis), so that the bulk of your reads can be served from > memory. > > > > Thanks, > > Mike Peters > > > On 5/16/2012 3:59 PM, Yiming Sun wrote: > > Hello, > > > I asked the question as a follow-up under a different thread, so I figure > I should ask here instead in case the other one gets buried, and besides, I > have a little more information. > > > "We find the lack of performance disturbing" as we are only able to get > about 3-4MB/sec read performance out of Cassandra. > > > We are using cassandra as the backend for an IR repository of digital > texts. It is a read-mostly repository with occasional writes. Each row > represents a book volume, and each column of a row represents a page of the > volume. Granted the data size is small -- the average size of a column > text is 2-3KB, and each row has about 250 columns (varies quite a bit from > one volume to another). > > > Currently we are running a 3-node cluster, and will soon be upgraded to a > 6-node setup. Each node is a VM with 4 cores and 16GB of memory. All VMs > use SAN as disk storage. > > > To retrieve a volume, a slice query is used via Hector that specifies the > row key (the volume), and a list of column keys (pages), and the > consistency level is set to ONE. It is typical to retrieve multiple > volumes per request. > > > The read rate that I have been seeing is about 3-4 MB/sec, and that is > reading the raw bytes... using string serializer the rate is even lower, > about 2.2MB/sec. > > > The server log shows the GC ParNew frequently gets longer than 200ms, > often in the range of 4-5seconds. But nowhere near 15 seconds (which is an > indication that JVM heap is being swapped out). > > > Currently we have not added JNA. From a blog post, it seems JNA is able > to increase the performance by 13%, and we are hoping to increase the > performance by something more like 1300% (3-4 MB/sec is just disturbingly > low). And we are hesitant to disable swap entirely since one of the nodes > is running a couple other services > > > Do you have any suggestions on how we may boost the performance? Thanks! > > > -- Y. > > >