I too am seeing very slow performance while testing worst case scenarios of 1 
key leading to 1 supercolumn and 1 column beyond that.

Key -> SuperColumn -> 1 Column (of ~ 500 bytes)

Drive utilization is 80-90% and I'm only dealing with 50-70 million rows.  
(With NO swapping)  So far, I've found nothing that helps, including increasing 
the keycache FROM 200k-500k keys, I'm guessing the hashing prevents better 
cache performance.

Read performance is definitely not 3 IOs based on the utilization factors on my 
drives.  I'm not sure the issue was ever settled in the previous e-mails as to 
how to calculate how many IOs were being done for each read.  I've been testing 
with clusters of 1,2,3 or 4 machines and so far all I'm seeing with multiple 
machines, is lower performance in a cluster than alone.  I keep assuming that 
at some number of nodes, the performance will begin to pick up.  Three of my 
nodes are running with 8GB (6GB Java Heap), and one has 4GB (3GB Java Heap).  
The machine with the smallest memory footprint is the fastest performer on 
inserts, but definitely not the fastest on reads.

I'm suspecting the read path is relying heavily on the fact that you want to 
get many columns that are closely related, because lookup by key appears to be 
incredibly slow.

From: yangfeng [mailto:yea...@gmail.com]
Sent: Tuesday, April 20, 2010 7:59 AM
To: user@cassandra.apache.org; d...@cassandra.apache.org
Subject: How to increase cassandra's performance in read?

I  get 10 columns Family by keys and  one columns Family has 30 columns.
I use multigetSlice once to get 10 column Family.but the performance is so poor.
anyone has other  thought to increase the performance.

Reply via email to