I too am seeing very slow performance while testing worst case scenarios of 1 key leading to 1 supercolumn and 1 column beyond that.
Key -> SuperColumn -> 1 Column (of ~ 500 bytes) Drive utilization is 80-90% and I'm only dealing with 50-70 million rows. (With NO swapping) So far, I've found nothing that helps, including increasing the keycache FROM 200k-500k keys, I'm guessing the hashing prevents better cache performance. Read performance is definitely not 3 IOs based on the utilization factors on my drives. I'm not sure the issue was ever settled in the previous e-mails as to how to calculate how many IOs were being done for each read. I've been testing with clusters of 1,2,3 or 4 machines and so far all I'm seeing with multiple machines, is lower performance in a cluster than alone. I keep assuming that at some number of nodes, the performance will begin to pick up. Three of my nodes are running with 8GB (6GB Java Heap), and one has 4GB (3GB Java Heap). The machine with the smallest memory footprint is the fastest performer on inserts, but definitely not the fastest on reads. I'm suspecting the read path is relying heavily on the fact that you want to get many columns that are closely related, because lookup by key appears to be incredibly slow. From: yangfeng [mailto:yea...@gmail.com] Sent: Tuesday, April 20, 2010 7:59 AM To: user@cassandra.apache.org; d...@cassandra.apache.org Subject: How to increase cassandra's performance in read? I get 10 columns Family by keys and one columns Family has 30 columns. I use multigetSlice once to get 10 column Family.but the performance is so poor. anyone has other thought to increase the performance.