If I was rebuilding my power after spending the first thousand years of the 
Third Age as a shapeless evil I would cast my Eye of Fire in the direction of 
the filthy little multi_gets. 

A node can fail to respond to a query with rpc_timeout for two reasons: either 
the command did not run or the command started but did not complete. The former 
is much more likely. If it is happening you will see  large pending counts and 
dropped messages in nodetool tpstats, you will also see log entries about 
dropped messages.

When you send a multi_get each row you request becomes a message in the read 
thread pool. If you request 100 rows you will put 100 messages in the pool, 
which by default has 32 threads. If some clients are sending large multi get 
(or batch mutations) you can overload nodes and starve other clients. 

for background, some metrics here for selecting from 10million columns 
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

Hope that helps. 


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/05/2012, at 7:14 AM, Luís Ferreira wrote:

> Hi, 
> 
> I'm doing get_slice on huge rows (3 million columns) and even though I am 
> doing it iteratively I keep getting TimeoutExceptions. I've tried to change 
> the number of columns fetched but it did not work. 
> 
> I have a 5 machine cluster, each with 4GB of which 3 are dedicated to 
> cassandra's heap, but still the all consume all of the memory and get huge IO 
> wait due to the amout of reads.
> 
> I am running tests with 100 clients all performing multiple operations mostly 
> get_slice, get and multi_get, but the timeouts only occur in the get_slice.
> 
> Does this have anything to do with cassandra's ability (or lack thereof) to 
> keep the rows in memory? Or am I doing anything wrong? Any tips?
> 
> Cumpliments,
> Luís Ferreira
> 
> 
> 
> 

Reply via email to