Re: Timeout Exception in get_slice

aaron morton Wed, 09 May 2012 03:03:59 -0700

How big are the multi get batches ?

How do the wide row get_slice calls behave when the multi gets are not running ?


Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 9/05/2012, at 1:47 AM, Luís Ferreira wrote:

> Maybe one of the problems is that I am reading the columns in a row and the 
> rows themselves in batches, using the count attribute in the SliceRange and 
> by changing the start column or the corresponding for rows with the KeyRange. 
> According to your blog post, using start key to read for millions of 
> rows/columns has a lot of latency, but how else can I read an entire row that 
> does not fit into memory?
> 
> I'll have to run some tests again and check the tpstats. Still, do you think 
> that adding more machines to the cluster will help a lot? I say this, because 
> I started with a 3 node cluster and have scaled to a 5 node cluster with 
> little improvement... 
> 
> Thanks anyway.
> 
> On May 8, 2012, at 9:54 AM, aaron morton wrote:
> 
>> If I was rebuilding my power after spending the first thousand years of the 
>> Third Age as a shapeless evil I would cast my Eye of Fire in the direction 
>> of the filthy little multi_gets. 
>> 
>> A node can fail to respond to a query with rpc_timeout for two reasons: 
>> either the command did not run or the command started but did not complete. 
>> The former is much more likely. If it is happening you will see  large 
>> pending counts and dropped messages in nodetool tpstats, you will also see 
>> log entries about dropped messages.
>> 
>> When you send a multi_get each row you request becomes a message in the read 
>> thread pool. If you request 100 rows you will put 100 messages in the pool, 
>> which by default has 32 threads. If some clients are sending large multi get 
>> (or batch mutations) you can overload nodes and starve other clients. 
>> 
>> for background, some metrics here for selecting from 10million columns 
>> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
>> 
>> Hope that helps. 
>> 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 6/05/2012, at 7:14 AM, Luís Ferreira wrote:
>> 
>>> Hi, 
>>> 
>>> I'm doing get_slice on huge rows (3 million columns) and even though I am 
>>> doing it iteratively I keep getting TimeoutExceptions. I've tried to change 
>>> the number of columns fetched but it did not work. 
>>> 
>>> I have a 5 machine cluster, each with 4GB of which 3 are dedicated to 
>>> cassandra's heap, but still the all consume all of the memory and get huge 
>>> IO wait due to the amout of reads.
>>> 
>>> I am running tests with 100 clients all performing multiple operations 
>>> mostly get_slice, get and multi_get, but the timeouts only occur in the 
>>> get_slice.
>>> 
>>> Does this have anything to do with cassandra's ability (or lack thereof) to 
>>> keep the rows in memory? Or am I doing anything wrong? Any tips?
>>> 
>>> Cumpliments,
>>> Luís Ferreira
>>> 
>>> 
>>> 
>>> 
>> 
> 
> Cumprimentos,
> Luís Ferreira
> 
> 
>

Re: Timeout Exception in get_slice

Reply via email to