Re: Timeout Exception in get_slice

Luís Ferreira Thu, 10 May 2012 03:05:45 -0700

The multi get batches range from 100 to 200.

The tests I'm running need to do get_slices and the multigets on those results. 
I can't turn either of them off.


I was only setting 16 threads for reading, but I'll boost it up to 32 and see 
what happens.

On May 9, 2012, at 11:03 AM, aaron morton wrote:

> How big are the multi get batches ?
> 
> How do the wide row get_slice calls behave when the multi gets are not 
> running ?
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 9/05/2012, at 1:47 AM, Luís Ferreira wrote:
> 
>> Maybe one of the problems is that I am reading the columns in a row and the 
>> rows themselves in batches, using the count attribute in the SliceRange and 
>> by changing the start column or the corresponding for rows with the 
>> KeyRange. According to your blog post, using start key to read for millions 
>> of rows/columns has a lot of latency, but how else can I read an entire row 
>> that does not fit into memory?
>> 
>> I'll have to run some tests again and check the tpstats. Still, do you think 
>> that adding more machines to the cluster will help a lot? I say this, 
>> because I started with a 3 node cluster and have scaled to a 5 node cluster 
>> with little improvement... 
>> 
>> Thanks anyway.
>> 
>> On May 8, 2012, at 9:54 AM, aaron morton wrote:
>> 
>>> If I was rebuilding my power after spending the first thousand years of the 
>>> Third Age as a shapeless evil I would cast my Eye of Fire in the direction 
>>> of the filthy little multi_gets. 
>>> 
>>> A node can fail to respond to a query with rpc_timeout for two reasons: 
>>> either the command did not run or the command started but did not complete. 
>>> The former is much more likely. If it is happening you will see  large 
>>> pending counts and dropped messages in nodetool tpstats, you will also see 
>>> log entries about dropped messages.
>>> 
>>> When you send a multi_get each row you request becomes a message in the 
>>> read thread pool. If you request 100 rows you will put 100 messages in the 
>>> pool, which by default has 32 threads. If some clients are sending large 
>>> multi get (or batch mutations) you can overload nodes and starve other 
>>> clients. 
>>> 
>>> for background, some metrics here for selecting from 10million columns 
>>> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
>>> 
>>> Hope that helps. 
>>> 
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 6/05/2012, at 7:14 AM, Luís Ferreira wrote:
>>> 
>>>> Hi, 
>>>> 
>>>> I'm doing get_slice on huge rows (3 million columns) and even though I am 
>>>> doing it iteratively I keep getting TimeoutExceptions. I've tried to 
>>>> change the number of columns fetched but it did not work. 
>>>> 
>>>> I have a 5 machine cluster, each with 4GB of which 3 are dedicated to 
>>>> cassandra's heap, but still the all consume all of the memory and get huge 
>>>> IO wait due to the amout of reads.
>>>> 
>>>> I am running tests with 100 clients all performing multiple operations 
>>>> mostly get_slice, get and multi_get, but the timeouts only occur in the 
>>>> get_slice.
>>>> 
>>>> Does this have anything to do with cassandra's ability (or lack thereof) 
>>>> to keep the rows in memory? Or am I doing anything wrong? Any tips?
>>>> 
>>>> Cumpliments,
>>>> Luís Ferreira
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> Cumprimentos,
>> Luís Ferreira
>> 
>> 
>> 
> 

Cumprimentos,
Luís Ferreira

Re: Timeout Exception in get_slice

Reply via email to