Re: Large SliceRanges: Reading all results in to memory vs. reading smaller result sub-sets at a time?

aaron morton Thu, 08 Mar 2012 01:20:06 -0800

It is better to get a sensible amount. Moving a few MB's is ok (see 
thrift_framed_transport_size_in_mb in cassandra.yaml).


Long running queries can reduce the overall query throughput. They also churn 
memory over on both the server and the client. 

Run some tests on your data, see how long it takes to iterate over all the 
columns using different slice sizes. More is not always better. 

Cheers
 
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 8/03/2012, at 11:56 AM, Kevin wrote:

> When dealing with large SliceRanges, it better to read all the results in to 
> memory (by setting “count” to the largest value possible), or is it better to 
> divide the query in to smaller SliceRange queries? Large in this case being 
> on the order of millions of rows.
>  
> There’s a footnote concerning SliceRanges on the main Apache Cassandra 
> project site that reads:
>  
> “…Thrift will materialize the whole result into memory before returning it to 
> the client, so be aware that you may be better served by iterating through 
> slices by passing the last value of one call in as the start of the next 
> instead of increasing count arbitrarily large.”
>  
> … but it doesn’t delve in to the reasons why going about things that way is 
> better.
>  
> Can someone shed some light on this? And would the same logic apply to large 
> KeyRanges?
>

Re: Large SliceRanges: Reading all results in to memory vs. reading smaller result sub-sets at a time?

Reply via email to