>  In ours tests,  we found there's a significant performance difference 
> between various  configurations and we are studying a policy to optimize it. 
> The doubt is that, if the needing of issuing multiple requests is caused only 
> by a fixable implementation detail, would make pointless do this study.
if you provide your numbers we can see if you are getting expected results. 

There are some limiting factors. Using the thrift API the max message size is 
15 MB. And each row you ask for becomes (roughly) RF number of tasks in the 
thread pools on replicas. When you ask for 1000 rows it creates (roughly) 3,000 
tasks in the replicas. If you have other clients trying to do reads at the same 
time this can cause delays to their reads. 

Like everything in computing, more is not always better. Run some tests to try 
multi gets with different sizes and see where improvements in the overall 
throughput begin to decline. 

Also consider using a newer client with token aware balancing and async 
networking. Again though, if you try to read everything at once you are going 
to have a bad day.

Cheers
  
-----------------
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/07/2013, at 8:24 PM, cesare cugnasco <cesare.cugna...@gmail.com> wrote:

> Hi Rob,
> of course, we could issue multiple requests, but then we should  consider 
> which is the optimal way to split the query in smaller ones. Moreover, we 
> should choose how many of sub-query run in parallel.
>  In ours tests,  we found there's a significant performance difference 
> between various  configurations and we are studying a policy to optimize it. 
> The doubt is that, if the needing of issuing multiple requests is caused only 
> by a fixable implementation detail, would make pointless do this study.
> 
> Does anyone made similar analysis?
> 
> 
> 2013/7/16 Robert Coli <rc...@eventbrite.com>
> 
> On Tue, Jul 16, 2013 at 4:46 AM, cesare cugnasco <cesare.cugna...@gmail.com> 
> wrote:
> We  are working on porting some life science applications to Cassandra, but 
> we have to deal with its limits managing huge queries. Our queries are 
> usually multiget_slice ones: many rows with many columns each.
> 
> You are not getting much "win" by increasing request size in Cassandra, and 
> you expose yourself to "lose" such as you have experienced.
> 
> Is there some reason you cannot just issue multiple requests?
> 
> =Rob 
> 

Reply via email to