On Tue, Jan 18, 2011 at 4:29 PM, Shu Zhang <szh...@mediosystems.com> wrote:
> Well, I don't think what I'm describing is complicated semantics. I think 
> I've described general batch operation design and something that is 
> symmetrical the batch_mutate method already on the Cassandra API. You are 
> right, I can solve the problem with further denormalization, and the approach 
> of making individual gets in parallel as described by Brandon will work too. 
> I'll be doing one of these for now. But I think neither is as efficient, and 
> I guess I'm still not sure why the multiget is designed the way it is.
>
> The problem with denormalization is you gotta make multiple row writes in 
> place of one, adding load to the server, adding required physical space and 
> losing atomicity on write operations. I know writes are cheap in cassandra, 
> and you can catch failed writes and retry so these problems are not major, 
> but it still seems clear that having a batch-get that works appropriately is 
> a least a little better...
> ________________________________________
> From: Aaron Morton [aa...@thelastpickle.com]
> Sent: Tuesday, January 18, 2011 12:55 PM
> To: user@cassandra.apache.org
> Subject: Re: please help with multiget
>
> I think the general approach is to denormalise data to remove the need for 
> complicated semantics when reading.
>
> Aaron
>
> On 19/01/2011, at 7:57 AM, Shu Zhang <szh...@mediosystems.com> wrote:
>
>> Well, maybe making a batch-get is not  anymore efficient on the server side 
>> but without it, you can get bottlenecked on client-server connections and 
>> client resources. If the number of requests you want to batch is on the 
>> order of connections in your pool, then yes, making gets in parallel is as 
>> good or maybe better. But what if you want to batch thousands of requests?
>>
>> The server I can scale out, I would want to get my requests there without 
>> needing to wait for connections on my client to free up.
>>
>> I just don't really understand the reasoning for designing muliget_slice the 
>> way it is. I still think if you're gonna have a batch-get request 
>> (multiget_slice), you should be able to add to the batch a reasonable number 
>> of ANY corresponding non-batch get requests. And you can't do that... Plus, 
>> it's not symmetrical to the batch-mutate. Is there a good reason for that?
>> ________________________________________
>> From: Brandon Williams [dri...@gmail.com]
>> Sent: Monday, January 17, 2011 5:09 PM
>> To: user@cassandra.apache.org
>> Cc: hector-us...@googlegroups.com
>> Subject: Re: please help with multiget
>>
>> On Mon, Jan 17, 2011 at 6:53 PM, Shu Zhang 
>> <szh...@mediosystems.com<mailto:szh...@mediosystems.com>> wrote:
>> Here's the method declaration for quick reference:
>> map<string,list<ColumnOrSuperColumn>> multiget_slice(string keyspace, 
>> list<string> keys, ColumnParent column_parent, SlicePredicate predicate, 
>> ConsistencyLevel consistency_level)
>>
>> It looks like you must have the same SlicePredicate for every key in your 
>> batch retrieval, so what are you suppose to do when you need to retrieve 
>> different columns for different keys?
>>
>> Issue multiple gets in parallel yourself.  Keep in mind that multiget is not 
>> an optimization, in fact, it can work against you when one key exceeds the 
>> rpc timeout, because you get nothing back.
>>
>> -Brandon
>

muliget_slice is very useful I IMHO. In my testing, the roundtrip time
for 1000 get requests all being acked individually is much higher then
rountrip time for 200 get_slice grouped 5 at a time. For anyone that
needs that type of access they are in good shape.

I was also theorizing that a CF using RowCache with very, very high
read rate would benefit from "pooling" a bunch of reads together with
multiget.

I do agree that the first time I looked at the multi_get_slice
signature I realized I could do many of the things I was expecting
from a multi-get.

Reply via email to