Re: cost estimate about some Cassandra patchs

aaron morton Tue, 07 May 2013 01:22:21 -0700

> Use case = rows with rowkey like (folder id, file id)
> And operations read/write multiple rows with same folder id => so, it could 
> make sense to have a partitioner putting rows with same "folder id" on the 
> same replicas.
The entire row key the thing we use to make the token used to both locate the 
replicas and place the row in the node. I don't see that changing.


Have you done any performance testing to see if this is a problem?

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/05/2013, at 5:27 AM, DE VITO Dominique <dominique.dev...@thalesgroup.com> 
wrote:

> > De : aaron morton [mailto:aa...@thelastpickle.com] 
> > Envoyé : dimanche 28 avril 2013 22:54
> > À : user@cassandra.apache.org
> > Objet : Re: cost estimate about some Cassandra patchs
> > 
> > > Does anyone know enough of the inner working of Cassandra to tell me how 
> > > much work is needed to patch Cassandra to enable such communication 
> > > vectorization/batch ?
> > 
>  
> > Assuming you mean "have the coordinator send multiple row read/write 
> > requests in a single message to replicas"
> > 
> > Pretty sure this has been raised as a ticket before but I cannot find one 
> > now. 
> > 
> > It would be a significant change and I'm not sure how big the benefit is. 
> > To send the messages the coordinator places them in a queue, there is 
> > little delay sending. Then it waits on them async. So there may be some 
> > saving on networking but from the coordinators point of view I think the 
> > impact is minimal. 
> > 
> > What is your use case?
>  
> Use case = rows with rowkey like (folder id, file id)
> And operations read/write multiple rows with same folder id => so, it could 
> make sense to have a partitioner putting rows with same "folder id" on the 
> same replicas.
>  
> But so far, Cassandra is not able to exploit this locality as batch effect 
> ends at the coordinator node.
>  
> So, my question about the cost estimate for patching Cassandra.
>  
> The closest (or exactly corresponding to my need ?) JIRA entries I have found 
> so far are:
>  
> CASSANDRA-166: Support batch inserts for more than one key at once
> https://issues.apache.org/jira/browse/CASSANDRA-166
> => "WON'T FIX" status
>  
> CASSANDRA-5034: Refactor to introduce Mutation Container in write path
> https://issues.apache.org/jira/browse/CASSANDRA-5034
> => I am not very sure if it's related to my topic
>  
> Thanks.
>  
> Dominique
>  
>  
>  
> > 
> > Cheers
> > 
> > 
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Consultant
> > New Zealand
> > 
> > @aaronmorton
> > http://www.thelastpickle.com
>  
> On 27/04/2013, at 4:04 AM, DE VITO Dominique 
> <dominique.dev...@thalesgroup.com> wrote:
> 
> 
> Hi,
>  
> We are created a new partitioner that groups some rows with **different** row 
> keys on the same replicas.
>  
> But neither the batch_mutate, or the multiget_slice are able to take 
> opportunity of this partitioner-defined placement to vectorize/batch 
> communications between the coordinator and the replicas.
>  
> Does anyone know enough of the inner working of Cassandra to tell me how much 
> work is needed to patch Cassandra to enable such communication 
> vectorization/batch ?
>  
> Thanks.
>  
> Regards,
> Dominique
>  
>

Re: cost estimate about some Cassandra patchs

Reply via email to