RE: cost estimate about some Cassandra patchs

DE VITO Dominique Tue, 07 May 2013 02:58:25 -0700

> -----Message d'origine-----
> De : aaron morton [mailto:aa...@thelastpickle.com] 
> Envoyé : mardi 7 mai 2013 10:22
> À : user@cassandra.apache.org
> Objet : Re: cost estimate about some Cassandra patchs
>
> > Use case = rows with rowkey like (folder id, file id)
> > And operations read/write multiple rows with same folder id => so, it could 
> > make sense to have a partitioner putting rows with same "folder id" on the 
> > same > replicas.
> The entire row key the thing we use to make the token used to both locate the 
> replicas and place the row in the node. I don't see that changing.

Well, we can't do that, because of secondary indexes on rows.
Only the C* v2 will allow the row design you mention, with secondary index.
So, this row design you mention is a no go for us, with C* 1.1 or 1.2.

> Have you done any performance testing to see if this is a problem?

Unfortunately, we have just some pieces, today, for doing performance testing. 
We are beginning. But still, I investigate to know if alternative designs are 
(at least) possible. Because if no alternative design is easy to develop, then 
there's no need to compare performance.

The lesson I learnt here is that, if I would restart our project from the 
beginning, I would start a more extensive performance testing project along 
with business project development. It's a kind of must-have for a NoSQL 
database.

So, the only tests we have done so far with our FolderPartitioner is with a one 
machine-cluster.
As expected, due to the more important work of this FolderPartitioner, the CPU 
is a better higher (~10%), memory and network consumptions are the same than 
with RP, but I have strange results for I/O (average hard drive), for example, 
for a write-only test. I don't know why the I/O consumption could be much 
higher with our FolderPartitioner than with the RP. So, I am questioning my 
measurement methods, and my C* understanding.
Well, the use of such FolderPartitioner is quite a long way to go...

Regards.
Dominique

> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com

On 7/05/2013, at 5:27 AM, DE VITO Dominique <dominique.dev...@thalesgroup.com> 
wrote:

> > De : aaron morton [mailto:aa...@thelastpickle.com] 
> > Envoyé : dimanche 28 avril 2013 22:54
> > À : user@cassandra.apache.org
> > Objet : Re: cost estimate about some Cassandra patchs
> > 
> > > Does anyone know enough of the inner working of Cassandra to tell me how 
> > > much work is needed to patch Cassandra to enable such communication 
> > > vectorization/batch ?
> > 
>  
> > Assuming you mean "have the coordinator send multiple row read/write 
> > requests in a single message to replicas"
> > 
> > Pretty sure this has been raised as a ticket before but I cannot find one 
> > now. 
> > 
> > It would be a significant change and I'm not sure how big the benefit is. 
> > To send the messages the coordinator places them in a queue, there is 
> > little delay sending. Then it waits on them async. So there may be some 
> > saving on networking but from the coordinators point of view I think the 
> > impact is minimal. 
> > 
> > What is your use case?
>  
> Use case = rows with rowkey like (folder id, file id)
> And operations read/write multiple rows with same folder id => so, it could 
> make sense to have a partitioner putting rows with same "folder id" on the 
> same replicas.
>  
> But so far, Cassandra is not able to exploit this locality as batch effect 
> ends at the coordinator node.
>  
> So, my question about the cost estimate for patching Cassandra.
>  
> The closest (or exactly corresponding to my need ?) JIRA entries I have found 
> so far are:
>  
> CASSANDRA-166: Support batch inserts for more than one key at once
> https://issues.apache.org/jira/browse/CASSANDRA-166
> => "WON'T FIX" status
>  
> CASSANDRA-5034: Refactor to introduce Mutation Container in write path
> https://issues.apache.org/jira/browse/CASSANDRA-5034
> => I am not very sure if it's related to my topic
>  
> Thanks.
>  
> Dominique
>  
>  
>  
> > 
> > Cheers
> > 
> > 
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Consultant
> > New Zealand
> > 
> > @aaronmorton
> > http://www.thelastpickle.com
>  
> On 27/04/2013, at 4:04 AM, DE VITO Dominique 
> <dominique.dev...@thalesgroup.com> wrote:
> 
> 
> Hi,
>  
> We are created a new partitioner that groups some rows with **different** row 
> keys on the same replicas.
>  
> But neither the batch_mutate, or the multiget_slice are able to take 
> opportunity of this partitioner-defined placement to vectorize/batch 
> communications between the coordinator and the replicas.
>  
> Does anyone know enough of the inner working of Cassandra to tell me how much 
> work is needed to patch Cassandra to enable such communication 
> vectorization/batch ?
>  
> Thanks.
>  
> Regards,
> Dominique
>  
>

RE: cost estimate about some Cassandra patchs

Reply via email to