> -----Message d'origine----- > De : aaron morton [mailto:aa...@thelastpickle.com] > Envoyé : mardi 7 mai 2013 10:22 > À : user@cassandra.apache.org > Objet : Re: cost estimate about some Cassandra patchs > > > Use case = rows with rowkey like (folder id, file id) > > And operations read/write multiple rows with same folder id => so, it could > > make sense to have a partitioner putting rows with same "folder id" on the > > same > replicas. > The entire row key the thing we use to make the token used to both locate the > replicas and place the row in the node. I don't see that changing.
Well, we can't do that, because of secondary indexes on rows. Only the C* v2 will allow the row design you mention, with secondary index. So, this row design you mention is a no go for us, with C* 1.1 or 1.2. > Have you done any performance testing to see if this is a problem? Unfortunately, we have just some pieces, today, for doing performance testing. We are beginning. But still, I investigate to know if alternative designs are (at least) possible. Because if no alternative design is easy to develop, then there's no need to compare performance. The lesson I learnt here is that, if I would restart our project from the beginning, I would start a more extensive performance testing project along with business project development. It's a kind of must-have for a NoSQL database. So, the only tests we have done so far with our FolderPartitioner is with a one machine-cluster. As expected, due to the more important work of this FolderPartitioner, the CPU is a better higher (~10%), memory and network consumptions are the same than with RP, but I have strange results for I/O (average hard drive), for example, for a write-only test. I don't know why the I/O consumption could be much higher with our FolderPartitioner than with the RP. So, I am questioning my measurement methods, and my C* understanding. Well, the use of such FolderPartitioner is quite a long way to go... Regards. Dominique > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com On 7/05/2013, at 5:27 AM, DE VITO Dominique <dominique.dev...@thalesgroup.com> wrote: > > De : aaron morton [mailto:aa...@thelastpickle.com] > > Envoyé : dimanche 28 avril 2013 22:54 > > À : user@cassandra.apache.org > > Objet : Re: cost estimate about some Cassandra patchs > > > > > Does anyone know enough of the inner working of Cassandra to tell me how > > > much work is needed to patch Cassandra to enable such communication > > > vectorization/batch ? > > > > > Assuming you mean "have the coordinator send multiple row read/write > > requests in a single message to replicas" > > > > Pretty sure this has been raised as a ticket before but I cannot find one > > now. > > > > It would be a significant change and I'm not sure how big the benefit is. > > To send the messages the coordinator places them in a queue, there is > > little delay sending. Then it waits on them async. So there may be some > > saving on networking but from the coordinators point of view I think the > > impact is minimal. > > > > What is your use case? > > Use case = rows with rowkey like (folder id, file id) > And operations read/write multiple rows with same folder id => so, it could > make sense to have a partitioner putting rows with same "folder id" on the > same replicas. > > But so far, Cassandra is not able to exploit this locality as batch effect > ends at the coordinator node. > > So, my question about the cost estimate for patching Cassandra. > > The closest (or exactly corresponding to my need ?) JIRA entries I have found > so far are: > > CASSANDRA-166: Support batch inserts for more than one key at once > https://issues.apache.org/jira/browse/CASSANDRA-166 > => "WON'T FIX" status > > CASSANDRA-5034: Refactor to introduce Mutation Container in write path > https://issues.apache.org/jira/browse/CASSANDRA-5034 > => I am not very sure if it's related to my topic > > Thanks. > > Dominique > > > > > > > Cheers > > > > > > ----------------- > > Aaron Morton > > Freelance Cassandra Consultant > > New Zealand > > > > @aaronmorton > > http://www.thelastpickle.com > > On 27/04/2013, at 4:04 AM, DE VITO Dominique > <dominique.dev...@thalesgroup.com> wrote: > > > Hi, > > We are created a new partitioner that groups some rows with **different** row > keys on the same replicas. > > But neither the batch_mutate, or the multiget_slice are able to take > opportunity of this partitioner-defined placement to vectorize/batch > communications between the coordinator and the replicas. > > Does anyone know enough of the inner working of Cassandra to tell me how much > work is needed to patch Cassandra to enable such communication > vectorization/batch ? > > Thanks. > > Regards, > Dominique > >