On Mon, 2012-10-01 at 10:45 +0200, Clement Honore wrote: > We plan to use manual indexing too (with native C* indexing for other > cases). > So, for one index, we will get plenty of FK and a MultiGet call to get all > the associated entities, with RP, would then spread all the cluster. > As we don't know the cluster size yet, and as it's expected to grow at an > unknown rate, we are thinking about alternatives, now, for scalability. > > But, to tell the truth, so far, we have not done performance tests. > But as the choice of a partitioner is the first C* cornerstone, we are > already thinking about a new partitioner. > We are planning tests "random vs custom partitioner" => so, my questions > for creating, first, another one. > > AFAIS, your partitioner (the higher bits of the hash from hashing the > category, and the lower bits of the hash from hashing the document id) will > put all the docs of a category in (in average) 1 node. Quite interesting, > thanks! > I could add such a partitioner to my test suite. > > But, why not just hashing the "category" part of the row key ? > With such partitioner, as said before, many rows on *one* node are going to > have the same hash value. > - if it hurts Cassandra behavior/performance => I am curious to know why. > Anyway, in that case, I see your partitioner, so far, as the best answer to > my wishes! > - if it's NOT hurting Cassandra behavior/performance => it sounds, then, an > optimal partitioner for our needs. > > Any idea about Cassandra behavior with such hash (category-only) > partitioner ?
I honestly don't know the code well enough - but I have always assumed (perhaps incorrectly) that the whole SSTable / Memtable system was sorted on the hash value rather than the key, so that range queries are efficient - so if all items on a node have the same hash you would get awful performance for (at least) reading specific rows from disk. I could be wrong in my assumptions. Certainly having lots of hash collisions is unusual behaviour - I don't imagine the time behaviour has been tested against that situation closely. If you haven't yet tested it, then I'm not sure why you assume that accesses from a single machine would be faster than from documents spread around the ring - ethernet is fast, and if you're going to have to do disk seeks to get any of this data then you can run the seeks in parallel across a large number of spindles by spreading the load around the cluster. It also adds extra load onto machines handling popular categories - assuming the number of categories is significantly smaller than the number of documents that could make a major difference to latency. Tim > > Regards, > Clément > > 2012/9/28 Tim Wintle <timwin...@gmail.com> > > > On Fri, 2012-09-28 at 18:20 +0200, Clement Honore wrote: > > > Hi,**** > > > > > > ** ** > > > > > > I have hierarchical data.**** > > > > > > I'm storing them in CF with rowkey somewhat like (category, doc id), and > > > plenty of columns for a doc definition.**** > > > > > > ** ** > > > > > > I have hierarchical data traversal too.**** > > > > > > The user just chooses one category, and then, interact with docs > > belonging > > > only to this category.**** > > > > > > ** ** > > > > > > 1) If I use RandomPartitioner, all docs could be spread within all nodes > > in > > > the cluster => bad performance.**** > > > > > > ** ** > > > > > > 2) Using RandomPartitioner, an alternative design could be > > rowkey=category > > > and column name=(doc id, prop name)**** > > > > > > I don't want it because I need fixed column names for indexing purposes, > > > and the "category" is quite a lonnnng string.**** > > > > > > ** ** > > > > > > 3) Then, I want to define a new partitioner for my rowkey (category, doc > > > id), doing MD5 only for the "category" part.**** > > > > > > ** ** > > > > > > The question is : with such partitioner, many rows on *one* node are > > going > > > to have the same MD5 value, as a result of this new partitioner.**** > > > > If you do decide writing having rows on the same node is what you want, > > then you could take the higher bits of the hash from hashing the > > category, and the lower bits of the hash from hashing the document id. > > > > That would mean documents in a category would be close to each other in > > the ring - while being unlikely to share the same hash. > > > > > > However, If you're doing this then all reads/writes to the category are > > going to be to a single machine. That's not going to spread the load > > across the cluster very well as I assume a few categories are going to > > be far more popular than others. > > > > Have you tested that you actually get bad performance from > > RandomPartitioner? > > > > Tim > > > >