Hey, We are considering upgrading from 1.2 to 2.0, why don't you consider 2.0 ready for production yet, Robert? Have you wrote about this somewhere already?
A bit off-topic in this discussion but it would be interesting to know, your posts are generally very enlightening. Cheers, On Thu, May 29, 2014 at 8:51 PM, Robert Coli <rc...@eventbrite.com> wrote: > On Thu, May 15, 2014 at 6:10 AM, Vegard Berget <p...@fantasista.no> wrote: > >> I know this has been discussed before, and I know there are limitations >> to how many rows one partition key in practice can handle. But I am not >> sure if number of rows or total data is the deciding factor. >> > > Both. In terms of data size, partitions containing over a small number of > hundreds of Megabytes begin to see diminishing returns in some cases. > Partitions over 64 megabytes are compacted on disk, which should give you a > rough sense of what Cassandra considers a "large" partition. > > >> Should we add another partition key to avoid 1 000 000 rows in the same >> thrift-row (which is how I understand it is actually stored)? Or is 1 000 >> 000 rows okay? >> > > Depending on row size and access patterns, 1Mn rows is not extremely > large. There are, however, some row sizes and operations where this order > of magnitude of columns might be slow. > > >> Other considerations, for example compaction strategy and if we should do >> an upgrade to 2.0 because of this (we will upgrade anyway, but if it is >> recommended we will continue to use 2.0 in development and upgrade the >> production environment sooner) >> > > You should not upgrade to 2.0 in order to address this concern. You should > upgrade to 2.0 when it is stable enough to run in production, which IMO is > not yet. YMMV. > > >> I have done some testing, inserting a million rows and selecting them >> all, counting them and selecting individual rows (with both clientid and >> id) and it seems fine, but I want to ask to be sure that I am on the right >> track. >> > > If the access patterns you are using perform the way you would like with > representative size data, sounds reasonable to me? > > If you are able to select all million rows within a reasonable percentage > of the relevant timeout, I presume they cannot be too huge in terms of data > size! :D > > =Rob > -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br <http://www.chaordic.com.br/>* +55 48 3232.3200