On Thu, May 15, 2014 at 6:10 AM, Vegard Berget <p...@fantasista.no> wrote:
> I know this has been discussed before, and I know there are limitations to > how many rows one partition key in practice can handle. But I am not sure > if number of rows or total data is the deciding factor. > Both. In terms of data size, partitions containing over a small number of hundreds of Megabytes begin to see diminishing returns in some cases. Partitions over 64 megabytes are compacted on disk, which should give you a rough sense of what Cassandra considers a "large" partition. > Should we add another partition key to avoid 1 000 000 rows in the same > thrift-row (which is how I understand it is actually stored)? Or is 1 000 > 000 rows okay? > Depending on row size and access patterns, 1Mn rows is not extremely large. There are, however, some row sizes and operations where this order of magnitude of columns might be slow. > Other considerations, for example compaction strategy and if we should do > an upgrade to 2.0 because of this (we will upgrade anyway, but if it is > recommended we will continue to use 2.0 in development and upgrade the > production environment sooner) > You should not upgrade to 2.0 in order to address this concern. You should upgrade to 2.0 when it is stable enough to run in production, which IMO is not yet. YMMV. > I have done some testing, inserting a million rows and selecting them all, > counting them and selecting individual rows (with both clientid and id) and > it seems fine, but I want to ask to be sure that I am on the right track. > If the access patterns you are using perform the way you would like with representative size data, sounds reasonable to me? If you are able to select all million rows within a reasonable percentage of the relevant timeout, I presume they cannot be too huge in terms of data size! :D =Rob