> > > The only drawback for ultra wide row I can see is point 1). But if I use > leveled compaction with a sufficiently large value for "sstable_size_in_mb" > (let's say 200Mb), will my read performance be impacted as the row grows ? >
For this use case, you would want to use SizeTieredCompaction and play around with the configuration a bit to keep a small number of large SSTables. Specifically: keep min|max_threshold really low, set bucket_low and bucket_high closer together maybe even both to 1.0, and maybe a larger min_sstable_size. YMMV though - per Rob's suggestion, take the time to run some tests tweaking these options. > > Of course, splitting wide row into several rows using bucketing technique > is one solution but it forces us to keep track of the bucket number and > it's not convenient. We have one process (jvm) that insert data and another > process (jvm) that read data. Using bucketing, we need to synchronize the > bucket number between the 2 processes. > > This could be as simple as adding year and month to the primary key (in the form 'yyyymm'). Alternatively, you could add this in the partition in the definition. Either way, it then becomes pretty easy to re-generate these based on the query parameters. -- ----------------- Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com