No what I meant by infinite partition is not auto sub-partitioning, even at server-side. Ideally Cassandra should be able to support infinite partition size and make compaction, repair and streaming of such partitions manageable:
- compaction: find a way to iterate super efficiently through the whole partition and merge-sort all sstables containing data of the same partition. - repair: find another approach than Merkle tree because its resolution is not granular enough. Ideally repair resolution should be at the clustering level or every xxx clustering values - streaming: same idea as repair, in case of error/disconnection the stream should be resumed at the latest clustering level checkpoint, or at least should we checkpoint every xxx clustering values - partition index: find a way to index efficiently the huge partition. Right now huge partition has a dramatic impact on partition index. The work of Michael Kjellman on birch indices is going into the right direction (CASSANDRA-9754) About tombstone, there is recently a research paper about Dotted DB and an attempt to make delete without using tombstones: http://haslab.uminho.pt/tome/files/dotteddb_srds.pdf On Fri, Aug 24, 2018 at 12:38 AM, Rahul Singh <rahul.xavier.si...@gmail.com> wrote: > Agreed. One of the ideas I had on partition size is to automatically > synthetically shard based on some basic patterns seen in the data. > > It could be implemented as a tool that would create a new table with an > additional part of the key that is an automatic created shard, or it would > use an existing key and then migrate the data. > > The internal automatic shard would adjust as needed and keep > “Subpartitons” or “rowsets” but return the full partition given some > special CQL > > This is done today at the Data Access layer and he data model design but > it’s pretty much a step by step process that could be algorithmically done. > > Regarding the tombstone — maybe we have another thread dedicated to > cleaning tombstones - separate from compaction. Depending on the amount of > tombstones and a threshold, it would be dedicated to deletion. It may be an > edge case , but people face issues with tombstones all the time because > they don’t know better. > > Rahul > On Aug 23, 2018, 11:50 AM -0500, DuyHai Doan <doanduy...@gmail.com>, > wrote: > > As I used to tell some people, the day we make : > > 1. partition size unlimited, or at least huge partition easily manageable > (compaction, repair, streaming, partition index file) > 2. tombstone a non-issue > > that day, Cassandra will dominate any other IoT technology out there > > Until then ... > > On Thu, Aug 23, 2018 at 4:54 PM, Rahul Singh <rahul.xavier.si...@gmail.com > > wrote: > >> Good analysis of how the different key structures affect use cases and >> performance. I think you could extend this article with potential >> evaluation of FiloDB which specifically tries to solve the OLAP issue with >> arbitrary queries. >> >> Another option is leveraging Elassandra (index in Elasticsearch >> collocates with C*) or DataStax (index in Solr collocated with C*) >> >> I personally haven’t used SnappyData but that’s another Spark based DB >> that could be leveraged for performance real-time queries on the OLTP side. >> >> Rahul >> On Aug 23, 2018, 2:48 AM -0500, Affan Syed <as...@an10.io>, wrote: >> >> Hi, >> >> we wrote a blog about some of the results that engineers from AN10 shared >> earlier. >> >> I am sharing it here for greater comments and discussions. >> >> http://www.an10.io/technology/cassandra-and-iot-queries-are- >> they-a-good-match/ >> >> >> Thank you. >> >> >> >> - Affan >> >> >