No what I meant by infinite partition is not auto sub-partitioning, even at
server-side. Ideally Cassandra should be able to support infinite partition
size and make compaction, repair and streaming of such partitions
manageable:

- compaction: find a way to iterate super efficiently through the whole
partition and merge-sort all sstables containing data of the same
partition.

 - repair: find another approach than Merkle tree because its resolution is
not granular enough. Ideally repair resolution should be at the clustering
level or every xxx clustering values

 - streaming: same idea as repair, in case of error/disconnection the
stream should be resumed at the latest clustering level checkpoint, or at
least should we checkpoint every xxx clustering values

 - partition index: find a way to index efficiently the huge partition.
Right now huge partition has a dramatic impact on partition index. The work
of Michael Kjellman on birch indices is going into the right direction
 (CASSANDRA-9754)

About tombstone, there is recently a research paper about Dotted DB and an
attempt to make delete without using tombstones:
http://haslab.uminho.pt/tome/files/dotteddb_srds.pdf



On Fri, Aug 24, 2018 at 12:38 AM, Rahul Singh <rahul.xavier.si...@gmail.com>
wrote:

> Agreed. One of the ideas I had on partition size is to automatically
> synthetically shard based on some basic patterns seen in the data.
>
> It could be implemented as a tool that would create a new table with an
> additional part of the key that is an automatic created shard, or it would
> use an existing key and then migrate the data.
>
> The internal automatic shard would adjust as needed and keep
> “Subpartitons” or “rowsets” but return the full partition given some
> special CQL
>
> This is done today at the Data Access layer and he data model design but
> it’s pretty much a step by step process that could be algorithmically done.
>
> Regarding the tombstone — maybe we have another thread dedicated to
> cleaning tombstones - separate from compaction. Depending on the amount of
> tombstones and a threshold, it would be dedicated to deletion. It may be an
> edge case , but people face issues with tombstones all the time because
> they don’t know better.
>
> Rahul
> On Aug 23, 2018, 11:50 AM -0500, DuyHai Doan <doanduy...@gmail.com>,
> wrote:
>
> As I used to tell some people, the day we make :
>
> 1. partition size unlimited, or at least huge partition easily manageable
> (compaction, repair, streaming, partition index file)
> 2. tombstone a non-issue
>
> that day, Cassandra will dominate any other IoT technology out there
>
> Until then ...
>
> On Thu, Aug 23, 2018 at 4:54 PM, Rahul Singh <rahul.xavier.si...@gmail.com
> > wrote:
>
>> Good analysis of how the different key structures affect use cases and
>> performance. I think you could extend this article with potential
>> evaluation of FiloDB which specifically tries to solve the OLAP issue with
>> arbitrary queries.
>>
>> Another option is leveraging Elassandra (index in Elasticsearch
>> collocates with C*) or DataStax (index in Solr collocated with C*)
>>
>> I personally haven’t used SnappyData but that’s another Spark based DB
>> that could be leveraged for performance real-time queries on the OLTP side.
>>
>> Rahul
>> On Aug 23, 2018, 2:48 AM -0500, Affan Syed <as...@an10.io>, wrote:
>>
>> Hi,
>>
>> we wrote a blog about some of the results that engineers from AN10 shared
>> earlier.
>>
>> I am sharing it here for greater comments and discussions.
>>
>> http://www.an10.io/technology/cassandra-and-iot-queries-are-
>> they-a-good-match/
>>
>>
>> Thank you.
>>
>>
>>
>> - Affan
>>
>>
>

Reply via email to