Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

DuyHai Doan Sat, 15 Oct 2016 04:27:58 -0700

"2) so what is optimal limit in terms of data size?"

--> Usual recommendations for Cassandra 2.1 are:


a. max 100Mb per partition size
b. or up to 10 000 000 physical columns for a partition (including
clustering columns etc ...)

Recently, with the work of Robert Stupp (CASSANDRA-11206) and also with the
huge enhancement from Michael Kjellman (CASSANDRA-9754) it will be easier
to handle huge partition in memory, especially with a reduce memory
footprint with regards to the JVM heap.

However, as long as we don't have repair and streaming processes that can
be "resumed" in a middle of a partition, the operational pains will still
be there. Same for compaction



On Sat, Oct 15, 2016 at 12:00 PM, Kant Kodali <k...@peernova.com> wrote:

> 1) It will be great if someone can confirm that there is no limit
> 2) so what is optimal limit in terms of data size?
>
> Finally, Thanks a lot for pointing out all the operational issues!
>
> On Sat, Oct 15, 2016 at 2:39 AM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> "But is there still 2B columns limit on the Cassandra code?"
>>
>> --> I remember some one the committer saying that this 2B columns
>> limitation comes from the Thrift era where you're limited to max  2B
>> columns to be returned to the client for each request. It also applies to
>> the max size of each "page" of data
>>
>> Since the introduction of the binary protocol and the paging feature,
>> this limitation does not make sense anymore.
>>
>> By the way, if your partition is too wide, you'll face other operational
>> issues way before reaching the 2B columns limit:
>>
>> - compaction taking looooong time --> heap pressure --> long GC pauses
>> --> nodes flapping
>> - repair & over-streaming, repair session failure in the middle that
>> forces you to re-send the whole big partition --> the receiving node has a
>> bunch of duplicate data --> pressure on compaction
>> - bootstrapping of new nodes --> failure to stream a partition in the
>> middle will force to re-send the whole partition from the beginning again -->
>> the receiving node has a bunch of duplicate data --> pressure on compaction
>>
>>
>>
>> On Sat, Oct 15, 2016 at 9:15 AM, Kant Kodali <k...@peernova.com> wrote:
>>
>>>  compacting 10 sstables each of them have a 15GB partition in what
>>> duration?
>>>
>>> On Fri, Oct 14, 2016 at 11:45 PM, Matope Ono <matope....@gmail.com>
>>> wrote:
>>>
>>>> Please forget the part in my sentence.
>>>> For more correctly, maybe I should have said like "He could compact 10
>>>> sstables each of them have a 15GB partition".
>>>> What I wanted to say is we can store much more rows(and columns) in a
>>>> partition than before 3.6.
>>>>
>>>> 2016-10-15 15:34 GMT+09:00 Kant Kodali <k...@peernova.com>:
>>>>
>>>>> "Robert said he could treat safely 10 15GB partitions at his
>>>>> presentation" This sounds like there is there is a row limit too not
>>>>> only columns??
>>>>>
>>>>> If I am reading this correctly 10 15GB partitions  means 10 partitions
>>>>> (like 10 row keys,  thats too small) with each partition of size 15GB.
>>>>> (thats like 15 million columns where each column can have a data of size
>>>>> 1KB).
>>>>>
>>>>> On Fri, Oct 14, 2016 at 11:30 PM, Kant Kodali <k...@peernova.com>
>>>>> wrote:
>>>>>
>>>>>> "Robert said he could treat safely 10 15GB partitions at his
>>>>>> presentation" This sounds like there is there is a row limit too not
>>>>>> only columns??
>>>>>>
>>>>>> If I am reading this correctly 10 15GB partitions  means 10
>>>>>> partitions (like 10 row keys,  thats too small) with each partition of 
>>>>>> size
>>>>>> 15GB. (thats like 10 million columns where each column can have a data of
>>>>>> size 1KB).
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 14, 2016 at 9:54 PM, Matope Ono <matope....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks to CASSANDRA-11206, I think we can have much larger partition
>>>>>>> than before 3.6.
>>>>>>> (Robert said he could treat safely 10 15GB partitions at his
>>>>>>> presentation. https://www.youtube.com/watch?v=N3mGxgnUiRY)
>>>>>>>
>>>>>>> But is there still 2B columns limit on the Cassandra code?
>>>>>>> If so, out of curiosity, I'd like to know where the bottleneck is.
>>>>>>> Could anyone let me know about it?
>>>>>>>
>>>>>>> Thanks Yasuharu.
>>>>>>>
>>>>>>>
>>>>>>> 2016-10-13 1:11 GMT+09:00 Edward Capriolo <edlinuxg...@gmail.com>:
>>>>>>>
>>>>>>>> The "2 billion column limit" press clipping "puffery". This
>>>>>>>> statement seemingly became popular because highly traffic traffic-ed 
>>>>>>>> story,
>>>>>>>> in which a tech reporter embellished on a statement to make a splashy
>>>>>>>> article.
>>>>>>>>
>>>>>>>> The effect is something like this:
>>>>>>>> http://www.healthnewsreview.org/2012/08/iced-tea-kidney-ston
>>>>>>>> es-and-the-study-that-never-existed/
>>>>>>>>
>>>>>>>> Iced tea does not cause kidney stones! Cassandra does not store
>>>>>>>> rows with 2 billion columns! It is just not true.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 12, 2016 at 4:57 AM, Kant Kodali <k...@peernova.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Well 1) I have not sent it to postgresql mailing lists 2) I
>>>>>>>>> thought this is an open ended question as it can involve ideas from
>>>>>>>>> everywhere including the Cassandra java driver mailing lists so sorry 
>>>>>>>>> If
>>>>>>>>> that bothered you for some reason.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 12, 2016 at 1:41 AM, Dorian Hoxha <
>>>>>>>>> dorian.ho...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Also, I'm not sure, but I don't think it's "cool" to write to
>>>>>>>>>> multiple lists in the same message. (based on postgresql mailing 
>>>>>>>>>> lists
>>>>>>>>>> rules).
>>>>>>>>>> Example I'm not subscribed to those, and now the messages are
>>>>>>>>>> separated.
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha <
>>>>>>>>>> dorian.ho...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> There are some issues working on larger partitions.
>>>>>>>>>>> Hbase doesn't do what you say! You have also to be carefull on
>>>>>>>>>>> hbase not to create large rows! But since they are globally-sorted, 
>>>>>>>>>>> you can
>>>>>>>>>>> easily sort between them and create small rows.
>>>>>>>>>>>
>>>>>>>>>>> In my opinion, cassandra people are wrong, in that they say
>>>>>>>>>>> "globally sorted is the devil!" while all fb/google/etc actually use
>>>>>>>>>>> globally-sorted most of the time! You have to be careful though 
>>>>>>>>>>> (just like
>>>>>>>>>>> with random partition)
>>>>>>>>>>>
>>>>>>>>>>> Can you tell what rowkey1, page1, col(x) actually are ? Maybe
>>>>>>>>>>> there is a way.
>>>>>>>>>>> The most "recent", means there's a timestamp in there ?
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali <k...@peernova.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> I understand Cassandra can have a maximum of 2B rows per
>>>>>>>>>>>> partition but in practice some people seem to suggest the magic 
>>>>>>>>>>>> number is
>>>>>>>>>>>> 100K. why not create another partition/rowkey automatically 
>>>>>>>>>>>> (whenever we
>>>>>>>>>>>> reach a safe limit that  we consider would be efficient)  with auto
>>>>>>>>>>>> increment bigint  as a suffix appended to the new rowkey? so that 
>>>>>>>>>>>> the
>>>>>>>>>>>> driver can return the new rowkey  indicating that there is a new 
>>>>>>>>>>>> partition
>>>>>>>>>>>> and so on...Now I understand this would involve allowing partial 
>>>>>>>>>>>> row key
>>>>>>>>>>>> searches which currently Cassandra wouldn't do (but I believe 
>>>>>>>>>>>> HBASE does)
>>>>>>>>>>>> and thinking about token ranges and potentially many other things..
>>>>>>>>>>>>
>>>>>>>>>>>> My current problem is this
>>>>>>>>>>>>
>>>>>>>>>>>> I have a row key followed by bunch of columns (this is not time
>>>>>>>>>>>> series data)
>>>>>>>>>>>> and these columns can grow to any number so since I have 100K
>>>>>>>>>>>> limit (or whatever the number is. say some limit) I want to break 
>>>>>>>>>>>> the
>>>>>>>>>>>> partition into level/pages
>>>>>>>>>>>>
>>>>>>>>>>>> rowkey1, page1->col1, col2, col3......
>>>>>>>>>>>> rowkey1, page2->col1, col2, col3......
>>>>>>>>>>>>
>>>>>>>>>>>> now say my Cassandra db is populated with data and say my
>>>>>>>>>>>> application just got booted up and I want to most recent value of 
>>>>>>>>>>>> a certain
>>>>>>>>>>>> partition but I don't know which page it belongs to since my 
>>>>>>>>>>>> application
>>>>>>>>>>>> just got booted up? how do I solve this in the most efficient that 
>>>>>>>>>>>> is
>>>>>>>>>>>> possible in Cassandra today? I understand I can create MV, other 
>>>>>>>>>>>> tables
>>>>>>>>>>>> that can hold some auxiliary data such as number of pages per 
>>>>>>>>>>>> partition and
>>>>>>>>>>>> so on..but that involves the maintenance cost of that other table 
>>>>>>>>>>>> which I
>>>>>>>>>>>> cannot afford really because I have MV's, secondary indexes for 
>>>>>>>>>>>> other good
>>>>>>>>>>>> reasons. so it would be great if someone can explain the best way 
>>>>>>>>>>>> possible
>>>>>>>>>>>> as of today with Cassandra? By best way I mean is it possible with 
>>>>>>>>>>>> one
>>>>>>>>>>>> request? If Yes, then how? If not, then what is the next best way 
>>>>>>>>>>>> to solve
>>>>>>>>>>>> this?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> kant
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

Reply via email to