"2) so what is optimal limit in terms of data size?" --> Usual recommendations for Cassandra 2.1 are:
a. max 100Mb per partition size b. or up to 10 000 000 physical columns for a partition (including clustering columns etc ...) Recently, with the work of Robert Stupp (CASSANDRA-11206) and also with the huge enhancement from Michael Kjellman (CASSANDRA-9754) it will be easier to handle huge partition in memory, especially with a reduce memory footprint with regards to the JVM heap. However, as long as we don't have repair and streaming processes that can be "resumed" in a middle of a partition, the operational pains will still be there. Same for compaction On Sat, Oct 15, 2016 at 12:00 PM, Kant Kodali <k...@peernova.com> wrote: > 1) It will be great if someone can confirm that there is no limit > 2) so what is optimal limit in terms of data size? > > Finally, Thanks a lot for pointing out all the operational issues! > > On Sat, Oct 15, 2016 at 2:39 AM, DuyHai Doan <doanduy...@gmail.com> wrote: > >> "But is there still 2B columns limit on the Cassandra code?" >> >> --> I remember some one the committer saying that this 2B columns >> limitation comes from the Thrift era where you're limited to max 2B >> columns to be returned to the client for each request. It also applies to >> the max size of each "page" of data >> >> Since the introduction of the binary protocol and the paging feature, >> this limitation does not make sense anymore. >> >> By the way, if your partition is too wide, you'll face other operational >> issues way before reaching the 2B columns limit: >> >> - compaction taking looooong time --> heap pressure --> long GC pauses >> --> nodes flapping >> - repair & over-streaming, repair session failure in the middle that >> forces you to re-send the whole big partition --> the receiving node has a >> bunch of duplicate data --> pressure on compaction >> - bootstrapping of new nodes --> failure to stream a partition in the >> middle will force to re-send the whole partition from the beginning again --> >> the receiving node has a bunch of duplicate data --> pressure on compaction >> >> >> >> On Sat, Oct 15, 2016 at 9:15 AM, Kant Kodali <k...@peernova.com> wrote: >> >>> compacting 10 sstables each of them have a 15GB partition in what >>> duration? >>> >>> On Fri, Oct 14, 2016 at 11:45 PM, Matope Ono <matope....@gmail.com> >>> wrote: >>> >>>> Please forget the part in my sentence. >>>> For more correctly, maybe I should have said like "He could compact 10 >>>> sstables each of them have a 15GB partition". >>>> What I wanted to say is we can store much more rows(and columns) in a >>>> partition than before 3.6. >>>> >>>> 2016-10-15 15:34 GMT+09:00 Kant Kodali <k...@peernova.com>: >>>> >>>>> "Robert said he could treat safely 10 15GB partitions at his >>>>> presentation" This sounds like there is there is a row limit too not >>>>> only columns?? >>>>> >>>>> If I am reading this correctly 10 15GB partitions means 10 partitions >>>>> (like 10 row keys, thats too small) with each partition of size 15GB. >>>>> (thats like 15 million columns where each column can have a data of size >>>>> 1KB). >>>>> >>>>> On Fri, Oct 14, 2016 at 11:30 PM, Kant Kodali <k...@peernova.com> >>>>> wrote: >>>>> >>>>>> "Robert said he could treat safely 10 15GB partitions at his >>>>>> presentation" This sounds like there is there is a row limit too not >>>>>> only columns?? >>>>>> >>>>>> If I am reading this correctly 10 15GB partitions means 10 >>>>>> partitions (like 10 row keys, thats too small) with each partition of >>>>>> size >>>>>> 15GB. (thats like 10 million columns where each column can have a data of >>>>>> size 1KB). >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Oct 14, 2016 at 9:54 PM, Matope Ono <matope....@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks to CASSANDRA-11206, I think we can have much larger partition >>>>>>> than before 3.6. >>>>>>> (Robert said he could treat safely 10 15GB partitions at his >>>>>>> presentation. https://www.youtube.com/watch?v=N3mGxgnUiRY) >>>>>>> >>>>>>> But is there still 2B columns limit on the Cassandra code? >>>>>>> If so, out of curiosity, I'd like to know where the bottleneck is. >>>>>>> Could anyone let me know about it? >>>>>>> >>>>>>> Thanks Yasuharu. >>>>>>> >>>>>>> >>>>>>> 2016-10-13 1:11 GMT+09:00 Edward Capriolo <edlinuxg...@gmail.com>: >>>>>>> >>>>>>>> The "2 billion column limit" press clipping "puffery". This >>>>>>>> statement seemingly became popular because highly traffic traffic-ed >>>>>>>> story, >>>>>>>> in which a tech reporter embellished on a statement to make a splashy >>>>>>>> article. >>>>>>>> >>>>>>>> The effect is something like this: >>>>>>>> http://www.healthnewsreview.org/2012/08/iced-tea-kidney-ston >>>>>>>> es-and-the-study-that-never-existed/ >>>>>>>> >>>>>>>> Iced tea does not cause kidney stones! Cassandra does not store >>>>>>>> rows with 2 billion columns! It is just not true. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Oct 12, 2016 at 4:57 AM, Kant Kodali <k...@peernova.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Well 1) I have not sent it to postgresql mailing lists 2) I >>>>>>>>> thought this is an open ended question as it can involve ideas from >>>>>>>>> everywhere including the Cassandra java driver mailing lists so sorry >>>>>>>>> If >>>>>>>>> that bothered you for some reason. >>>>>>>>> >>>>>>>>> On Wed, Oct 12, 2016 at 1:41 AM, Dorian Hoxha < >>>>>>>>> dorian.ho...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Also, I'm not sure, but I don't think it's "cool" to write to >>>>>>>>>> multiple lists in the same message. (based on postgresql mailing >>>>>>>>>> lists >>>>>>>>>> rules). >>>>>>>>>> Example I'm not subscribed to those, and now the messages are >>>>>>>>>> separated. >>>>>>>>>> >>>>>>>>>> On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha < >>>>>>>>>> dorian.ho...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> There are some issues working on larger partitions. >>>>>>>>>>> Hbase doesn't do what you say! You have also to be carefull on >>>>>>>>>>> hbase not to create large rows! But since they are globally-sorted, >>>>>>>>>>> you can >>>>>>>>>>> easily sort between them and create small rows. >>>>>>>>>>> >>>>>>>>>>> In my opinion, cassandra people are wrong, in that they say >>>>>>>>>>> "globally sorted is the devil!" while all fb/google/etc actually use >>>>>>>>>>> globally-sorted most of the time! You have to be careful though >>>>>>>>>>> (just like >>>>>>>>>>> with random partition) >>>>>>>>>>> >>>>>>>>>>> Can you tell what rowkey1, page1, col(x) actually are ? Maybe >>>>>>>>>>> there is a way. >>>>>>>>>>> The most "recent", means there's a timestamp in there ? >>>>>>>>>>> >>>>>>>>>>> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali <k...@peernova.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi All, >>>>>>>>>>>> >>>>>>>>>>>> I understand Cassandra can have a maximum of 2B rows per >>>>>>>>>>>> partition but in practice some people seem to suggest the magic >>>>>>>>>>>> number is >>>>>>>>>>>> 100K. why not create another partition/rowkey automatically >>>>>>>>>>>> (whenever we >>>>>>>>>>>> reach a safe limit that we consider would be efficient) with auto >>>>>>>>>>>> increment bigint as a suffix appended to the new rowkey? so that >>>>>>>>>>>> the >>>>>>>>>>>> driver can return the new rowkey indicating that there is a new >>>>>>>>>>>> partition >>>>>>>>>>>> and so on...Now I understand this would involve allowing partial >>>>>>>>>>>> row key >>>>>>>>>>>> searches which currently Cassandra wouldn't do (but I believe >>>>>>>>>>>> HBASE does) >>>>>>>>>>>> and thinking about token ranges and potentially many other things.. >>>>>>>>>>>> >>>>>>>>>>>> My current problem is this >>>>>>>>>>>> >>>>>>>>>>>> I have a row key followed by bunch of columns (this is not time >>>>>>>>>>>> series data) >>>>>>>>>>>> and these columns can grow to any number so since I have 100K >>>>>>>>>>>> limit (or whatever the number is. say some limit) I want to break >>>>>>>>>>>> the >>>>>>>>>>>> partition into level/pages >>>>>>>>>>>> >>>>>>>>>>>> rowkey1, page1->col1, col2, col3...... >>>>>>>>>>>> rowkey1, page2->col1, col2, col3...... >>>>>>>>>>>> >>>>>>>>>>>> now say my Cassandra db is populated with data and say my >>>>>>>>>>>> application just got booted up and I want to most recent value of >>>>>>>>>>>> a certain >>>>>>>>>>>> partition but I don't know which page it belongs to since my >>>>>>>>>>>> application >>>>>>>>>>>> just got booted up? how do I solve this in the most efficient that >>>>>>>>>>>> is >>>>>>>>>>>> possible in Cassandra today? I understand I can create MV, other >>>>>>>>>>>> tables >>>>>>>>>>>> that can hold some auxiliary data such as number of pages per >>>>>>>>>>>> partition and >>>>>>>>>>>> so on..but that involves the maintenance cost of that other table >>>>>>>>>>>> which I >>>>>>>>>>>> cannot afford really because I have MV's, secondary indexes for >>>>>>>>>>>> other good >>>>>>>>>>>> reasons. so it would be great if someone can explain the best way >>>>>>>>>>>> possible >>>>>>>>>>>> as of today with Cassandra? By best way I mean is it possible with >>>>>>>>>>>> one >>>>>>>>>>>> request? If Yes, then how? If not, then what is the next best way >>>>>>>>>>>> to solve >>>>>>>>>>>> this? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> kant >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >