Vincent: currently big partitions, even if you're using paging & slicing by clustering keys, will give you performance problems over time. Please read the JIRAs that Alex linked to, they provide in depth explanations as to why, from some of the best Cassandra operators in the world :)
On Fri, Oct 28, 2016 at 9:50 AM Vincent Rischmann <m...@vrischmann.me> wrote: > Well I only asked that because I wanted to make sure that we're not doing > it wrong, because that's actually how we query stuff, we always provide a > cluster key or a range of cluster keys. > > But yes, I understand that compactions may suffer and/or there may be > hidden bottlenecks because of big partitions, so it's definitely good to > know, and I'll definitely work on reducing partition sizes. > > On Fri, Oct 28, 2016, at 06:32 PM, Edward Capriolo wrote: > > > > On Fri, Oct 28, 2016 at 11:21 AM, Vincent Rischmann <m...@vrischmann.me> > wrote: > > > Doesn't paging help with this ? Also if we select a range via the cluster > key we're never really selecting the full partition. Or is that wrong ? > > > On Fri, Oct 28, 2016, at 05:00 PM, Edward Capriolo wrote: > > Big partitions are an anti-pattern here is why: > > First Cassandra is not an analytic datastore. Sure it has some UDFs and > aggregate UDFs, but the true purpose of the data store is to satisfy point > reads. Operations have strict timeouts: > > # How long the coordinator should wait for read operations to complete > read_request_timeout_in_ms: 5000 > > # How long the coordinator should wait for seq or index scans to complete > range_request_timeout_in_ms: 10000 > > This means you need to be able to satisfy the operation in 5 seconds. > Which is not only the "think time" for 1 server, but if you are doing a > quorum the operation has to complete and compare on 2 or more servers. > Beyond these cutoffs are thread pools which fill up and start dropping > requests once full. > > Something has to give, either functionality or physics. Particularly the > physics of aggregating an ever-growing data set across N replicas in less > than 5 seconds. How many 2ms point reads will be blocked by 50 ms queries > etc. > > I do not see the technical limitations of big partitions on disk is the > only hurdle to climb here. > > > On Fri, Oct 28, 2016 at 10:39 AM, Alexander Dejanovski < > a...@thelastpickle.com> wrote: > > Hi Eric, > > that would be https://issues.apache.org/jira/browse/CASSANDRA-9754 by > Michael Kjellman and https://issues.apache.org/jira/browse/CASSANDRA-11206 by > Robert Stupp. > If you haven't seen it yet, Robert's summit talk on big partitions is > totally worth it : > Video : https://www.youtube.com/watch?v=N3mGxgnUiRY > Slides : > http://www.slideshare.net/DataStax/myths-of-big-partitions-robert-stupp-datastax-cassandra-summit-2016 > > Cheers, > > > On Fri, Oct 28, 2016 at 4:09 PM Eric Evans <john.eric.ev...@gmail.com> > wrote: > > On Thu, Oct 27, 2016 at 4:13 PM, Alexander Dejanovski > <a...@thelastpickle.com> wrote: > > A few patches are pushing the limits of partition sizes so we may soon be > > more comfortable with big partitions. > > You don't happen to have Jira links to these handy, do you? > > > > -- > Eric Evans > john.eric.ev...@gmail.com > > > > -- > ----------------- > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > > > > "Doesn't paging help with this ? Also if we select a range via the > cluster key we're never really selecting the full partition. Or is that > wrong ?" > > What I am suggestion is that the data store has had this practical > limitation on size of partition since inception. As a result the common use > case is not to use it in such a way. For example, the compaction manager > may not be optimized for this cases, queries running across large > partitions may cause more contention or lots of young gen garbage , queries > running across large partitions may occupy the slots of the read stage etc. > > > > http://mail-archives.apache.org/mod_mbox/cassandra-user/201602.mbox/%3CCAJjpQyTS2eaCcRBVa=zmm-hcbx5nf4ovc1enw+sffgwvngo...@mail.gmail.com%3E > > I think there is possibly some more "little details" to discover. Not in a > bad thing. I just do not think it you can hand-waive like a specific thing > someone is working on now or paging solves it. If it was that easy it would > be solved by now :) > > >