You could keep a "num_buckets" value associated with the client's account, which can be adjusted accordingly as usage increases.
On Tue, Jan 5, 2016 at 2:17 PM Jim Ancona <j...@anconafamily.com> wrote: > On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin < > clintlmar...@coolfiretechnologies.com> wrote: > >> What sort of data is your clustering key composed of? That might help >> some in determining a way to achieve what you're looking for. >> > Just a UUID that acts as an object identifier. > >> >> Clint >> On Jan 5, 2016 2:28 PM, "Jim Ancona" <j...@anconafamily.com> wrote: >> >>> Hi Nate, >>> >>> Yes, I've been thinking about treating customers as either small or big, >>> where "small" ones have a single partition and big ones have 50 (or >>> whatever number I need to keep sizes reasonable). There's still the problem >>> of how to handle a small customer who becomes too big, but that will happen >>> much less frequently than a customer filling a partition. >>> >>> Jim >>> >>> On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall <n...@thelastpickle.com> >>> wrote: >>> >>>> >>>>> In this case, 99% of my data could fit in a single 50 MB partition. >>>>> But if I use the standard approach, I have to split my partitions into 50 >>>>> pieces to accommodate the largest data. That means that to query the 700 >>>>> rows for my median case, I have to read 50 partitions instead of one. >>>>> >>>>> If you try to deal with this by starting a new partition when an old >>>>> one fills up, you have a nasty distributed consensus problem, along with >>>>> read-before-write. Cassandra LWT wasn't available the last time I dealt >>>>> with this, but might help with the consensus part today. But there are >>>>> still some nasty corner cases. >>>>> >>>>> I have some thoughts on other ways to solve this, but they all have >>>>> drawbacks. So I thought I'd ask here and hope that someone has a better >>>>> approach. >>>>> >>>>> >>>> Hi Jim - good to see you around again. >>>> >>>> If you can segment this upstream by customer/account/whatever, handling >>>> the outliers as an entirely different code path (potentially different >>>> cluster as the workload will be quite different at that point and have >>>> different tuning requirements) would be your best bet. Then a >>>> read-before-write makes sense given it is happening on such a small number >>>> of API queries. >>>> >>>> >>>> -- >>>> ----------------- >>>> Nate McCall >>>> Austin, TX >>>> @zznate >>>> >>>> Co-Founder & Sr. Technical Consultant >>>> Apache Cassandra Consulting >>>> http://www.thelastpickle.com >>>> >>> >>>