On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin < clintlmar...@coolfiretechnologies.com> wrote:
> What sort of data is your clustering key composed of? That might help some > in determining a way to achieve what you're looking for. > Just a UUID that acts as an object identifier. > > Clint > On Jan 5, 2016 2:28 PM, "Jim Ancona" <j...@anconafamily.com> wrote: > >> Hi Nate, >> >> Yes, I've been thinking about treating customers as either small or big, >> where "small" ones have a single partition and big ones have 50 (or >> whatever number I need to keep sizes reasonable). There's still the problem >> of how to handle a small customer who becomes too big, but that will happen >> much less frequently than a customer filling a partition. >> >> Jim >> >> On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall <n...@thelastpickle.com> >> wrote: >> >>> >>>> In this case, 99% of my data could fit in a single 50 MB partition. But >>>> if I use the standard approach, I have to split my partitions into 50 >>>> pieces to accommodate the largest data. That means that to query the 700 >>>> rows for my median case, I have to read 50 partitions instead of one. >>>> >>>> If you try to deal with this by starting a new partition when an old >>>> one fills up, you have a nasty distributed consensus problem, along with >>>> read-before-write. Cassandra LWT wasn't available the last time I dealt >>>> with this, but might help with the consensus part today. But there are >>>> still some nasty corner cases. >>>> >>>> I have some thoughts on other ways to solve this, but they all have >>>> drawbacks. So I thought I'd ask here and hope that someone has a better >>>> approach. >>>> >>>> >>> Hi Jim - good to see you around again. >>> >>> If you can segment this upstream by customer/account/whatever, handling >>> the outliers as an entirely different code path (potentially different >>> cluster as the workload will be quite different at that point and have >>> different tuning requirements) would be your best bet. Then a >>> read-before-write makes sense given it is happening on such a small number >>> of API queries. >>> >>> >>> -- >>> ----------------- >>> Nate McCall >>> Austin, TX >>> @zznate >>> >>> Co-Founder & Sr. Technical Consultant >>> Apache Cassandra Consulting >>> http://www.thelastpickle.com >>> >> >>