Hi Nate, Yes, I've been thinking about treating customers as either small or big, where "small" ones have a single partition and big ones have 50 (or whatever number I need to keep sizes reasonable). There's still the problem of how to handle a small customer who becomes too big, but that will happen much less frequently than a customer filling a partition.
Jim On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall <n...@thelastpickle.com> wrote: > >> In this case, 99% of my data could fit in a single 50 MB partition. But >> if I use the standard approach, I have to split my partitions into 50 >> pieces to accommodate the largest data. That means that to query the 700 >> rows for my median case, I have to read 50 partitions instead of one. >> >> If you try to deal with this by starting a new partition when an old one >> fills up, you have a nasty distributed consensus problem, along with >> read-before-write. Cassandra LWT wasn't available the last time I dealt >> with this, but might help with the consensus part today. But there are >> still some nasty corner cases. >> >> I have some thoughts on other ways to solve this, but they all have >> drawbacks. So I thought I'd ask here and hope that someone has a better >> approach. >> >> > Hi Jim - good to see you around again. > > If you can segment this upstream by customer/account/whatever, handling > the outliers as an entirely different code path (potentially different > cluster as the workload will be quite different at that point and have > different tuning requirements) would be your best bet. Then a > read-before-write makes sense given it is happening on such a small number > of API queries. > > > -- > ----------------- > Nate McCall > Austin, TX > @zznate > > Co-Founder & Sr. Technical Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com >