> > > In this case, 99% of my data could fit in a single 50 MB partition. But if > I use the standard approach, I have to split my partitions into 50 pieces > to accommodate the largest data. That means that to query the 700 rows for > my median case, I have to read 50 partitions instead of one. > > If you try to deal with this by starting a new partition when an old one > fills up, you have a nasty distributed consensus problem, along with > read-before-write. Cassandra LWT wasn't available the last time I dealt > with this, but might help with the consensus part today. But there are > still some nasty corner cases. > > I have some thoughts on other ways to solve this, but they all have > drawbacks. So I thought I'd ask here and hope that someone has a better > approach. > > Hi Jim - good to see you around again.
If you can segment this upstream by customer/account/whatever, handling the outliers as an entirely different code path (potentially different cluster as the workload will be quite different at that point and have different tuning requirements) would be your best bet. Then a read-before-write makes sense given it is happening on such a small number of API queries. -- ----------------- Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com