Hi, I have not done something similar, however I have some comments:
On Mon, Mar 2, 2015 at 8:47 PM, Clint Kelly <clint.ke...@gmail.com> wrote: > The downside of this approach is that we can no longer do a simple > continuous scan to get all of the events for a given user. > Sure, but would you really do that real time anyway? :) If you have billions of events that's not going to scale anyway. Also, if you have 100000 events per bucket. The latency introduced by batching should be manageable. > Some users may log lots and lots of interactions every day, while others > may interact with our application infrequently, > This makes another reason to split them up into bucket to make the cluster partitions more manageble and homogenous. > so I'd like a quick way to get the most recent interaction for a given > user. > For this you could actually have a second table that stores the last_time_bucket for a user. Upon event write, you could simply do an update of the last_time_bucket. You could even have an index of all time buckets per user if you want. > Has anyone used different approaches for this problem? > > The only thing I can think of is to use the second table schema described > above, but switch to an order-preserving hashing function, and then > manually hash the "id" field. This is essentially what we would do in > HBase. > Like you might already know, this order preserving hashing is _not_ considered best practise in the Cassandra world. Cheers, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>