Thanks for the feedback everyone! Redis `zincryby` and `zrangebyscore` is indeed what we use today.
Caching the resulting 'sorted sets' in redis is exactly what I plan to do. There will be tens of thousands of these sorted sets, each generally with <10k items (with maybe a few exceptions going a bit over that). The reason to periodically calculate the set and store it in cassandra is to avoid having the client do that work, when the client only really cares about the top 100 or so items at any given time. Being truly "real time" is not critical for us, but it is a selling point to be as up to date as possible. I'd like to understand the performance issue of frequently updating these sets. I understand that every time I 'regenerate' the sorted set, any rows that change will create a tombstone - for example, if "item_1" is in first place and "item_2" is in second place, then they switch on the next update, that would be two tombstones. Do you think this will be a big enough problem that it is worth doing the sorting work client side, on demand, and just try to eat the performance hit there? My thought was to make a tradeoff by using more cassandra disk space (ie pre calculating all sets), in exchange for faster reads when requests actually come in that need this data. From: Benjamin Roth <benjamin.r...@jaumo.com<mailto:benjamin.r...@jaumo.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Saturday, January 14, 2017 at 1:25 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: implementing a 'sorted set' on top of cassandra Mike mentioned "increment" in his initial post. That let me think of a case with increments and fetching a top list by a counter like https://redis.io/commands/zincrby https://redis.io/commands/zrangebyscore 1. Cassandra is absolutely not made to sort by a counter (or a non-counter numeric incrementing value) but it is made to store counters. In this case a partition could be seen as a set. 2. I thought of CS for persistence and - depending on the app requirements like real-time and set size - still use redis as a read cache 2017-01-14 18:45 GMT+01:00 Jonathan Haddad <j...@jonhaddad.com<mailto:j...@jonhaddad.com>>: Sorted sets don't have a requirement of incrementing / decrementing. They're commonly used for thing like leaderboards where the values are arbitrary. In Redis they are implemented with 2 data structures for efficient lookups of either key or value. No getting around that as far as I know. In Cassandra they would require using the score as a clustering column in order to select top N scores (and paginate). That means a tombstone whenever the value for a key in the set changes. In sets with high rates of change that means a lot of tombstones and thus terrible performance. On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan <doanduy...@gmail.com<mailto:doanduy...@gmail.com>> wrote: Sorting on an "incremented" numeric value has always been a nightmare to be done properly in C* Either use Counter type but then no sorting is possible since counter cannot be used as type for clustering column (which allows sort) Or use simple numeric type on clustering column but then to increment the value *concurrently* and *safely* it's prohibitive (SELECT to fetch current value + UPDATE ... IF value = <old_value>) + retry On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth <benjamin.r...@jaumo.com<mailto:benjamin.r...@jaumo.com>> wrote: If your proposed solution is crazy depends on your needs :) It sounds like you can live with not-realtime data. So it is ok to cache it. Why preproduce the results if you only need 5% of them? Why not use redis as a cache with expiring sorted sets that are filled on demand from cassandra partitions with counters? So redis has much less to do and can scale much better. And you are not limited on keeping all data in ram as cache data is volatile and can be evicted on demand. If this is effective also depends on the size of your sets. CS wont be able to sort them by score for you, so you will have to load the complete set to redis for caching and / or do sorting in your app on demand. This certainly won't work out well with sets with millions of entries. 2017-01-13 23:14 GMT+01:00 Mike Torra <mto...@demandware.com<mailto:mto...@demandware.com>>: We currently use redis to store sorted sets that we increment many, many times more than we read. For example, only about 5% of these sets are ever read. We are getting to the point where redis is becoming difficult to scale (currently at >20 nodes). We've started using cassandra for other things, and now we are experimenting to see if having a similar 'sorted set' data structure is feasible in cassandra. My approach so far is: 1. Use a counter CF to store the values I want to sort by 2. Periodically read in all key/values in the counter CF and sort in the client application (~every five minutes or so) 3. Write back to a different CF with the ordered keys I care about Does this seem crazy? Is there a simpler way to do this in cassandra? -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com<http://www.jaumo.com> Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6<tel:+49%207161%203048806> · Fax +49 7161 304880-1<tel:+49%207161%203048801> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com<http://www.jaumo.com> Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6 · Fax +49 7161 304880-1 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer