Hello, We have noticed CPU usage spike after several minutes of consistent load when querying: - a single column of set<uuid> type (same partition key) - relatively frequently (couple hundred times per second, for comparison, we do an order of magnitude more reads already with much bigger payloads) - with the elements in the set having a very short TTL ( single digit seconds) and several inserts per second - gc_grace set to 0 (that should remove hints and should prevent tombstones) - reads and writes are using local quorum consistency - replication factor of 3 (on 4 node setup)
I am struggling to figure out where the high CPU usage comes from (and thus how to resolve it) and hoping that some one sees what we are doing wrong. I'd expect the data to stay in memory on the cluster and have constant read time. The use case is rate limiting. We basically limit a user (for example) to 20 requests per 5 seconds and are using cassandra's TTL to implement it across all live servers. So when a request comes along we run the following query: SELECT tokens FROM recent_request_token_bucket WHERE usagekey = 'some user id' If the tokens' count is less than 20 we execute UPDATE recent_request_token_bucket USING TTL 5 SET tokens = tokens + 'Guid.NewGuid()' WHERE usagekey = 'some user id' If the token's count is greater than 20 we reject the request The table definition is CREATE TABLE recent_request_token_bucket ( usagekey text, tokens set<uuid>, PRIMARY KEY (usagekey) ) WITH compaction={'min_threshold': 2, 'class': 'SizeTieredCompactionStrategy', 'max_threshold': 32} AND compression={'sstable_compression': 'SnappyCompressor'} AND gc_grace_seconds=0; I have replicated it with the following: 200 reads per second 3 inserts per second This starts of with CPU load is ~10% and average response time (as reported by my console app) 1-2 ms After 5 the CPU load creeps up to ~20% and average response time 2-4ms After 10 minutes the CPU load is over 50 and average response times starts to hit 10ms After 15 minutes the CPU load is near 100% and response times over 100ms become normal. Interestingly, when aborting the application, waiting several minutes and then restarting, the response times and CPU load on the server remain terrible. It's like I poisoned that partition key permanently. This also survives flushes of the memtable. I'd expect a constant response time in our use case as there should be no more than 20 odd guids in the set. But it appears that cassandra maintains the tombstones in memory? We are running 2.1.20 I'd appreciate any pointers! Cheers, Tom -- Development Director | T: 0800 021 0888 | M: 0790 4489797 | www.codeweavers.net | | Codeweavers Limited | Barn 4 | Dunston Business Village | Dunston | ST18 9AB | | Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63 | CUSTOMERS' BLOG <http://blog.codeweavers.net/> TWITTER <http://twitter.com/#%21/CodeweaversLtd> FACEBOOK <http://www.facebook.com/pages/Codeweavers-Limited/205794062775987> LINKED IN <http://www.linkedin.com/company/225698?trk=tyah> DEVELOPERS' BLOG <http://codeweavers.wordpress.com/> YOUTUBE <http://www.youtube.com/user/codeweaversltd> -- <https://codeweavers.net> What's happened at Codeweavers in 2018? <https://codeweavers.net/company-blog/what-s-happened-at-codeweavers-in-2018> l *Codeweavers 2018 Stats & Trends <https://gallery.mailchimp.com/fcb361cfa194cf70551bc5169/files/debe4909-70ff-45d7-9bfd-05f43fa2e504/Codeweavers_stats_2018.03.pdf>* *Phone:* 0800 021 0888 Email: contac...@codeweavers.net <mailto:contac...@codeweavers.net> Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63 <https://twitter.com/Codeweavers_Ltd> <https://www.facebook.com/Codeweavers.Ltd/> <https://www.linkedin.com/company/codeweavers-limited>