Yes, use the time-bucketing approach and choose a bucket-size (included in the partition key) that is granular enough to keep partitions to about 100 MB in size. (Unbounded partitions WILL destroy your cluster.) If your queries *need* to retrieve all user activity over a certain period, then, yes, multiple queries may be required. Partition key queries (of small partitions) are very fast and can be done asynchronously. That is the right way to use Cassandra for a time series of data.
Sean Durity – Staff Systems Engineer, Cassandra From: manish khandelwal <manishkhandelwa...@gmail.com> Sent: Monday, July 19, 2021 11:58 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Storing user activity logs I concur with Eliot view. Only way you can reduce partition size is by tweaking your partition key. Here with user_id as partition key, partition size depends on the activity of the user. For a superactive user it can become large in no time. After changing the key migration of old data to the new table will also be required, please keep that also in mind. Regards Manish On Tue, Jul 20, 2021 at 2:54 AM Elliott Sims <elli...@backblaze.com<mailto:elli...@backblaze.com>> wrote: Your partition key determines your partition size. Reducing retention sounds like it would help some in your case, but really you'd have to split it up somehow. If it fits your query pattern, you could potentially have a compound key of userid+datetime, or some other time-based split. You could also just split each user's rows into subsets with some sort of indirect mapping, though that can get messy pretty fast. On Mon, Jul 19, 2021 at 9:01 AM MyWorld <timeplus.1...@gmail.com<mailto:timeplus.1...@gmail.com>> wrote: Hi all, We are currently storing our user activity log in Cassandra with below architecture. Create table user_act_log( Userid bigint, Datetime bigint, Sno UUID, ....some more columns) With partition key - userid Clustering key - datetime, sno And TTL of 6 months With time our table data have grown to around 500gb and we notice from table histogram our max partition size have also grown to tremendous size (nearly 1gb) So, please help me out what should be the right architecture for this use case? I am currently thinking of changing the compaction strategy to time window from size tier with 30 day window. But will this improve the partion size? Should we use any other db for such use case? ________________________________ The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.