RE: Storing user activity logs

Durity, Sean R Tue, 20 Jul 2021 06:31:32 -0700

Yes, use the time-bucketing approach and choose a bucket-size (included in the 
partition key) that is granular enough to keep partitions to about 100 MB in 
size. (Unbounded partitions WILL destroy your cluster.) If your queries *need* 
to retrieve all user activity over a certain period, then, yes, multiple 
queries may be required. Partition key queries (of small partitions) are very 
fast and can be done asynchronously. That is the right way to use Cassandra for 
a time series of data.

Sean Durity – Staff Systems Engineer, Cassandra

From: manish khandelwal <manishkhandelwa...@gmail.com>
Sent: Monday, July 19, 2021 11:58 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Storing user activity logs

I concur with Eliot view. Only way you can reduce partition size is by tweaking 
your partition key. Here with user_id as partition key, partition size depends 
on the activity of the user. For a superactive user it can become large in no 
time. After changing the key migration of old data  to the new table will also 
be required, please keep that also in mind.

Regards
Manish

On Tue, Jul 20, 2021 at 2:54 AM Elliott Sims 
<elli...@backblaze.com<mailto:elli...@backblaze.com>> wrote:
Your partition key determines your partition size.  Reducing retention sounds 
like it would help some in your case, but really you'd have to split it up 
somehow.  If it fits your query pattern, you could potentially have a compound 
key of userid+datetime, or some other time-based split.  You could also just 
split each user's rows into subsets with some sort of indirect mapping, though 
that can get messy pretty fast.

On Mon, Jul 19, 2021 at 9:01 AM MyWorld 
<timeplus.1...@gmail.com<mailto:timeplus.1...@gmail.com>> wrote:
Hi all,

We are currently storing our user activity log in Cassandra with below 
architecture.

Create table user_act_log(
Userid bigint,
Datetime bigint,
Sno UUID,
....some more columns)
With partition key - userid
Clustering key - datetime, sno
And TTL of 6 months

With time our table data have grown to around 500gb and we notice from table 
histogram our max partition size have also grown to tremendous size (nearly 1gb)

So, please help me out what should be the right architecture for this use case?

I am currently thinking of changing the compaction strategy to time window from 
size tier with 30 day window. But will this improve the partion size?

Should we use any other db for such use case?

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: Storing user activity logs

Reply via email to