Thanks Elliott, Sean, Manish for the quick response.

On 2021/07/20 13:31:17 "Durity, Sean R" wrote:
> Yes, use the time-bucketing approach and choose a bucket-size (included
in the partition key) that is granular enough to keep partitions to about
100 MB in size. (Unbounded partitions WILL destroy your cluster.) If your
queries *need* to retrieve all user activity over a certain period, then,
yes, multiple queries may be required. Partition key queries (of small
partitions) are very fast and can be done asynchronously. That is the right
way to use Cassandra for a time series of data.
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
> From: manish khandelwal <ma...@gmail.com>
> Sent: Monday, July 19, 2021 11:58 PM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] Re: Storing user activity logs
>
> I concur with Eliot view. Only way you can reduce partition size is by
tweaking your partition key. Here with user_id as partition key, partition
size depends on the activity of the user. For a superactive user it can
become large in no time. After changing the key migration of old data  to
the new table will also be required, please keep that also in mind.
>
> Regards
> Manish
>
> On Tue, Jul 20, 2021 at 2:54 AM Elliott Sims <el...@backblaze.com>> wrote:
> Your partition key determines your partition size.  Reducing retention
sounds like it would help some in your case, but really you'd have to split
it up somehow.  If it fits your query pattern, you could potentially have a
compound key of userid+datetime, or some other time-based split.  You could
also just split each user's rows into subsets with some sort of indirect
mapping, though that can get messy pretty fast.
>
> On Mon, Jul 19, 2021 at 9:01 AM MyWorld <ti...@gmail.com>> wrote:
> Hi all,
>
> We are currently storing our user activity log in Cassandra with below
architecture.
>
> Create table user_act_log(
> Userid bigint,
> Datetime bigint,
> Sno UUID,
> ....some more columns)
> With partition key - userid
> Clustering key - datetime, sno
> And TTL of 6 months
>
> With time our table data have grown to around 500gb and we notice from
table histogram our max partition size have also grown to tremendous size
(nearly 1gb)
>
> So, please help me out what should be the right architecture for this use
case?
>
> I am currently thinking of changing the compaction strategy to time
window from size tier with 30 day window. But will this improve the partion
size?
>
> Should we use any other db for such use case?
>
>
>
>
> ________________________________
>
> The information in this Internet Email is confidential and may be legally
privileged. It is intended solely for the addressee. Access to this Email
by anyone else is unauthorized. If you are not the intended recipient, any
disclosure, copying, distribution or any action taken or omitted to be
taken in reliance on it, is prohibited and may be unlawful. When addressed
to our clients any opinions or advice contained in this Email are subject
to the terms and conditions expressed in any applicable governing The Home
Depot terms of business or client engagement letter. The Home Depot
disclaims all responsibility and liability for the accuracy and content of
this attachment and for any damages or losses arising from any
inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
items of a destructive nature, which may be contained in this attachment
and shall not be liable for direct, indirect, consequential or special
damages in connection with this e-mail message or its attachment.
>

Reply via email to