Hi Elliott,
We thought of adding month and mapping key to partition key to make it
bimonthly.
So our new partition key would be userid + date (yyyymm) + mapping key (01
for date 01-15 and 02 for date 16-30).
However, there could be a user who has done only 10 activities in past 6
months. So I need to do 12 reads to find his all activity logs and there
could be another user who has done enormous activities within a month.
I don't think doing further granular partitioning would help as read
queries will increase. And what if the user activity log for some users
after bimonthly partitioning increased to 500mb .



On 2021/07/19 21:23:24 Elliott Sims wrote:
> Your partition key determines your partition size.  Reducing retention
> sounds like it would help some in your case, but really you'd have to
split
> it up somehow.  If it fits your query pattern, you could potentially have
a
> compound key of userid+datetime, or some other time-based split.  You
could
> also just split each user's rows into subsets with some sort of indirect
> mapping, though that can get messy pretty fast.
>
> On Mon, Jul 19, 2021 at 9:01 AM MyWorld <ti...@gmail.com> wrote:
>
> > Hi all,
> >
> > We are currently storing our user activity log in Cassandra with below
> > architecture.
> >
> > Create table user_act_log(
> > Userid bigint,
> > Datetime bigint,
> > Sno UUID,
> > ....some more columns)
> > With partition key - userid
> > Clustering key - datetime, sno
> > And TTL of 6 months
> >
> > With time our table data have grown to around 500gb and we notice from
> > table histogram our max partition size have also grown to tremendous
size
> > (nearly 1gb)
> >
> > So, please help me out what should be the right architecture for this
use
> > case?
> >
> > I am currently thinking of changing the compaction strategy to time
window
> > from size tier with 30 day window. But will this improve the partion
size?
> >
> > Should we use any other db for such use case?
> >
> >
> >
> >
>

Reply via email to