Hi Elliott, We thought of adding month and mapping key to partition key to make it bimonthly. So our new partition key would be userid + date (yyyymm) + mapping key (01 for date 01-15 and 02 for date 16-30). However, there could be a user who has done only 10 activities in past 6 months. So I need to do 12 reads to find his all activity logs and there could be another user who has done enormous activities within a month. I don't think doing further granular partitioning would help as read queries will increase. And what if the user activity log for some users after bimonthly partitioning increased to 500mb .
On 2021/07/19 21:23:24 Elliott Sims wrote: > Your partition key determines your partition size. Reducing retention > sounds like it would help some in your case, but really you'd have to split > it up somehow. If it fits your query pattern, you could potentially have a > compound key of userid+datetime, or some other time-based split. You could > also just split each user's rows into subsets with some sort of indirect > mapping, though that can get messy pretty fast. > > On Mon, Jul 19, 2021 at 9:01 AM MyWorld <ti...@gmail.com> wrote: > > > Hi all, > > > > We are currently storing our user activity log in Cassandra with below > > architecture. > > > > Create table user_act_log( > > Userid bigint, > > Datetime bigint, > > Sno UUID, > > ....some more columns) > > With partition key - userid > > Clustering key - datetime, sno > > And TTL of 6 months > > > > With time our table data have grown to around 500gb and we notice from > > table histogram our max partition size have also grown to tremendous size > > (nearly 1gb) > > > > So, please help me out what should be the right architecture for this use > > case? > > > > I am currently thinking of changing the compaction strategy to time window > > from size tier with 30 day window. But will this improve the partion size? > > > > Should we use any other db for such use case? > > > > > > > > >