Re: Storing user activity logs

2021-07-19 Thread manish khandelwal
I concur with Eliot view. Only way you can reduce partition size is by tweaking your partition key. Here with user_id as partition key, partition size depends on the activity of the user. For a superactive user it can become large in no time. After changing the key migration of old data to the new

RE: Re: Re: High mutation stage in multi dc deployment

2021-07-19 Thread MyWorld
Hi Arvinder, It's a separate cluster. Here max partition size is 32mb. On 2021/07/19 23:57:27 Arvinder Dhillon wrote: > Is this the same cluster with 1G partition size? > > -Arvinder > > On Mon, Jul 19, 2021, 4:51 PM MyWorld wrote: > > > Hi daemeon, > > We have already tuned the TCP settings to i

RE: Re: Storing user activity logs

2021-07-19 Thread MyWorld
Hi Elliott, We thought of adding month and mapping key to partition key to make it bimonthly. So our new partition key would be userid + date (mm) + mapping key (01 for date 01-15 and 02 for date 16-30). However, there could be a user who has done only 10 activities in past 6 months. So I need

Re: Re: High mutation stage in multi dc deployment

2021-07-19 Thread Arvinder Dhillon
Is this the same cluster with 1G partition size? -Arvinder On Mon, Jul 19, 2021, 4:51 PM MyWorld wrote: > Hi daemeon, > We have already tuned the TCP settings to improve the bandwidth. Earlier > we had lot of hint and mutation msg drop which were gone after tuning TCP. > Moreover we are writing

RE: Re: High mutation stage in multi dc deployment

2021-07-19 Thread MyWorld
Hi daemeon, We have already tuned the TCP settings to improve the bandwidth. Earlier we had lot of hint and mutation msg drop which were gone after tuning TCP. Moreover we are writing with CL local quorum at US side, so ack is taken from local DC. I m still concern what could be reason of increase

RE: Re: High mutation stage in multi dc deployment

2021-07-19 Thread MyWorld
Hi Patrick, Currently we are using 3.11.6 apache cassandra version. We are performing write with CL local quorum in US side DC. We have 4-5 tables with supplier_details, supplier_prod_details, supplier_rating. We also have an mview attached with rating table. For batching part, I need to check with

Re: Storing user activity logs

2021-07-19 Thread Elliott Sims
Your partition key determines your partition size. Reducing retention sounds like it would help some in your case, but really you'd have to split it up somehow. If it fits your query pattern, you could potentially have a compound key of userid+datetime, or some other time-based split. You could

Re: High mutation stage in multi dc deployment

2021-07-19 Thread Patrick McFadin
Hi Ashish, Can you give us some information about some of the details? Specifically some indication on the version of Cassandra, data model, consistency levels used, and how you are bulk loading. Is this a batch by any chance? Patrick On Mon, Jul 19, 2021 at 10:38 AM MyWorld wrote: > Hi all, >

Re: High mutation stage in multi dc deployment

2021-07-19 Thread daemeon reiydelle
You may want to think about the latency impacts of a cluster that has one node "far away". This is such a basic design flaw that you need to do some basic learning, and some basic understanding of networking and latency. On Mon, Jul 19, 2021 at 10:38 AM MyWorld wrote: > Hi all, > > Currently

RE: Re: R/W timeouts VS number of tables in keyspace

2021-07-19 Thread MyWorld
Yes it seems like a GC issue. And since the default timeout for write/read is 2000ms and 5000ms, you might be experiencing timeout issues. But you need to check reason behind this GC pause. Heavy partition or high tombstones could be one reason. Check your table stats for the same. As suggested you

High mutation stage in multi dc deployment

2021-07-19 Thread MyWorld
Hi all, Currently we have a cluster with 2 DC of 3 nodes each. One DC is in GCP-US while other is in GCP-India. Just to add here, configuration of every node accross both DC is same. Cpu-6, Ram-32gb, Heap-8gb We do all our write on US data center. While performing a bulk write on GCP US, we obser

Re: R/W timeouts VS number of tables in keyspace

2021-07-19 Thread Yakir Gibraltar
In order to tune GC, you need gc.log or jvm metrics, you can check on https://gceasy.io/ and see the results before and after the change. On Mon, Jul 19, 2021 at 7:21 PM Luca Rondanini wrote: > Thanks Yakir, > > I can already experience slow repairs and startups but I'd like to > stabilize the s

Re: R/W timeouts VS number of tables in keyspace

2021-07-19 Thread Luca Rondanini
Thanks Yakir, I can already experience slow repairs and startups but I'd like to stabilize the system before jumping into refactoring (columns are not a problem, max 10/cols per table). Do you believe it's a GC problem to cause the timeouts and crashes? I'll give it a try and update this post. Th

Re: R/W timeouts VS number of tables in keyspace

2021-07-19 Thread Yakir Gibraltar
I recommend rethinking about this design, hard to maintain, slow startup and repair . About GC, try to replace CMS with G1 , see doc : https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/operations/opsTuningGcAbout.html BTW, also many columns may affect performance, see doc: https://

Storing user activity logs

2021-07-19 Thread MyWorld
Hi all, We are currently storing our user activity log in Cassandra with below architecture. Create table user_act_log( Userid bigint, Datetime bigint, Sno UUID, some more columns) With partition key - userid Clustering key - datetime, sno And TTL of 6 months With time our table data have gr

R/W timeouts VS number of tables in keyspace

2021-07-19 Thread Luca Rondanini
Hi all, I have a keyspace with almost 900 tables. Lately I started receiving lots of w/r timeouts (eg com.datastax.driver.core.exceptions.Read/WriteTimeoutException: Cassandra timeout during write query at consistency LOCAL_ONE (1 replica were required but only 0 acknowledged the write). *I'm ev