Following up, I've found that we tend to encounter one of three types of
exceptions/commitlog corruptions:
1.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
Mutation checksum failure at ... in CommitLog-5-1531150627243.log
at
org.apache.cassandra.db.commitlog.CommitL
Thanks. I guess some earlier thread got truncated.
I already applied Erick's recommendations and that seem to have worked in
reducing the ram consumption by around 50%.
Regarding cheap memory and hardware, we are already running 96GB boxes and
getting multiple larger ones might be a little diffic
CMS heap too large will have long GC. you may try reduce heap on 1 node to
see. or go GC1 if it is easy way.
Thanks,
Jim
On Tue, Aug 3, 2021 at 3:33 AM manish khandelwal <
manishkhandelwa...@gmail.com> wrote:
> Long GC (1 seconds /2 seconds) pauses seen during repair on the
> coordinator. Runn
I think Erick posted https://community.datastax.com/questions/6947/.
explained very clearly.
We hit same issue only on a huge table when upgrade, and we changed back
after done.
My understanding, Which option to chose, shall depend on your user case.
If chasing high performance on a big table, t
Long GC (1 seconds /2 seconds) pauses seen during repair on the
coordinator. Running full repair with partition range option. GC collector
is CMS and heap is 14G. Cluster is 7+7. Cassandra version is 3.11.2. Not
much traffic when repair is running. What could be the probable cause of
long gc pause