Hi,

I'm trying to solve an issue we are seeing on our production cluster, in
various LCS tables, with Cassandra 3.5. I believe it is related
to CASSANDRA-11373 (however CASSANDRA-11373 is marked as resolved in 3.5).

I'm not sure what causes it, but eventually `getCandidatesFor` in
LeveledManifest.java (
https://github.com/apache/cassandra/blob/cassandra-3.5/src/java/org/apache/cassandra/db/compaction/LeveledManifest.java#L553)
gets stuck in an infinite loop on L0 (and eventually goes into a GC
thrashing loop). Investigating further, it looks like `getCandidatesFor`
returns a mix of sstables, some of which don't exist on disk (same symptom
as 11373).

What I'd like to know is what I can do here? For smaller tables we switched
to STS, but we have a very large LCS timeseries (~300GB per node) table,
that would take ages to switch over and really affect our performance on
STS (offline scrub doesn't work either).

Would it suffice to add a check here to ensure the file exists and evicting
the file if it doesn't? Or is there anyway I could force the rewrite of the
table's metadata?

Nimi

Reply via email to