Hi all,

We feed a Cassandra v3.7 DB from a streaming application. From time to time
we need to update a boolean column for an entire table based on some logic
. For this we wrote a Spark job but this always crashed after a while  with
tombstone threshold error ("Scanned over 100001 tombstones during
query..").

When we compacted the table and set spark.cassandra.input.split.size_in_mb
to a lower size the job became really slow  but after few retries it was
able to finish.
Here we remarked exceptions like:
java.io.IOException: Failed to write statements to xxx. The latest
exception was
Cassandra failure during write query at consistency LOCAL_QUORUM (2
responses were required but only 1 replica responded, 2 failed) I guess
because we put too much load on the cluster.

We're thinking about to take a snapshot of a table, read and manipulate the
sstables with a MR job and bulk load it into Cassandra, but in this case we
most probably can't do this on the same table without affecting the
consumers accessing that table.

What is the recommended way to do such a live-update on an entire table?

Thanks
Peter

Reply via email to