I don't know enough about the code level implementation to comment on the > validity of the fix. My main issue is that we use a lot of TTL columns and > in many cases all columns have a TTL that is less than gc_grace. The > problem arises when the columns are gc-able and are compacted away on one > node but not on all replicas, the periodic repair process ends up copying > all the garbage columns & rows back to all other replicas. It consumes a > lot of repair resources and makes rows stick around for much longer than > they really should which consumes even more cluster resources. > You can set gc_grace to 0 if you never manually delete any of them. You only need tombstones if you do manual deletes.
Otherwise the two tickets should improve your situation.