Deduplicating data on a node (RF=1)

Alain Vandendorpe Mon, 17 Nov 2014 12:07:11 -0800

Hey all,

For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup.
This is being moved away from ASAP. In the meantime, adding a node recently
encountered a Stream Failed error (http://pastie.org/9725846). Cassandra
restarted and it seemingly restarted streaming from zero, without having
removed the failed stream's data.


With bootstrapping and initial compactions finished that node now has what
seems to be duplicate data, with almost exactly 2x the expected disk usage.
CQL returns correct results but we depend on the ability to directly read
the SSTable files (hence also RF=1.)

Would anyone have suggestions on a good way to resolve this?

Thanks,
Alain

Deduplicating data on a node (RF=1)

Reply via email to