Hello, We are running into an unusual situation that I'm wondering if anyone has any insight on. We've been running a Cassandra cluster for some time, with compression enabled on one column family in which text documents are stored. We enabled compression on the column family, utilizing the SnappyCompressor and a 64k chunk length.
It was recently discovered that Cassandra was reporting a compression ratio of 0. I took a snapshot of the data and started a cassandra node in isolation to investigate. Running nodetool scrub, or nodetool upgradesstables had little impact on the amount of data that was being stored. I then disabled compression and ran nodetool upgradesstables on the column family. Again, not impact on the data size stored. I then reenabled compression and ran nodetool upgradesstables on the column family. This resulting in a 60% reduction in the data size stored, and Cassandra reporting a compression ration of about .38. Any idea what is going on here? Obviously I can go through this process in production to enable compression, however, any idea what is currently happening and why new data does not appear to be compressed? Any insights are appreciated, Thanks, -Mike