Hello,

We are running into an unusual situation that I'm wondering if anyone has any 
insight on.  We've been running a Cassandra cluster for some time, with 
compression enabled on one column family in which text documents are stored.  
We enabled compression on the column family, utilizing the SnappyCompressor and 
a 64k chunk length.

It was recently discovered that Cassandra was reporting a compression ratio of 
0.  I took a snapshot of the data and started a cassandra node in isolation to 
investigate.

Running nodetool scrub, or nodetool upgradesstables had little impact on the 
amount of data that was being stored.

I then disabled compression and ran nodetool upgradesstables on the column 
family.  Again, not impact on the data size stored.

I then reenabled compression and ran nodetool upgradesstables on the column 
family.  This resulting in a 60% reduction in the data size stored, and 
Cassandra reporting a compression ration of about .38.

Any idea what is going on here?  Obviously I can go through this process in 
production to enable compression, however, any idea what is currently happening 
and why new data does not appear to be compressed?

Any insights are appreciated,
Thanks,
-Mike

Reply via email to