Thanks Sylvain. So C* compression is block based and has nothing to do with 
format of the rows.

On Mar 19, 2013, at 1:31 AM, Sylvain Lebresne <> wrote:

> That's just describing what compression is about. Compression (not in C*, in 
> general) is based on recognizing repeated pattern.
> So yes, in that sense, static column families are more likely to yield better 
> compression ratio because it is more likely to have repeated patterns in the 
> compressed blocks. But:
> 1) it doesn't necessarily mean that wide column families won't have a good 
> compression ratio per se.
> 2) you can absolutely have crappy compression ratio with a static column 
> family. Just create a column family where each row has 1 column 'image' that 
> contains a png.
> And to come back to your initial question, I highly doubt disk level 
> compression would be much of a workaround because again, that's more about 
> how compression is working than how Cassandra use it.
> At the end of the day, I really think the best choice is to try it and decide 
> for yourself if it does more good than harm or the converse.
> --
> Sylvain  
> On Tue, Mar 19, 2013 at 3:58 AM, Drew Kutcharian <> wrote:
> Edward/Sylvain,
> I also came across this post on DataStax's blog:
>> When to use compression
>> Compression is best suited for ColumnFamilies where there are many rows, 
>> with each row having the same columns, or at least many columns in common. 
>> For example, a ColumnFamily containing user data such as username, email, 
>> etc., would be a good candidate for compression. The more similar the data 
>> across rows, the greater the compression ratio will be, and the larger the 
>> gain in read performance.
>> Compression is not as good a fit for ColumnFamilies where each row has a 
>> different set of columns, or where there are just a few very wide rows. 
>> Dynamic column families such as this will not yield good compression ratios.
> @Sylvain, does this still apply on more recent versions of C*?
> -- Drew
> On Mar 18, 2013, at 7:16 PM, Edward Capriolo <> wrote:
>> I feel this has come up before. I believe the compression is block based, so 
>> just because no two column names are the same does not mean the compression 
>> will not be effective. Possibly in their case the compression was not 
>> effective.
>> On Mon, Mar 18, 2013 at 9:08 PM, Drew Kutcharian <> wrote:
>> That's what I originally thought but the OOYALA presentation from C*2012 got 
>> me confused. Do you guys know what's going on here?
>> The video: 
>> The slides: Slide 22 @ 
>> -- Drew
>> On Mar 18, 2013, at 6:14 AM, Edward Capriolo <> wrote:
>>> Imho it is probably more efficient for wide. When you decompress 8k blocks 
>>> to get at a 200 byte row you create overhead , particularly young gen.
>>> On Monday, March 18, 2013, Sylvain Lebresne <> wrote:
>>> > The way compression is implemented, it is oblivious to the CF being 
>>> > wide-row or narrow-row. There is nothing intrinsically less efficient in 
>>> > the compression for wide-rows.
>>> > --
>>> > Sylvain
>>> >
>>> > On Fri, Mar 15, 2013 at 11:53 PM, Drew Kutcharian <> wrote:
>>> >>
>>> >> Hey Guys,
>>> >>
>>> >> I remember reading somewhere that C* compression is not very effective 
>>> >> when most of the CFs are in wide-row format and some folks turn the 
>>> >> compression off and use disk level compression as a workaround. 
>>> >> Considering that wide rows with composites are "first class citizens" in 
>>> >> CQL3, is this still the case? Has there been any improvements on this?
>>> >>
>>> >> Thanks,
>>> >>
>>> >> Drew
>>> >

Reply via email to