Re: Cassandra Compression and Wide Rows

Drew Kutcharian Mon, 18 Mar 2013 19:59:17 -0700

Edward/Sylvain,

I also came across this post on DataStax's blog:


> When to use compression
> Compression is best suited for ColumnFamilies where there are many rows, with 
> each row having the same columns, or at least many columns in common. For 
> example, a ColumnFamily containing user data such as username, email, etc., 
> would be a good candidate for compression. The more similar the data across 
> rows, the greater the compression ratio will be, and the larger the gain in 
> read performance.
> Compression is not as good a fit for ColumnFamilies where each row has a 
> different set of columns, or where there are just a few very wide rows. 
> Dynamic column families such as this will not yield good compression ratios.

http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression

@Sylvain, does this still apply on more recent versions of C*?


-- Drew



On Mar 18, 2013, at 7:16 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote:

> I feel this has come up before. I believe the compression is block based, so 
> just because no two column names are the same does not mean the compression 
> will not be effective. Possibly in their case the compression was not 
> effective.
> 
> On Mon, Mar 18, 2013 at 9:08 PM, Drew Kutcharian <d...@venarc.com> wrote:
> That's what I originally thought but the OOYALA presentation from C*2012 got 
> me confused. Do you guys know what's going on here?
> 
> The video: 
> http://www.youtube.com/watch?v=r2nGBUuvVmc&feature=player_detailpage#t=790s
> The slides: Slide 22 @ 
> http://www.datastax.com/wp-content/uploads/2012/08/C2012-Hastur-NoahGibbs.pdf
> 
> -- Drew
> 
> 
> On Mar 18, 2013, at 6:14 AM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
> 
>> 
>> Imho it is probably more efficient for wide. When you decompress 8k blocks 
>> to get at a 200 byte row you create overhead , particularly young gen.
>> On Monday, March 18, 2013, Sylvain Lebresne <sylv...@datastax.com> wrote:
>> > The way compression is implemented, it is oblivious to the CF being 
>> > wide-row or narrow-row. There is nothing intrinsically less efficient in 
>> > the compression for wide-rows.
>> > --
>> > Sylvain
>> >
>> > On Fri, Mar 15, 2013 at 11:53 PM, Drew Kutcharian <d...@venarc.com> wrote:
>> >>
>> >> Hey Guys,
>> >>
>> >> I remember reading somewhere that C* compression is not very effective 
>> >> when most of the CFs are in wide-row format and some folks turn the 
>> >> compression off and use disk level compression as a workaround. 
>> >> Considering that wide rows with composites are "first class citizens" in 
>> >> CQL3, is this still the case? Has there been any improvements on this?
>> >>
>> >> Thanks,
>> >>
>> >> Drew
>> >
> 
>

Re: Cassandra Compression and Wide Rows

Reply via email to