Re: Cassandra Compression and Wide Rows

aaron morton Wed, 20 Mar 2013 01:49:35 -0700

Yes. 
The block size is specified as part of the compression options for the CF / 
Table.


Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/03/2013, at 5:31 AM, Drew Kutcharian <d...@venarc.com> wrote:

> Thanks Sylvain. So C* compression is block based and has nothing to do with 
> format of the rows.
> 
> On Mar 19, 2013, at 1:31 AM, Sylvain Lebresne <sylv...@datastax.com> wrote:
> 
>> That's just describing what compression is about. Compression (not in C*, in 
>> general) is based on recognizing repeated pattern.
>> 
>> So yes, in that sense, static column families are more likely to yield 
>> better compression ratio because it is more likely to have repeated patterns 
>> in the compressed blocks. But:
>> 1) it doesn't necessarily mean that wide column families won't have a good 
>> compression ratio per se.
>> 2) you can absolutely have crappy compression ratio with a static column 
>> family. Just create a column family where each row has 1 column 'image' that 
>> contains a png.
>> 
>> And to come back to your initial question, I highly doubt disk level 
>> compression would be much of a workaround because again, that's more about 
>> how compression is working than how Cassandra use it.
>> 
>> At the end of the day, I really think the best choice is to try it and 
>> decide for yourself if it does more good than harm or the converse.
>> 
>> --
>> Sylvain  
>> 
>> 
>> On Tue, Mar 19, 2013 at 3:58 AM, Drew Kutcharian <d...@venarc.com> wrote:
>> Edward/Sylvain,
>> 
>> I also came across this post on DataStax's blog:
>> 
>>> When to use compression
>>> Compression is best suited for ColumnFamilies where there are many rows, 
>>> with each row having the same columns, or at least many columns in common. 
>>> For example, a ColumnFamily containing user data such as username, email, 
>>> etc., would be a good candidate for compression. The more similar the data 
>>> across rows, the greater the compression ratio will be, and the larger the 
>>> gain in read performance.
>>> Compression is not as good a fit for ColumnFamilies where each row has a 
>>> different set of columns, or where there are just a few very wide rows. 
>>> Dynamic column families such as this will not yield good compression ratios.
>> 
>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
>> 
>> @Sylvain, does this still apply on more recent versions of C*?
>> 
>> 
>> -- Drew
>> 
>> 
>> 
>> On Mar 18, 2013, at 7:16 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
>> 
>>> I feel this has come up before. I believe the compression is block based, 
>>> so just because no two column names are the same does not mean the 
>>> compression will not be effective. Possibly in their case the compression 
>>> was not effective.
>>> 
>>> On Mon, Mar 18, 2013 at 9:08 PM, Drew Kutcharian <d...@venarc.com> wrote:
>>> That's what I originally thought but the OOYALA presentation from C*2012 
>>> got me confused. Do you guys know what's going on here?
>>> 
>>> The video: 
>>> http://www.youtube.com/watch?v=r2nGBUuvVmc&feature=player_detailpage#t=790s
>>> The slides: Slide 22 @ 
>>> http://www.datastax.com/wp-content/uploads/2012/08/C2012-Hastur-NoahGibbs.pdf
>>> 
>>> -- Drew
>>> 
>>> 
>>> On Mar 18, 2013, at 6:14 AM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
>>> 
>>>> 
>>>> Imho it is probably more efficient for wide. When you decompress 8k blocks 
>>>> to get at a 200 byte row you create overhead , particularly young gen.
>>>> On Monday, March 18, 2013, Sylvain Lebresne <sylv...@datastax.com> wrote:
>>>> > The way compression is implemented, it is oblivious to the CF being 
>>>> > wide-row or narrow-row. There is nothing intrinsically less efficient in 
>>>> > the compression for wide-rows.
>>>> > --
>>>> > Sylvain
>>>> >
>>>> > On Fri, Mar 15, 2013 at 11:53 PM, Drew Kutcharian <d...@venarc.com> 
>>>> > wrote:
>>>> >>
>>>> >> Hey Guys,
>>>> >>
>>>> >> I remember reading somewhere that C* compression is not very effective 
>>>> >> when most of the CFs are in wide-row format and some folks turn the 
>>>> >> compression off and use disk level compression as a workaround. 
>>>> >> Considering that wide rows with composites are "first class citizens" 
>>>> >> in CQL3, is this still the case? Has there been any improvements on 
>>>> >> this?
>>>> >>
>>>> >> Thanks,
>>>> >>
>>>> >> Drew
>>>> >
>>> 
>>> 
>> 
>> 
>

Re: Cassandra Compression and Wide Rows

Reply via email to