Re: Client-side compression, cassandra or both?

graham sanderson Mon, 03 Nov 2014 10:40:12 -0800

I wouldn’t do both.
Unless a little server CPU or (and you’d have to measure it - I imagine it is 
probably not significant - as you say C* has more context, and hopefully most 
things can compress “0, “ repeatedly) disk space are an issue, I wouldn’t 
bother to compress yourself. Compression across the wire is good of course 
(client side CPU a wash, and server CPU we already mentioned anyway)


On a side note, perhaps your object model should address the redundancy, though 
of course this is perhaps equivalent to the complexity of doing client side 
compression, IDK.

We do have one table where we keep compressed blobs, but that is because those 
are natural from an application perspective, and so we just turn off C* table 
compression for those (there isn’t much other data there).

Note, I haven’t been tracking it recently, but certainly in the past the 
compression code path on the C* had to do more data copies, but this is not 
likely significant unless your case is special. I believe this has been/will be 
improved in 2.1 or 3.

> On Nov 3, 2014, at 9:40 AM, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> Hello Robin
> 
>  You have many options for compression in C*:
> 
> 1) Serialized in bytes instead of JSON, to save a lot of space due to String 
> encoding. Of course the data will be opaque and not human readable
> 
> 2) Activate client-node data compression. In this case, do not forget to ship 
> LZ4 or SNAPPY dependency on the client side. 
> 
> On the server-side, data compression is active by default using LZ4 when 
> you're creating a new table so there is pretty much nothing to do.
> 
>  It's up to you to consider whether the compression ratio difference between 
> Gzip and LZ4 does worth relying on C* compression.
> 
> 
> Regards
> 
> 
> On Mon, Nov 3, 2014 at 3:51 PM, Robin Verlangen <ro...@us2.nl 
> <mailto:ro...@us2.nl>> wrote:
> Hi there,
> 
> We're working on a project which is going to store a lot of JSON objects in 
> Cassandra. A large piece of this (90%) consists of an array of integers, of 
> which in a lot of cases there are a bunch of zeroes. 
> 
> The average JSON is 4KB in size, and once GZIP (default compression) just 
> under 100 bytes. 
> 
> My question is, should we compress client-side (literally converting JSON 
> string to compressed gzip bytes), let Cassandra do the work, or do both?
> 
> From my point of view I think Cassandra would be better, as it could compress 
> beyond a single value, using large blocks within a row / SSTable.
> 
> Thank you in advance for your help.
> 
> Best regards, 
> 
> Robin Verlangen
> Chief Data Architect
> 
> W http://www.robinverlangen.nl <http://www.robinverlangen.nl/>
> E ro...@us2.nl <mailto:ro...@us2.nl>
> 
>  <http://goo.gl/Lt7BC>
> What is CloudPelican? <http://goo.gl/HkB3D>
> 
> Disclaimer: The information contained in this message and attachments is 
> intended solely for the attention and use of the named addressee and may be 
> confidential. If you are not the intended recipient, you are reminded that 
> the information remains the property of the sender. You must not use, 
> disclose, distribute, copy, print or rely on this e-mail. If you have 
> received this message in error, please contact the sender immediately and 
> irrevocably delete this message and any copies.
>

smime.p7s
Description: S/MIME cryptographic signature

Re: Client-side compression, cassandra or both?

Reply via email to