I wouldn’t do both. Unless a little server CPU or (and you’d have to measure it - I imagine it is probably not significant - as you say C* has more context, and hopefully most things can compress “0, “ repeatedly) disk space are an issue, I wouldn’t bother to compress yourself. Compression across the wire is good of course (client side CPU a wash, and server CPU we already mentioned anyway)
On a side note, perhaps your object model should address the redundancy, though of course this is perhaps equivalent to the complexity of doing client side compression, IDK. We do have one table where we keep compressed blobs, but that is because those are natural from an application perspective, and so we just turn off C* table compression for those (there isn’t much other data there). Note, I haven’t been tracking it recently, but certainly in the past the compression code path on the C* had to do more data copies, but this is not likely significant unless your case is special. I believe this has been/will be improved in 2.1 or 3. > On Nov 3, 2014, at 9:40 AM, DuyHai Doan <doanduy...@gmail.com> wrote: > > Hello Robin > > You have many options for compression in C*: > > 1) Serialized in bytes instead of JSON, to save a lot of space due to String > encoding. Of course the data will be opaque and not human readable > > 2) Activate client-node data compression. In this case, do not forget to ship > LZ4 or SNAPPY dependency on the client side. > > On the server-side, data compression is active by default using LZ4 when > you're creating a new table so there is pretty much nothing to do. > > It's up to you to consider whether the compression ratio difference between > Gzip and LZ4 does worth relying on C* compression. > > > Regards > > > On Mon, Nov 3, 2014 at 3:51 PM, Robin Verlangen <ro...@us2.nl > <mailto:ro...@us2.nl>> wrote: > Hi there, > > We're working on a project which is going to store a lot of JSON objects in > Cassandra. A large piece of this (90%) consists of an array of integers, of > which in a lot of cases there are a bunch of zeroes. > > The average JSON is 4KB in size, and once GZIP (default compression) just > under 100 bytes. > > My question is, should we compress client-side (literally converting JSON > string to compressed gzip bytes), let Cassandra do the work, or do both? > > From my point of view I think Cassandra would be better, as it could compress > beyond a single value, using large blocks within a row / SSTable. > > Thank you in advance for your help. > > Best regards, > > Robin Verlangen > Chief Data Architect > > W http://www.robinverlangen.nl <http://www.robinverlangen.nl/> > E ro...@us2.nl <mailto:ro...@us2.nl> > > <http://goo.gl/Lt7BC> > What is CloudPelican? <http://goo.gl/HkB3D> > > Disclaimer: The information contained in this message and attachments is > intended solely for the attention and use of the named addressee and may be > confidential. If you are not the intended recipient, you are reminded that > the information remains the property of the sender. You must not use, > disclose, distribute, copy, print or rely on this e-mail. If you have > received this message in error, please contact the sender immediately and > irrevocably delete this message and any copies. >
smime.p7s
Description: S/MIME cryptographic signature