Hi Igniters! Ilya, I'm glad to see one more person who is interested in the compression feature in Ignite.
I looked through the pull request and want to share following thoughts: It's very dangerous using a custom algorithm in this way - you store serialized data separate from a dictionary and there are a lot of points when we may lose data: rebalancing, serialization errors, node rebooting and so on. I'd suggest the following ways to improve reliability: - use well know algorithms: zstd, deflater, lzma, gzip e.g. that allows us to decompress data in any situation - store the dictionary inside page with data Also, we have a lot of discussions [1] [2] about compression on BinaryObject and BinaryMarshaller level and Vladimir Ozerov was strictly against a compression on this level. If something has changed since then, you may look through [1] [2] [3] I've done a lot of research in algorithms comparison it may be useful for you. [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-2-0-td10099.html [2] http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-td20679.html [3] https://issues.apache.org/jira/browse/IGNITE-3592 [4] https://issues.apache.org/jira/browse/IGNITE-5226 [5] https://github.com/daradurvs/ignite-compression On Sat, Aug 25, 2018 at 2:51 AM Denis Magda <dma...@apache.org> wrote: > > > > > Currently, the dictionary for decompression is only stored on heap. After > > restart there's compressed data in the PDS, but there's no dictionary :) > > > Basically, it means that I've lost my data, right? How about persisting > data to disk. > > Overall, we need Vladimir Ozerov to check the contribution. He was the one > who sponsored the IEP and knows the area best. > > -- > Denis > > On Fri, Aug 24, 2018 at 4:31 AM Ilya Kasnacheev <ilya.kasnach...@gmail.com> > wrote: > > > Hello! > > > > It is somewhat a part of IEP-20, since I have updated it with this > > particular direction. > > > > Regards, > > > > -- > > Ilya Kasnacheev > > > > 2018-08-24 2:56 GMT+03:00 Denis Magda <dma...@apache.org>: > > > > > Hi Ilya, > > > > > > Sounds terrific! Is this part of the following Ignite enhancement > > proposal? > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP- > > > 20%3A+Data+Compression+in+Ignite > > > > > > -- > > > Denis > > > > > > On Thu, Aug 23, 2018 at 5:17 AM Ilya Kasnacheev < > > ilya.kasnach...@gmail.com > > > > > > > wrote: > > > > > > > Hello! > > > > > > > > My plan was to add a compression section to cache configuration, where > > > you > > > > can enable compression, enable key compression (which has heavier > > > > performance implications), adjust dictionary gathering settings, and in > > > the > > > > future possibly choose betwen algorithms. In fact I'm not sure, since > > my > > > > assumption is that you can always just use latest&greatest, but maybe > > we > > > > can have e.g. very fast and not very strong vs. slower but stronger > > one. > > > > > > > > I'm not sure yet if we should share dictionary between all caches vs. > > > > having separate dictionary for every cache. > > > > > > > > With regards to data format, of course there will be room for further > > > > extension. > > > > > > > > Regards, > > > > > > > > -- > > > > Ilya Kasnacheev > > > > > > > > 2018-08-23 15:13 GMT+03:00 Sergey Kozlov <skoz...@gridgain.com>: > > > > > > > > > Hi Ilya > > > > > > > > > > Is there a plan to introduce it as an option of Ignite configuration? > > > In > > > > > that instead the boolean type I suggest to use the enum and reserve > > the > > > > > ability to extend compressions algorithms in future > > > > > > > > > > On Thu, Aug 23, 2018 at 1:09 PM, Ilya Kasnacheev < > > > > > ilya.kasnach...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hello! > > > > > > > > > > > > I want to share with the developer community my compression > > > prototype. > > > > > > > > > > > > Long story short, it compresses BinaryObject's byte[] as they are > > > > written > > > > > > to Durable Memory page, operating on a pre-built dictionary. > > Typical > > > > > > compression ratio is 0.4 (meaning 2.5x compression) using custom > > > > > > LZW+Huffman. Metadata, indexes and primitive values are unaffected > > > > > > entirely. > > > > > > > > > > > > This is akin to DB2's table-level compression[1] but independently > > > > > > invented. > > > > > > > > > > > > On Yardstick tests performance hit is -6% with PDS and up to -25% > > (in > > > > > > throughput) with In-Memory loads. It also means you can fit ~twice > > as > > > > > much > > > > > > data into the same IM cluster, or have higher ram/disk ratio with > > PDS > > > > > > cluster, saving on hardware or decreasing latency. > > > > > > > > > > > > The code is available as PR 4295[2] (set > > > IGNITE_ENABLE_COMPRESSION=true > > > > > to > > > > > > activate). Note that it will not presently survive a PDS node > > > restart. > > > > > > The impact is very small, the patch should be applicable to most > > 2.x > > > > > > releases. > > > > > > > > > > > > Sure there's a long way before this prototype can have hope of > > being > > > > > > included, but first I would like to hear input from fellow > > igniters. > > > > > > > > > > > > See also IEP-20[3]. > > > > > > > > > > > > 1. > > > > > > https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10. > > > > > > 5.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0052331.html > > > > > > 2. https://github.com/apache/ignite/pull/4295 > > > > > > 3. > > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP- > > > > > > 20%3A+Data+Compression+in+Ignite > > > > > > > > > > > > Regards, > > > > > > > > > > > > -- > > > > > > Ilya Kasnacheev > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sergey Kozlov > > > > > GridGain Systems > > > > > www.gridgain.com > > > > > > > > > > > > > > -- Best Regards, Vyacheslav D.