Vladimir, thank you for detailed explanation. I think I've understanded the main idea of described storage compression.
I'll join the new discussion after researching of given material and comlpetion of varint-optimization [1]. [1] https://issues.apache.org/jira/browse/IGNITE-5097 2017-08-02 15:43 GMT+03:00 Alexey Kuznetsov <akuznet...@apache.org>: > Vova, > > Finally we back to my initial idea - to look how "big databases compress" > data :) > > > Just to remind how IBM DB2 do this[1]. > > [1] http://www.ibm.com/developerworks/data/library/techarticle/dm- > 1205db210compression/ > > On Tue, Aug 1, 2017 at 4:15 PM, Vladimir Ozerov <voze...@gridgain.com> > wrote: > > > Vyacheslav, > > > > This is not about my needs, but about the product :-) BinaryObject is a > > central entity used for both data transfer and data storage. This is both > > good and bad at the same time. > > > > Good thing is that as we optimize binary protocol, we improve both > network > > and storage performance at the same time. We have at least 3 things which > > will be included into the product soon: varint encoding [1], optimized > > string encoding [2] and null-field optimization [3]. Bad thing is that > > binary object format is not well suited for data storage optimizations, > > including compression. For example, one good compression technique is to > > organize data in column-store format, or to introduce shared "dictionary" > > with unique values on cache level. In both cases N equal values are not > > stored N times. Instead, we store one value and N references to it, or > so. > > This way 2x-10x compression is possible depending on workload type. > Binary > > object protocol with some compression on top of it cannot give such > > improvement, because it will compress data in individual objects, instead > > of compressing the whole cache data in a single context. > > > > That said, I propose to give up adding compression to BinaryObject. This > is > > a dead end. Instead, we should: > > 1) Optimize protocol itself to be more compact, as described in > > aforementioned Ignite tickets > > 2) Start new discussion about storage compression > > > > You can read papers of other vendors to get better understanding on > > possible compression options. E.g. Oracle has a lot of compression > > techniques, including heat maps, background compression, per-block > > compression, data dictionaries, etc. [4]. > > > > [1] https://issues.apache.org/jira/browse/IGNITE-5097 > > [2] https://issues.apache.org/jira/browse/IGNITE-5655 > > [3] https://issues.apache.org/jira/browse/IGNITE-3939 > > [4] http://www.oracle.com/technetwork/database/options/ > > compression/advanced- > > compression-wp-12c-1896128.pdf > > > > Vladimir. > > > > > > -- > Alexey Kuznetsov > -- Best Regards, Vyacheslav D.