Vova, Finally we back to my initial idea - to look how "big databases compress" data :)
Just to remind how IBM DB2 do this[1]. [1] http://www.ibm.com/developerworks/data/library/techarticle/dm- 1205db210compression/ On Tue, Aug 1, 2017 at 4:15 PM, Vladimir Ozerov <voze...@gridgain.com> wrote: > Vyacheslav, > > This is not about my needs, but about the product :-) BinaryObject is a > central entity used for both data transfer and data storage. This is both > good and bad at the same time. > > Good thing is that as we optimize binary protocol, we improve both network > and storage performance at the same time. We have at least 3 things which > will be included into the product soon: varint encoding [1], optimized > string encoding [2] and null-field optimization [3]. Bad thing is that > binary object format is not well suited for data storage optimizations, > including compression. For example, one good compression technique is to > organize data in column-store format, or to introduce shared "dictionary" > with unique values on cache level. In both cases N equal values are not > stored N times. Instead, we store one value and N references to it, or so. > This way 2x-10x compression is possible depending on workload type. Binary > object protocol with some compression on top of it cannot give such > improvement, because it will compress data in individual objects, instead > of compressing the whole cache data in a single context. > > That said, I propose to give up adding compression to BinaryObject. This is > a dead end. Instead, we should: > 1) Optimize protocol itself to be more compact, as described in > aforementioned Ignite tickets > 2) Start new discussion about storage compression > > You can read papers of other vendors to get better understanding on > possible compression options. E.g. Oracle has a lot of compression > techniques, including heat maps, background compression, per-block > compression, data dictionaries, etc. [4]. > > [1] https://issues.apache.org/jira/browse/IGNITE-5097 > [2] https://issues.apache.org/jira/browse/IGNITE-5655 > [3] https://issues.apache.org/jira/browse/IGNITE-3939 > [4] http://www.oracle.com/technetwork/database/options/ > compression/advanced- > compression-wp-12c-1896128.pdf > > Vladimir. > > -- Alexey Kuznetsov