Hi, should I close the initial ticket [1] as "Won't Fix" and add link to the new discusion about storage compression [2] in comments?
[1] https://issues.apache.org/jira/browse/IGNITE-3592 [2] http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-td20679.html 2017-08-09 23:05 GMT+03:00 Vyacheslav Daradur <daradu...@gmail.com>: > Vladimir, thank you for detailed explanation. > > I think I've understanded the main idea of described storage compression. > > I'll join the new discussion after researching of given material and > comlpetion of varint-optimization [1]. > > [1] https://issues.apache.org/jira/browse/IGNITE-5097 > > 2017-08-02 15:43 GMT+03:00 Alexey Kuznetsov <akuznet...@apache.org>: > >> Vova, >> >> Finally we back to my initial idea - to look how "big databases compress" >> data :) >> >> >> Just to remind how IBM DB2 do this[1]. >> >> [1] http://www.ibm.com/developerworks/data/library/techarticle/dm- >> 1205db210compression/ >> <http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/> >> >> On Tue, Aug 1, 2017 at 4:15 PM, Vladimir Ozerov <voze...@gridgain.com> >> wrote: >> >> > Vyacheslav, >> > >> > This is not about my needs, but about the product :-) BinaryObject is a >> > central entity used for both data transfer and data storage. This is >> both >> > good and bad at the same time. >> > >> > Good thing is that as we optimize binary protocol, we improve both >> network >> > and storage performance at the same time. We have at least 3 things >> which >> > will be included into the product soon: varint encoding [1], optimized >> > string encoding [2] and null-field optimization [3]. Bad thing is that >> > binary object format is not well suited for data storage optimizations, >> > including compression. For example, one good compression technique is to >> > organize data in column-store format, or to introduce shared >> "dictionary" >> > with unique values on cache level. In both cases N equal values are not >> > stored N times. Instead, we store one value and N references to it, or >> so. >> > This way 2x-10x compression is possible depending on workload type. >> Binary >> > object protocol with some compression on top of it cannot give such >> > improvement, because it will compress data in individual objects, >> instead >> > of compressing the whole cache data in a single context. >> > >> > That said, I propose to give up adding compression to BinaryObject. >> This is >> > a dead end. Instead, we should: >> > 1) Optimize protocol itself to be more compact, as described in >> > aforementioned Ignite tickets >> > 2) Start new discussion about storage compression >> > >> > You can read papers of other vendors to get better understanding on >> > possible compression options. E.g. Oracle has a lot of compression >> > techniques, including heat maps, background compression, per-block >> > compression, data dictionaries, etc. [4]. >> > >> > [1] https://issues.apache.org/jira/browse/IGNITE-5097 >> > [2] https://issues.apache.org/jira/browse/IGNITE-5655 >> > [3] https://issues.apache.org/jira/browse/IGNITE-3939 >> > [4] http://www.oracle.com/technetwork/database/options/ >> > compression/advanced- >> > compression-wp-12c-1896128.pdf >> > >> > Vladimir. >> > >> > >> >> -- >> Alexey Kuznetsov >> > > > > -- > Best Regards, Vyacheslav D. > -- Best Regards, Vyacheslav D.