FYI, I created issue for Ignite 2.0: https://issues.apache.org/jira/browse/IGNITE-3592
Thanks! On Wed, Jul 27, 2016 at 2:36 PM, Sergi Vladykin <sergi.vlady...@gmail.com> wrote: > Nikita, > > I agree with Andrey, HANA is a bad comparison to Ignite in this respect. I > did not find any evidence on the internet that their row store is very > efficient with compression. It was always about column store. > > Alexey, > > As for DB2, can you check what exactly, when and how it compresses and does > it give any decent results before suggesting it as an example to follow? I > don't think it is good idea to repeat every bad idea after other products. > > And even if there are good results in DB2, will this all be applicable to > Ignite? PostgreSql for example provides TOAST compression and this can be > useful when used in a smart way, but this is a very different architecture > from what we have. > > All in all I agree that may be we should provide some kind of pluggable > compression SPI support, but do not expect much from it, usually it will be > just useless. > > Sergi > > > > 2016-07-27 10:16 GMT+03:00 Sebastien DIAZ <sebastien.d...@gmail.com>: > > > Hi > > > > I add Redis as a sample of memory compression strategy > > > > http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/ > > > > http://redis.io/topics/memory-optimization > > > > Regards > > > > S DIAZ > > > > > > > > 2016-07-27 8:24 GMT+02:00 Alexey Kuznetsov <akuznet...@gridgain.com>: > > > > > Nikita, > > > > > > That was my intention: "we may need to provide a better facility to > > inject > > > user's logic here..." > > > > > > Andrey, > > > About compression, once again - DB2 is a row-based DB and they can > > compress > > > :) > > > > > > On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <nivano...@gmail.com> > > > wrote: > > > > > > > Very good points indeed. I get the compression in Ignite question > quite > > > > often and Hana reference is a typical lead in. > > > > > > > > My personal opinion is still that in Ignite *specifically* the > > > compression > > > > is best left to the end-user. But we may need to provide a better > > > facility > > > > to inject user's logic here... > > > > > > > > -- > > > > Nikita Ivanov > > > > > > > > > > > > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev < > > andrewkor...@hotmail.com > > > > > > > > wrote: > > > > > > > > > Dictionary compression requires some knowledge about data being > > > > > compressed. For example, for numeric types a range of values must > be > > > > known > > > > > so that the dictionary can be generated. For strings, the number of > > > > unique > > > > > values of the column is the key piece of input into the dictionary > > > > > generation. > > > > > SAP HANA is a column-based database system: it stores the fields of > > the > > > > > data tuple individually using the best compression for the given > data > > > > type > > > > > and the particular set of values. HANA has been specifically built > > as a > > > > > general purpose database, rather than as an afterthought layer on > top > > > of > > > > an > > > > > already existing distributed cache. > > > > > On the other hand, Ignite is a distributed cache implementation (a > > > pretty > > > > > good one!) that in general requires no schema and stores its data > in > > > the > > > > > row-based fashion. Its current design doesn't land itself readily > to > > > the > > > > > kind of optimizations HANA provides out of the box. > > > > > For the curios types among us, the implementation details of HANA > are > > > > well > > > > > documented in "In-memory Data Management", by Hasso Plattner & > > > Alexander > > > > > Zeier. > > > > > Cheers > > > > > Andrey > > > > > _____________________________ > > > > > From: Alexey Kuznetsov <akuznet...@gridgain.com<mailto: > > > > > akuznet...@gridgain.com>> > > > > > Sent: Tuesday, July 26, 2016 5:36 AM > > > > > Subject: Re: Data compression in Ignite 2.0 > > > > > To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>> > > > > > > > > > > > > > > > Sergey Kozlov wrote: > > > > > >> For approach 1: Put a large object into a partition cache will > > > > > force to update > > > > > the dictionary placed on replication cache. It may be time-expense > > > > > operation. > > > > > The dictionary will be built only once. And we could control what > > > should > > > > be > > > > > put into dictionary, for example, we could check min and max size > and > > > > > decide - put value to dictionary or not. > > > > > > > > > > >> Approach 2-3 are make sense for rare cases as Sergi commented. > > > > > But it is better at least have a possibility to plug user code for > > > > > compression than not to have it at all. > > > > > > > > > > >> Also I see a danger of OOM if we've got high compression level > and > > > try > > > > > to restore original value in memory. > > > > > We could easily get OOM with many other operations right now > without > > > > > compression, I think it is not an issue, we could add a NOTE to > > > > > documentation about such possibility. > > > > > > > > > > Andrey Kornev wrote: > > > > > >> ... in general I think compression is a great data. The cleanest > > way > > > > to > > > > > achieve that would be to just make it possible to chain the > > > > marshallers... > > > > > I think it is also good idea. And looks like it could be used for > > > > > compression with some sort of ZIP algorithm, but how to deal with > > > > > compression by dictionary substitution? > > > > > We need to build dictionary first. Any ideas? > > > > > > > > > > Nikita Ivanov wrote: > > > > > >> SAP Hana does the compression by 1) compressing SQL parameters > > > before > > > > > execution... > > > > > Looks interesting, but my initial point was about compression of > > cache > > > > > data, not SQL queries. > > > > > My idea was to make compression transparent for SQL engine when it > > will > > > > > lookup for data. > > > > > > > > > > But idea of compressing SQL queries result looks very interesting, > > > > because > > > > > it is known fact, that SQL engine could consume quite a lot of heap > > for > > > > > storing result sets. > > > > > I think this should be discussed in separate thread. > > > > > > > > > > Just for you information, in first message I mentioned that DB2 has > > > > > compression by dictionary and according to them it is possible to > > > > > compress usual data to 50-80%. > > > > > I have some experience with DB2 and can confirm this. > > > > > > > > > > -- > > > > > Alexey Kuznetsov > > > > > > > > > > > > > -- > > > Alexey Kuznetsov > > > > > > -- Alexey Kuznetsov GridGain Systems www.gridgain.com