Re: Data compression in Ignite 2.0

Alexey Kuznetsov Wed, 27 Jul 2016 05:06:29 -0700

FYI, I created issue for Ignite 2.0:
https://issues.apache.org/jira/browse/IGNITE-3592


Thanks!

On Wed, Jul 27, 2016 at 2:36 PM, Sergi Vladykin <sergi.vlady...@gmail.com>
wrote:

> Nikita,
>
> I agree with Andrey, HANA is a bad comparison to Ignite in this respect. I
> did not find any evidence on the internet that their row store is very
> efficient with compression. It was always about column store.
>
> Alexey,
>
> As for DB2, can you check what exactly, when and how it compresses and does
> it give any decent results before suggesting it as an example to follow? I
> don't think it is good idea to repeat every bad idea after other products.
>
> And even if there are good results in DB2, will this all be applicable to
> Ignite? PostgreSql for example provides TOAST compression and this can be
> useful when used in a smart way, but this is a very different architecture
> from what we have.
>
> All in all I agree that may be we should provide some kind of pluggable
> compression SPI support, but do not expect much from it, usually it will be
> just useless.
>
> Sergi
>
>
>
> 2016-07-27 10:16 GMT+03:00 Sebastien DIAZ <sebastien.d...@gmail.com>:
>
> > Hi
> >
> > I add Redis as a sample of memory compression strategy
> >
> > http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/
> >
> > http://redis.io/topics/memory-optimization
> >
> > Regards
> >
> > S DIAZ
> >
> >
> >
> > 2016-07-27 8:24 GMT+02:00 Alexey Kuznetsov <akuznet...@gridgain.com>:
> >
> > > Nikita,
> > >
> > > That was my intention: "we may need to provide a better facility to
> > inject
> > > user's logic here..."
> > >
> > > Andrey,
> > > About compression, once again - DB2 is a row-based DB and they can
> > compress
> > > :)
> > >
> > > On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <nivano...@gmail.com>
> > > wrote:
> > >
> > > > Very good points indeed. I get the compression in Ignite question
> quite
> > > > often and Hana reference is a typical lead in.
> > > >
> > > > My personal opinion is still that in Ignite *specifically* the
> > > compression
> > > > is best left to the end-user. But we may need to provide a better
> > > facility
> > > > to inject user's logic here...
> > > >
> > > > --
> > > > Nikita Ivanov
> > > >
> > > >
> > > > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <
> > andrewkor...@hotmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Dictionary compression requires some knowledge about data being
> > > > > compressed. For example, for numeric types a range of values must
> be
> > > > known
> > > > > so that the dictionary can be generated. For strings, the number of
> > > > unique
> > > > > values of the column is the key piece of input into the dictionary
> > > > > generation.
> > > > > SAP HANA is a column-based database system: it stores the fields of
> > the
> > > > > data tuple individually using the best compression for the given
> data
> > > > type
> > > > > and the particular set of values. HANA has been specifically built
> > as a
> > > > > general purpose database, rather than as an afterthought layer on
> top
> > > of
> > > > an
> > > > > already existing distributed cache.
> > > > > On the other hand, Ignite is a distributed cache implementation (a
> > > pretty
> > > > > good one!) that in general requires no schema and stores its data
> in
> > > the
> > > > > row-based fashion. Its current design doesn't land itself readily
> to
> > > the
> > > > > kind of optimizations HANA provides out of the box.
> > > > > For the curios types among us, the implementation details of HANA
> are
> > > > well
> > > > > documented in "In-memory Data Management", by Hasso Plattner &
> > > Alexander
> > > > > Zeier.
> > > > > Cheers
> > > > > Andrey
> > > > > _____________________________
> > > > > From: Alexey Kuznetsov <akuznet...@gridgain.com<mailto:
> > > > > akuznet...@gridgain.com>>
> > > > > Sent: Tuesday, July 26, 2016 5:36 AM
> > > > > Subject: Re: Data compression in Ignite 2.0
> > > > > To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>>
> > > > >
> > > > >
> > > > > Sergey Kozlov wrote:
> > > > > >> For approach 1: Put a large object into a partition cache will
> > > > > force to update
> > > > > the dictionary placed on replication cache. It may be time-expense
> > > > > operation.
> > > > > The dictionary will be built only once. And we could control what
> > > should
> > > > be
> > > > > put into dictionary, for example, we could check min and max size
> and
> > > > > decide - put value to dictionary or not.
> > > > >
> > > > > >> Approach 2-3 are make sense for rare cases as Sergi commented.
> > > > > But it is better at least have a possibility to plug user code for
> > > > > compression than not to have it at all.
> > > > >
> > > > > >> Also I see a danger of OOM if we've got high compression level
> and
> > > try
> > > > > to restore original value in memory.
> > > > > We could easily get OOM with many other operations right now
> without
> > > > > compression, I think it is not an issue, we could add a NOTE to
> > > > > documentation about such possibility.
> > > > >
> > > > > Andrey Kornev wrote:
> > > > > >> ... in general I think compression is a great data. The cleanest
> > way
> > > > to
> > > > > achieve that would be to just make it possible to chain the
> > > > marshallers...
> > > > > I think it is also good idea. And looks like it could be used for
> > > > > compression with some sort of ZIP algorithm, but how to deal with
> > > > > compression by dictionary substitution?
> > > > > We need to build dictionary first. Any ideas?
> > > > >
> > > > > Nikita Ivanov wrote:
> > > > > >> SAP Hana does the compression by 1) compressing SQL parameters
> > > before
> > > > > execution...
> > > > > Looks interesting, but my initial point was about compression of
> > cache
> > > > > data, not SQL queries.
> > > > > My idea was to make compression transparent for SQL engine when it
> > will
> > > > > lookup for data.
> > > > >
> > > > > But idea of compressing SQL queries result looks very interesting,
> > > > because
> > > > > it is known fact, that SQL engine could consume quite a lot of heap
> > for
> > > > > storing result sets.
> > > > > I think this should be discussed in separate thread.
> > > > >
> > > > > Just for you information, in first message I mentioned that DB2 has
> > > > > compression by dictionary and according to them it is possible to
> > > > > compress usual data to 50-80%.
> > > > > I have some experience with DB2 and can confirm this.
> > > > >
> > > > > --
> > > > > Alexey Kuznetsov
> > > >
> > >
> > >
> > > --
> > > Alexey Kuznetsov
> > >
> >
>



-- 
Alexey Kuznetsov
GridGain Systems
www.gridgain.com

Re: Data compression in Ignite 2.0

Reply via email to