Re: Data compression in Ignite 2.0

Sebastien DIAZ Wed, 27 Jul 2016 00:17:58 -0700

Hi

I add Redis as a sample of memory compression strategy


http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/

http://redis.io/topics/memory-optimization

Regards

S DIAZ



2016-07-27 8:24 GMT+02:00 Alexey Kuznetsov <akuznet...@gridgain.com>:

> Nikita,
>
> That was my intention: "we may need to provide a better facility to inject
> user's logic here..."
>
> Andrey,
> About compression, once again - DB2 is a row-based DB and they can compress
> :)
>
> On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <nivano...@gmail.com>
> wrote:
>
> > Very good points indeed. I get the compression in Ignite question quite
> > often and Hana reference is a typical lead in.
> >
> > My personal opinion is still that in Ignite *specifically* the
> compression
> > is best left to the end-user. But we may need to provide a better
> facility
> > to inject user's logic here...
> >
> > --
> > Nikita Ivanov
> >
> >
> > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <andrewkor...@hotmail.com
> >
> > wrote:
> >
> > > Dictionary compression requires some knowledge about data being
> > > compressed. For example, for numeric types a range of values must be
> > known
> > > so that the dictionary can be generated. For strings, the number of
> > unique
> > > values of the column is the key piece of input into the dictionary
> > > generation.
> > > SAP HANA is a column-based database system: it stores the fields of the
> > > data tuple individually using the best compression for the given data
> > type
> > > and the particular set of values. HANA has been specifically built as a
> > > general purpose database, rather than as an afterthought layer on top
> of
> > an
> > > already existing distributed cache.
> > > On the other hand, Ignite is a distributed cache implementation (a
> pretty
> > > good one!) that in general requires no schema and stores its data in
> the
> > > row-based fashion. Its current design doesn't land itself readily to
> the
> > > kind of optimizations HANA provides out of the box.
> > > For the curios types among us, the implementation details of HANA are
> > well
> > > documented in "In-memory Data Management", by Hasso Plattner &
> Alexander
> > > Zeier.
> > > Cheers
> > > Andrey
> > > _____________________________
> > > From: Alexey Kuznetsov <akuznet...@gridgain.com<mailto:
> > > akuznet...@gridgain.com>>
> > > Sent: Tuesday, July 26, 2016 5:36 AM
> > > Subject: Re: Data compression in Ignite 2.0
> > > To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>>
> > >
> > >
> > > Sergey Kozlov wrote:
> > > >> For approach 1: Put a large object into a partition cache will
> > > force to update
> > > the dictionary placed on replication cache. It may be time-expense
> > > operation.
> > > The dictionary will be built only once. And we could control what
> should
> > be
> > > put into dictionary, for example, we could check min and max size and
> > > decide - put value to dictionary or not.
> > >
> > > >> Approach 2-3 are make sense for rare cases as Sergi commented.
> > > But it is better at least have a possibility to plug user code for
> > > compression than not to have it at all.
> > >
> > > >> Also I see a danger of OOM if we've got high compression level and
> try
> > > to restore original value in memory.
> > > We could easily get OOM with many other operations right now without
> > > compression, I think it is not an issue, we could add a NOTE to
> > > documentation about such possibility.
> > >
> > > Andrey Kornev wrote:
> > > >> ... in general I think compression is a great data. The cleanest way
> > to
> > > achieve that would be to just make it possible to chain the
> > marshallers...
> > > I think it is also good idea. And looks like it could be used for
> > > compression with some sort of ZIP algorithm, but how to deal with
> > > compression by dictionary substitution?
> > > We need to build dictionary first. Any ideas?
> > >
> > > Nikita Ivanov wrote:
> > > >> SAP Hana does the compression by 1) compressing SQL parameters
> before
> > > execution...
> > > Looks interesting, but my initial point was about compression of cache
> > > data, not SQL queries.
> > > My idea was to make compression transparent for SQL engine when it will
> > > lookup for data.
> > >
> > > But idea of compressing SQL queries result looks very interesting,
> > because
> > > it is known fact, that SQL engine could consume quite a lot of heap for
> > > storing result sets.
> > > I think this should be discussed in separate thread.
> > >
> > > Just for you information, in first message I mentioned that DB2 has
> > > compression by dictionary and according to them it is possible to
> > > compress usual data to 50-80%.
> > > I have some experience with DB2 and can confirm this.
> > >
> > > --
> > > Alexey Kuznetsov
> >
>
>
> --
> Alexey Kuznetsov
>

Re: Data compression in Ignite 2.0

Reply via email to