Re: Data compression in Ignite 2.0

Sergi Vladykin Wed, 27 Jul 2016 00:37:32 -0700

Nikita,

I agree with Andrey, HANA is a bad comparison to Ignite in this respect. I
did not find any evidence on the internet that their row store is very
efficient with compression. It was always about column store.


Alexey,

As for DB2, can you check what exactly, when and how it compresses and does
it give any decent results before suggesting it as an example to follow? I
don't think it is good idea to repeat every bad idea after other products.

And even if there are good results in DB2, will this all be applicable to
Ignite? PostgreSql for example provides TOAST compression and this can be
useful when used in a smart way, but this is a very different architecture
from what we have.

All in all I agree that may be we should provide some kind of pluggable
compression SPI support, but do not expect much from it, usually it will be
just useless.

Sergi



2016-07-27 10:16 GMT+03:00 Sebastien DIAZ <sebastien.d...@gmail.com>:

> Hi
>
> I add Redis as a sample of memory compression strategy
>
> http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/
>
> http://redis.io/topics/memory-optimization
>
> Regards
>
> S DIAZ
>
>
>
> 2016-07-27 8:24 GMT+02:00 Alexey Kuznetsov <akuznet...@gridgain.com>:
>
> > Nikita,
> >
> > That was my intention: "we may need to provide a better facility to
> inject
> > user's logic here..."
> >
> > Andrey,
> > About compression, once again - DB2 is a row-based DB and they can
> compress
> > :)
> >
> > On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <nivano...@gmail.com>
> > wrote:
> >
> > > Very good points indeed. I get the compression in Ignite question quite
> > > often and Hana reference is a typical lead in.
> > >
> > > My personal opinion is still that in Ignite *specifically* the
> > compression
> > > is best left to the end-user. But we may need to provide a better
> > facility
> > > to inject user's logic here...
> > >
> > > --
> > > Nikita Ivanov
> > >
> > >
> > > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <
> andrewkor...@hotmail.com
> > >
> > > wrote:
> > >
> > > > Dictionary compression requires some knowledge about data being
> > > > compressed. For example, for numeric types a range of values must be
> > > known
> > > > so that the dictionary can be generated. For strings, the number of
> > > unique
> > > > values of the column is the key piece of input into the dictionary
> > > > generation.
> > > > SAP HANA is a column-based database system: it stores the fields of
> the
> > > > data tuple individually using the best compression for the given data
> > > type
> > > > and the particular set of values. HANA has been specifically built
> as a
> > > > general purpose database, rather than as an afterthought layer on top
> > of
> > > an
> > > > already existing distributed cache.
> > > > On the other hand, Ignite is a distributed cache implementation (a
> > pretty
> > > > good one!) that in general requires no schema and stores its data in
> > the
> > > > row-based fashion. Its current design doesn't land itself readily to
> > the
> > > > kind of optimizations HANA provides out of the box.
> > > > For the curios types among us, the implementation details of HANA are
> > > well
> > > > documented in "In-memory Data Management", by Hasso Plattner &
> > Alexander
> > > > Zeier.
> > > > Cheers
> > > > Andrey
> > > > _____________________________
> > > > From: Alexey Kuznetsov <akuznet...@gridgain.com<mailto:
> > > > akuznet...@gridgain.com>>
> > > > Sent: Tuesday, July 26, 2016 5:36 AM
> > > > Subject: Re: Data compression in Ignite 2.0
> > > > To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>>
> > > >
> > > >
> > > > Sergey Kozlov wrote:
> > > > >> For approach 1: Put a large object into a partition cache will
> > > > force to update
> > > > the dictionary placed on replication cache. It may be time-expense
> > > > operation.
> > > > The dictionary will be built only once. And we could control what
> > should
> > > be
> > > > put into dictionary, for example, we could check min and max size and
> > > > decide - put value to dictionary or not.
> > > >
> > > > >> Approach 2-3 are make sense for rare cases as Sergi commented.
> > > > But it is better at least have a possibility to plug user code for
> > > > compression than not to have it at all.
> > > >
> > > > >> Also I see a danger of OOM if we've got high compression level and
> > try
> > > > to restore original value in memory.
> > > > We could easily get OOM with many other operations right now without
> > > > compression, I think it is not an issue, we could add a NOTE to
> > > > documentation about such possibility.
> > > >
> > > > Andrey Kornev wrote:
> > > > >> ... in general I think compression is a great data. The cleanest
> way
> > > to
> > > > achieve that would be to just make it possible to chain the
> > > marshallers...
> > > > I think it is also good idea. And looks like it could be used for
> > > > compression with some sort of ZIP algorithm, but how to deal with
> > > > compression by dictionary substitution?
> > > > We need to build dictionary first. Any ideas?
> > > >
> > > > Nikita Ivanov wrote:
> > > > >> SAP Hana does the compression by 1) compressing SQL parameters
> > before
> > > > execution...
> > > > Looks interesting, but my initial point was about compression of
> cache
> > > > data, not SQL queries.
> > > > My idea was to make compression transparent for SQL engine when it
> will
> > > > lookup for data.
> > > >
> > > > But idea of compressing SQL queries result looks very interesting,
> > > because
> > > > it is known fact, that SQL engine could consume quite a lot of heap
> for
> > > > storing result sets.
> > > > I think this should be discussed in separate thread.
> > > >
> > > > Just for you information, in first message I mentioned that DB2 has
> > > > compression by dictionary and according to them it is possible to
> > > > compress usual data to 50-80%.
> > > > I have some experience with DB2 and can confirm this.
> > > >
> > > > --
> > > > Alexey Kuznetsov
> > >
> >
> >
> > --
> > Alexey Kuznetsov
> >
>

Re: Data compression in Ignite 2.0

Reply via email to