Very good points indeed. I get the compression in Ignite question quite
often and Hana reference is a typical lead in.

My personal opinion is still that in Ignite *specifically* the compression
is best left to the end-user. But we may need to provide a better facility
to inject user's logic here...

--
Nikita Ivanov


On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <andrewkor...@hotmail.com>
wrote:

> Dictionary compression requires some knowledge about data being
> compressed. For example, for numeric types a range of values must be known
> so that the dictionary can be generated. For strings, the number of unique
> values of the column is the key piece of input into the dictionary
> generation.
> SAP HANA is a column-based database system: it stores the fields of the
> data tuple individually using the best compression for the given data type
> and the particular set of values. HANA has been specifically built as a
> general purpose database, rather than as an afterthought layer on top of an
> already existing distributed cache.
> On the other hand, Ignite is a distributed cache implementation (a pretty
> good one!) that in general requires no schema and stores its data in the
> row-based fashion. Its current design doesn't land itself readily to the
> kind of optimizations HANA provides out of the box.
> For the curios types among us, the implementation details of HANA are well
> documented in "In-memory Data Management", by Hasso Plattner & Alexander
> Zeier.
> Cheers
> Andrey
> _____________________________
> From: Alexey Kuznetsov <akuznet...@gridgain.com<mailto:
> akuznet...@gridgain.com>>
> Sent: Tuesday, July 26, 2016 5:36 AM
> Subject: Re: Data compression in Ignite 2.0
> To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>>
>
>
> Sergey Kozlov wrote:
> >> For approach 1: Put a large object into a partition cache will
> force to update
> the dictionary placed on replication cache. It may be time-expense
> operation.
> The dictionary will be built only once. And we could control what should be
> put into dictionary, for example, we could check min and max size and
> decide - put value to dictionary or not.
>
> >> Approach 2-3 are make sense for rare cases as Sergi commented.
> But it is better at least have a possibility to plug user code for
> compression than not to have it at all.
>
> >> Also I see a danger of OOM if we've got high compression level and try
> to restore original value in memory.
> We could easily get OOM with many other operations right now without
> compression, I think it is not an issue, we could add a NOTE to
> documentation about such possibility.
>
> Andrey Kornev wrote:
> >> ... in general I think compression is a great data. The cleanest way to
> achieve that would be to just make it possible to chain the marshallers...
> I think it is also good idea. And looks like it could be used for
> compression with some sort of ZIP algorithm, but how to deal with
> compression by dictionary substitution?
> We need to build dictionary first. Any ideas?
>
> Nikita Ivanov wrote:
> >> SAP Hana does the compression by 1) compressing SQL parameters before
> execution...
> Looks interesting, but my initial point was about compression of cache
> data, not SQL queries.
> My idea was to make compression transparent for SQL engine when it will
> lookup for data.
>
> But idea of compressing SQL queries result looks very interesting, because
> it is known fact, that SQL engine could consume quite a lot of heap for
> storing result sets.
> I think this should be discussed in separate thread.
>
> Just for you information, in first message I mentioned that DB2 has
> compression by dictionary and according to them it is possible to
> compress usual data to 50-80%.
> I have some experience with DB2 and can confirm this.
>
> --
> Alexey Kuznetsov
>
>
>

Reply via email to