Re: Data compression in Ignite 2.0

Andrey Kornev Mon, 25 Jul 2016 17:53:46 -0700

I'm guessing the suggestion here is to use the compressed form directly for 
WHERE clause evaluation. If that's the case I think there are a couple of 
issues:


1) the LIKE predicate.

2) predicates other than equality (for example, <, >, etc.)


But since Ignite isn't just about SQL queries (surprisingly some people still 
use it just as distributed cache!), in general I think compression is a great 
data. The cleanest way to achieve that would be to just make it possible to 
chain the marshallers. It is possible to do it already without any Ignite code 
changes, but unfortunately it would force people to use the non-public 
BinaryMarshaller class directly (as the first element of the chain).


Cheers

Andrey

________________________________
From: Dmitriy Setrakyan <dsetrak...@apache.org>
Sent: Monday, July 25, 2016 1:53 PM
To: dev@ignite.apache.org
Subject: Re: Data compression in Ignite 2.0

Nikita, this sounds like a pretty elegant approach.

Does anyone in the community see a problem with this design?

On Mon, Jul 25, 2016 at 4:49 PM, Nikita Ivanov <nivano...@gmail.com> wrote:

> SAP Hana does the compression by 1) compressing SQL parameters before
> execution, and 2) storing only compressed data in memory. This way all SQL
> queries work as normal with zero modifications or performance overhead.
> Only results of the query can be (optionally) decompressed back before
> returning to the user.
>
> --
> Nikita Ivanov
>
>
> On Mon, Jul 25, 2016 at 1:40 PM, Sergey Kozlov <skoz...@gridgain.com>
> wrote:
>
> > Hi
> >
> > For approach 1: Put a large object into a partiton cache will force to
> > update the dictionary placed on replication cache. It seeis it may be
> > time-expense operation.
> > Appoach 2-3 are make sense for rare cases as Sergi commented.
> > Aslo I see a danger of OOM if we've got high compression level and try to
> > restore orginal value in memory.
> >
> > On Mon, Jul 25, 2016 at 10:39 AM, Alexey Kuznetsov <
> > akuznet...@gridgain.com>
> > wrote:
> >
> > > Sergi,
> > >
> > > Of course it will introduce some slowdown, but with compression more
> data
> > > could be stored in memory
> > > and not will be evicted to disk. In case of compress by dictionary
> > > substitution it will be only one more lookup
> > > and should be fast.
> > >
> > > In general we could provide only API for compression out of the box,
> and
> > > users that really need some sort of compression
> > > will implement it by them self. This will not require much effort I
> > think.
> > >
> > >
> > >
> > > On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin <
> > sergi.vlady...@gmail.com>
> > > wrote:
> > >
> > > > This will make sense only for rare cases when you have very large
> > objects
> > > > stored, which can be effectively compressed. And even then it will
> > > > introduce slowdown on all the operations, which often will not be
> > > > acceptable. I guess only few users will find this feature useful,
> thus
> > I
> > > > think it does not worth the effort.
> > > >
> > > > Sergi
> > > >
> > > >
> > > >
> > > > 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <akuznet...@gridgain.com
> >:
> > > >
> > > > > Hi, All!
> > > > >
> > > > > I would like to propose one more feature for Ignite 2.0.
> > > > >
> > > > > Data compression for data in binary format.
> > > > >
> > > > > Binary format is stored as field name + field data.
> > > > > So we have a description.
> > > > > How about to add one more byte to binary data descriptor:
> > > > >
> > > > > *Compressed*:
> > > > >  0 - Data stored as is (no compression).
> > > > >  1 - Data compressed by dictionary (something like DB2 row
> > compression
> > > > [1],
> > > > >  but for all binary types). We could have system or user defined
> > > > replicated
> > > > > cache for such dictionary and *cache.compact()* method that will
> scan
> > > > > cache, build dictionary and compact data.
> > > > >  2 - Data compressed by Java built in ZIP.
> > > > >  3 - Data compressed by some user custom algorithm.
> > > > >
> > > > > Of course it is possible to compress data in current Ignite 1.x but
> > in
> > > > this
> > > > > case compressed data cannot be accessed from SQL engine, if we
> > > implement
> > > > > support for compression on Ignite core level SQL engine will be
> able
> > to
> > > > > detect that data is compressed and properly handle such data.
> > > > >
> > > > > What do you think?
> > > > > If community consider this feature useful I will create issue in
> > JIRA.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/
Optimize storage with deep compression in DB2 
10<http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/>
www.ibm.com
Thomas Fanghaenel has been with IBM for nine years and is a Senior Software 
Engineer with the DB2 Data Warehouse Storage and Indexing ...


> > > > >
> > > > > --
> > > > > Alexey Kuznetsov
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Alexey Kuznetsov
> > >
> >
> >
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com<http://www.gridgain.com>
> >
>

Re: Data compression in Ignite 2.0

Reply via email to