Re: Compression prototype

2018-09-05 Thread Ilya Kasnacheev
Hello! Of course, this setting will be configurable. Regards, -- Ilya Kasnacheev ср, 5 сент. 2018 г. в 3:21, Dmitriy Setrakyan : > In my view, dictionary of 1024 bytes is not going to be nearly enough. > > On Tue, Sep 4, 2018 at 8:06 AM, Ilya Kasnacheev > > wrote: > > > Hello! > > > > In cas

Re: Compression prototype

2018-09-04 Thread Dmitriy Setrakyan
In my view, dictionary of 1024 bytes is not going to be nearly enough. On Tue, Sep 4, 2018 at 8:06 AM, Ilya Kasnacheev wrote: > Hello! > > In case of Apache Ignite, most of savings is due to BinaryObject format, > which encodes types and fields with byte sequences. Any enum/string flags > will a

Re: Compression prototype

2018-09-04 Thread Ilya Kasnacheev
Hello! In case of Apache Ignite, most of savings is due to BinaryObject format, which encodes types and fields with byte sequences. Any enum/string flags will also get in dictionary. And then as it processes a record it fills up its individual dictionary. But, in one cache, most if not all entrie

Re: Compression prototype

2018-09-04 Thread Dmitriy Setrakyan
On Tue, Sep 4, 2018 at 2:55 AM, Ilya Kasnacheev wrote: > Hello! > > Each node has a local dictionary (per node currently, per cache planned). > Dictionary is never shared between nodes. As data patterns shift, > dictionary rotation is also planned. > > With Zstd, the best dictionary size seems to

Re: Compression prototype

2018-09-04 Thread Ilya Kasnacheev
Hello! Each node has a local dictionary (per node currently, per cache planned). Dictionary is never shared between nodes. As data patterns shift, dictionary rotation is also planned. With Zstd, the best dictionary size seems to be 1024 bytes. I imagine It is enough to store common BinaryObject b

Re: Compression prototype

2018-09-04 Thread Dmitriy Setrakyan
On Tue, Sep 4, 2018 at 1:16 AM, Ilya Kasnacheev wrote: > Hello! > > The compression is per-binary-object, but dictionary is external, shared > between multiple (millions of) entries and stored alongside compressed > data. > I was under a different impression. If the dictionary is for the whole d

Re: Compression prototype

2018-09-04 Thread Ilya Kasnacheev
Hello! The compression is per-binary-object, but dictionary is external, shared between multiple (millions of) entries and stored alongside compressed data. Regards, -- Ilya Kasnacheev вт, 4 сент. 2018 г. в 2:40, Dmitriy Setrakyan : > Hi Ilya, > > This is very useful. Is the compression going

Re: Compression prototype

2018-09-03 Thread Dmitriy Setrakyan
Hi Ilya, This is very useful. Is the compression going to be per-page, in which case the dictionary is going to be kept inside of a page? Or do you have some other design in mind? D. On Mon, Sep 3, 2018 at 10:36 AM, Ilya Kasnacheev wrote: > Hello again! > > I've been running various compressio

Re: Compression prototype

2018-09-03 Thread Ilya Kasnacheev
Hello again! I've been running various compression parameters through cod dataset. It looks like the best compression level in terms of speed is either 1 or 2. The default for Zstd seems to be 3 which would almost always perform worse. For best performance a dictionary of 1024 is optimal, for bet

Re: Compression prototype

2018-08-31 Thread Ilya Kasnacheev
Just as I have started praising Zstd, it began to show JVM crashes in native code in train dict :( I guess it has limits to train buffer, after which errorneous behaviour is exhibited. Maybe we will need to submit a pull request:) Regards, -- Ilya Kasnacheev пт, 31 авг. 2018 г. в 11:56, Ilya K

Re: Compression prototype

2018-08-31 Thread Ilya Kasnacheev
Hello! I am testing Zstd with dictionary, and it looks very very promising. I'm confident I can choose settings where it is faster than my own algo while bringing better compression ratio, on "cod" dataset. So I am happliy retiring my code and switching to Zstd. Would probably mean that we will s

Re: Compression prototype

2018-08-28 Thread Ilya Kasnacheev
Hello! Yes, we can tinker with BinaryObject format, which is currently clearly excessive. But the best part with compression, it will automatically remove this redundancy for us, for free. Even if we had hairy XML as binary object format, it will still compress roughly to the same number of bytes

Re: Compression prototype

2018-08-28 Thread Vyacheslav Daradur
I have another suggestion which may help us reduce objects size extremely - implementing some kind of SQL Scheme. For now, BinaryObject's format is too excessive - each serialized object stores offset of every serialized field even if the offset can be easily calculated. If we move this metadata

Re: Compression prototype

2018-08-27 Thread Vyacheslav Daradur
According to my benchmarks - zstd compression algorithm [1] looks very interesting, it has a high compression ratio with quite good speed. AFAIK it supports external dictionaries, but I'm not sure about using it with "on the fly building" dictionaries. Anyway, have look at (it has ASF 2.0 friendly

Re: Compression prototype

2018-08-27 Thread Ilya Kasnacheev
Hello Vyacheslav! Unfortunately I have not found any efficient algorithms that will allow me to use external dictionary as a pre-processed data structure. If plain gzip is used without dictionary, the compression is around 0.7, as opposed to 0.4 that I will get with custom implementation, AFAIR th

Re: Compression prototype

2018-08-27 Thread Vyacheslav Daradur
Hi Igniters! Ilya, I'm glad to see one more person who is interested in the compression feature in Ignite. I looked through the pull request and want to share following thoughts: It's very dangerous using a custom algorithm in this way - you store serialized data separate from a dictionary and t

Re: Compression prototype

2018-08-24 Thread Denis Magda
> > Currently, the dictionary for decompression is only stored on heap. After > restart there's compressed data in the PDS, but there's no dictionary :) Basically, it means that I've lost my data, right? How about persisting data to disk. Overall, we need Vladimir Ozerov to check the contributio

Re: Compression prototype

2018-08-24 Thread Ilya Kasnacheev
Hello! It is somewhat a part of IEP-20, since I have updated it with this particular direction. Regards, -- Ilya Kasnacheev 2018-08-24 2:56 GMT+03:00 Denis Magda : > Hi Ilya, > > Sounds terrific! Is this part of the following Ignite enhancement proposal? > https://cwiki.apache.org/confluence/

Re: Compression prototype

2018-08-23 Thread Denis Magda
Hi Ilya, Sounds terrific! Is this part of the following Ignite enhancement proposal? https://cwiki.apache.org/confluence/display/IGNITE/IEP-20%3A+Data+Compression+in+Ignite -- Denis On Thu, Aug 23, 2018 at 5:17 AM Ilya Kasnacheev wrote: > Hello! > > My plan was to add a compression section to

Re: Compression prototype

2018-08-23 Thread Ilya Kasnacheev
Hello! My plan was to add a compression section to cache configuration, where you can enable compression, enable key compression (which has heavier performance implications), adjust dictionary gathering settings, and in the future possibly choose betwen algorithms. In fact I'm not sure, since my a

Re: Compression prototype

2018-08-23 Thread Dmitriy Pavlov
Ok, thanks. IMO we need to store the dictionary in Durable memory before merging into master. чт, 23 авг. 2018 г. в 15:12, Ilya Kasnacheev : > Hello! > > Currently, the dictionary for decompression is only stored on heap. After > restart there's compressed data in the PDS, but there's no dictiona

Re: Compression prototype

2018-08-23 Thread Sergey Kozlov
Hi Ilya Is there a plan to introduce it as an option of Ignite configuration? In that instead the boolean type I suggest to use the enum and reserve the ability to extend compressions algorithms in future On Thu, Aug 23, 2018 at 1:09 PM, Ilya Kasnacheev wrote: > Hello! > > I want to share with

Re: Compression prototype

2018-08-23 Thread Ilya Kasnacheev
Hello! Currently, the dictionary for decompression is only stored on heap. After restart there's compressed data in the PDS, but there's no dictionary :) Regards, -- Ilya Kasnacheev 2018-08-23 14:58 GMT+03:00 Dmitriy Pavlov : > Hi Ilya, > > Thank you for sharing this here. I believe this cont

Re: Compression prototype

2018-08-23 Thread Dmitriy Pavlov
Hi Ilya, Thank you for sharing this here. I believe this contribution will be accepted by the Community. Moreover, it shows so remarkable performance boost. I'm pretty sure this patch will be reviewed by Ignite Native Persistence experts soon. What do you mean by can't survive PDS node restart?