Hello! The compression is per-binary-object, but dictionary is external, shared between multiple (millions of) entries and stored alongside compressed data.
Regards, -- Ilya Kasnacheev вт, 4 сент. 2018 г. в 2:40, Dmitriy Setrakyan <dsetrak...@apache.org>: > Hi Ilya, > > This is very useful. Is the compression going to be per-page, in which case > the dictionary is going to be kept inside of a page? Or do you have some > other design in mind? > > D. > > On Mon, Sep 3, 2018 at 10:36 AM, Ilya Kasnacheev < > ilya.kasnach...@gmail.com> > wrote: > > > Hello again! > > > > I've been running various compression parameters through cod dataset. > > > > It looks like the best compression level in terms of speed is either 1 or > > 2. > > The default for Zstd seems to be 3 which would almost always perform > worse. > > For best performance a dictionary of 1024 is optimal, for better > > compression > > one might choose larger dictionaries, 6k looks good but I will also run a > > few benchmarks on larger dicts. Unfortunately, Zstd crashes if sample > size > > is set to more than 16k entries (I guess I should probe the max buffer > size > > where problems begin). > > > > I'm attaching two charts which show what's we've got. Compression rate > is a > > fraction of original records size. Time to run is wall clock time the > test > > run. Reasonable compression will increase the run time twofold (of a > > program > > that only does text record parsing -> creates objects -> binarylizes them > > -> > > compresses -> decompresses). Notation: s{number of bin objects used to > > train}-d{dictionary length in bytes}-l{compression level}. > > <http://apache-ignite-developers.2346864.n4.nabble. > > com/file/t374/chart1.png> > > Second one is basically a zoom in on the first. > > <http://apache-ignite-developers.2346864.n4.nabble. > > com/file/t374/chart2.png> > > I think that in additional to dictionary compression we should have > > dictionary-less compression. On typical data of small records it shows > > compression rate of 0.8 ~ 0.65, but I can imagine that with larger > > unstructured records it can be as good as dict-based and much less of a > > hassle dictionary-processing-wise. WDYT? > > Sorry for the fine prints. I hope my charts will visible. > > > > You can see the updated code as pull request: > > https://github.com/apache/ignite/pull/4673 > > > > Regards, > > > > > > > > -- > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ > > >