Hello!

The compression is per-binary-object, but dictionary is external, shared
between multiple (millions of) entries and stored alongside compressed data.

Regards,
-- 
Ilya Kasnacheev


вт, 4 сент. 2018 г. в 2:40, Dmitriy Setrakyan <dsetrak...@apache.org>:

> Hi Ilya,
>
> This is very useful. Is the compression going to be per-page, in which case
> the dictionary is going to be kept inside of a page? Or do you have some
> other design in mind?
>
> D.
>
> On Mon, Sep 3, 2018 at 10:36 AM, Ilya Kasnacheev <
> ilya.kasnach...@gmail.com>
> wrote:
>
> > Hello again!
> >
> > I've been running various compression parameters through cod dataset.
> >
> > It looks like the best compression level in terms of speed is either 1 or
> > 2.
> > The default for Zstd seems to be 3 which would almost always perform
> worse.
> > For best performance a dictionary of 1024 is optimal, for better
> > compression
> > one might choose larger dictionaries, 6k looks good but I will also run a
> > few benchmarks on larger dicts. Unfortunately, Zstd crashes if sample
> size
> > is set to more than 16k entries (I guess I should probe the max buffer
> size
> > where problems begin).
> >
> > I'm attaching two charts which show what's we've got. Compression rate
> is a
> > fraction of original records size. Time to run is wall clock time the
> test
> > run. Reasonable compression will increase the run time twofold (of a
> > program
> > that only does text record parsing -> creates objects -> binarylizes them
> > ->
> > compresses -> decompresses). Notation: s{number of bin objects used to
> > train}-d{dictionary length in bytes}-l{compression level}.
> > <http://apache-ignite-developers.2346864.n4.nabble.
> > com/file/t374/chart1.png>
> > Second one is basically a zoom in on the first.
> > <http://apache-ignite-developers.2346864.n4.nabble.
> > com/file/t374/chart2.png>
> > I think that in additional to dictionary compression we should have
> > dictionary-less compression. On typical data of small records it shows
> > compression rate of 0.8 ~ 0.65, but I can imagine that with larger
> > unstructured records it can be as good as dict-based and much less of a
> > hassle dictionary-processing-wise. WDYT?
> > Sorry for the fine prints. I hope my charts will visible.
> >
> > You can see the updated code as pull request:
> > https://github.com/apache/ignite/pull/4673
> >
> > Regards,
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>

Reply via email to