Hello!

Each node has a local dictionary (per node currently, per cache planned).
Dictionary is never shared between nodes. As data patterns shift,
dictionary rotation is also planned.

With Zstd, the best dictionary size seems to be 1024 bytes. I imagine It is
enough to store common BinaryObject boilerplate, and everything else is
compressed on the fly. The source sample is 16k records.

Regards,
-- 
Ilya Kasnacheev


вт, 4 сент. 2018 г. в 11:49, Dmitriy Setrakyan <dsetrak...@apache.org>:

> On Tue, Sep 4, 2018 at 1:16 AM, Ilya Kasnacheev <ilya.kasnach...@gmail.com
> >
> wrote:
>
> > Hello!
> >
> > The compression is per-binary-object, but dictionary is external, shared
> > between multiple (millions of) entries and stored alongside compressed
> > data.
> >
>
> I was under a different impression. If the dictionary is for the whole data
> set, then it will occupy megabytes (if not gigabytes) of data. What happens
> when a new node joins and has no idea about the dictionary? What happens
> when dictionary between nodes get out-of-sync?
>
> D.
>

Reply via email to