Re: Data compression design proposal

Vladimir Ozerov Mon, 26 Mar 2018 08:44:02 -0700

Gents,

If I understood the idea correctly, the proposal is to compress pages on
eviction and decompress them on read from disk. Is it correct?


On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <[email protected]> wrote:

> + 1 to Taras's vision.
>
> Compression on eviction is a good case to store more.
> Pages at memory always hot a real system, so complession in memory will
> definetely slowdown the system, I think.
>
> Anyway, we can split issue to "on eviction compression" and to "in-memory
> compression".
>
>
> 2018-03-06 12:14 GMT+03:00 Taras Ledkov <[email protected]>:
>
> > Hi,
> >
> > I guess page level compression make sense on page loading / eviction.
> > In this case we can decrease I/O operation and performance boost can be
> > reached.
> > What is goal for in-memory compression? Holds about 2-5x data in memory
> > with performance drop?
> >
> > Also please clarify the case with compression/decompression for hot and
> > cold pages.
> > Is it right for your approach:
> > 1. Hot pages are always decompressed in memory because many read/write
> > operations touch ones.
> > 2. So we can compress only cold pages.
> >
> > So the way is suitable when the hot data size << available RAM size.
> >
> > Thoughts?
> >
> >
> > On 05.03.2018 20:18, Vyacheslav Daradur wrote:
> >
> >> Hi Igniters!
> >>
> >> I’d like to do next step in our data compression discussion [1].
> >>
> >> Most Igniters vote for per-data-page compression.
> >>
> >> I’d like to accumulate  main theses to start implementation:
> >> - page will be compressed with the dictionary-based approach (e.g.LZV)
> >> - page will be compressed in batch mode (not on every change)
> >> - page compression should been initiated by an event, for example, a
> >> page’s free space drops below 20%
> >> - compression process will be under page write lock
> >>
> >> Vladimir Ozerov has written:
> >>
> >>> What we do not understand yet:
> >>>> 1) Granularity of compression algorithm.
> >>>> 1.1) It could be per-entry - i.e. we compress the whole entry content,
> >>>> but
> >>>> respect boundaries between entries. E.g.: before - [ENTRY_1][ENTRY_2],
> >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to
> >>>> [COMPRESSED ENTRY_1 and ENTRY_2]).
> >>>> v1.2) Or it could be per-field - i.e. we compress fields, but respect
> >>>> binary
> >>>> object layout. First approach is simple, straightforward, and will
> give
> >>>> acceptable compression rate, but we will have to compress the whole
> >>>> binary
> >>>> object on every field access, what may ruin our SQL performance.
> Second
> >>>> approach is more complex, we are not sure about it's compression rate,
> >>>> but
> >>>> as BinaryObject structure is preserved, we will still have fast
> >>>> constant-time per-field access.
> >>>>
> >>> I think there are advantages in both approaches and we will be able to
> >> compare different approaches and algorithms after prototype
> >> implementation.
> >>
> >> Main approach in brief:
> >> 1) When page’s free space drops below 20% will be triggered compression
> >> event
> >> 2) Page will be locked by write lock
> >> 3) Page will be passed to page’s compressor implementation
> >> 4) Page will be replaced by compressed page
> >>
> >> Whole object or a field reading:
> >> 1) If page marked as compressed then the page will be handled by
> >> page’s compressor implementation, otherwise, it will be handled as
> >> usual.
> >>
> >> Thoughts?
> >>
> >> Should we create new IEP and register tickets to start implementation?
> >> This will allow us to watch for the feature progress and related
> >> tasks.
> >>
> >>
> >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-
> >> compression-in-Ignite-tc20679.html
> >>
> >>
> >>
> > --
> > Taras Ledkov
> > Mail-To: [email protected]
> >
> >
>

Re: Data compression design proposal

Reply via email to