Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

Wes McKinney Thu, 26 Mar 2020 10:26:23 -0700

I'll run a grid of batch sizes (from 1024 to 64K or 128K) and let you
know the read/write times and compression ratios. Shouldn't take too
long


On Wed, Mar 25, 2020 at 10:37 PM Fan Liya <[email protected]> wrote:
>
> Thanks a lot for sharing the good results.
>
> As investigated by Wes, we have existing zstd library for Java (zstd-jni) 
> [1], and lz4 library for Java (lz4-java) [2].
> +1 for the 1024 batch size, as it represents an important scenario where the 
> batch fits into the L1 cache (IMO).
>
> Best,
> Liya Fan
>
> [1] https://github.com/luben/zstd-jni
> [2] https://github.com/lz4/lz4-java
>
> On Thu, Mar 26, 2020 at 2:38 AM Micah Kornfield <[email protected]> wrote:
>>
>> If it isn't hard could you run with batch sizes of 1024 or 2048 records?  I
>> think there was a question previously raised if there was benefit for
>> smaller sizes buffers.
>>
>> Thanks,
>> Micah
>>
>>
>> On Wed, Mar 25, 2020 at 8:59 AM Wes McKinney <[email protected]> wrote:
>>
>> > On Tue, Mar 24, 2020 at 9:22 PM Micah Kornfield <[email protected]>
>> > wrote:
>> > >
>> > > >
>> > > > Compression ratios ranging from ~50% with LZ4 and ~75% with ZSTD on
>> > > > the Taxi dataset to ~87% with LZ4 and ~90% with ZSTD on the Fannie Mae
>> > > > dataset. So that's a huge space savings
>> > >
>> > > One more question on this.  What was the average row-batch size used?  I
>> > > see in the proposal some buffers might not be compressed, did you this
>> > > feature in the test?
>> >
>> > I used 64K row batch size. I haven't implemented the optional
>> > non-compressed buffers (for cases where there is little space savings)
>> > so everything is compressed. I can check different batch sizes if you
>> > like
>> >
>> >
>> > > On Mon, Mar 23, 2020 at 4:40 PM Wes McKinney <[email protected]>
>> > wrote:
>> > >
>> > > > hi folks,
>> > > >
>> > > > Sorry it's taken me a little while to produce supporting benchmarks.
>> > > >
>> > > > * I implemented experimental trivial body buffer compression in
>> > > > https://github.com/apache/arrow/pull/6638
>> > > > * I hooked up the Arrow IPC file format with compression as the new
>> > > > Feather V2 format in
>> > > > https://github.com/apache/arrow/pull/6694#issuecomment-602906476
>> > > >
>> > > > I tested a couple of real-world datasets from a prior blog post
>> > > > https://ursalabs.org/blog/2019-10-columnar-perf/ with ZSTD and LZ4
>> > > > codecs
>> > > >
>> > > > The complete results are here
>> > > > https://github.com/apache/arrow/pull/6694#issuecomment-602906476
>> > > >
>> > > > Summary:
>> > > >
>> > > > * Compression ratios ranging from ~50% with LZ4 and ~75% with ZSTD on
>> > > > the Taxi dataset to ~87% with LZ4 and ~90% with ZSTD on the Fannie Mae
>> > > > dataset. So that's a huge space savings
>> > > > * Single-threaded decompression times exceeding 2-4GByte/s with LZ4
>> > > > and 1.2-3GByte/s with ZSTD
>> > > >
>> > > > I would have to do some more engineering to test throughput changes
>> > > > with Flight, but given these results on slower networking (e.g. 1
>> > > > Gigabit) my guess is that the compression and decompression overhead
>> > > > is little compared with the time savings due to high compression
>> > > > ratios. If people would like to see these numbers to help make a
>> > > > decision I can take a closer look
>> > > >
>> > > > As far as what Micah said about having a limited number of
>> > > > compressors: I would be in favor of having just LZ4 and ZSTD. It seems
>> > > > anecdotally that these outperform Snappy in most real world scenarios
>> > > > and generally have > 1 GB/s decompression performance. Some Linux
>> > > > distributions (Arch at least) have already started adopting ZSTD over
>> > > > LZMA or GZIP [1]
>> > > >
>> > > > - Wes
>> > > >
>> > > > [1]:
>> > > >
>> > https://www.archlinux.org/news/now-using-zstandard-instead-of-xz-for-package-compression/
>> > > >
>> > > > On Fri, Mar 6, 2020 at 8:42 AM Fan Liya <[email protected]> wrote:
>> > > > >
>> > > > > Hi Wes,
>> > > > >
>> > > > > Thanks a lot for the additional information.
>> > > > > Looking forward to see the good results from your experiments.
>> > > > >
>> > > > > Best,
>> > > > > Liya Fan
>> > > > >
>> > > > > On Thu, Mar 5, 2020 at 11:42 PM Wes McKinney <[email protected]>
>> > > > wrote:
>> > > > >
>> > > > > > I see, thank you.
>> > > > > >
>> > > > > > For such a scenario, implementations would need to define a
>> > > > > > "UserDefinedCodec" interface to enable codecs to be registered from
>> > > > > > third party code, similar to what is done for extension types [1]
>> > > > > >
>> > > > > > I'll update this thread when I get my experimental C++ patch up to
>> > see
>> > > > > > what I'm thinking at least for the built-in codecs we have like
>> > ZSTD.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > >
>> > https://github.com/apache/arrow/blob/apache-arrow-0.16.0/docs/source/format/Columnar.rst#extension-types
>> > > > > >
>> > > > > > On Thu, Mar 5, 2020 at 7:56 AM Fan Liya <[email protected]>
>> > wrote:
>> > > > > > >
>> > > > > > > Hi Wes,
>> > > > > > >
>> > > > > > > Thanks a lot for your further clarification.
>> > > > > > >
>> > > > > > > Some of my prelimiary thoughts:
>> > > > > > >
>> > > > > > > 1. We assign a unique GUID to each pair of
>> > compression/decompression
>> > > > > > > strategies. The GUID is stored as part of the
>> > > > Message.custom_metadata.
>> > > > > > When
>> > > > > > > receiving the GUID, the receiver knows which decompression
>> > strategy
>> > > > to
>> > > > > > use.
>> > > > > > >
>> > > > > > > 2. We serialize the decompression strategy, and store it into the
>> > > > > > > Message.custom_metadata. The receiver can decompress data after
>> > > > > > > deserializing the strategy.
>> > > > > > >
>> > > > > > > Method 1 is generally used in static strategy scenarios while
>> > method
>> > > > 2 is
>> > > > > > > generally used in dynamic strategy scenarios.
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Liya Fan
>> > > > > > >
>> > > > > > > On Wed, Mar 4, 2020 at 11:39 PM Wes McKinney <
>> > [email protected]>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Okay, I guess my question is how the receiver is going to be
>> > able
>> > > > to
>> > > > > > > > determine how to "rehydrate" the record batch buffers:
>> > > > > > > >
>> > > > > > > > What I've proposed amounts to the following:
>> > > > > > > >
>> > > > > > > > * UNCOMPRESSED: the current behavior
>> > > > > > > > * ZSTD/LZ4/...: each buffer is compressed and written with an
>> > int64
>> > > > > > > > length prefix
>> > > > > > > >
>> > > > > > > > (I'm close to putting up a PR implementing an experimental
>> > version
>> > > > of
>> > > > > > > > this that uses Message.custom_metadata to transmit the codec,
>> > so
>> > > > this
>> > > > > > > > will make the implementation details more concrete)
>> > > > > > > >
>> > > > > > > > So in the USER_DEFINED case, how will the library know how to
>> > > > obtain
>> > > > > > > > the uncompressed buffer? Is some additional metadata structure
>> > > > > > > > required to provide instructions?
>> > > > > > > >
>> > > > > > > > On Wed, Mar 4, 2020 at 8:05 AM Fan Liya <[email protected]>
>> > > > wrote:
>> > > > > > > > >
>> > > > > > > > > Hi Wes,
>> > > > > > > > >
>> > > > > > > > > I am thinking of adding an option named "USER_DEFINED" (or
>> > > > something
>> > > > > > > > > similar) to enum CompressionType in your proposal.
>> > > > > > > > > IMO, this option should be used primarily in Flight.
>> > > > > > > > >
>> > > > > > > > > Best,
>> > > > > > > > > Liya Fan
>> > > > > > > > >
>> > > > > > > > > On Wed, Mar 4, 2020 at 11:12 AM Wes McKinney <
>> > > > [email protected]>
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > On Tue, Mar 3, 2020, 8:11 PM Fan Liya <
>> > [email protected]>
>> > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Sure. I agree with you that we should not overdo this.
>> > > > > > > > > > > I am wondering if we should provide an option to allow
>> > users
>> > > > to
>> > > > > > > > plugin
>> > > > > > > > > > > their customized compression strategies.
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Can you provide a patch showing changes to Message.fbs (or
>> > > > > > Schema.fbs)
>> > > > > > > > that
>> > > > > > > > > > make this idea more concrete?
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > > Best,
>> > > > > > > > > > > Liya Fan
>> > > > > > > > > > >
>> > > > > > > > > > > On Tue, Mar 3, 2020 at 9:47 PM Wes McKinney <
>> > > > [email protected]
>> > > > > > >
>> > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > On Tue, Mar 3, 2020, 7:36 AM Fan Liya <
>> > > > [email protected]>
>> > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > I am so glad to see this discussion, and I am
>> > willing to
>> > > > > > provide
>> > > > > > > > help
>> > > > > > > > > > > > from
>> > > > > > > > > > > > > the Java side.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > In the proposal, I see the support for basic
>> > compression
>> > > > > > > > strategies
>> > > > > > > > > > > > > (e.g.gzip, snappy).
>> > > > > > > > > > > > > IMO, applying a single basic strategy is not likely
>> > to
>> > > > > > achieve
>> > > > > > > > > > > > performance
>> > > > > > > > > > > > > improvement for most scenarios.
>> > > > > > > > > > > > > The optimal compression strategy is often obtained by
>> > > > > > composing
>> > > > > > > > basic
>> > > > > > > > > > > > > strategies and tuning parameters.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > I hope we can support such highly customized
>> > compression
>> > > > > > > > strategies.
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > I think very much beyond trivial one-shot buffer level
>> > > > > > compression
>> > > > > > > > is
>> > > > > > > > > > > > probably out of the question for addition to the
>> > current
>> > > > > > > > "RecordBatch"
>> > > > > > > > > > > > Flatbuffers type, because the additional metadata
>> > would add
>> > > > > > > > undesirable
>> > > > > > > > > > > > bloat (which I would be against). If people have other
>> > > > ideas it
>> > > > > > > > would
>> > > > > > > > > > be
>> > > > > > > > > > > > great to see exactly what you are thinking as far as
>> > > > changes
>> > > > > > to the
>> > > > > > > > > > > > protocol files.
>> > > > > > > > > > > >
>> > > > > > > > > > > > I'll try to assemble some examples to show the
>> > before/after
>> > > > > > > > results of
>> > > > > > > > > > > > applying the simple strategy.
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > Liya Fan
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Tue, Mar 3, 2020 at 8:15 PM Antoine Pitrou <
>> > > > > > > > [email protected]>
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > If we want to use a HTTP header, it would be more
>> > of a
>> > > > > > > > > > > Accept-Encoding
>> > > > > > > > > > > > > > header, no?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > In any case, we would have to put non-standard
>> > values
>> > > > there
>> > > > > > > > (e.g.
>> > > > > > > > > > > lz4),
>> > > > > > > > > > > > > > so I'm not sure how desirable it is to repurpose
>> > HTTP
>> > > > > > headers
>> > > > > > > > for
>> > > > > > > > > > > that,
>> > > > > > > > > > > > > > rather than add some dedicated field to the Flight
>> > > > > > messages.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Regards
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Antoine.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Le 03/03/2020 à 12:52, David Li a écrit :
>> > > > > > > > > > > > > > > gRPC supports headers so for Flight, we could
>> > send
>> > > > > > > > essentially an
>> > > > > > > > > > > > > Accept
>> > > > > > > > > > > > > > > header and perhaps a Content-Type header.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > David
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Mon, Mar 2, 2020, 23:15 Micah Kornfield <
>> > > > > > > > > > [email protected]>
>> > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >> Hi Wes,
>> > > > > > > > > > > > > > >> A few thoughts on this.  In general, I think it
>> > is a
>> > > > > > good
>> > > > > > > > idea.
>> > > > > > > > > > > But
>> > > > > > > > > > > > > > before
>> > > > > > > > > > > > > > >> proceeding, I think the following points are
>> > worth
>> > > > > > > > discussing:
>> > > > > > > > > > > > > > >> 1.  Does this actually improve
>> > throughput/latency
>> > > > for
>> > > > > > > > Flight? (I
>> > > > > > > > > > > > think
>> > > > > > > > > > > > > > you
>> > > > > > > > > > > > > > >> mentioned you would follow-up with benchmarks).
>> > > > > > > > > > > > > > >> 2.  I think we should limit the number of
>> > supported
>> > > > > > > > compression
>> > > > > > > > > > > > > schemes
>> > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > >> only 1 or 2.  I think the criteria for selection
>> > > > speed
>> > > > > > and
>> > > > > > > > > > native
>> > > > > > > > > > > > > > >> implementations available across the widest
>> > possible
>> > > > > > > > languages.
>> > > > > > > > > > > As
>> > > > > > > > > > > > > far
>> > > > > > > > > > > > > > as
>> > > > > > > > > > > > > > >> i can tell zstd only have bindings in java via
>> > JNI,
>> > > > but
>> > > > > > my
>> > > > > > > > > > > > > > understanding is
>> > > > > > > > > > > > > > >> it is probably the type of compression for our
>> > > > > > use-cases.
>> > > > > > > > So I
>> > > > > > > > > > > > think
>> > > > > > > > > > > > > > >> zstd + potentially 1 more.
>> > > > > > > > > > > > > > >> 3.  Commitment from someone on the Java side to
>> > > > > > implement
>> > > > > > > > this.
>> > > > > > > > > > > > > > >> 4.  This doesn't need to be coupled with this
>> > change
>> > > > > > per-se
>> > > > > > > > but
>> > > > > > > > > > > for
>> > > > > > > > > > > > > > >> something like flight it would be good to have a
>> > > > > > standard
>> > > > > > > > > > > mechanism
>> > > > > > > > > > > > > for
>> > > > > > > > > > > > > > >> negotiating server/client capabilities (e.g.
>> > client
>> > > > > > doesn't
>> > > > > > > > > > > support
>> > > > > > > > > > > > > > >> compression or only supports a subset).
>> > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > >> Thanks,
>> > > > > > > > > > > > > > >> Micah
>> > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > >> On Sun, Mar 1, 2020 at 1:24 PM Wes McKinney <
>> > > > > > > > > > [email protected]>
>> > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > >>> On Sun, Mar 1, 2020 at 3:14 PM Antoine Pitrou <
>> > > > > > > > > > > [email protected]>
>> > > > > > > > > > > > > > >> wrote:
>> > > > > > > > > > > > > > >>>>
>> > > > > > > > > > > > > > >>>>
>> > > > > > > > > > > > > > >>>> Le 01/03/2020 à 22:01, Wes McKinney a écrit :
>> > > > > > > > > > > > > > >>>>> In the context of a "next version of the
>> > Feather
>> > > > > > format"
>> > > > > > > > > > > > ARROW-5510
>> > > > > > > > > > > > > > >>>>> (which is consumed only by Python and R at
>> > the
>> > > > > > moment), I
>> > > > > > > > > > have
>> > > > > > > > > > > > been
>> > > > > > > > > > > > > > >>>>> looking at compressing buffers using fast
>> > > > compressors
>> > > > > > > > like
>> > > > > > > > > > ZSTD
>> > > > > > > > > > > > > when
>> > > > > > > > > > > > > > >>>>> writing the RecordBatch bodies. This could be
>> > > > handled
>> > > > > > > > > > privately
>> > > > > > > > > > > > as
>> > > > > > > > > > > > > an
>> > > > > > > > > > > > > > >>>>> implementation detail of the Feather file,
>> > but
>> > > > since
>> > > > > > ZSTD
>> > > > > > > > > > > > > compression
>> > > > > > > > > > > > > > >>>>> could improve throughput in Flight, for
>> > example,
>> > > > I
>> > > > > > > > thought I
>> > > > > > > > > > > > would
>> > > > > > > > > > > > > > >>>>> bring it up for discussion.
>> > > > > > > > > > > > > > >>>>>
>> > > > > > > > > > > > > > >>>>> I can see two simple compression strategies:
>> > > > > > > > > > > > > > >>>>>
>> > > > > > > > > > > > > > >>>>> * Compress the entire message body in
>> > one-shot,
>> > > > > > writing
>> > > > > > > > the
>> > > > > > > > > > > > result
>> > > > > > > > > > > > > > >> out
>> > > > > > > > > > > > > > >>>>> with an 8-byte int64 prefix indicating the
>> > > > > > uncompressed
>> > > > > > > > size
>> > > > > > > > > > > > > > >>>>> * Compress each non-zero-length constituent
>> > > > Buffer
>> > > > > > prior
>> > > > > > > > to
>> > > > > > > > > > > > writing
>> > > > > > > > > > > > > > >> to
>> > > > > > > > > > > > > > >>>>> the body (and using the same
>> > > > > > uncompressed-length-prefix
>> > > > > > > > when
>> > > > > > > > > > > > > writing
>> > > > > > > > > > > > > > >>>>> the compressed buffer)
>> > > > > > > > > > > > > > >>>>>
>> > > > > > > > > > > > > > >>>>> The latter strategy is preferable for
>> > scenarios
>> > > > > > where we
>> > > > > > > > may
>> > > > > > > > > > > > > project
>> > > > > > > > > > > > > > >>>>> out only a few fields from a larger record
>> > batch
>> > > > > > (such as
>> > > > > > > > > > > reading
>> > > > > > > > > > > > > > >> from
>> > > > > > > > > > > > > > >>>>> a memory-mapped file).
>> > > > > > > > > > > > > > >>>>
>> > > > > > > > > > > > > > >>>> Agreed.  It may also allow using different
>> > > > compression
>> > > > > > > > > > > strategies
>> > > > > > > > > > > > > for
>> > > > > > > > > > > > > > >>>> different kinds of buffers (for example a
>> > > > bytestream
>> > > > > > > > splitting
>> > > > > > > > > > > > > > strategy
>> > > > > > > > > > > > > > >>>> for floats and doubles, or a delta encoding
>> > > > strategy
>> > > > > > for
>> > > > > > > > > > > > integers).
>> > > > > > > > > > > > > > >>>
>> > > > > > > > > > > > > > >>> If we wanted to allow for different
>> > compression to
>> > > > > > apply to
>> > > > > > > > > > > > different
>> > > > > > > > > > > > > > >>> buffers, I think we will need a new Message
>> > type
>> > > > > > because
>> > > > > > > > this
>> > > > > > > > > > > would
>> > > > > > > > > > > > > > >>> inflate metadata sizes in a way that is not
>> > likely
>> > > > to
>> > > > > > be
>> > > > > > > > > > > acceptable
>> > > > > > > > > > > > > > >>> for the current uncompressed use case.
>> > > > > > > > > > > > > > >>>
>> > > > > > > > > > > > > > >>> Here is my strawman proposal
>> > > > > > > > > > > > > > >>>
>> > > > > > > > > > > > > > >>>
>> > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > >
>> > https://github.com/apache/arrow/compare/master...wesm:compression-strawman
>> > > > > > > > > > > > > > >>>
>> > > > > > > > > > > > > > >>>>> Implementation could be accomplished by one
>> > of
>> > > > the
>> > > > > > > > following
>> > > > > > > > > > > > > methods:
>> > > > > > > > > > > > > > >>>>>
>> > > > > > > > > > > > > > >>>>> * Setting a field in Message.custom_metadata
>> > > > > > > > > > > > > > >>>>> * Adding a new field to Message
>> > > > > > > > > > > > > > >>>>
>> > > > > > > > > > > > > > >>>> I think it has to be a new field in Message.
>> > > > Making
>> > > > > > it an
>> > > > > > > > > > > > ignorable
>> > > > > > > > > > > > > > >>>> metadata field means non-supporting receivers
>> > will
>> > > > > > decode
>> > > > > > > > and
>> > > > > > > > > > > > > > interpret
>> > > > > > > > > > > > > > >>>> the data wrongly.
>> > > > > > > > > > > > > > >>>>
>> > > > > > > > > > > > > > >>>> Regards
>> > > > > > > > > > > > > > >>>>
>> > > > > > > > > > > > > > >>>> Antoine.
>> > > > > > > > > > > > > > >>>
>> > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > >
>> >

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

Reply via email to