Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

Wes McKinney Thu, 16 Apr 2020 11:48:14 -0700

It seems like there is reasonable consensus in the PR. If there are no
further comments I'll start a vote about this within the next several
days


On Mon, Apr 6, 2020 at 10:55 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> I updated the Format proposal again, please have a look
>
> https://github.com/apache/arrow/pull/6707
>
> On Wed, Apr 1, 2020 at 10:15 AM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > For uncompressed, memory mapping is disabled, so all of the bytes are
> > being read into RAM. I wanted to show that even when your IO pipe is
> > very fast (in the case with an NVMe SSD like I have, > 1GB/s for read
> > from disk) that you can still load faster with compressed files.
> >
> > Here were the prior Read results with
> >
> > * Single threaded decompression
> > * Memory mapping enabled
> >
> > https://ibb.co/4ZncdF8
> >
> > You can see for larger chunksizes, because the IPC reconstruction
> > overhead is about 60 microseconds per batch, that read time is very
> > low (10s of milliseconds).
> >
> > On Wed, Apr 1, 2020 at 10:10 AM Antoine Pitrou <anto...@python.org> wrote:
> > >
> > >
> > > The read times are still with memory mapping for the uncompressed case?
> > >  If so, impressive!
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 01/04/2020 à 16:44, Wes McKinney a écrit :
> > > > Several pieces of work got done in the last few days:
> > > >
> > > > * Changing from LZ4 raw to LZ4 frame format (what is recommended for
> > > > interoperability)
> > > > * Parallelizing both compression and decompression at the field level
> > > >
> > > > Here are the results (using 8 threads on an 8-core laptop). I disabled
> > > > the "memory map" feature so that in the uncompressed case all of the
> > > > data must be read off disk into memory. This helps illustrate the
> > > > compression/IO tradeoff to wall clock load times
> > > >
> > > > File size (only LZ4 may be different): https://ibb.co/CP3VQkp
> > > > Read time: https://ibb.co/vz9JZMx
> > > > Write time: https://ibb.co/H7bb68T
> > > >
> > > > In summary, now with multicore compression and decompression,
> > > > LZ4-compressed files are faster both to read and write even on a very
> > > > fast SSD, as are ZSTD-compressed files with a low ZSTD compression
> > > > level. I didn't notice a major difference between LZ4 raw and LZ4
> > > > frame formats. The reads and writes could be made faster still by
> > > > pipelining / making concurrent the disk read/write and
> > > > compression/decompression steps -- the current implementation performs
> > > > these tasks serially. We can improve this in the near future
> > > >
> > > > I'll update the Format proposal this week so we can move toward
> > > > something we can vote on. I would recommend that we await
> > > > implementations and integration tests for this before releasing this
> > > > as stable, in line with prior discussions about adding stuff to the
> > > > IPC protocol
> > > >
> > > > On Thu, Mar 26, 2020 at 4:57 PM Wes McKinney <wesmck...@gmail.com> 
> > > > wrote:
> > > >>
> > > >> Here are the results:
> > > >>
> > > >> File size: https://ibb.co/71sBsg3
> > > >> Read time: https://ibb.co/4ZncdF8
> > > >> Write time: https://ibb.co/xhNkRS2
> > > >>
> > > >> Code: 
> > > >> https://github.com/wesm/notebooks/blob/master/20190919file_benchmarks/FeatherCompression.ipynb
> > > >> (based on https://github.com/apache/arrow/pull/6694)
> > > >>
> > > >> High level summary:
> > > >>
> > > >> * Chunksize 1024 vs 64K has relatively limited impact on file sizes
> > > >>
> > > >> * Wall clock read time is impacted by chunksize, maybe 30-40%
> > > >> difference between 1K row chunks versus 16K row chunks. One notable
> > > >> thing is that you can see clearly the overhead associated with IPC
> > > >> reconstruction even when the data is memory mapped. For example, in
> > > >> the Fannie Mae dataset there are 21,661 batches (each batch has 31
> > > >> fields) when the chunksize is 1024. So a read time of 1.3 seconds
> > > >> indicates ~60 microseconds of overhead for each record batch. When you
> > > >> consider the amount of business logic involved with reconstructing a
> > > >> record batch, 60 microseconds is pretty good. This also shows that
> > > >> every microsecond counts and we need to be carefully tracking
> > > >> microperformance in this critical operation.
> > > >>
> > > >> * Small chunksize results in higher write times for "expensive" codecs
> > > >> like ZSTD with a high compression ratio. For "cheap" codecs like LZ4
> > > >> it doesn't make as much of a difference
> > > >>
> > > >> * Note that LZ4 compressor results in faster wall clock time to disk
> > > >> presumably because the compression speed is faster than my SSD's write
> > > >> speed
> > > >>
> > > >> Implementation notes:
> > > >> * There is no parallelization or pipelining of reads or writes. For
> > > >> example, on write, all of the buffers are compressed with a single
> > > >> thread and then compression stops until the write to disk completes.
> > > >> On read, buffers are decompressed serially
> > > >>
> > > >>
> > > >> On Thu, Mar 26, 2020 at 12:24 PM Wes McKinney <wesmck...@gmail.com> 
> > > >> wrote:
> > > >>>
> > > >>> I'll run a grid of batch sizes (from 1024 to 64K or 128K) and let you
> > > >>> know the read/write times and compression ratios. Shouldn't take too
> > > >>> long
> > > >>>
> > > >>> On Wed, Mar 25, 2020 at 10:37 PM Fan Liya <liya.fa...@gmail.com> 
> > > >>> wrote:
> > > >>>>
> > > >>>> Thanks a lot for sharing the good results.
> > > >>>>
> > > >>>> As investigated by Wes, we have existing zstd library for Java 
> > > >>>> (zstd-jni) [1], and lz4 library for Java (lz4-java) [2].
> > > >>>> +1 for the 1024 batch size, as it represents an important scenario 
> > > >>>> where the batch fits into the L1 cache (IMO).
> > > >>>>
> > > >>>> Best,
> > > >>>> Liya Fan
> > > >>>>
> > > >>>> [1] https://github.com/luben/zstd-jni
> > > >>>> [2] https://github.com/lz4/lz4-java
> > > >>>>
> > > >>>> On Thu, Mar 26, 2020 at 2:38 AM Micah Kornfield 
> > > >>>> <emkornfi...@gmail.com> wrote:
> > > >>>>>
> > > >>>>> If it isn't hard could you run with batch sizes of 1024 or 2048 
> > > >>>>> records?  I
> > > >>>>> think there was a question previously raised if there was benefit 
> > > >>>>> for
> > > >>>>> smaller sizes buffers.
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Micah
> > > >>>>>
> > > >>>>>
> > > >>>>> On Wed, Mar 25, 2020 at 8:59 AM Wes McKinney <wesmck...@gmail.com> 
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> On Tue, Mar 24, 2020 at 9:22 PM Micah Kornfield 
> > > >>>>>> <emkornfi...@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Compression ratios ranging from ~50% with LZ4 and ~75% with ZSTD 
> > > >>>>>>>> on
> > > >>>>>>>> the Taxi dataset to ~87% with LZ4 and ~90% with ZSTD on the 
> > > >>>>>>>> Fannie Mae
> > > >>>>>>>> dataset. So that's a huge space savings
> > > >>>>>>>
> > > >>>>>>> One more question on this.  What was the average row-batch size 
> > > >>>>>>> used?  I
> > > >>>>>>> see in the proposal some buffers might not be compressed, did you 
> > > >>>>>>> this
> > > >>>>>>> feature in the test?
> > > >>>>>>
> > > >>>>>> I used 64K row batch size. I haven't implemented the optional
> > > >>>>>> non-compressed buffers (for cases where there is little space 
> > > >>>>>> savings)
> > > >>>>>> so everything is compressed. I can check different batch sizes if 
> > > >>>>>> you
> > > >>>>>> like
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> On Mon, Mar 23, 2020 at 4:40 PM Wes McKinney <wesmck...@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> hi folks,
> > > >>>>>>>>
> > > >>>>>>>> Sorry it's taken me a little while to produce supporting 
> > > >>>>>>>> benchmarks.
> > > >>>>>>>>
> > > >>>>>>>> * I implemented experimental trivial body buffer compression in
> > > >>>>>>>> https://github.com/apache/arrow/pull/6638
> > > >>>>>>>> * I hooked up the Arrow IPC file format with compression as the 
> > > >>>>>>>> new
> > > >>>>>>>> Feather V2 format in
> > > >>>>>>>> https://github.com/apache/arrow/pull/6694#issuecomment-602906476
> > > >>>>>>>>
> > > >>>>>>>> I tested a couple of real-world datasets from a prior blog post
> > > >>>>>>>> https://ursalabs.org/blog/2019-10-columnar-perf/ with ZSTD and 
> > > >>>>>>>> LZ4
> > > >>>>>>>> codecs
> > > >>>>>>>>
> > > >>>>>>>> The complete results are here
> > > >>>>>>>> https://github.com/apache/arrow/pull/6694#issuecomment-602906476
> > > >>>>>>>>
> > > >>>>>>>> Summary:
> > > >>>>>>>>
> > > >>>>>>>> * Compression ratios ranging from ~50% with LZ4 and ~75% with 
> > > >>>>>>>> ZSTD on
> > > >>>>>>>> the Taxi dataset to ~87% with LZ4 and ~90% with ZSTD on the 
> > > >>>>>>>> Fannie Mae
> > > >>>>>>>> dataset. So that's a huge space savings
> > > >>>>>>>> * Single-threaded decompression times exceeding 2-4GByte/s with 
> > > >>>>>>>> LZ4
> > > >>>>>>>> and 1.2-3GByte/s with ZSTD
> > > >>>>>>>>
> > > >>>>>>>> I would have to do some more engineering to test throughput 
> > > >>>>>>>> changes
> > > >>>>>>>> with Flight, but given these results on slower networking (e.g. 1
> > > >>>>>>>> Gigabit) my guess is that the compression and decompression 
> > > >>>>>>>> overhead
> > > >>>>>>>> is little compared with the time savings due to high compression
> > > >>>>>>>> ratios. If people would like to see these numbers to help make a
> > > >>>>>>>> decision I can take a closer look
> > > >>>>>>>>
> > > >>>>>>>> As far as what Micah said about having a limited number of
> > > >>>>>>>> compressors: I would be in favor of having just LZ4 and ZSTD. It 
> > > >>>>>>>> seems
> > > >>>>>>>> anecdotally that these outperform Snappy in most real world 
> > > >>>>>>>> scenarios
> > > >>>>>>>> and generally have > 1 GB/s decompression performance. Some Linux
> > > >>>>>>>> distributions (Arch at least) have already started adopting ZSTD 
> > > >>>>>>>> over
> > > >>>>>>>> LZMA or GZIP [1]
> > > >>>>>>>>
> > > >>>>>>>> - Wes
> > > >>>>>>>>
> > > >>>>>>>> [1]:
> > > >>>>>>>>
> > > >>>>>> https://www.archlinux.org/news/now-using-zstandard-instead-of-xz-for-package-compression/
> > > >>>>>>>>
> > > >>>>>>>> On Fri, Mar 6, 2020 at 8:42 AM Fan Liya <liya.fa...@gmail.com> 
> > > >>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> Hi Wes,
> > > >>>>>>>>>
> > > >>>>>>>>> Thanks a lot for the additional information.
> > > >>>>>>>>> Looking forward to see the good results from your experiments.
> > > >>>>>>>>>
> > > >>>>>>>>> Best,
> > > >>>>>>>>> Liya Fan
> > > >>>>>>>>>
> > > >>>>>>>>> On Thu, Mar 5, 2020 at 11:42 PM Wes McKinney 
> > > >>>>>>>>> <wesmck...@gmail.com>
> > > >>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> I see, thank you.
> > > >>>>>>>>>>
> > > >>>>>>>>>> For such a scenario, implementations would need to define a
> > > >>>>>>>>>> "UserDefinedCodec" interface to enable codecs to be registered 
> > > >>>>>>>>>> from
> > > >>>>>>>>>> third party code, similar to what is done for extension types 
> > > >>>>>>>>>> [1]
> > > >>>>>>>>>>
> > > >>>>>>>>>> I'll update this thread when I get my experimental C++ patch 
> > > >>>>>>>>>> up to
> > > >>>>>> see
> > > >>>>>>>>>> what I'm thinking at least for the built-in codecs we have like
> > > >>>>>> ZSTD.
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>> https://github.com/apache/arrow/blob/apache-arrow-0.16.0/docs/source/format/Columnar.rst#extension-types
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Thu, Mar 5, 2020 at 7:56 AM Fan Liya <liya.fa...@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Hi Wes,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Thanks a lot for your further clarification.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Some of my prelimiary thoughts:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> 1. We assign a unique GUID to each pair of
> > > >>>>>> compression/decompression
> > > >>>>>>>>>>> strategies. The GUID is stored as part of the
> > > >>>>>>>> Message.custom_metadata.
> > > >>>>>>>>>> When
> > > >>>>>>>>>>> receiving the GUID, the receiver knows which decompression
> > > >>>>>> strategy
> > > >>>>>>>> to
> > > >>>>>>>>>> use.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> 2. We serialize the decompression strategy, and store it into 
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>> Message.custom_metadata. The receiver can decompress data 
> > > >>>>>>>>>>> after
> > > >>>>>>>>>>> deserializing the strategy.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Method 1 is generally used in static strategy scenarios while
> > > >>>>>> method
> > > >>>>>>>> 2 is
> > > >>>>>>>>>>> generally used in dynamic strategy scenarios.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Best,
> > > >>>>>>>>>>> Liya Fan
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Wed, Mar 4, 2020 at 11:39 PM Wes McKinney <
> > > >>>>>> wesmck...@gmail.com>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Okay, I guess my question is how the receiver is going to be
> > > >>>>>> able
> > > >>>>>>>> to
> > > >>>>>>>>>>>> determine how to "rehydrate" the record batch buffers:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> What I've proposed amounts to the following:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> * UNCOMPRESSED: the current behavior
> > > >>>>>>>>>>>> * ZSTD/LZ4/...: each buffer is compressed and written with an
> > > >>>>>> int64
> > > >>>>>>>>>>>> length prefix
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> (I'm close to putting up a PR implementing an experimental
> > > >>>>>> version
> > > >>>>>>>> of
> > > >>>>>>>>>>>> this that uses Message.custom_metadata to transmit the codec,
> > > >>>>>> so
> > > >>>>>>>> this
> > > >>>>>>>>>>>> will make the implementation details more concrete)
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> So in the USER_DEFINED case, how will the library know how to
> > > >>>>>>>> obtain
> > > >>>>>>>>>>>> the uncompressed buffer? Is some additional metadata 
> > > >>>>>>>>>>>> structure
> > > >>>>>>>>>>>> required to provide instructions?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Wed, Mar 4, 2020 at 8:05 AM Fan Liya 
> > > >>>>>>>>>>>> <liya.fa...@gmail.com>
> > > >>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Hi Wes,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I am thinking of adding an option named "USER_DEFINED" (or
> > > >>>>>>>> something
> > > >>>>>>>>>>>>> similar) to enum CompressionType in your proposal.
> > > >>>>>>>>>>>>> IMO, this option should be used primarily in Flight.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>> Liya Fan
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Wed, Mar 4, 2020 at 11:12 AM Wes McKinney <
> > > >>>>>>>> wesmck...@gmail.com>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Tue, Mar 3, 2020, 8:11 PM Fan Liya <
> > > >>>>>> liya.fa...@gmail.com>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Sure. I agree with you that we should not overdo this.
> > > >>>>>>>>>>>>>>> I am wondering if we should provide an option to allow
> > > >>>>>> users
> > > >>>>>>>> to
> > > >>>>>>>>>>>> plugin
> > > >>>>>>>>>>>>>>> their customized compression strategies.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Can you provide a patch showing changes to Message.fbs (or
> > > >>>>>>>>>> Schema.fbs)
> > > >>>>>>>>>>>> that
> > > >>>>>>>>>>>>>> make this idea more concrete?
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>> Liya Fan
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> On Tue, Mar 3, 2020 at 9:47 PM Wes McKinney <
> > > >>>>>>>> wesmck...@gmail.com
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On Tue, Mar 3, 2020, 7:36 AM Fan Liya <
> > > >>>>>>>> liya.fa...@gmail.com>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> I am so glad to see this discussion, and I am
> > > >>>>>> willing to
> > > >>>>>>>>>> provide
> > > >>>>>>>>>>>> help
> > > >>>>>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>>> the Java side.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> In the proposal, I see the support for basic
> > > >>>>>> compression
> > > >>>>>>>>>>>> strategies
> > > >>>>>>>>>>>>>>>>> (e.g.gzip, snappy).
> > > >>>>>>>>>>>>>>>>> IMO, applying a single basic strategy is not likely
> > > >>>>>> to
> > > >>>>>>>>>> achieve
> > > >>>>>>>>>>>>>>>> performance
> > > >>>>>>>>>>>>>>>>> improvement for most scenarios.
> > > >>>>>>>>>>>>>>>>> The optimal compression strategy is often obtained by
> > > >>>>>>>>>> composing
> > > >>>>>>>>>>>> basic
> > > >>>>>>>>>>>>>>>>> strategies and tuning parameters.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> I hope we can support such highly customized
> > > >>>>>> compression
> > > >>>>>>>>>>>> strategies.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I think very much beyond trivial one-shot buffer level
> > > >>>>>>>>>> compression
> > > >>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>> probably out of the question for addition to the
> > > >>>>>> current
> > > >>>>>>>>>>>> "RecordBatch"
> > > >>>>>>>>>>>>>>>> Flatbuffers type, because the additional metadata
> > > >>>>>> would add
> > > >>>>>>>>>>>> undesirable
> > > >>>>>>>>>>>>>>>> bloat (which I would be against). If people have other
> > > >>>>>>>> ideas it
> > > >>>>>>>>>>>> would
> > > >>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>> great to see exactly what you are thinking as far as
> > > >>>>>>>> changes
> > > >>>>>>>>>> to the
> > > >>>>>>>>>>>>>>>> protocol files.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I'll try to assemble some examples to show the
> > > >>>>>> before/after
> > > >>>>>>>>>>>> results of
> > > >>>>>>>>>>>>>>>> applying the simple strategy.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>>> Liya Fan
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On Tue, Mar 3, 2020 at 8:15 PM Antoine Pitrou <
> > > >>>>>>>>>>>> anto...@python.org>
> > > >>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> If we want to use a HTTP header, it would be more
> > > >>>>>> of a
> > > >>>>>>>>>>>>>>> Accept-Encoding
> > > >>>>>>>>>>>>>>>>>> header, no?
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> In any case, we would have to put non-standard
> > > >>>>>> values
> > > >>>>>>>> there
> > > >>>>>>>>>>>> (e.g.
> > > >>>>>>>>>>>>>>> lz4),
> > > >>>>>>>>>>>>>>>>>> so I'm not sure how desirable it is to repurpose
> > > >>>>>> HTTP
> > > >>>>>>>>>> headers
> > > >>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>> that,
> > > >>>>>>>>>>>>>>>>>> rather than add some dedicated field to the Flight
> > > >>>>>>>>>> messages.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Regards
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Antoine.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Le 03/03/2020 à 12:52, David Li a écrit :
> > > >>>>>>>>>>>>>>>>>>> gRPC supports headers so for Flight, we could
> > > >>>>>> send
> > > >>>>>>>>>>>> essentially an
> > > >>>>>>>>>>>>>>>>> Accept
> > > >>>>>>>>>>>>>>>>>>> header and perhaps a Content-Type header.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> David
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> On Mon, Mar 2, 2020, 23:15 Micah Kornfield <
> > > >>>>>>>>>>>>>> emkornfi...@gmail.com>
> > > >>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Hi Wes,
> > > >>>>>>>>>>>>>>>>>>>> A few thoughts on this.  In general, I think it
> > > >>>>>> is a
> > > >>>>>>>>>> good
> > > >>>>>>>>>>>> idea.
> > > >>>>>>>>>>>>>>> But
> > > >>>>>>>>>>>>>>>>>> before
> > > >>>>>>>>>>>>>>>>>>>> proceeding, I think the following points are
> > > >>>>>> worth
> > > >>>>>>>>>>>> discussing:
> > > >>>>>>>>>>>>>>>>>>>> 1.  Does this actually improve
> > > >>>>>> throughput/latency
> > > >>>>>>>> for
> > > >>>>>>>>>>>> Flight? (I
> > > >>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>>>> mentioned you would follow-up with benchmarks).
> > > >>>>>>>>>>>>>>>>>>>> 2.  I think we should limit the number of
> > > >>>>>> supported
> > > >>>>>>>>>>>> compression
> > > >>>>>>>>>>>>>>>>> schemes
> > > >>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>> only 1 or 2.  I think the criteria for selection
> > > >>>>>>>> speed
> > > >>>>>>>>>> and
> > > >>>>>>>>>>>>>> native
> > > >>>>>>>>>>>>>>>>>>>> implementations available across the widest
> > > >>>>>> possible
> > > >>>>>>>>>>>> languages.
> > > >>>>>>>>>>>>>>> As
> > > >>>>>>>>>>>>>>>>> far
> > > >>>>>>>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>>>>>>> i can tell zstd only have bindings in java via
> > > >>>>>> JNI,
> > > >>>>>>>> but
> > > >>>>>>>>>> my
> > > >>>>>>>>>>>>>>>>>> understanding is
> > > >>>>>>>>>>>>>>>>>>>> it is probably the type of compression for our
> > > >>>>>>>>>> use-cases.
> > > >>>>>>>>>>>> So I
> > > >>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>>> zstd + potentially 1 more.
> > > >>>>>>>>>>>>>>>>>>>> 3.  Commitment from someone on the Java side to
> > > >>>>>>>>>> implement
> > > >>>>>>>>>>>> this.
> > > >>>>>>>>>>>>>>>>>>>> 4.  This doesn't need to be coupled with this
> > > >>>>>> change
> > > >>>>>>>>>> per-se
> > > >>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>> something like flight it would be good to have a
> > > >>>>>>>>>> standard
> > > >>>>>>>>>>>>>>> mechanism
> > > >>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>> negotiating server/client capabilities (e.g.
> > > >>>>>> client
> > > >>>>>>>>>> doesn't
> > > >>>>>>>>>>>>>>> support
> > > >>>>>>>>>>>>>>>>>>>> compression or only supports a subset).
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>>>> Micah
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> On Sun, Mar 1, 2020 at 1:24 PM Wes McKinney <
> > > >>>>>>>>>>>>>> wesmck...@gmail.com>
> > > >>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> On Sun, Mar 1, 2020 at 3:14 PM Antoine Pitrou <
> > > >>>>>>>>>>>>>>> anto...@python.org>
> > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Le 01/03/2020 à 22:01, Wes McKinney a écrit :
> > > >>>>>>>>>>>>>>>>>>>>>>> In the context of a "next version of the
> > > >>>>>> Feather
> > > >>>>>>>>>> format"
> > > >>>>>>>>>>>>>>>> ARROW-5510
> > > >>>>>>>>>>>>>>>>>>>>>>> (which is consumed only by Python and R at
> > > >>>>>> the
> > > >>>>>>>>>> moment), I
> > > >>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>>> been
> > > >>>>>>>>>>>>>>>>>>>>>>> looking at compressing buffers using fast
> > > >>>>>>>> compressors
> > > >>>>>>>>>>>> like
> > > >>>>>>>>>>>>>> ZSTD
> > > >>>>>>>>>>>>>>>>> when
> > > >>>>>>>>>>>>>>>>>>>>>>> writing the RecordBatch bodies. This could be
> > > >>>>>>>> handled
> > > >>>>>>>>>>>>>> privately
> > > >>>>>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>>>> an
> > > >>>>>>>>>>>>>>>>>>>>>>> implementation detail of the Feather file,
> > > >>>>>> but
> > > >>>>>>>> since
> > > >>>>>>>>>> ZSTD
> > > >>>>>>>>>>>>>>>>> compression
> > > >>>>>>>>>>>>>>>>>>>>>>> could improve throughput in Flight, for
> > > >>>>>> example,
> > > >>>>>>>> I
> > > >>>>>>>>>>>> thought I
> > > >>>>>>>>>>>>>>>> would
> > > >>>>>>>>>>>>>>>>>>>>>>> bring it up for discussion.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> I can see two simple compression strategies:
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> * Compress the entire message body in
> > > >>>>>> one-shot,
> > > >>>>>>>>>> writing
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>> result
> > > >>>>>>>>>>>>>>>>>>>> out
> > > >>>>>>>>>>>>>>>>>>>>>>> with an 8-byte int64 prefix indicating the
> > > >>>>>>>>>> uncompressed
> > > >>>>>>>>>>>> size
> > > >>>>>>>>>>>>>>>>>>>>>>> * Compress each non-zero-length constituent
> > > >>>>>>>> Buffer
> > > >>>>>>>>>> prior
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>> writing
> > > >>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>> the body (and using the same
> > > >>>>>>>>>> uncompressed-length-prefix
> > > >>>>>>>>>>>> when
> > > >>>>>>>>>>>>>>>>> writing
> > > >>>>>>>>>>>>>>>>>>>>>>> the compressed buffer)
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> The latter strategy is preferable for
> > > >>>>>> scenarios
> > > >>>>>>>>>> where we
> > > >>>>>>>>>>>> may
> > > >>>>>>>>>>>>>>>>> project
> > > >>>>>>>>>>>>>>>>>>>>>>> out only a few fields from a larger record
> > > >>>>>> batch
> > > >>>>>>>>>> (such as
> > > >>>>>>>>>>>>>>> reading
> > > >>>>>>>>>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>>>>>>>>> a memory-mapped file).
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Agreed.  It may also allow using different
> > > >>>>>>>> compression
> > > >>>>>>>>>>>>>>> strategies
> > > >>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>>> different kinds of buffers (for example a
> > > >>>>>>>> bytestream
> > > >>>>>>>>>>>> splitting
> > > >>>>>>>>>>>>>>>>>> strategy
> > > >>>>>>>>>>>>>>>>>>>>>> for floats and doubles, or a delta encoding
> > > >>>>>>>> strategy
> > > >>>>>>>>>> for
> > > >>>>>>>>>>>>>>>> integers).
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> If we wanted to allow for different
> > > >>>>>> compression to
> > > >>>>>>>>>> apply to
> > > >>>>>>>>>>>>>>>> different
> > > >>>>>>>>>>>>>>>>>>>>> buffers, I think we will need a new Message
> > > >>>>>> type
> > > >>>>>>>>>> because
> > > >>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>> would
> > > >>>>>>>>>>>>>>>>>>>>> inflate metadata sizes in a way that is not
> > > >>>>>> likely
> > > >>>>>>>> to
> > > >>>>>>>>>> be
> > > >>>>>>>>>>>>>>> acceptable
> > > >>>>>>>>>>>>>>>>>>>>> for the current uncompressed use case.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Here is my strawman proposal
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>> https://github.com/apache/arrow/compare/master...wesm:compression-strawman
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Implementation could be accomplished by one
> > > >>>>>> of
> > > >>>>>>>> the
> > > >>>>>>>>>>>> following
> > > >>>>>>>>>>>>>>>>> methods:
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> * Setting a field in Message.custom_metadata
> > > >>>>>>>>>>>>>>>>>>>>>>> * Adding a new field to Message
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> I think it has to be a new field in Message.
> > > >>>>>>>> Making
> > > >>>>>>>>>> it an
> > > >>>>>>>>>>>>>>>> ignorable
> > > >>>>>>>>>>>>>>>>>>>>>> metadata field means non-supporting receivers
> > > >>>>>> will
> > > >>>>>>>>>> decode
> > > >>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>> interpret
> > > >>>>>>>>>>>>>>>>>>>>>> the data wrongly.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Regards
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Antoine.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

Reply via email to