It seems like there is reasonable consensus in the PR. If there are no further comments I'll start a vote about this within the next several days
On Mon, Apr 6, 2020 at 10:55 PM Wes McKinney <wesmck...@gmail.com> wrote: > > I updated the Format proposal again, please have a look > > https://github.com/apache/arrow/pull/6707 > > On Wed, Apr 1, 2020 at 10:15 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > > For uncompressed, memory mapping is disabled, so all of the bytes are > > being read into RAM. I wanted to show that even when your IO pipe is > > very fast (in the case with an NVMe SSD like I have, > 1GB/s for read > > from disk) that you can still load faster with compressed files. > > > > Here were the prior Read results with > > > > * Single threaded decompression > > * Memory mapping enabled > > > > https://ibb.co/4ZncdF8 > > > > You can see for larger chunksizes, because the IPC reconstruction > > overhead is about 60 microseconds per batch, that read time is very > > low (10s of milliseconds). > > > > On Wed, Apr 1, 2020 at 10:10 AM Antoine Pitrou <anto...@python.org> wrote: > > > > > > > > > The read times are still with memory mapping for the uncompressed case? > > > If so, impressive! > > > > > > Regards > > > > > > Antoine. > > > > > > > > > Le 01/04/2020 à 16:44, Wes McKinney a écrit : > > > > Several pieces of work got done in the last few days: > > > > > > > > * Changing from LZ4 raw to LZ4 frame format (what is recommended for > > > > interoperability) > > > > * Parallelizing both compression and decompression at the field level > > > > > > > > Here are the results (using 8 threads on an 8-core laptop). I disabled > > > > the "memory map" feature so that in the uncompressed case all of the > > > > data must be read off disk into memory. This helps illustrate the > > > > compression/IO tradeoff to wall clock load times > > > > > > > > File size (only LZ4 may be different): https://ibb.co/CP3VQkp > > > > Read time: https://ibb.co/vz9JZMx > > > > Write time: https://ibb.co/H7bb68T > > > > > > > > In summary, now with multicore compression and decompression, > > > > LZ4-compressed files are faster both to read and write even on a very > > > > fast SSD, as are ZSTD-compressed files with a low ZSTD compression > > > > level. I didn't notice a major difference between LZ4 raw and LZ4 > > > > frame formats. The reads and writes could be made faster still by > > > > pipelining / making concurrent the disk read/write and > > > > compression/decompression steps -- the current implementation performs > > > > these tasks serially. We can improve this in the near future > > > > > > > > I'll update the Format proposal this week so we can move toward > > > > something we can vote on. I would recommend that we await > > > > implementations and integration tests for this before releasing this > > > > as stable, in line with prior discussions about adding stuff to the > > > > IPC protocol > > > > > > > > On Thu, Mar 26, 2020 at 4:57 PM Wes McKinney <wesmck...@gmail.com> > > > > wrote: > > > >> > > > >> Here are the results: > > > >> > > > >> File size: https://ibb.co/71sBsg3 > > > >> Read time: https://ibb.co/4ZncdF8 > > > >> Write time: https://ibb.co/xhNkRS2 > > > >> > > > >> Code: > > > >> https://github.com/wesm/notebooks/blob/master/20190919file_benchmarks/FeatherCompression.ipynb > > > >> (based on https://github.com/apache/arrow/pull/6694) > > > >> > > > >> High level summary: > > > >> > > > >> * Chunksize 1024 vs 64K has relatively limited impact on file sizes > > > >> > > > >> * Wall clock read time is impacted by chunksize, maybe 30-40% > > > >> difference between 1K row chunks versus 16K row chunks. One notable > > > >> thing is that you can see clearly the overhead associated with IPC > > > >> reconstruction even when the data is memory mapped. For example, in > > > >> the Fannie Mae dataset there are 21,661 batches (each batch has 31 > > > >> fields) when the chunksize is 1024. So a read time of 1.3 seconds > > > >> indicates ~60 microseconds of overhead for each record batch. When you > > > >> consider the amount of business logic involved with reconstructing a > > > >> record batch, 60 microseconds is pretty good. This also shows that > > > >> every microsecond counts and we need to be carefully tracking > > > >> microperformance in this critical operation. > > > >> > > > >> * Small chunksize results in higher write times for "expensive" codecs > > > >> like ZSTD with a high compression ratio. For "cheap" codecs like LZ4 > > > >> it doesn't make as much of a difference > > > >> > > > >> * Note that LZ4 compressor results in faster wall clock time to disk > > > >> presumably because the compression speed is faster than my SSD's write > > > >> speed > > > >> > > > >> Implementation notes: > > > >> * There is no parallelization or pipelining of reads or writes. For > > > >> example, on write, all of the buffers are compressed with a single > > > >> thread and then compression stops until the write to disk completes. > > > >> On read, buffers are decompressed serially > > > >> > > > >> > > > >> On Thu, Mar 26, 2020 at 12:24 PM Wes McKinney <wesmck...@gmail.com> > > > >> wrote: > > > >>> > > > >>> I'll run a grid of batch sizes (from 1024 to 64K or 128K) and let you > > > >>> know the read/write times and compression ratios. Shouldn't take too > > > >>> long > > > >>> > > > >>> On Wed, Mar 25, 2020 at 10:37 PM Fan Liya <liya.fa...@gmail.com> > > > >>> wrote: > > > >>>> > > > >>>> Thanks a lot for sharing the good results. > > > >>>> > > > >>>> As investigated by Wes, we have existing zstd library for Java > > > >>>> (zstd-jni) [1], and lz4 library for Java (lz4-java) [2]. > > > >>>> +1 for the 1024 batch size, as it represents an important scenario > > > >>>> where the batch fits into the L1 cache (IMO). > > > >>>> > > > >>>> Best, > > > >>>> Liya Fan > > > >>>> > > > >>>> [1] https://github.com/luben/zstd-jni > > > >>>> [2] https://github.com/lz4/lz4-java > > > >>>> > > > >>>> On Thu, Mar 26, 2020 at 2:38 AM Micah Kornfield > > > >>>> <emkornfi...@gmail.com> wrote: > > > >>>>> > > > >>>>> If it isn't hard could you run with batch sizes of 1024 or 2048 > > > >>>>> records? I > > > >>>>> think there was a question previously raised if there was benefit > > > >>>>> for > > > >>>>> smaller sizes buffers. > > > >>>>> > > > >>>>> Thanks, > > > >>>>> Micah > > > >>>>> > > > >>>>> > > > >>>>> On Wed, Mar 25, 2020 at 8:59 AM Wes McKinney <wesmck...@gmail.com> > > > >>>>> wrote: > > > >>>>> > > > >>>>>> On Tue, Mar 24, 2020 at 9:22 PM Micah Kornfield > > > >>>>>> <emkornfi...@gmail.com> > > > >>>>>> wrote: > > > >>>>>>> > > > >>>>>>>> > > > >>>>>>>> Compression ratios ranging from ~50% with LZ4 and ~75% with ZSTD > > > >>>>>>>> on > > > >>>>>>>> the Taxi dataset to ~87% with LZ4 and ~90% with ZSTD on the > > > >>>>>>>> Fannie Mae > > > >>>>>>>> dataset. So that's a huge space savings > > > >>>>>>> > > > >>>>>>> One more question on this. What was the average row-batch size > > > >>>>>>> used? I > > > >>>>>>> see in the proposal some buffers might not be compressed, did you > > > >>>>>>> this > > > >>>>>>> feature in the test? > > > >>>>>> > > > >>>>>> I used 64K row batch size. I haven't implemented the optional > > > >>>>>> non-compressed buffers (for cases where there is little space > > > >>>>>> savings) > > > >>>>>> so everything is compressed. I can check different batch sizes if > > > >>>>>> you > > > >>>>>> like > > > >>>>>> > > > >>>>>> > > > >>>>>>> On Mon, Mar 23, 2020 at 4:40 PM Wes McKinney <wesmck...@gmail.com> > > > >>>>>> wrote: > > > >>>>>>> > > > >>>>>>>> hi folks, > > > >>>>>>>> > > > >>>>>>>> Sorry it's taken me a little while to produce supporting > > > >>>>>>>> benchmarks. > > > >>>>>>>> > > > >>>>>>>> * I implemented experimental trivial body buffer compression in > > > >>>>>>>> https://github.com/apache/arrow/pull/6638 > > > >>>>>>>> * I hooked up the Arrow IPC file format with compression as the > > > >>>>>>>> new > > > >>>>>>>> Feather V2 format in > > > >>>>>>>> https://github.com/apache/arrow/pull/6694#issuecomment-602906476 > > > >>>>>>>> > > > >>>>>>>> I tested a couple of real-world datasets from a prior blog post > > > >>>>>>>> https://ursalabs.org/blog/2019-10-columnar-perf/ with ZSTD and > > > >>>>>>>> LZ4 > > > >>>>>>>> codecs > > > >>>>>>>> > > > >>>>>>>> The complete results are here > > > >>>>>>>> https://github.com/apache/arrow/pull/6694#issuecomment-602906476 > > > >>>>>>>> > > > >>>>>>>> Summary: > > > >>>>>>>> > > > >>>>>>>> * Compression ratios ranging from ~50% with LZ4 and ~75% with > > > >>>>>>>> ZSTD on > > > >>>>>>>> the Taxi dataset to ~87% with LZ4 and ~90% with ZSTD on the > > > >>>>>>>> Fannie Mae > > > >>>>>>>> dataset. So that's a huge space savings > > > >>>>>>>> * Single-threaded decompression times exceeding 2-4GByte/s with > > > >>>>>>>> LZ4 > > > >>>>>>>> and 1.2-3GByte/s with ZSTD > > > >>>>>>>> > > > >>>>>>>> I would have to do some more engineering to test throughput > > > >>>>>>>> changes > > > >>>>>>>> with Flight, but given these results on slower networking (e.g. 1 > > > >>>>>>>> Gigabit) my guess is that the compression and decompression > > > >>>>>>>> overhead > > > >>>>>>>> is little compared with the time savings due to high compression > > > >>>>>>>> ratios. If people would like to see these numbers to help make a > > > >>>>>>>> decision I can take a closer look > > > >>>>>>>> > > > >>>>>>>> As far as what Micah said about having a limited number of > > > >>>>>>>> compressors: I would be in favor of having just LZ4 and ZSTD. It > > > >>>>>>>> seems > > > >>>>>>>> anecdotally that these outperform Snappy in most real world > > > >>>>>>>> scenarios > > > >>>>>>>> and generally have > 1 GB/s decompression performance. Some Linux > > > >>>>>>>> distributions (Arch at least) have already started adopting ZSTD > > > >>>>>>>> over > > > >>>>>>>> LZMA or GZIP [1] > > > >>>>>>>> > > > >>>>>>>> - Wes > > > >>>>>>>> > > > >>>>>>>> [1]: > > > >>>>>>>> > > > >>>>>> https://www.archlinux.org/news/now-using-zstandard-instead-of-xz-for-package-compression/ > > > >>>>>>>> > > > >>>>>>>> On Fri, Mar 6, 2020 at 8:42 AM Fan Liya <liya.fa...@gmail.com> > > > >>>>>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>> Hi Wes, > > > >>>>>>>>> > > > >>>>>>>>> Thanks a lot for the additional information. > > > >>>>>>>>> Looking forward to see the good results from your experiments. > > > >>>>>>>>> > > > >>>>>>>>> Best, > > > >>>>>>>>> Liya Fan > > > >>>>>>>>> > > > >>>>>>>>> On Thu, Mar 5, 2020 at 11:42 PM Wes McKinney > > > >>>>>>>>> <wesmck...@gmail.com> > > > >>>>>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>>> I see, thank you. > > > >>>>>>>>>> > > > >>>>>>>>>> For such a scenario, implementations would need to define a > > > >>>>>>>>>> "UserDefinedCodec" interface to enable codecs to be registered > > > >>>>>>>>>> from > > > >>>>>>>>>> third party code, similar to what is done for extension types > > > >>>>>>>>>> [1] > > > >>>>>>>>>> > > > >>>>>>>>>> I'll update this thread when I get my experimental C++ patch > > > >>>>>>>>>> up to > > > >>>>>> see > > > >>>>>>>>>> what I'm thinking at least for the built-in codecs we have like > > > >>>>>> ZSTD. > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>> > > > >>>>>> https://github.com/apache/arrow/blob/apache-arrow-0.16.0/docs/source/format/Columnar.rst#extension-types > > > >>>>>>>>>> > > > >>>>>>>>>> On Thu, Mar 5, 2020 at 7:56 AM Fan Liya <liya.fa...@gmail.com> > > > >>>>>> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>> Hi Wes, > > > >>>>>>>>>>> > > > >>>>>>>>>>> Thanks a lot for your further clarification. > > > >>>>>>>>>>> > > > >>>>>>>>>>> Some of my prelimiary thoughts: > > > >>>>>>>>>>> > > > >>>>>>>>>>> 1. We assign a unique GUID to each pair of > > > >>>>>> compression/decompression > > > >>>>>>>>>>> strategies. The GUID is stored as part of the > > > >>>>>>>> Message.custom_metadata. > > > >>>>>>>>>> When > > > >>>>>>>>>>> receiving the GUID, the receiver knows which decompression > > > >>>>>> strategy > > > >>>>>>>> to > > > >>>>>>>>>> use. > > > >>>>>>>>>>> > > > >>>>>>>>>>> 2. We serialize the decompression strategy, and store it into > > > >>>>>>>>>>> the > > > >>>>>>>>>>> Message.custom_metadata. The receiver can decompress data > > > >>>>>>>>>>> after > > > >>>>>>>>>>> deserializing the strategy. > > > >>>>>>>>>>> > > > >>>>>>>>>>> Method 1 is generally used in static strategy scenarios while > > > >>>>>> method > > > >>>>>>>> 2 is > > > >>>>>>>>>>> generally used in dynamic strategy scenarios. > > > >>>>>>>>>>> > > > >>>>>>>>>>> Best, > > > >>>>>>>>>>> Liya Fan > > > >>>>>>>>>>> > > > >>>>>>>>>>> On Wed, Mar 4, 2020 at 11:39 PM Wes McKinney < > > > >>>>>> wesmck...@gmail.com> > > > >>>>>>>>>> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>>> Okay, I guess my question is how the receiver is going to be > > > >>>>>> able > > > >>>>>>>> to > > > >>>>>>>>>>>> determine how to "rehydrate" the record batch buffers: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> What I've proposed amounts to the following: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> * UNCOMPRESSED: the current behavior > > > >>>>>>>>>>>> * ZSTD/LZ4/...: each buffer is compressed and written with an > > > >>>>>> int64 > > > >>>>>>>>>>>> length prefix > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> (I'm close to putting up a PR implementing an experimental > > > >>>>>> version > > > >>>>>>>> of > > > >>>>>>>>>>>> this that uses Message.custom_metadata to transmit the codec, > > > >>>>>> so > > > >>>>>>>> this > > > >>>>>>>>>>>> will make the implementation details more concrete) > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> So in the USER_DEFINED case, how will the library know how to > > > >>>>>>>> obtain > > > >>>>>>>>>>>> the uncompressed buffer? Is some additional metadata > > > >>>>>>>>>>>> structure > > > >>>>>>>>>>>> required to provide instructions? > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> On Wed, Mar 4, 2020 at 8:05 AM Fan Liya > > > >>>>>>>>>>>> <liya.fa...@gmail.com> > > > >>>>>>>> wrote: > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Hi Wes, > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> I am thinking of adding an option named "USER_DEFINED" (or > > > >>>>>>>> something > > > >>>>>>>>>>>>> similar) to enum CompressionType in your proposal. > > > >>>>>>>>>>>>> IMO, this option should be used primarily in Flight. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Best, > > > >>>>>>>>>>>>> Liya Fan > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> On Wed, Mar 4, 2020 at 11:12 AM Wes McKinney < > > > >>>>>>>> wesmck...@gmail.com> > > > >>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>>> On Tue, Mar 3, 2020, 8:11 PM Fan Liya < > > > >>>>>> liya.fa...@gmail.com> > > > >>>>>>>>>> wrote: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Sure. I agree with you that we should not overdo this. > > > >>>>>>>>>>>>>>> I am wondering if we should provide an option to allow > > > >>>>>> users > > > >>>>>>>> to > > > >>>>>>>>>>>> plugin > > > >>>>>>>>>>>>>>> their customized compression strategies. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Can you provide a patch showing changes to Message.fbs (or > > > >>>>>>>>>> Schema.fbs) > > > >>>>>>>>>>>> that > > > >>>>>>>>>>>>>> make this idea more concrete? > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Best, > > > >>>>>>>>>>>>>>> Liya Fan > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> On Tue, Mar 3, 2020 at 9:47 PM Wes McKinney < > > > >>>>>>>> wesmck...@gmail.com > > > >>>>>>>>>>> > > > >>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> On Tue, Mar 3, 2020, 7:36 AM Fan Liya < > > > >>>>>>>> liya.fa...@gmail.com> > > > >>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> I am so glad to see this discussion, and I am > > > >>>>>> willing to > > > >>>>>>>>>> provide > > > >>>>>>>>>>>> help > > > >>>>>>>>>>>>>>>> from > > > >>>>>>>>>>>>>>>>> the Java side. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> In the proposal, I see the support for basic > > > >>>>>> compression > > > >>>>>>>>>>>> strategies > > > >>>>>>>>>>>>>>>>> (e.g.gzip, snappy). > > > >>>>>>>>>>>>>>>>> IMO, applying a single basic strategy is not likely > > > >>>>>> to > > > >>>>>>>>>> achieve > > > >>>>>>>>>>>>>>>> performance > > > >>>>>>>>>>>>>>>>> improvement for most scenarios. > > > >>>>>>>>>>>>>>>>> The optimal compression strategy is often obtained by > > > >>>>>>>>>> composing > > > >>>>>>>>>>>> basic > > > >>>>>>>>>>>>>>>>> strategies and tuning parameters. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> I hope we can support such highly customized > > > >>>>>> compression > > > >>>>>>>>>>>> strategies. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> I think very much beyond trivial one-shot buffer level > > > >>>>>>>>>> compression > > > >>>>>>>>>>>> is > > > >>>>>>>>>>>>>>>> probably out of the question for addition to the > > > >>>>>> current > > > >>>>>>>>>>>> "RecordBatch" > > > >>>>>>>>>>>>>>>> Flatbuffers type, because the additional metadata > > > >>>>>> would add > > > >>>>>>>>>>>> undesirable > > > >>>>>>>>>>>>>>>> bloat (which I would be against). If people have other > > > >>>>>>>> ideas it > > > >>>>>>>>>>>> would > > > >>>>>>>>>>>>>> be > > > >>>>>>>>>>>>>>>> great to see exactly what you are thinking as far as > > > >>>>>>>> changes > > > >>>>>>>>>> to the > > > >>>>>>>>>>>>>>>> protocol files. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> I'll try to assemble some examples to show the > > > >>>>>> before/after > > > >>>>>>>>>>>> results of > > > >>>>>>>>>>>>>>>> applying the simple strategy. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> Best, > > > >>>>>>>>>>>>>>>>> Liya Fan > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> On Tue, Mar 3, 2020 at 8:15 PM Antoine Pitrou < > > > >>>>>>>>>>>> anto...@python.org> > > > >>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> If we want to use a HTTP header, it would be more > > > >>>>>> of a > > > >>>>>>>>>>>>>>> Accept-Encoding > > > >>>>>>>>>>>>>>>>>> header, no? > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> In any case, we would have to put non-standard > > > >>>>>> values > > > >>>>>>>> there > > > >>>>>>>>>>>> (e.g. > > > >>>>>>>>>>>>>>> lz4), > > > >>>>>>>>>>>>>>>>>> so I'm not sure how desirable it is to repurpose > > > >>>>>> HTTP > > > >>>>>>>>>> headers > > > >>>>>>>>>>>> for > > > >>>>>>>>>>>>>>> that, > > > >>>>>>>>>>>>>>>>>> rather than add some dedicated field to the Flight > > > >>>>>>>>>> messages. > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Regards > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Antoine. > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Le 03/03/2020 à 12:52, David Li a écrit : > > > >>>>>>>>>>>>>>>>>>> gRPC supports headers so for Flight, we could > > > >>>>>> send > > > >>>>>>>>>>>> essentially an > > > >>>>>>>>>>>>>>>>> Accept > > > >>>>>>>>>>>>>>>>>>> header and perhaps a Content-Type header. > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> David > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> On Mon, Mar 2, 2020, 23:15 Micah Kornfield < > > > >>>>>>>>>>>>>> emkornfi...@gmail.com> > > > >>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> Hi Wes, > > > >>>>>>>>>>>>>>>>>>>> A few thoughts on this. In general, I think it > > > >>>>>> is a > > > >>>>>>>>>> good > > > >>>>>>>>>>>> idea. > > > >>>>>>>>>>>>>>> But > > > >>>>>>>>>>>>>>>>>> before > > > >>>>>>>>>>>>>>>>>>>> proceeding, I think the following points are > > > >>>>>> worth > > > >>>>>>>>>>>> discussing: > > > >>>>>>>>>>>>>>>>>>>> 1. Does this actually improve > > > >>>>>> throughput/latency > > > >>>>>>>> for > > > >>>>>>>>>>>> Flight? (I > > > >>>>>>>>>>>>>>>> think > > > >>>>>>>>>>>>>>>>>> you > > > >>>>>>>>>>>>>>>>>>>> mentioned you would follow-up with benchmarks). > > > >>>>>>>>>>>>>>>>>>>> 2. I think we should limit the number of > > > >>>>>> supported > > > >>>>>>>>>>>> compression > > > >>>>>>>>>>>>>>>>> schemes > > > >>>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>> only 1 or 2. I think the criteria for selection > > > >>>>>>>> speed > > > >>>>>>>>>> and > > > >>>>>>>>>>>>>> native > > > >>>>>>>>>>>>>>>>>>>> implementations available across the widest > > > >>>>>> possible > > > >>>>>>>>>>>> languages. > > > >>>>>>>>>>>>>>> As > > > >>>>>>>>>>>>>>>>> far > > > >>>>>>>>>>>>>>>>>> as > > > >>>>>>>>>>>>>>>>>>>> i can tell zstd only have bindings in java via > > > >>>>>> JNI, > > > >>>>>>>> but > > > >>>>>>>>>> my > > > >>>>>>>>>>>>>>>>>> understanding is > > > >>>>>>>>>>>>>>>>>>>> it is probably the type of compression for our > > > >>>>>>>>>> use-cases. > > > >>>>>>>>>>>> So I > > > >>>>>>>>>>>>>>>> think > > > >>>>>>>>>>>>>>>>>>>> zstd + potentially 1 more. > > > >>>>>>>>>>>>>>>>>>>> 3. Commitment from someone on the Java side to > > > >>>>>>>>>> implement > > > >>>>>>>>>>>> this. > > > >>>>>>>>>>>>>>>>>>>> 4. This doesn't need to be coupled with this > > > >>>>>> change > > > >>>>>>>>>> per-se > > > >>>>>>>>>>>> but > > > >>>>>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>>>>> something like flight it would be good to have a > > > >>>>>>>>>> standard > > > >>>>>>>>>>>>>>> mechanism > > > >>>>>>>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>>>>> negotiating server/client capabilities (e.g. > > > >>>>>> client > > > >>>>>>>>>> doesn't > > > >>>>>>>>>>>>>>> support > > > >>>>>>>>>>>>>>>>>>>> compression or only supports a subset). > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> Thanks, > > > >>>>>>>>>>>>>>>>>>>> Micah > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> On Sun, Mar 1, 2020 at 1:24 PM Wes McKinney < > > > >>>>>>>>>>>>>> wesmck...@gmail.com> > > > >>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> On Sun, Mar 1, 2020 at 3:14 PM Antoine Pitrou < > > > >>>>>>>>>>>>>>> anto...@python.org> > > > >>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> Le 01/03/2020 à 22:01, Wes McKinney a écrit : > > > >>>>>>>>>>>>>>>>>>>>>>> In the context of a "next version of the > > > >>>>>> Feather > > > >>>>>>>>>> format" > > > >>>>>>>>>>>>>>>> ARROW-5510 > > > >>>>>>>>>>>>>>>>>>>>>>> (which is consumed only by Python and R at > > > >>>>>> the > > > >>>>>>>>>> moment), I > > > >>>>>>>>>>>>>> have > > > >>>>>>>>>>>>>>>> been > > > >>>>>>>>>>>>>>>>>>>>>>> looking at compressing buffers using fast > > > >>>>>>>> compressors > > > >>>>>>>>>>>> like > > > >>>>>>>>>>>>>> ZSTD > > > >>>>>>>>>>>>>>>>> when > > > >>>>>>>>>>>>>>>>>>>>>>> writing the RecordBatch bodies. This could be > > > >>>>>>>> handled > > > >>>>>>>>>>>>>> privately > > > >>>>>>>>>>>>>>>> as > > > >>>>>>>>>>>>>>>>> an > > > >>>>>>>>>>>>>>>>>>>>>>> implementation detail of the Feather file, > > > >>>>>> but > > > >>>>>>>> since > > > >>>>>>>>>> ZSTD > > > >>>>>>>>>>>>>>>>> compression > > > >>>>>>>>>>>>>>>>>>>>>>> could improve throughput in Flight, for > > > >>>>>> example, > > > >>>>>>>> I > > > >>>>>>>>>>>> thought I > > > >>>>>>>>>>>>>>>> would > > > >>>>>>>>>>>>>>>>>>>>>>> bring it up for discussion. > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> I can see two simple compression strategies: > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> * Compress the entire message body in > > > >>>>>> one-shot, > > > >>>>>>>>>> writing > > > >>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>> result > > > >>>>>>>>>>>>>>>>>>>> out > > > >>>>>>>>>>>>>>>>>>>>>>> with an 8-byte int64 prefix indicating the > > > >>>>>>>>>> uncompressed > > > >>>>>>>>>>>> size > > > >>>>>>>>>>>>>>>>>>>>>>> * Compress each non-zero-length constituent > > > >>>>>>>> Buffer > > > >>>>>>>>>> prior > > > >>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>> writing > > > >>>>>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>>>>> the body (and using the same > > > >>>>>>>>>> uncompressed-length-prefix > > > >>>>>>>>>>>> when > > > >>>>>>>>>>>>>>>>> writing > > > >>>>>>>>>>>>>>>>>>>>>>> the compressed buffer) > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> The latter strategy is preferable for > > > >>>>>> scenarios > > > >>>>>>>>>> where we > > > >>>>>>>>>>>> may > > > >>>>>>>>>>>>>>>>> project > > > >>>>>>>>>>>>>>>>>>>>>>> out only a few fields from a larger record > > > >>>>>> batch > > > >>>>>>>>>> (such as > > > >>>>>>>>>>>>>>> reading > > > >>>>>>>>>>>>>>>>>>>> from > > > >>>>>>>>>>>>>>>>>>>>>>> a memory-mapped file). > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> Agreed. It may also allow using different > > > >>>>>>>> compression > > > >>>>>>>>>>>>>>> strategies > > > >>>>>>>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>>>>>>> different kinds of buffers (for example a > > > >>>>>>>> bytestream > > > >>>>>>>>>>>> splitting > > > >>>>>>>>>>>>>>>>>> strategy > > > >>>>>>>>>>>>>>>>>>>>>> for floats and doubles, or a delta encoding > > > >>>>>>>> strategy > > > >>>>>>>>>> for > > > >>>>>>>>>>>>>>>> integers). > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> If we wanted to allow for different > > > >>>>>> compression to > > > >>>>>>>>>> apply to > > > >>>>>>>>>>>>>>>> different > > > >>>>>>>>>>>>>>>>>>>>> buffers, I think we will need a new Message > > > >>>>>> type > > > >>>>>>>>>> because > > > >>>>>>>>>>>> this > > > >>>>>>>>>>>>>>> would > > > >>>>>>>>>>>>>>>>>>>>> inflate metadata sizes in a way that is not > > > >>>>>> likely > > > >>>>>>>> to > > > >>>>>>>>>> be > > > >>>>>>>>>>>>>>> acceptable > > > >>>>>>>>>>>>>>>>>>>>> for the current uncompressed use case. > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> Here is my strawman proposal > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>> > > > >>>>>> https://github.com/apache/arrow/compare/master...wesm:compression-strawman > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> Implementation could be accomplished by one > > > >>>>>> of > > > >>>>>>>> the > > > >>>>>>>>>>>> following > > > >>>>>>>>>>>>>>>>> methods: > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> * Setting a field in Message.custom_metadata > > > >>>>>>>>>>>>>>>>>>>>>>> * Adding a new field to Message > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> I think it has to be a new field in Message. > > > >>>>>>>> Making > > > >>>>>>>>>> it an > > > >>>>>>>>>>>>>>>> ignorable > > > >>>>>>>>>>>>>>>>>>>>>> metadata field means non-supporting receivers > > > >>>>>> will > > > >>>>>>>>>> decode > > > >>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>> interpret > > > >>>>>>>>>>>>>>>>>>>>>> the data wrongly. > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> Regards > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> Antoine. > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>> > > > >>>>>>