BTW I just opened https://issues.apache.org/jira/browse/ARROW-8823
since we don't track "what the uncompressed size would have been"
without compression turned on.

On Sat, May 16, 2020 at 10:19 AM Amol Umbarkar <amolumbar...@gmail.com> wrote:
>
> Thanks Wes. Will do.
>
> On Sat, May 16, 2020 at 7:06 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > hi Amol,
> >
> > thanks for pointing that out. Such a heuristic (observing compression
> > ratios of stream messages) could be implemented at some point so that
> > compression could be toggled off mid-stream if it doesn't seem to be
> > helping. Feel free to open a JIRA issue about this
> >
> > - Wes
> >
> > On Sat, May 16, 2020 at 6:39 AM Amol Umbarkar <amolumbar...@gmail.com>
> > wrote:
> > >
> > > Hello All,
> > > I was going through dask developer log recently. Dask seems to be
> > > selectively do compression if it is found to be useful. They sort of pick
> > > 10kb of sample upfront to calculate compression and if the results are
> > good
> > > then the whole batch is compressed. This seems to save de-compression
> > > effort on receiver side.
> > >
> > > Please take a look at
> > >
> > https://blog.dask.org/2016/04/14/dask-distributed-optimizing-protocol#problem-3-unwanted-compression
> > >
> > > Thought this could be relevant to arrow batch transfers as well.
> > >
> > > Thanks,
> > > Amol
> > >
> > > On Thu, Apr 23, 2020 at 5:54 AM Wes McKinney <wesmck...@gmail.com>
> > wrote:
> > >
> > > > Hello,
> > > >
> > > > I have proposed adding a simple RecordBatch IPC message body
> > > > compression scheme (using either LZ4 or ZSTD) to the Arrow IPC
> > > > protocol in GitHub PR [1] as discussed on the mailing list [2]. This
> > > > is distinct from separate discussions about adding in-memory encodings
> > > > (like RLE-encoding) to the Arrow columnar format.
> > > >
> > > > This change is not forward compatible so it will not be safe to send
> > > > compressed messages to old libraries, but since we are still pre-1.0.0
> > > > the consensus is that this is acceptable. We may separately consider
> > > > increasing the metadata version for 1.0.0 to require clients to
> > > > upgrade.
> > > >
> > > > Please vote whether to accept the addition. The vote will be open for
> > > > at least 72 hours.
> > > >
> > > > [ ] +1 Accept this addition to the IPC protocol
> > > > [ ] +0
> > > > [ ] -1 Do not accept the changes because...
> > > >
> > > > Here is my vote: +1
> > > >
> > > > Thanks,
> > > > Wes
> > > >
> > > > [1]: https://github.com/apache/arrow/pull/6707
> > > > [2]:
> > > >
> > https://lists.apache.org/thread.html/r58c9d23ad159644fca590d8f841df80d180b11bfb72f949d601d764b%40%3Cdev.arrow.apache.org%3E
> > > >
> >

Reply via email to