Re: Compression in Arrow - Question

Micah Kornfield Sat, 29 Aug 2020 17:35:05 -0700

Hi Mark,
See the most recent previous discussion about alternate encodings [1].
This is something that in the long run should be added, I'd personally
prefer to start with simpler encodings.


I don't think we should add anything more with regard to
compression/encoding until at least 3 languages support the current
compression methods that are in the specification.  C++ has it implemented,
there is some work in Java and I think we should have at least one more.

-Micah

[1]
https://lists.apache.org/thread.html/r1d9d707c481c53c13534f7c72d75c7a90dc7b2b9966c6c0772d0e416%40%3Cdev.arrow.apache.org%3E

On Sat, Aug 29, 2020 at 4:04 PM <m...@markfarnan.com> wrote:

>
> I was looking at compression in arrow had a couple questions.
>
> If I've understood compression currently,   it is only used  'in flight'
> in either IPC or Arrow Flight, using a block compression,  but still
> decoded into Ram at the destination in full array form.  Is this correct ?
>
>
> Given that arrow is a columnar format, has any thought been given to an
> option to have the data compressed both in memory and in flight, using some
> of the columnar techniques ?
>  As I deal primarily with Timeseries numerical data, I was thinking about
> some of the algorithms from the Gorilla paper [1]  for Floats  and
> Timestamps (Delta-of-Delta) or similar might be appropriate.
>
> The interface functions could  still iterate over the data and produce raw
> values so this is transparent to users of the data, but the data
> blocks/arrays in-mem are actually compressed.
>
> With this method, blocks could come out of a data base/source, through the
> data service, across the wire (flight)  and land in the consuming
> applications memory without ever being decompressed or processed until
> final use.
>
>
> Crazy thought ?
>
>
> Regards
>
> Mark.
>
>
> [1]: https://www.vldb.org/pvldb/vol8/p1816-teller.pdf
>
>

Re: Compression in Arrow - Question

Reply via email to