That said, there is nothing preventing the development of programming
interfaces for compressed / encoded data right now. When it comes to
transporting such data, that's when we will have to decide on what to
support and what new metadata structures are required.

For example, we could add RLE to C++ in prototype form and then
convert to non-RLE when writing to IPC messages.

On Sat, Aug 29, 2020 at 7:34 PM Micah Kornfield <emkornfi...@gmail.com> wrote:
>
> Hi Mark,
> See the most recent previous discussion about alternate encodings [1].
> This is something that in the long run should be added, I'd personally
> prefer to start with simpler encodings.
>
> I don't think we should add anything more with regard to
> compression/encoding until at least 3 languages support the current
> compression methods that are in the specification.  C++ has it implemented,
> there is some work in Java and I think we should have at least one more.
>
> -Micah
>
> [1]
> https://lists.apache.org/thread.html/r1d9d707c481c53c13534f7c72d75c7a90dc7b2b9966c6c0772d0e416%40%3Cdev.arrow.apache.org%3E
>
> On Sat, Aug 29, 2020 at 4:04 PM <m...@markfarnan.com> wrote:
>
> >
> > I was looking at compression in arrow had a couple questions.
> >
> > If I've understood compression currently,   it is only used  'in flight'
> > in either IPC or Arrow Flight, using a block compression,  but still
> > decoded into Ram at the destination in full array form.  Is this correct ?
> >
> >
> > Given that arrow is a columnar format, has any thought been given to an
> > option to have the data compressed both in memory and in flight, using some
> > of the columnar techniques ?
> >  As I deal primarily with Timeseries numerical data, I was thinking about
> > some of the algorithms from the Gorilla paper [1]  for Floats  and
> > Timestamps (Delta-of-Delta) or similar might be appropriate.
> >
> > The interface functions could  still iterate over the data and produce raw
> > values so this is transparent to users of the data, but the data
> > blocks/arrays in-mem are actually compressed.
> >
> > With this method, blocks could come out of a data base/source, through the
> > data service, across the wire (flight)  and land in the consuming
> > applications memory without ever being decompressed or processed until
> > final use.
> >
> >
> > Crazy thought ?
> >
> >
> > Regards
> >
> > Mark.
> >
> >
> > [1]: https://www.vldb.org/pvldb/vol8/p1816-teller.pdf
> >
> >

Reply via email to