Hi Mark, See the most recent previous discussion about alternate encodings [1]. This is something that in the long run should be added, I'd personally prefer to start with simpler encodings.
I don't think we should add anything more with regard to compression/encoding until at least 3 languages support the current compression methods that are in the specification. C++ has it implemented, there is some work in Java and I think we should have at least one more. -Micah [1] https://lists.apache.org/thread.html/r1d9d707c481c53c13534f7c72d75c7a90dc7b2b9966c6c0772d0e416%40%3Cdev.arrow.apache.org%3E On Sat, Aug 29, 2020 at 4:04 PM <m...@markfarnan.com> wrote: > > I was looking at compression in arrow had a couple questions. > > If I've understood compression currently, it is only used 'in flight' > in either IPC or Arrow Flight, using a block compression, but still > decoded into Ram at the destination in full array form. Is this correct ? > > > Given that arrow is a columnar format, has any thought been given to an > option to have the data compressed both in memory and in flight, using some > of the columnar techniques ? > As I deal primarily with Timeseries numerical data, I was thinking about > some of the algorithms from the Gorilla paper [1] for Floats and > Timestamps (Delta-of-Delta) or similar might be appropriate. > > The interface functions could still iterate over the data and produce raw > values so this is transparent to users of the data, but the data > blocks/arrays in-mem are actually compressed. > > With this method, blocks could come out of a data base/source, through the > data service, across the wire (flight) and land in the consuming > applications memory without ever being decompressed or processed until > final use. > > > Crazy thought ? > > > Regards > > Mark. > > > [1]: https://www.vldb.org/pvldb/vol8/p1816-teller.pdf > >