Re: Compression?

Jacob Quinn Tue, 15 Sep 2020 17:47:47 -0700

Ah, that's where it was.

Ok, so if I understand correctly, individual buffers are compressed, and in
the Buffer struct, the buffer length is the _compressed_ length? And when
written, the _uncompressed_ length is first written in 8 bytes, then the
compressed buffer?

What's the general strategy for dealing with compressed buffers? Uncompress
the whole thing when deserializing a compressed buffer? Or is decompressing
delayed until individual elements are accessed? I'm guessing the former
since it doesn't seem like you'd be able to do random-access into a
compressed buffer?

-Jacob

On Tue, Sep 15, 2020 at 6:23 PM Wes McKinney <wesmck...@gmail.com> wrote:

> We have protocol-level compression for message body buffers [1][2]
> with LZ4 or ZSTD
>
> In-memory compression and encoding other than dictionary encoding
> (like RLE) has been discussed multiple times and remains on the
> roadmap for the project.
>
> [1]: https://github.com/apache/arrow/blob/master/format/Message.fbs#L45
>
> On Tue, Sep 15, 2020 at 7:18 PM Jacob Quinn <quinn.jac...@gmail.com>
> wrote:
> >
> > Am I correct in understanding there's nothing in the arrow ipc/file
> format
> > spec about compression? I thought I had seen something at one point, but
> > looking over the spec website, I don't see anything.
> >
> > -Jacob
>

Re: Compression?

Reply via email to