Hi Yunhong, This isn't a Java issue. The spec for Arrow IPC only supports per-buffer compression [1]. It does mention other designs as a potential future improvement there. If you think it might be useful, it could be helpful to sketch a proposal and/or bring some benchmarks?
Note that most vectors/arrays are only going to have the data buffer and maybe a validity buffer, so I'm not sure bundling them together will matter too much? Are there more details about the overhead you're seeing/your use case? [1]: https://github.com/apache/arrow/blob/20d8acd89f5ebf87295e08ed10e2f94cb03d57d0/format/Message.fbs#L55-L67 Thanks, David On Wed, Feb 19, 2025, at 14:54, yh z wrote: > Hi, all. Currently, in arrow-java, to do compression for one > ArrowRecordBatch in VectorUnloader, it will separately compress each > ArrowBuffer within the FieldVector instead of compress at the FieldVector > level. From the compression rate perspective, larger batches generally > result in higher compression rates. Additionally, calling > compress(BufferAllocator allocator, ArrowBuf uncompressedBuffer) multiple > times may consume more CPU than call once. > Therefore, I would like to ask if there will be support for overall > compression at the FieldVector level, which could improve the compression > ratio without affecting the ability to read individual columns. > > Many thanks, > Yunhong Zheng