Have you tried another Java LZ4 library (I think you mentioned Airlift on a PR)?


Le 11/03/2021 à 17:58, Micah Kornfield a écrit :
We've found in the process of implementing support for LZ4 decompression
that the fast Java decoder library does not support all the features of the
C++ library (dependendent blocks can't be read, and by default that is what
the C++ code emits).  The only library we found (Apache Commons) that seems
to support the full specification is unusably slow because it doesn't
directly support off-heap data.

I don't recall seeing a discussion on the merits of using LZ4 Frame vs LZ4
Block compression in the Arrow IPC format, so I'm not sure if there is a
strong rationale for one versus the other.

At this point I think for interoperability we have three options:
1.  Specify in the specification that "independent" blocks must be used for
LZ4_FRAME.
2.  Add LZ4_BLOCK to the specification and prefer that over LZ4_FRAME
3.  Provide our own  Java implementation (either directly in Arrow or by
providing a patch to another project) that supports dependent blocks.

Any thoughts?

Thanks,
Micah

Reply via email to