hi Joris -- this isn't a use case that we intend for most users (we intend for users to instead use the LZ4 compression option that is part of the IPC format itself, rather than something that is layered on externally), but it would be good to make sure that our LZ4 streams are interoperable across LZ4 implementations. Can you please open a Jira issue?
Thanks On Thu, Jan 28, 2021 at 11:00 AM Joris Peeters <joris.mg.peet...@gmail.com> wrote: > > From Python, I'm dumping an LZ4-compressed arrow stream to a file, using > > with pa.output_stream(path, compression = 'lz4') as fh: > writer = pa.RecordBatchStreamWriter(fh, table.schema) > writer.write_table(table) > writer.close() > > I then try reading this file from Java, starting with > > var is = new LZ4FrameInputStream(new FileInputStream(path.toFile())); > > using the lz4-java library. That fails, however, with > > java.lang.RuntimeException: Dependent block stream is unsupported > (BLOCK_INDEPENDENCE must be set) > at > net.jpountz.lz4.LZ4FrameOutputStream$FLG.validate(LZ4FrameOutputStream.java:367) > > > so it looks like pyarrow is doing the compression with dependent blocks, > which lz4-java does not support. > > I suspect I can solve this by doing the lz4 compression myself, using > Python's lz4 package, and wrapping it around an uncompressed pyarrow output > stream, but wanted to check if there isn't anything obvious I'm missing. > > Best, > -J