Re: lz4 compressed arrow between Python & Java

2021-01-28 Thread Micah Kornfield
We should be extending the archery ipc integration tests for this (ideally no files checked in) On Thursday, January 28, 2021, Fan Liya wrote: > Hi Joris, > > The Java support for lz4 compression is on-going ( > https://github.com/apache/arrow/pull/8949). > Integration with C++/Python is not fin

Re: lz4 compressed arrow between Python & Java

2021-01-28 Thread Fan Liya
Hi Joris, The Java support for lz4 compression is on-going ( https://github.com/apache/arrow/pull/8949). Integration with C++/Python is not finished yet. We would appreciate it if you could share the file to help us with the integration test. Best, Liya Fan On Fri, Jan 29, 2021 at 2:41 AM Antoi

Re: lz4 compressed arrow between Python & Java

2021-01-28 Thread Antoine Pitrou
Le 28/01/2021 à 19:38, Wes McKinney a écrit : > It still seems notable that our generic LZ4-compressed output stream > cannot be read by Java (independent of Arrow and the Arrow IPC > format). That and the custom LZ4 framing used by Parquet-Java... Apparently the Java ecosystem can't implement p

Re: lz4 compressed arrow between Python & Java

2021-01-28 Thread Wes McKinney
It still seems notable that our generic LZ4-compressed output stream cannot be read by Java (independent of Arrow and the Arrow IPC format). On Thu, Jan 28, 2021 at 12:30 PM Antoine Pitrou wrote: > > On Thu, 28 Jan 2021 18:19:00 + > Joris Peeters wrote: > > > To be fair, I'm happy to apply i

Re: lz4 compressed arrow between Python & Java

2021-01-28 Thread Antoine Pitrou
On Thu, 28 Jan 2021 18:19:00 + Joris Peeters wrote: > To be fair, I'm happy to apply it at IPC level. Just didn't realise that > was a thing. IIUC what Antoine suggests, though, then just (leaving Python > as-is and) changing my Java to > > var is = new FileInputStream(path.toFile()); >

Re: lz4 compressed arrow between Python & Java

2021-01-28 Thread Joris Peeters
Aha, OK! Thanks for the help all. I'll keep an eye on the Java side for the IPC compression, but for my current purpose doing full stream compression is totally fine. On Thu, Jan 28, 2021 at 6:22 PM Micah Kornfield wrote: > The application level compression Java support for compression is being

Re: lz4 compressed arrow between Python & Java

2021-01-28 Thread Micah Kornfield
The application level compression Java support for compression is being worked on (I would need to double check if the PR has been merged) and I don't think its been integration tested with C++/Python I would imagine it would run into a similar issue with not being able to decode linked blocks.

Re: lz4 compressed arrow between Python & Java

2021-01-28 Thread Joris Peeters
To be fair, I'm happy to apply it at IPC level. Just didn't realise that was a thing. IIUC what Antoine suggests, though, then just (leaving Python as-is and) changing my Java to var is = new FileInputStream(path.toFile()); var reader = new ArrowStreamReader(is, allocator); var schema

Re: lz4 compressed arrow between Python & Java

2021-01-28 Thread Micah Kornfield
It might be worth opening up an issue with the lz4-java library. This seems like the java implementation doesn't fully support the LZ4 stream protocol? Antoine in this case it looks like Joris is applying the compression and decompression at the file level NOT the IPC level. On Thu, Jan 28, 2021

Re: lz4 compressed arrow between Python & Java

2021-01-28 Thread Antoine Pitrou
Le 28/01/2021 à 17:59, Joris Peeters a écrit : > From Python, I'm dumping an LZ4-compressed arrow stream to a file, using > > with pa.output_stream(path, compression = 'lz4') as fh: > writer = pa.RecordBatchStreamWriter(fh, table.schema) > writer.write_table(table) >

Re: lz4 compressed arrow between Python & Java

2021-01-28 Thread Wes McKinney
hi Joris -- this isn't a use case that we intend for most users (we intend for users to instead use the LZ4 compression option that is part of the IPC format itself, rather than something that is layered on externally), but it would be good to make sure that our LZ4 streams are interoperable across

lz4 compressed arrow between Python & Java

2021-01-28 Thread Joris Peeters
>From Python, I'm dumping an LZ4-compressed arrow stream to a file, using with pa.output_stream(path, compression = 'lz4') as fh: writer = pa.RecordBatchStreamWriter(fh, table.schema) writer.write_table(table) writer.close() I then try reading this file from Java, star