andygrove commented on PR #1181: URL: https://github.com/apache/datafusion-comet/pull/1181#issuecomment-2552512228
Here are my findings from hacking on this today. LZ4 provides two compression formats: `LZ4 Block Format` and `LZ4 Frame Format`. Spark uses the Java library https://github.com/lz4/lz4-java and specifically uses `LZ4BlockOutputStream` which seems to be a proprietary streaming LZ4 format, as noted in the documentation: ``` /** * Streaming LZ4 (not compatible with the LZ4 Frame format). * This class compresses data into fixed-size blocks of compressed data. * This class uses its own format and is not compatible with the LZ4 Frame format. * For interoperability with other LZ4 tools, use {@link LZ4FrameOutputStream}, * which is compatible with the LZ4 Frame format. This class remains for backward compatibility. * @see LZ4BlockInputStream * @see LZ4FrameOutputStream */ public class LZ4BlockOutputStream extends FilterOutputStream { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org