Hi Micah, Good point. Does unframed LZ4 provide a checksum of the content before compression?
Best Piotr On Fri, 30 Aug 2024 at 23:34, Micah Kornfield <emkornfi...@gmail.com> wrote: > The Iceberg implementation was supposed to be based on aircompressor pure >> Java implementation https://github.com/airlift/aircompressor/pull/142. >> AFAICT, aircompressor started to favor (or be more OK with) native >> implementations (because of Project Panama), so adding LZ4 framed >> compression might be simpler these days. > > > Since this work was never completed, I'd personally be in favor of > deprecating LZ4 framed and using LZ4 withing framing which already has high > quality native java implementation. > > Cheers, > Micah > > On Tue, Aug 27, 2024 at 5:44 AM Piotr Findeisen <piotr.findei...@gmail.com> > wrote: > >> Hi Gabor >> >> Thanks for creating this discussion thread. This is indeed a good topic >> to discuss. >> >> The idea was to have lightweight compression for the footer for cass when >> Puffin files are bigger. >> It is true that the implementation didn't follow the spec yet. >> If we remove this from the Puffin spec, we will probably want to add it >> later. >> >> The Iceberg implementation was supposed to be based on aircompressor pure >> Java implementation https://github.com/airlift/aircompressor/pull/142. >> AFAICT, aircompressor started to favor (or be more OK with) native >> implementations (because of Project Panama), so adding LZ4 framed >> compression might be simpler these days. >> >> I would prefer to spend the effort on completing the compression. >> >> Best >> Piotr >> >> >> >> >> On Tue, 27 Aug 2024 at 14:29, Gabor Kaszab >> <gaborkas...@cloudera.com.invalid> wrote: >> >>> Hi Iceberg Community, >>> >>> I saw in the Puffin spec <https://iceberg.apache.org/puffin-spec> that >>> the footer of the Puffin file or the blobs themselves could be compressed >>> by LZ4. I checked the code >>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/puffin/PuffinFormat.java#L110> >>> however, and for me it seems that currently LZ4 is not supported. >>> My first question is do I miss anything here? >>> The second, is if we in fact don't support LZ4, can I remove it from the >>> spec to avoid confusions? (I believe this requires a vote in a separate >>> thread) >>> >>> Thanks, >>> Gabor >>> >>>