Micah, LZ4 framed format is actually not very complicated & well documented. If someone has cycles, https://github.com/airlift/aircompressor/pull/142 has most of the work done.
It would be better to use an existing frame format over implementing yet another one. Best Piotr On Sun, 13 Oct 2024 at 23:04, Micah Kornfield <emkornfi...@gmail.com> wrote: > Apologies for the very delayed reply. > > Does unframed LZ4 provide a checksum of the content before compression? > > > I don't believe so, we would have need to add basic minimal metadata like > checksum/uncompressed length. I think this is still fairly simple compared > to implementing the block format. > > On Sat, Aug 31, 2024 at 12:11 PM Piotr Findeisen < > piotr.findei...@gmail.com> wrote: > >> Hi Micah, >> >> Good point. >> Does unframed LZ4 provide a checksum of the content before compression? >> >> Best >> Piotr >> >> >> On Fri, 30 Aug 2024 at 23:34, Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> >>> The Iceberg implementation was supposed to be based on >>>> aircompressor pure Java implementation >>>> https://github.com/airlift/aircompressor/pull/142. >>>> AFAICT, aircompressor started to favor (or be more OK with) native >>>> implementations (because of Project Panama), so adding LZ4 framed >>>> compression might be simpler these days. >>> >>> >>> Since this work was never completed, I'd personally be in favor of >>> deprecating LZ4 framed and using LZ4 withing framing which already has high >>> quality native java implementation. >>> >>> Cheers, >>> Micah >>> >>> On Tue, Aug 27, 2024 at 5:44 AM Piotr Findeisen < >>> piotr.findei...@gmail.com> wrote: >>> >>>> Hi Gabor >>>> >>>> Thanks for creating this discussion thread. This is indeed a good topic >>>> to discuss. >>>> >>>> The idea was to have lightweight compression for the footer for cass >>>> when Puffin files are bigger. >>>> It is true that the implementation didn't follow the spec yet. >>>> If we remove this from the Puffin spec, we will probably want to add it >>>> later. >>>> >>>> The Iceberg implementation was supposed to be based on >>>> aircompressor pure Java implementation >>>> https://github.com/airlift/aircompressor/pull/142. >>>> AFAICT, aircompressor started to favor (or be more OK with) native >>>> implementations (because of Project Panama), so adding LZ4 framed >>>> compression might be simpler these days. >>>> >>>> I would prefer to spend the effort on completing the compression. >>>> >>>> Best >>>> Piotr >>>> >>>> >>>> >>>> >>>> On Tue, 27 Aug 2024 at 14:29, Gabor Kaszab >>>> <gaborkas...@cloudera.com.invalid> wrote: >>>> >>>>> Hi Iceberg Community, >>>>> >>>>> I saw in the Puffin spec <https://iceberg.apache.org/puffin-spec> >>>>> that the footer of the Puffin file or the blobs themselves could be >>>>> compressed by LZ4. I checked the code >>>>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/puffin/PuffinFormat.java#L110> >>>>> however, and for me it seems that currently LZ4 is not supported. >>>>> My first question is do I miss anything here? >>>>> The second, is if we in fact don't support LZ4, can I remove it from >>>>> the spec to avoid confusions? (I believe this requires a vote in a >>>>> separate >>>>> thread) >>>>> >>>>> Thanks, >>>>> Gabor >>>>> >>>>>