Re: [Discuss] LZ4 compression in Puffin spec

2024-10-13 Thread Micah Kornfield
Apologies for the very delayed reply. Does unframed LZ4 provide a checksum of the content before compression? I don't believe so, we would have need to add basic minimal metadata like checksum/uncompressed length. I think this is still fairly simple compared to implementing the block format. O

Re: Spec changes for deletion vectors

2024-10-13 Thread Anton Okolnychyi
Thanks for putting the spec PRs together, Ryan! A bit of context below. The concept of DVs is not external to Iceberg. We have been using Roaring bitmaps (aka DVs) as an in-memory representation for position deletes, which allowed us to support vectorized reads and buffer out-of-order positions i

Re: Spec changes for deletion vectors

2024-10-13 Thread Jean-Baptiste Onofré
Hi Thanks for the PRs ! I reviewed Anton's document, I will do a pass on the PRs. Imho, it's important to get feedback from query engines, as, if delete vectors is not a problem per se (it's what we are using as internal representation), the use of Puffin files to store it is "impactful" for the