One small point > Theoretically we could end up with iceberg implementers who have bugs in > this part of the code and we wouldn’t even know it was an issue till > someone converted the table to delta.
I guess we could mandate readers validate all fields here to make sure they are all consistent, even if unused. Separately, I think it might pay to take a step back and restate desired requirements of this design (in no particular order): 1. The best possible implementation of DVs (limited redundancy, no extraneous fields, CPU efficiency, minimal space, etc). 2. The ability for Delta Lake readers to read Iceberg DVs 3. The ability for Iceberg readers to read Delta Lake DVs The current proposal accomplishes 2 and 3 at very low cost with some for cost for 1. I still think 1 is important. Table formats are still going through a very large growth phase so taking suboptimal choices, when there are better choices that don't add substantial cost, shouldn't be done lightly. Granted DVs are only going to be a very small part of the cost of any table format. I think it is worth discussing other options to see if we think there is a better one (if there isn't then I would propose we continue with the current proposal). Please chime in if I missed one but off the top of my head these are: a. Go forward with current proposal b. Create a different format DV that we feel is a better, and take no additional steps for compatibility with Delta Lake. c. Create a different format DV that we feel is a better, and allow backwards compatibility by adding "reader" support for Delta Lake DVs in the spec, but not "writer support". d. Go forward with the current proposal but use offset and length to trim off the "offset" bytes. (I assume this would break Delta Lake Readers but I think Iceberg Readers could still read Delta Lake tables). This option is very close to C but doesn't address all concerns around DV format). Out of these three, my slight preference would be option c (add migration capabilities from Delta Lake to Iceberg), followed by option a (current proposal). Cheers, Micah On Tue, Oct 15, 2024 at 9:32 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > @Scott We would have the ability to read delta vectors regardless of what > we pick since on Iceberg side we really just need the bitmap and what > offset it is located at within a file, everything else could be in the > Iceberg metadata. We don’t have any disagreement on this aspect I think. > > The question is whether we would write additional Delta specific metadata > into the vector itself that an Iceberg implementation would not use so that > current Delta readers could read Iceberg delete vectors without a code > change or rewriting the vectors. The underlying representation would still > be the same between the two formats. > > The pros to doing this are that a reverse translation of iceberg to delta > would be much simpler. Any implementers who already have delta vector read > code can probably mostly reuse it although our metadata would let them skip > to just reading the bitmap. > > The cons are that the metadata being written isn’t used by Iceberg so any > real tests would require using a delta reader, anything else would just be > synthetic. Theoretically we could end up with iceberg implementers who have > bugs in this part of the code and we wouldn’t even know it was an issue > till someone converted the table to delta. Other iceberg readers would just > be ignoring these bytes, so we essentially are adding a requirement and > complexity (although not that much) to Iceberg writers for the benefit of > current Delta readers. Delta would probably also have to add a new fields > to their metadata representations to capture the vector metadata to handle > our bitmaps. > > On Tue, Oct 15, 2024 at 5:56 PM Scott Cowell > <scott.cow...@dremio.com.invalid> wrote: > >> From an engine perspective I think compatibility between Delta and >> Iceberg on DVs is a great thing to have. The additions for cross-compat >> seem a minor thing to me that is vastly outweighed by a future where Delta >> tables with DVs were supported in Delta Uniform and could be read by any >> Iceberg V3 compliant engine. >> >> -Scott >> >> On Tue, Oct 15, 2024 at 2:06 PM Anton Okolnychyi <aokolnyc...@gmail.com> >> wrote: >> >>> Are there engines/vendors/companies in the community that support both >>> Iceberg and Delta and would benefit from having one blob layout for DVs? >>> >>> - Anton >>> >>> вт, 15 жовт. 2024 р. о 11:10 rdb...@gmail.com <rdb...@gmail.com> пише: >>> >>>> Thanks, Szehon. >>>> >>>> To clarify on compatibility, using the same format for the blobs makes >>>> it so that existing Delta readers can read and use the DVs written by >>>> Iceberg. I'd love for Delta to adopt Puffin, but if we adopt the extra >>>> fields they would not need to change how readers work. That's why I think >>>> there is a benefit to using the same format. We avoid fragmentation and >>>> make sure data and delete files are compatible. No unnecessary >>>> fragmentation. >>>> >>>> Ryan >>>> >>>> On Tue, Oct 15, 2024 at 10:57 AM Szehon Ho <szehon.apa...@gmail.com> >>>> wrote: >>>> >>>>> This is awesome work by Anton and Ryan, it looks like a ton of effort >>>>> has gone into the V3 position vector proposal to make it clean and >>>>> efficient, a long time coming and Im truly excited to see the great >>>>> improvement in storage/perf. >>>>> >>>>> wrt to these fields, I think most of the concerns are already >>>>> mentioned by the other community members in the prs >>>>> https://github.com/apache/iceberg/pull/11238 and >>>>> https://github.com/apache/iceberg/pull/11238, so not much to add. >>>>> The DV itself is RoaringBitmap 64-bit format so that's great, the argument >>>>> for CRC seems reasonable, and I dont have enough data to be opinionated >>>>> towards endian/magic byte. >>>>> >>>>> But I do lean towards the many PR comments that the extra length field >>>>> is unnecessary, and would just add confusion. It seemed to me that the >>>>> Iceberg community has made so much effort to trim to spec to the bare >>>>> minimum for cleanliness and efficiency, so I do feel the field is not in >>>>> the normal direction of the project. Also Im not clear on the plan for >>>>> old >>>>> Delta readers, they cant read Puffin anyway, if Delta adopts Puffin, then >>>>> new readers could adopt? Anyway great work again, thanks for raising the >>>>> issue on devlist! >>>>> >>>>> Thanks, >>>>> Szehon >>>>> >>>>> On Mon, Oct 14, 2024 at 5:14 PM rdb...@gmail.com <rdb...@gmail.com> >>>>> wrote: >>>>> >>>>>> > I think it might be worth mentioning the current proposal makes >>>>>> some, mostly minor, design choices to try to be compatible with Delta >>>>>> Lake >>>>>> deletion vectors. >>>>>> >>>>>> Yes it does, and thanks for pointing this out, Micah. I think it's >>>>>> important to consider whether compatibility is important to this >>>>>> community. >>>>>> I just replied to Piotr on the PR, but I'll adapt some of that response >>>>>> here to reach the broader community. >>>>>> >>>>>> I think there is value in supporting compatibility with older Delta >>>>>> readers, but I acknowledge that this may be my perspective because my >>>>>> employer has a lot of Delta customers that we are going to support now >>>>>> and >>>>>> in the future. >>>>>> >>>>>> The main use case for maintaining compatibility with the Delta format >>>>>> is that it's hard to move old jobs to new code in a migration. We see a >>>>>> similar issue in Hive to Iceberg migrations, where unknown older readers >>>>>> prevent migration entirely because they are hard to track down and often >>>>>> read files directly from the backing object store. I'd like to avoid the >>>>>> same problem here, where all readers need to be identified and migrated >>>>>> at >>>>>> the same time. Compatibility with the format those readers expect makes >>>>>> it >>>>>> possible to maintain Delta metadata for them temporarily. That increases >>>>>> confidence that things won't randomly break and makes it easier to get >>>>>> people to move forward. >>>>>> >>>>>> The second reason for maintaining compatibility is that we want for >>>>>> the formats to become more similar. My hope is that we can work across >>>>>> both >>>>>> communities and come up with a common metadata format in a future version >>>>>> -- which explains my interest in smooth migrations. Maintaining >>>>>> compatibility in cases like this builds trust and keeps our options open: >>>>>> if we have compatible data layers, then it's easier to build a compatible >>>>>> metadata layer. I'm hoping that if we make the blob format compatible, we >>>>>> can get the Delta community to start using Puffin for better >>>>>> self-describing delete files. >>>>>> >>>>>> Other people may not share those goals, so I think it helps to >>>>>> consider what is being compromised for this compatibility. I don't think >>>>>> it >>>>>> is too much. There are 2 additional fields: >>>>>> * A 4-byte length field (that Iceberg doesn't need) >>>>>> * A 4-byte CRC to validate the contents of the bitmap >>>>>> >>>>>> There are also changes to how these would have been added if the >>>>>> Iceberg community were building this independently. >>>>>> * Our initial version didn't include a CRC at all, but now that we >>>>>> think it's useful compatibility means using a CRC-32 checksum rather >>>>>> than a >>>>>> newer one >>>>>> * The Delta format uses big endian for its fields (or mixed endian if >>>>>> you consider RoaringBitmap is LE) >>>>>> * The magic bytes (added to avoid reading the Puffin footer) would >>>>>> have been different >>>>>> >>>>>> Overall, I don't think that those changes to what we would have done >>>>>> are unreasonable. It's only 8 extra bytes and half of them are for a >>>>>> checksum that is a good idea. >>>>>> >>>>>> I'm looking forward to what the rest of the community thinks about >>>>>> this. Thanks for reviewing the PR! >>>>>> >>>>>> Ryan >>>>>> >>>>>> >>>>>> On Sun, Oct 13, 2024 at 10:45 PM Jean-Baptiste Onofré < >>>>>> j...@nanthrax.net> wrote: >>>>>> >>>>>>> Hi >>>>>>> >>>>>>> Thanks for the PRs ! I reviewed Anton's document, I will do a pass >>>>>>> on the PRs. >>>>>>> >>>>>>> Imho, it's important to get feedback from query engines, as, if >>>>>>> delete >>>>>>> vectors is not a problem per se (it's what we are using as internal >>>>>>> representation), the use of Puffin files to store it is "impactful" >>>>>>> for the query engines (probably some query engines might need to >>>>>>> implement Puffin spec (read/write) using other language than Java, >>>>>>> for >>>>>>> instance Apache Impala). >>>>>>> >>>>>>> I like the proposal, I just hope we won't "surprise" some query >>>>>>> engines with extra work :) >>>>>>> >>>>>>> Regards >>>>>>> JB >>>>>>> >>>>>>> On Thu, Oct 10, 2024 at 11:41 PM rdb...@gmail.com <rdb...@gmail.com> >>>>>>> wrote: >>>>>>> > >>>>>>> > Hi everyone, >>>>>>> > >>>>>>> > There seems to be broad agreement around Anton's proposal to use >>>>>>> deletion vectors in Iceberg v3, so I've opened two PRs that update the >>>>>>> spec >>>>>>> with the proposed changes. The first, PR #11238, adds a new Puffin blob >>>>>>> type, delete-vector-v1, that stores a delete vector. The second, PR >>>>>>> #11240, >>>>>>> updates the Iceberg table spec. >>>>>>> > >>>>>>> > Please take a look and comment! >>>>>>> > >>>>>>> > Ryan >>>>>>> >>>>>>