> > For the conversion from Delta to Iceberg, wouldn't we need to scan all of > the Delta Vectors if we choose a different CRC or other endian-ness?
Exactly, we would not be able to expose Delta as Iceberg if we choose a different checksum type or byte order. Does delta mandate that writers also include this information in their > metadata files? If I understand correctly, the checksum is only in the DV file, not in the metadata. - Anton чт, 17 жовт. 2024 р. о 14:51 Russell Spitzer <russell.spit...@gmail.com> пише: > For the conversion from Delta to Iceberg, wouldn't we need to scan all of > the Delta Vectors if we choose a different CRC or other endian-ness? Does > delta mandate that writers also include this information in their metadata > files? > > On Thu, Oct 17, 2024 at 4:26 PM Anton Okolnychyi <aokolnyc...@gmail.com> > wrote: > >> We would want to have magic bytes + checksum as part of the blob in >> Iceberg, as discussed in the spec PRs. If we chose something other than CRC >> and/or use little endian for all parts of the blob, this would break the >> compatibility in either direction and would prevent the use case that Scott >> was mentioning. >> >> - Anton >> >> чт, 17 жовт. 2024 р. о 08:58 Bart Samwel <b...@databricks.com.invalid> >> пише: >> >>> I hope it's OK if I chime in. I'm one of the people responsible for the >>> format for position deletes that is used in Delta Lake and I've been >>> reading along with the discussion. Given that the main sticking point is >>> whether this compatibility is worth the associated "not pure" spec, I >>> figured that maybe I can mention what the consequences would be for the >>> Delta Lake developers and users, depending on the outcome of this >>> discussion. I can also give some historical background, in case people find >>> that interesting. >>> >>> 1) Historical background on why the Delta Lake format is the way it is. >>> >>> The reason that this length field was added on the Delta Lake side is >>> because we didn't have a framing format like Puffin. Like you, we wanted >>> the Deletion Vector files to be parseable by themselves, if only for >>> debugging purposes. If we could go back, then we might have adopted Puffin. >>> Or we would have made the pointers in the metadata point at only the blob + >>> CRC, and kept the length outside of it, in the framing format. But the >>> reality is that right now there are many clients out there that read the >>> current format, and we can't change this anymore. :( The endianness >>> difference is simply an unfortunate historical accident. They are at >>> different layers, and this was the first time we really did anything >>> binary-ish in Delta Lake, so we didn't actually have any consistent >>> baseline to be consistent with. We only noticed the difference once it had >>> "escaped" into the wild, and then it was too late. >>> >>> Am I super happy with it? No. Is it *terrible*? Well, not terrible >>> enough for us to go back and upgrade the protocol to fix it. It doesn't >>> lead to broken behavior. This is just a historical idiosyncrasy, and the >>> friction caused by protocol changes is much higher than any benefit from a >>> cleaner spec. So basically, we're stuck with it until the next time we do a >>> major overhaul of the protocol. >>> >>> (2) What are the consequences for Delta Lake if this is *not* made >>> compatible? >>> >>> Well, then we'd have to support this new layout in Delta Lake. This >>> would be a long and relatively painful process. >>> >>> It would not just be a matter of "retconning" it into the protocol and >>> updating the libraries. There are simply too many connectors out there, >>> owned by different vendors etc. Until they would adopt the change, they >>> would simply error out on these files *at runtime with weird errors*, >>> or potentially even use the invalid values and crash and burn. (Lack of >>> proper input validation is unfortunately a real thing in the wild.) >>> >>> So instead, what we would do is to add this in a new protocol version of >>> Delta Lake. Or actually, it would be a "table feature", since Delta Lake >>> has a-la-carte protocol features. But these features tend to take a long >>> time to fully permeate the connector ecosystem, and people don't actually >>> upgrade their systems very quickly. That means that realistically, nobody >>> would be able to make use of this for quite a while. >>> >>> So what would need to happen instead? For now we would have to rewrite >>> the delete files on conversion, only to add this annoying little length >>> field. This would add at least 200 ms of latency to any metadata >>> conversion, if only because of the cloud object storage GET and PUT >>> latency. Furthermore, the conversion latency for a single commit would >>> become dependent on the number of delete files instead of being O(1). And >>> it would take significant development time to actually make this work and >>> to make this scale. >>> >>> Based on these consequences, you can imagine why I would *really* >>> appreciate it if the community could weigh this aspect as part of their >>> deliberations. >>> >>> (3) Is Iceberg -> Delta Lake compatibility actually important enough to >>> care about? >>> >>> From where I'm standing, compatibility is nearly always very important. >>> It's not important for users who have standardized fully on Iceberg, and >>> those are probably the most represented here in the dev community. But in >>> the world that I'm seeing, companies are generally using a mixture of many >>> different systems, and they are suffering because of the inability for >>> systems to operate efficiently on each others' data. Being able to convert >>> easily and efficiently in both directions benefits users. In this case it's >>> about Iceberg and Delta Lake, but IMO this is true as a principle >>> regardless of which systems you're talking about -- lower friction for >>> interoperability is very high value because it increases users' choice in >>> the tools that they can use -- it allows them to choose the right tool for >>> the job at hand. And it doesn't matter if users are converting from Delta >>> Lake to Iceberg or the other way around, they are in fact all Iceberg users! >>> >>> Putting it simply: I have heard many users complain that they can't >>> (efficiently) read data from system X in system Y. At the same time, I have >>> never heard a user complaining about having inconsistent endianness in >>> their protocols. >>> >>> On Thu, Oct 17, 2024 at 11:02 AM Jean-Baptiste Onofré <j...@nanthrax.net> >>> wrote: >>> >>>> Hi folks, >>>> >>>> As Daniel said, I think we have actually two proposals in one: >>>> 1. The first proposal is "improvement of positional delete files", >>>> using delete vectors stored in Puffin files. I like this proposal, it >>>> makes a lot of sense. I think with a kind of consensus here (we >>>> discussed about how to parse Puffin files, etc, good discussion). >>>> 2. Then, based on (1), is support vector format "compatible" with >>>> Delta. This is also interesting. However, do we really need this in >>>> Spec V3 ? Why not focus on the original proposal (improvement of >>>> positional delete) with a simple approach, and evaluate Delta >>>> compatibility later ? If the compatibility is "easy", I'm not against >>>> to include in V3, but users might be disappointed if bringing this >>>> means a tradeoff. >>>> >>>> Imho, I will focus on 1 because it would be a great feature for the >>>> Iceberg community. >>>> >>>> Regards >>>> JB >>>> >>>> On Wed, Oct 16, 2024 at 9:16 PM Daniel Weeks <dwe...@apache.org> wrote: >>>> > >>>> > Hey Everyone, >>>> > >>>> > I feel like at this point we've articulated all of the various >>>> options and paths forward, but this really just comes down to a matter of >>>> whether we want to make a concession here for the purpose of compatibility. >>>> > >>>> > If we were building this with no prior art, I would expect to omit >>>> the length and align the endianness, but given there's an opportunity to >>>> close the gap with minor inefficiency, it merits real consideration. >>>> > >>>> > This proposal takes into consideration bi-directional compatibility >>>> while maintaining backward compatibility. Do we feel this is beneficial to >>>> the larger community or should we discard efforts for compatibility? >>>> > >>>> > -Dan >>>> > >>>> > On Wed, Oct 16, 2024 at 11:01 AM rdb...@gmail.com <rdb...@gmail.com> >>>> wrote: >>>> >> >>>> >> Thanks, Russell for the clear summary of the pros and cons! I agree >>>> there's some risk to Iceberg implementations, but I think that is mitigated >>>> somewhat by code reuse. For example, an engine like Trino could simply >>>> reuse code for reading Delta bitmaps, so we would get some validation and >>>> support more easily. >>>> >> >>>> >> Micah, I agree with the requirements that you listed, but I would >>>> say #2 is not yet a "requirement" for the design. It's a consideration that >>>> I think has real value, but it's up to the community whether we want to add >>>> some cost to #1 to make #2 happen. I definitely think that #3 is a >>>> requirement so that we can convert Delta to Iceberg metadata (as in the >>>> iceberg-delta-lake module). >>>> >> >>>> >> For the set of options, I would collapse a few of those options >>>> because I think that we would use the same bitmap representation, the >>>> portable 64-bit roaring bitmap. >>>> >> >>>> >> If that's the case (and probably even if we had some other >>>> representation), then Delta can always add support for reading Iceberg >>>> delete vectors. That means we either go with the current proposal (a) that >>>> preserves the ability for existing Delta clients to read, or we go with a >>>> different proposal that we think is better, in which case Delta adds >>>> support. >>>> >> >>>> >> I think both options (c) and (d) have the same effect: Delta readers >>>> need to change and that breaks forward compatibility. Specifically: >>>> >> * I think that Option (c) would mean that we set the offset to >>>> either magic bytes or directly to the start of the roaring bitmap, so I >>>> think we will almost certainly be able to read Delta DVs. Even if we didn't >>>> have a similar bitmap encoding, we would probably end up adding support for >>>> reading Delta DVs for iceberg-delta-lake. Then it's a question of whether >>>> support for converted files is required -- similar to how we handle missing >>>> partition values in data files from Hive tables that we just updated the >>>> spec to clarify. >>>> >> * Option (d) is still incompatible with existing Delta readers, so >>>> there isn't much of a difference between this and (b) >>>> >> >>>> >> To me, Micah's requirement #2 is a good goal, but needs to be >>>> balanced against the cost. I don't see that cost as too high, and I think >>>> avoiding fragmentation across the projects helps us work together more in >>>> the future. But again, that may be my goal and not a priority for the >>>> broader Iceberg community. >>>> >> >>>> >> Ryan >>>> >> >>>> >> On Wed, Oct 16, 2024 at 10:10 AM Micah Kornfield < >>>> emkornfi...@gmail.com> wrote: >>>> >>> >>>> >>> One small point >>>> >>>> >>>> >>>> Theoretically we could end up with iceberg implementers who have >>>> bugs in this part of the code and we wouldn’t even know it was an issue >>>> till someone converted the table to delta. >>>> >>> >>>> >>> >>>> >>> I guess we could mandate readers validate all fields here to make >>>> sure they are all consistent, even if unused. >>>> >>> >>>> >>> Separately, I think it might pay to take a step back and restate >>>> desired requirements of this design (in no particular order): >>>> >>> 1. The best possible implementation of DVs (limited redundancy, no >>>> extraneous fields, CPU efficiency, minimal space, etc). >>>> >>> 2. The ability for Delta Lake readers to read Iceberg DVs >>>> >>> 3. The ability for Iceberg readers to read Delta Lake DVs >>>> >>> >>>> >>> The current proposal accomplishes 2 and 3 at very low cost with >>>> some for cost for 1. I still think 1 is important. Table formats are >>>> still going through a very large growth phase so taking suboptimal choices, >>>> when there are better choices that don't add substantial cost, shouldn't be >>>> done lightly. Granted DVs are only going to be a very small part of the >>>> cost of any table format. >>>> >>> >>>> >>> I think it is worth discussing other options to see if we think >>>> there is a better one (if there isn't then I would propose we continue with >>>> the current proposal). Please chime in if I missed one but off the top of >>>> my head these are: >>>> >>> >>>> >>> a. Go forward with current proposal >>>> >>> b. Create a different format DV that we feel is a better, and take >>>> no additional steps for compatibility with Delta Lake. >>>> >>> c. Create a different format DV that we feel is a better, and >>>> allow backwards compatibility by adding "reader" support for Delta Lake DVs >>>> in the spec, but not "writer support". >>>> >>> d. Go forward with the current proposal but use offset and length >>>> to trim off the "offset" bytes. (I assume this would break Delta Lake >>>> Readers but I think Iceberg Readers could still read Delta Lake tables). >>>> This option is very close to C but doesn't address all concerns around DV >>>> format). >>>> >>> >>>> >>> Out of these three, my slight preference would be option c (add >>>> migration capabilities from Delta Lake to Iceberg), followed by option a >>>> (current proposal). >>>> >>> >>>> >>> Cheers, >>>> >>> Micah >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> On Tue, Oct 15, 2024 at 9:32 PM Russell Spitzer < >>>> russell.spit...@gmail.com> wrote: >>>> >>>> >>>> >>>> @Scott We would have the ability to read delta vectors regardless >>>> of what we pick since on Iceberg side we really just need the bitmap and >>>> what offset it is located at within a file, everything else could be in the >>>> Iceberg metadata. We don’t have any disagreement on this aspect I think. >>>> >>>> >>>> >>>> The question is whether we would write additional Delta specific >>>> metadata into the vector itself that an Iceberg implementation would not >>>> use so that current Delta readers could read Iceberg delete vectors without >>>> a code change or rewriting the vectors. The underlying representation would >>>> still be the same between the two formats. >>>> >>>> >>>> >>>> The pros to doing this are that a reverse translation of iceberg >>>> to delta would be much simpler. Any implementers who already have delta >>>> vector read code can probably mostly reuse it although our metadata would >>>> let them skip to just reading the bitmap. >>>> >>>> >>>> >>>> The cons are that the metadata being written isn’t used by Iceberg >>>> so any real tests would require using a delta reader, anything else would >>>> just be synthetic. Theoretically we could end up with iceberg implementers >>>> who have bugs in this part of the code and we wouldn’t even know it was an >>>> issue till someone converted the table to delta. Other iceberg readers >>>> would just be ignoring these bytes, so we essentially are adding a >>>> requirement and complexity (although not that much) to Iceberg writers for >>>> the benefit of current Delta readers. Delta would probably also have to add >>>> a new fields to their metadata representations to capture the vector >>>> metadata to handle our bitmaps. >>>> >>>> >>>> >>>> On Tue, Oct 15, 2024 at 5:56 PM Scott Cowell < >>>> scott.cow...@dremio.com.invalid> wrote: >>>> >>>>> >>>> >>>>> From an engine perspective I think compatibility between Delta >>>> and Iceberg on DVs is a great thing to have. The additions for >>>> cross-compat seem a minor thing to me that is vastly outweighed by a future >>>> where Delta tables with DVs were supported in Delta Uniform and could be >>>> read by any Iceberg V3 compliant engine. >>>> >>>>> >>>> >>>>> -Scott >>>> >>>>> >>>> >>>>> On Tue, Oct 15, 2024 at 2:06 PM Anton Okolnychyi < >>>> aokolnyc...@gmail.com> wrote: >>>> >>>>>> >>>> >>>>>> Are there engines/vendors/companies in the community that >>>> support both Iceberg and Delta and would benefit from having one blob >>>> layout for DVs? >>>> >>>>>> >>>> >>>>>> - Anton >>>> >>>>>> >>>> >>>>>> вт, 15 жовт. 2024 р. о 11:10 rdb...@gmail.com <rdb...@gmail.com> >>>> пише: >>>> >>>>>>> >>>> >>>>>>> Thanks, Szehon. >>>> >>>>>>> >>>> >>>>>>> To clarify on compatibility, using the same format for the >>>> blobs makes it so that existing Delta readers can read and use the DVs >>>> written by Iceberg. I'd love for Delta to adopt Puffin, but if we adopt the >>>> extra fields they would not need to change how readers work. That's why I >>>> think there is a benefit to using the same format. We avoid fragmentation >>>> and make sure data and delete files are compatible. No unnecessary >>>> fragmentation. >>>> >>>>>>> >>>> >>>>>>> Ryan >>>> >>>>>>> >>>> >>>>>>> On Tue, Oct 15, 2024 at 10:57 AM Szehon Ho < >>>> szehon.apa...@gmail.com> wrote: >>>> >>>>>>>> >>>> >>>>>>>> This is awesome work by Anton and Ryan, it looks like a ton of >>>> effort has gone into the V3 position vector proposal to make it clean and >>>> efficient, a long time coming and Im truly excited to see the great >>>> improvement in storage/perf. >>>> >>>>>>>> >>>> >>>>>>>> wrt to these fields, I think most of the concerns are already >>>> mentioned by the other community members in the prs >>>> https://github.com/apache/iceberg/pull/11238 and >>>> https://github.com/apache/iceberg/pull/11238, so not much to add. The >>>> DV itself is RoaringBitmap 64-bit format so that's great, the argument for >>>> CRC seems reasonable, and I dont have enough data to be opinionated towards >>>> endian/magic byte. >>>> >>>>>>>> >>>> >>>>>>>> But I do lean towards the many PR comments that the extra >>>> length field is unnecessary, and would just add confusion. It seemed to me >>>> that the Iceberg community has made so much effort to trim to spec to the >>>> bare minimum for cleanliness and efficiency, so I do feel the field is not >>>> in the normal direction of the project. Also Im not clear on the plan for >>>> old Delta readers, they cant read Puffin anyway, if Delta adopts Puffin, >>>> then new readers could adopt? Anyway great work again, thanks for raising >>>> the issue on devlist! >>>> >>>>>>>> >>>> >>>>>>>> Thanks, >>>> >>>>>>>> Szehon >>>> >>>>>>>> >>>> >>>>>>>> On Mon, Oct 14, 2024 at 5:14 PM rdb...@gmail.com < >>>> rdb...@gmail.com> wrote: >>>> >>>>>>>>> >>>> >>>>>>>>> > I think it might be worth mentioning the current proposal >>>> makes some, mostly minor, design choices to try to be compatible with Delta >>>> Lake deletion vectors. >>>> >>>>>>>>> >>>> >>>>>>>>> Yes it does, and thanks for pointing this out, Micah. I think >>>> it's important to consider whether compatibility is important to this >>>> community. I just replied to Piotr on the PR, but I'll adapt some of that >>>> response here to reach the broader community. >>>> >>>>>>>>> >>>> >>>>>>>>> I think there is value in supporting compatibility with older >>>> Delta readers, but I acknowledge that this may be my perspective because my >>>> employer has a lot of Delta customers that we are going to support now and >>>> in the future. >>>> >>>>>>>>> >>>> >>>>>>>>> The main use case for maintaining compatibility with the >>>> Delta format is that it's hard to move old jobs to new code in a migration. >>>> We see a similar issue in Hive to Iceberg migrations, where unknown older >>>> readers prevent migration entirely because they are hard to track down and >>>> often read files directly from the backing object store. I'd like to avoid >>>> the same problem here, where all readers need to be identified and migrated >>>> at the same time. Compatibility with the format those readers expect makes >>>> it possible to maintain Delta metadata for them temporarily. That increases >>>> confidence that things won't randomly break and makes it easier to get >>>> people to move forward. >>>> >>>>>>>>> >>>> >>>>>>>>> The second reason for maintaining compatibility is that we >>>> want for the formats to become more similar. My hope is that we can work >>>> across both communities and come up with a common metadata format in a >>>> future version -- which explains my interest in smooth migrations. >>>> Maintaining compatibility in cases like this builds trust and keeps our >>>> options open: if we have compatible data layers, then it's easier to build >>>> a compatible metadata layer. I'm hoping that if we make the blob format >>>> compatible, we can get the Delta community to start using Puffin for better >>>> self-describing delete files. >>>> >>>>>>>>> >>>> >>>>>>>>> Other people may not share those goals, so I think it helps >>>> to consider what is being compromised for this compatibility. I don't think >>>> it is too much. There are 2 additional fields: >>>> >>>>>>>>> * A 4-byte length field (that Iceberg doesn't need) >>>> >>>>>>>>> * A 4-byte CRC to validate the contents of the bitmap >>>> >>>>>>>>> >>>> >>>>>>>>> There are also changes to how these would have been added if >>>> the Iceberg community were building this independently. >>>> >>>>>>>>> * Our initial version didn't include a CRC at all, but now >>>> that we think it's useful compatibility means using a CRC-32 checksum >>>> rather than a newer one >>>> >>>>>>>>> * The Delta format uses big endian for its fields (or mixed >>>> endian if you consider RoaringBitmap is LE) >>>> >>>>>>>>> * The magic bytes (added to avoid reading the Puffin footer) >>>> would have been different >>>> >>>>>>>>> >>>> >>>>>>>>> Overall, I don't think that those changes to what we would >>>> have done are unreasonable. It's only 8 extra bytes and half of them are >>>> for a checksum that is a good idea. >>>> >>>>>>>>> >>>> >>>>>>>>> I'm looking forward to what the rest of the community thinks >>>> about this. Thanks for reviewing the PR! >>>> >>>>>>>>> >>>> >>>>>>>>> Ryan >>>> >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>> On Sun, Oct 13, 2024 at 10:45 PM Jean-Baptiste Onofré < >>>> j...@nanthrax.net> wrote: >>>> >>>>>>>>>> >>>> >>>>>>>>>> Hi >>>> >>>>>>>>>> >>>> >>>>>>>>>> Thanks for the PRs ! I reviewed Anton's document, I will do >>>> a pass on the PRs. >>>> >>>>>>>>>> >>>> >>>>>>>>>> Imho, it's important to get feedback from query engines, as, >>>> if delete >>>> >>>>>>>>>> vectors is not a problem per se (it's what we are using as >>>> internal >>>> >>>>>>>>>> representation), the use of Puffin files to store it is >>>> "impactful" >>>> >>>>>>>>>> for the query engines (probably some query engines might >>>> need to >>>> >>>>>>>>>> implement Puffin spec (read/write) using other language than >>>> Java, for >>>> >>>>>>>>>> instance Apache Impala). >>>> >>>>>>>>>> >>>> >>>>>>>>>> I like the proposal, I just hope we won't "surprise" some >>>> query >>>> >>>>>>>>>> engines with extra work :) >>>> >>>>>>>>>> >>>> >>>>>>>>>> Regards >>>> >>>>>>>>>> JB >>>> >>>>>>>>>> >>>> >>>>>>>>>> On Thu, Oct 10, 2024 at 11:41 PM rdb...@gmail.com < >>>> rdb...@gmail.com> wrote: >>>> >>>>>>>>>> > >>>> >>>>>>>>>> > Hi everyone, >>>> >>>>>>>>>> > >>>> >>>>>>>>>> > There seems to be broad agreement around Anton's proposal >>>> to use deletion vectors in Iceberg v3, so I've opened two PRs that update >>>> the spec with the proposed changes. The first, PR #11238, adds a new Puffin >>>> blob type, delete-vector-v1, that stores a delete vector. The second, PR >>>> #11240, updates the Iceberg table spec. >>>> >>>>>>>>>> > >>>> >>>>>>>>>> > Please take a look and comment! >>>> >>>>>>>>>> > >>>> >>>>>>>>>> > Ryan >>>> >>>