Re: Spec changes for deletion vectors

Szehon Ho Mon, 21 Oct 2024 15:05:48 -0700

Im +1 for adding DV (Goal 1) and also +1 for the ability for Iceberg readers to 
read Delta Lake DV’s, as the magic byte, CRC make sense design-wise (Goal 2).


Its nice that there's cross-community collaboration probably in other areas Im 
not looking at, but I'm -0.5 on writing an otherwise unjustifiable (albeit 
small) format change to be compatible by Delta Lake readers (Goal 3), to 
register that this is irregular direction for the project, for Iceberg to 
choose a spec just so it can be read by another project.  For example, it can 
be debated, but discussions in the past about adding export functionality to 
Hive, were not supported to be added to Iceberg project.  So it won’t block if 
community supports it, but want to raise awareness.

Thanks,
Szehon



> On Oct 21, 2024, at 2:36 PM, Micah Kornfield <emkornfi...@gmail.com> wrote:
> 
> I agree with everything Russell said.  I think we should move forward with 
> the current format of DVs to favor compatibility.  I'll add that I think the 
> collaboration aspect likely applies to other aspects as well outside of 
> Deletion Vectors (e.g. the work that is happening on Variant type).
> 
> Thanks,
> Micah
> 
> On Mon, Oct 21, 2024 at 1:45 PM Russell Spitzer <russell.spit...@gmail.com 
> <mailto:russell.spit...@gmail.com>> wrote:
>> I've thought about this a lot and talked it over with a lot of folks. As 
>> I've noted before my main concerns are 
>> 
>> A. Setting a precedent that we are delegating design decisions to another 
>> project
>> B. Setting unnecessary requirements that can only really be checked by 
>> integration tests with another system
>> 
>> I think the value of compatibility can override the pain/annoyance of B.
>> 
>> For A, I just want to make clear I will not go along with any sort of 
>> concession in design in the future. I think it's ok that we do it this time, 
>> but in the future if the Delta Delete Vector format changes I would really 
>> hope that the community around Delta would make those decisions in 
>> collaboration with the Apache Iceberg community. This is probably to be a 
>> red line for me in the future and I won't be able to go ahead in the future 
>> with any changes that aren't necessary for the Iceberg project regardless of 
>> their adoption in other formats.
>> 
>> So with that said, I'm in support of any of the above solutions but I think 
>> just going with full compatibility with Delta (down to storage format 
>> details) is the right choice to try to get the two communities working 
>> together in the future. 
>> 
>> On Sat, Oct 19, 2024 at 4:38 PM rdb...@gmail.com <mailto:rdb...@gmail.com> 
>> <rdb...@gmail.com <mailto:rdb...@gmail.com>> wrote:
>>> Thanks for the summary, Szehon!
>>> 
>>> I would add one thing to the "minimum" for each option. Because we want to 
>>> be able to seek directly to the DV for a particular data file, I think it's 
>>> important to start the blob with magic bytes. That way the reader can 
>>> validate that the offset was correct and that the contents of the blob are 
>>> in the expected format. So I'd add magic bytes to options (1) and (2). In 
>>> (2) we would want the magic bytes to match the ones from Delta to be able 
>>> to read DV files written by Delta.
>>> 
>>> Ryan
>>> 
>>> On Thu, Oct 17, 2024 at 8:55 PM Szehon Ho <szehon.apa...@gmail.com 
>>> <mailto:szehon.apa...@gmail.com>> wrote:
>>>> So based on Micah's original goals, switch 2 and 3:
>>>> 
>>>> 1. The best possible implementation of DVs (limited redundancy, no 
>>>> extraneous fields, CPU efficiency, minimal space, etc).
>>>> 2.  The ability for Iceberg readers to read Delta Lake DVs
>>>> 3.  The ability for Delta Lake readers to read Iceberg DVs
>>>> 
>>>> The minimum for each option are:
>>>> (1) = DV 
>>>> (2) = DV (little-endian) + CRC  (big-endian)
>>>> (3) = Len (big-endian) + Magic + DV (litte-endian) + CRC (big-endian) 
>>>> 
>>>> Design wise,  CRC can be useful, Magic may be useful/OK, but Len is 
>>>> controversial as its a different length than the bounding Puffin length 
>>>> (uncompressed and also a partial length without CRC).  Big-endian Len/CRC 
>>>> is not ideal.
>>>> 
>>>> I hope thats right?  As it took me somet ime to go through the thread, 
>>>> hope it saves others some time too.
>>>> Thanks
>>>> Szehon
>>>> 
>>>> 
>>>> On Thu, Oct 17, 2024 at 4:16 PM Anton Okolnychyi <aokolnyc...@gmail.com 
>>>> <mailto:aokolnyc...@gmail.com>> wrote:
>>>>>> For the conversion from Delta to Iceberg, wouldn't we need to scan all 
>>>>>> of the Delta Vectors if we choose a different CRC or other endian-ness?
>>>>> 
>>>>> Exactly, we would not be able to expose Delta as Iceberg if we choose a 
>>>>> different checksum type or byte order.
>>>>> 
>>>>>> Does delta mandate that writers also include this information in their 
>>>>>> metadata files?
>>>>> 
>>>>> If I understand correctly, the checksum is only in the DV file, not in 
>>>>> the metadata.
>>>>> 
>>>>> - Anton
>>>>> 
>>>>> чт, 17 жовт. 2024 р. о 14:51 Russell Spitzer <russell.spit...@gmail.com 
>>>>> <mailto:russell.spit...@gmail.com>> пише:
>>>>>> For the conversion from Delta to Iceberg, wouldn't we need to scan all 
>>>>>> of the Delta Vectors if we choose a different CRC or other endian-ness? 
>>>>>> Does delta mandate that writers also include this information in their 
>>>>>> metadata files?
>>>>>> 
>>>>>> On Thu, Oct 17, 2024 at 4:26 PM Anton Okolnychyi <aokolnyc...@gmail.com 
>>>>>> <mailto:aokolnyc...@gmail.com>> wrote:
>>>>>>> We would want to have magic bytes + checksum as part of the blob in 
>>>>>>> Iceberg, as discussed in the spec PRs. If we chose something other than 
>>>>>>> CRC and/or use little endian for all parts of the blob, this would 
>>>>>>> break the compatibility in either direction and would prevent the use 
>>>>>>> case that Scott was mentioning.
>>>>>>> 
>>>>>>> - Anton
>>>>>>> 
>>>>>>> чт, 17 жовт. 2024 р. о 08:58 Bart Samwel <b...@databricks.com.invalid> 
>>>>>>> пише:
>>>>>>>> I hope it's OK if I chime in. I'm one of the people responsible for 
>>>>>>>> the format for position deletes that is used in Delta Lake and I've 
>>>>>>>> been reading along with the discussion. Given that the main sticking 
>>>>>>>> point is whether this compatibility is worth the associated "not pure" 
>>>>>>>> spec, I figured that maybe I can mention what the consequences would 
>>>>>>>> be for the Delta Lake developers and users, depending on the outcome 
>>>>>>>> of this discussion. I can also give some historical background, in 
>>>>>>>> case people find that interesting.
>>>>>>>> 
>>>>>>>> 1) Historical background on why the Delta Lake format is the way it 
>>>>>>>> is. 
>>>>>>>> 
>>>>>>>> The reason that this length field was added on the Delta Lake side is 
>>>>>>>> because we didn't have a framing format like Puffin. Like you, we 
>>>>>>>> wanted the Deletion Vector files to be parseable by themselves, if 
>>>>>>>> only for debugging purposes. If we could go back, then we might have 
>>>>>>>> adopted Puffin. Or we would have made the pointers in the metadata 
>>>>>>>> point at only the blob + CRC, and kept the length outside of it, in 
>>>>>>>> the framing format. But the reality is that right now there are many 
>>>>>>>> clients out there that read the current format, and we can't change 
>>>>>>>> this anymore. :( The endianness difference is simply an unfortunate 
>>>>>>>> historical accident. They are at different layers, and this was the 
>>>>>>>> first time we really did anything binary-ish in Delta Lake, so we 
>>>>>>>> didn't actually have any consistent baseline to be consistent with. We 
>>>>>>>> only noticed the difference once it had "escaped" into the wild, and 
>>>>>>>> then it was too late.
>>>>>>>> 
>>>>>>>> Am I super happy with it? No. Is it terrible? Well, not terrible 
>>>>>>>> enough for us to go back and upgrade the protocol to fix it. It 
>>>>>>>> doesn't lead to broken behavior. This is just a historical 
>>>>>>>> idiosyncrasy, and the friction caused by protocol changes is much 
>>>>>>>> higher than any benefit from a cleaner spec. So basically, we're stuck 
>>>>>>>> with it until the next time we do a major overhaul of the protocol.
>>>>>>>> 
>>>>>>>> (2) What are the consequences for Delta Lake if this is not made 
>>>>>>>> compatible?
>>>>>>>> 
>>>>>>>> Well, then we'd have to support this new layout in Delta Lake. This 
>>>>>>>> would be a long and relatively painful process.
>>>>>>>> 
>>>>>>>> It would not just be a matter of "retconning" it into the protocol and 
>>>>>>>> updating the libraries. There are simply too many connectors out 
>>>>>>>> there, owned by different vendors etc. Until they would adopt the 
>>>>>>>> change, they would simply error out on these files at runtime with 
>>>>>>>> weird errors, or potentially even use the invalid values and crash and 
>>>>>>>> burn. (Lack of proper input validation is unfortunately a real thing 
>>>>>>>> in the wild.)
>>>>>>>> 
>>>>>>>> So instead, what we would do is to add this in a new protocol version 
>>>>>>>> of Delta Lake. Or actually, it would be a "table feature", since Delta 
>>>>>>>> Lake has a-la-carte protocol features. But these features tend to take 
>>>>>>>> a long time to fully permeate the connector ecosystem, and people 
>>>>>>>> don't actually upgrade their systems very quickly. That means that 
>>>>>>>> realistically, nobody would be able to make use of this for quite a 
>>>>>>>> while.
>>>>>>>> 
>>>>>>>> So what would need to happen instead? For now we would have to rewrite 
>>>>>>>> the delete files on conversion, only to add this annoying little 
>>>>>>>> length field. This would add at least 200 ms of latency to any 
>>>>>>>> metadata conversion, if only because of the cloud object storage GET 
>>>>>>>> and PUT latency. Furthermore, the conversion latency for a single 
>>>>>>>> commit would become dependent on the number of delete files instead of 
>>>>>>>> being O(1). And it would take significant development time to actually 
>>>>>>>> make this work and to make this scale.
>>>>>>>> 
>>>>>>>> Based on these consequences, you can imagine why I would really 
>>>>>>>> appreciate it if the community could weigh this aspect as part of 
>>>>>>>> their deliberations.
>>>>>>>> 
>>>>>>>> (3) Is Iceberg -> Delta Lake compatibility actually important enough 
>>>>>>>> to care about?
>>>>>>>> 
>>>>>>>> From where I'm standing, compatibility is nearly always very 
>>>>>>>> important. It's not important for users who have standardized fully on 
>>>>>>>> Iceberg, and those are probably the most represented here in the dev 
>>>>>>>> community. But in the world that I'm seeing, companies are generally 
>>>>>>>> using a mixture of many different systems, and they are suffering 
>>>>>>>> because of the inability for systems to operate efficiently on each 
>>>>>>>> others' data. Being able to convert easily and efficiently in both 
>>>>>>>> directions benefits users. In this case it's about Iceberg and Delta 
>>>>>>>> Lake, but IMO this is true as a principle regardless of which systems 
>>>>>>>> you're talking about -- lower friction for interoperability is very 
>>>>>>>> high value because it increases users' choice in the tools that they 
>>>>>>>> can use -- it allows them to choose the right tool for the job at 
>>>>>>>> hand. And it doesn't matter if users are converting from Delta Lake to 
>>>>>>>> Iceberg or the other way around, they are in fact all Iceberg users!
>>>>>>>> 
>>>>>>>> Putting it simply: I have heard many users complain that they can't 
>>>>>>>> (efficiently) read data from system X in system Y. At the same time, I 
>>>>>>>> have never heard a user complaining about having inconsistent 
>>>>>>>> endianness in their protocols.
>>>>>>>> 
>>>>>>>> On Thu, Oct 17, 2024 at 11:02 AM Jean-Baptiste Onofré 
>>>>>>>> <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote:
>>>>>>>>> Hi folks,
>>>>>>>>> 
>>>>>>>>> As Daniel said, I think we have actually two proposals in one:
>>>>>>>>> 1. The first proposal is "improvement of positional delete files",
>>>>>>>>> using delete vectors stored in Puffin files. I like this proposal, it
>>>>>>>>> makes a lot of sense. I think with a kind of consensus here (we
>>>>>>>>> discussed about how to parse Puffin files, etc, good discussion).
>>>>>>>>> 2. Then, based on (1), is support vector format "compatible" with
>>>>>>>>> Delta. This is also interesting. However, do we really need this in
>>>>>>>>> Spec V3 ? Why not focus on the original proposal (improvement of
>>>>>>>>> positional delete) with a simple approach, and evaluate Delta
>>>>>>>>> compatibility later ? If the compatibility is "easy", I'm not against
>>>>>>>>> to include in V3, but users might be disappointed if bringing this
>>>>>>>>> means a tradeoff.
>>>>>>>>> 
>>>>>>>>> Imho, I will focus on 1 because it would be a great feature for the
>>>>>>>>> Iceberg community.
>>>>>>>>> 
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>> 
>>>>>>>>> On Wed, Oct 16, 2024 at 9:16 PM Daniel Weeks <dwe...@apache.org 
>>>>>>>>> <mailto:dwe...@apache.org>> wrote:
>>>>>>>>> >
>>>>>>>>> > Hey Everyone,
>>>>>>>>> >
>>>>>>>>> > I feel like at this point we've articulated all of the various 
>>>>>>>>> > options and paths forward, but this really just comes down to a 
>>>>>>>>> > matter of whether we want to make a concession here for the purpose 
>>>>>>>>> > of compatibility.
>>>>>>>>> >
>>>>>>>>> > If we were building this with no prior art, I would expect to omit 
>>>>>>>>> > the length and align the endianness, but given there's an 
>>>>>>>>> > opportunity to close the gap with minor inefficiency, it merits 
>>>>>>>>> > real consideration.
>>>>>>>>> >
>>>>>>>>> > This proposal takes into consideration bi-directional compatibility 
>>>>>>>>> > while maintaining backward compatibility.  Do we feel this is 
>>>>>>>>> > beneficial to the larger community or should we discard efforts for 
>>>>>>>>> > compatibility?
>>>>>>>>> >
>>>>>>>>> > -Dan
>>>>>>>>> >
>>>>>>>>> > On Wed, Oct 16, 2024 at 11:01 AM rdb...@gmail.com 
>>>>>>>>> > <mailto:rdb...@gmail.com> <rdb...@gmail.com 
>>>>>>>>> > <mailto:rdb...@gmail.com>> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Thanks, Russell for the clear summary of the pros and cons! I 
>>>>>>>>> >> agree there's some risk to Iceberg implementations, but I think 
>>>>>>>>> >> that is mitigated somewhat by code reuse. For example, an engine 
>>>>>>>>> >> like Trino could simply reuse code for reading Delta bitmaps, so 
>>>>>>>>> >> we would get some validation and support more easily.
>>>>>>>>> >>
>>>>>>>>> >> Micah, I agree with the requirements that you listed, but I would 
>>>>>>>>> >> say #2 is not yet a "requirement" for the design. It's a 
>>>>>>>>> >> consideration that I think has real value, but it's up to the 
>>>>>>>>> >> community whether we want to add some cost to #1 to make #2 
>>>>>>>>> >> happen. I definitely think that #3 is a requirement so that we can 
>>>>>>>>> >> convert Delta to Iceberg metadata (as in the iceberg-delta-lake 
>>>>>>>>> >> module).
>>>>>>>>> >>
>>>>>>>>> >> For the set of options, I would collapse a few of those options 
>>>>>>>>> >> because I think that we would use the same bitmap representation, 
>>>>>>>>> >> the portable 64-bit roaring bitmap.
>>>>>>>>> >>
>>>>>>>>> >> If that's the case (and probably even if we had some other 
>>>>>>>>> >> representation), then Delta can always add support for reading 
>>>>>>>>> >> Iceberg delete vectors. That means we either go with the current 
>>>>>>>>> >> proposal (a) that preserves the ability for existing Delta clients 
>>>>>>>>> >> to read, or we go with a different proposal that we think is 
>>>>>>>>> >> better, in which case Delta adds support.
>>>>>>>>> >>
>>>>>>>>> >> I think both options (c) and (d) have the same effect: Delta 
>>>>>>>>> >> readers need to change and that breaks forward compatibility. 
>>>>>>>>> >> Specifically:
>>>>>>>>> >> * I think that Option (c) would mean that we set the offset to 
>>>>>>>>> >> either magic bytes or directly to the start of the roaring bitmap, 
>>>>>>>>> >> so I think we will almost certainly be able to read Delta DVs. 
>>>>>>>>> >> Even if we didn't have a similar bitmap encoding, we would 
>>>>>>>>> >> probably end up adding support for reading Delta DVs for 
>>>>>>>>> >> iceberg-delta-lake. Then it's a question of whether support for 
>>>>>>>>> >> converted files is required -- similar to how we handle missing 
>>>>>>>>> >> partition values in data files from Hive tables that we just 
>>>>>>>>> >> updated the spec to clarify.
>>>>>>>>> >> * Option (d) is still incompatible with existing Delta readers, so 
>>>>>>>>> >> there isn't much of a difference between this and (b)
>>>>>>>>> >>
>>>>>>>>> >> To me, Micah's requirement #2 is a good goal, but needs to be 
>>>>>>>>> >> balanced against the cost. I don't see that cost as too high, and 
>>>>>>>>> >> I think avoiding fragmentation across the projects helps us work 
>>>>>>>>> >> together more in the future. But again, that may be my goal and 
>>>>>>>>> >> not a priority for the broader Iceberg community.
>>>>>>>>> >>
>>>>>>>>> >> Ryan
>>>>>>>>> >>
>>>>>>>>> >> On Wed, Oct 16, 2024 at 10:10 AM Micah Kornfield 
>>>>>>>>> >> <emkornfi...@gmail.com <mailto:emkornfi...@gmail.com>> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> One small point
>>>>>>>>> >>>>
>>>>>>>>> >>>> Theoretically we could end up with iceberg implementers who have 
>>>>>>>>> >>>> bugs in this part of the code and we wouldn’t even know it was 
>>>>>>>>> >>>> an issue till someone converted the table to delta.
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> I guess we could mandate readers validate all fields here to make 
>>>>>>>>> >>> sure they are all consistent, even if unused.
>>>>>>>>> >>>
>>>>>>>>> >>> Separately, I think it might pay to take a step back and restate 
>>>>>>>>> >>> desired requirements of this design (in no particular order):
>>>>>>>>> >>> 1. The best possible implementation of DVs (limited redundancy, 
>>>>>>>>> >>> no extraneous fields, CPU efficiency, minimal space, etc).
>>>>>>>>> >>> 2.  The ability for Delta Lake readers to read Iceberg DVs
>>>>>>>>> >>> 3.  The ability for Iceberg readers to read Delta Lake DVs
>>>>>>>>> >>>
>>>>>>>>> >>> The current proposal accomplishes 2 and 3 at very low cost with 
>>>>>>>>> >>> some for cost for 1.  I still think 1 is important.  Table 
>>>>>>>>> >>> formats are still going through a very large growth phase so 
>>>>>>>>> >>> taking suboptimal choices, when there are better choices that 
>>>>>>>>> >>> don't add substantial cost, shouldn't be done lightly.  Granted 
>>>>>>>>> >>> DVs are only going to be a very small part of the cost of any 
>>>>>>>>> >>> table format.
>>>>>>>>> >>>
>>>>>>>>> >>> I think it is worth discussing other options to see if we think 
>>>>>>>>> >>> there is a better one (if there isn't then I would propose we 
>>>>>>>>> >>> continue with the current proposal).  Please chime in if I missed 
>>>>>>>>> >>> one but off the top of my head these are:
>>>>>>>>> >>>
>>>>>>>>> >>> a.  Go forward with current proposal
>>>>>>>>> >>> b.  Create a different format DV that we feel is a better, and 
>>>>>>>>> >>> take no additional steps for compatibility with Delta Lake.
>>>>>>>>> >>> c.  Create a different format DV that we feel is a better, and 
>>>>>>>>> >>> allow backwards compatibility by adding "reader" support for 
>>>>>>>>> >>> Delta Lake DVs in the spec, but not "writer support".
>>>>>>>>> >>> d.  Go forward with the current proposal but use offset and 
>>>>>>>>> >>> length to trim off the "offset" bytes.  (I assume this would 
>>>>>>>>> >>> break Delta Lake Readers but I think Iceberg Readers could still 
>>>>>>>>> >>> read Delta Lake tables). This option is very close to C but 
>>>>>>>>> >>> doesn't address all concerns around DV format).
>>>>>>>>> >>>
>>>>>>>>> >>> Out of these three, my slight preference would be option c (add 
>>>>>>>>> >>> migration capabilities from Delta Lake to Iceberg), followed by 
>>>>>>>>> >>> option a (current proposal).
>>>>>>>>> >>>
>>>>>>>>> >>> Cheers,
>>>>>>>>> >>> Micah
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> On Tue, Oct 15, 2024 at 9:32 PM Russell Spitzer 
>>>>>>>>> >>> <russell.spit...@gmail.com <mailto:russell.spit...@gmail.com>> 
>>>>>>>>> >>> wrote:
>>>>>>>>> >>>>
>>>>>>>>> >>>> @Scott We would have the ability to read delta vectors 
>>>>>>>>> >>>> regardless of what we pick since on  Iceberg side we really just 
>>>>>>>>> >>>> need the bitmap and what offset it is located at within a file, 
>>>>>>>>> >>>> everything else could be in the Iceberg metadata. We don’t have 
>>>>>>>>> >>>> any disagreement on this aspect I think.
>>>>>>>>> >>>>
>>>>>>>>> >>>> The question is whether we would write additional Delta specific 
>>>>>>>>> >>>> metadata into the vector itself that an Iceberg implementation 
>>>>>>>>> >>>> would not use so that current Delta readers could read Iceberg 
>>>>>>>>> >>>> delete vectors without a code change or rewriting the vectors. 
>>>>>>>>> >>>> The underlying representation would still be the same between 
>>>>>>>>> >>>> the two formats.
>>>>>>>>> >>>>
>>>>>>>>> >>>> The pros to doing this are that a reverse translation of iceberg 
>>>>>>>>> >>>> to delta would be much simpler.  Any implementers who already 
>>>>>>>>> >>>> have delta vector read code can probably mostly reuse it 
>>>>>>>>> >>>> although our metadata would let them skip to just reading the 
>>>>>>>>> >>>> bitmap.
>>>>>>>>> >>>>
>>>>>>>>> >>>> The cons are that the metadata being written isn’t used by 
>>>>>>>>> >>>> Iceberg so any real tests would require using a delta reader, 
>>>>>>>>> >>>> anything else would just be synthetic. Theoretically we could 
>>>>>>>>> >>>> end up with iceberg implementers who have bugs in this part of 
>>>>>>>>> >>>> the code and we wouldn’t even know it was an issue till someone 
>>>>>>>>> >>>> converted the table to delta. Other iceberg readers would just 
>>>>>>>>> >>>> be ignoring these bytes, so we essentially are adding a 
>>>>>>>>> >>>> requirement and complexity (although not that much) to Iceberg 
>>>>>>>>> >>>> writers for the benefit of current Delta readers. Delta would 
>>>>>>>>> >>>> probably also have to add a new fields to their metadata 
>>>>>>>>> >>>> representations to capture the vector metadata to handle our 
>>>>>>>>> >>>> bitmaps.
>>>>>>>>> >>>>
>>>>>>>>> >>>> On Tue, Oct 15, 2024 at 5:56 PM Scott Cowell 
>>>>>>>>> >>>> <scott.cow...@dremio.com 
>>>>>>>>> >>>> <mailto:scott.cow...@dremio.com>.invalid> wrote:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> From an engine perspective I think compatibility between Delta 
>>>>>>>>> >>>>> and Iceberg on DVs is a great thing to have.  The additions for 
>>>>>>>>> >>>>> cross-compat seem a minor thing to me that is vastly outweighed 
>>>>>>>>> >>>>> by a future where Delta tables with DVs were supported in Delta 
>>>>>>>>> >>>>> Uniform and could be read by any Iceberg V3 compliant engine.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> -Scott
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> On Tue, Oct 15, 2024 at 2:06 PM Anton Okolnychyi 
>>>>>>>>> >>>>> <aokolnyc...@gmail.com <mailto:aokolnyc...@gmail.com>> wrote:
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> Are there engines/vendors/companies in the community that 
>>>>>>>>> >>>>>> support both Iceberg and Delta and would benefit from having 
>>>>>>>>> >>>>>> one blob layout for DVs?
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> - Anton
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> вт, 15 жовт. 2024 р. о 11:10 rdb...@gmail.com 
>>>>>>>>> >>>>>> <mailto:rdb...@gmail.com> <rdb...@gmail.com 
>>>>>>>>> >>>>>> <mailto:rdb...@gmail.com>> пише:
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> Thanks, Szehon.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> To clarify on compatibility, using the same format for the 
>>>>>>>>> >>>>>>> blobs makes it so that existing Delta readers can read and 
>>>>>>>>> >>>>>>> use the DVs written by Iceberg. I'd love for Delta to adopt 
>>>>>>>>> >>>>>>> Puffin, but if we adopt the extra fields they would not need 
>>>>>>>>> >>>>>>> to change how readers work. That's why I think there is a 
>>>>>>>>> >>>>>>> benefit to using the same format. We avoid fragmentation and 
>>>>>>>>> >>>>>>> make sure data and delete files are compatible. No 
>>>>>>>>> >>>>>>> unnecessary fragmentation.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> Ryan
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> On Tue, Oct 15, 2024 at 10:57 AM Szehon Ho 
>>>>>>>>> >>>>>>> <szehon.apa...@gmail.com <mailto:szehon.apa...@gmail.com>> 
>>>>>>>>> >>>>>>> wrote:
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> This is awesome work by Anton and Ryan, it looks like a ton 
>>>>>>>>> >>>>>>>> of effort has gone into the V3 position vector proposal to 
>>>>>>>>> >>>>>>>> make it clean and efficient, a long time coming and Im truly 
>>>>>>>>> >>>>>>>> excited to see the great improvement in storage/perf.
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> wrt to these fields, I think most of the concerns are 
>>>>>>>>> >>>>>>>> already mentioned by the other community members in the prs 
>>>>>>>>> >>>>>>>> https://github.com/apache/iceberg/pull/11238 and 
>>>>>>>>> >>>>>>>> https://github.com/apache/iceberg/pull/11238, so not much to 
>>>>>>>>> >>>>>>>> add.  The DV itself is RoaringBitmap 64-bit format so that's 
>>>>>>>>> >>>>>>>> great, the argument for CRC seems reasonable, and I dont 
>>>>>>>>> >>>>>>>> have enough data to be opinionated towards endian/magic byte.
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> But I do lean towards the many PR comments that the extra 
>>>>>>>>> >>>>>>>> length field is unnecessary, and would just add confusion.  
>>>>>>>>> >>>>>>>> It seemed to me that the Iceberg community has made so much 
>>>>>>>>> >>>>>>>> effort to trim to spec to the bare minimum for cleanliness 
>>>>>>>>> >>>>>>>> and efficiency, so I do feel the field is not in the normal 
>>>>>>>>> >>>>>>>> direction of the project.  Also Im not clear on the plan for 
>>>>>>>>> >>>>>>>> old Delta readers, they cant read Puffin anyway, if Delta 
>>>>>>>>> >>>>>>>> adopts Puffin, then new readers could adopt?  Anyway great 
>>>>>>>>> >>>>>>>> work again, thanks for raising the issue on devlist!
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>> >>>>>>>> Szehon
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> On Mon, Oct 14, 2024 at 5:14 PM rdb...@gmail.com 
>>>>>>>>> >>>>>>>> <mailto:rdb...@gmail.com> <rdb...@gmail.com 
>>>>>>>>> >>>>>>>> <mailto:rdb...@gmail.com>> wrote:
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> > I think it might be worth mentioning the current proposal 
>>>>>>>>> >>>>>>>>> > makes some, mostly minor, design choices to try to be 
>>>>>>>>> >>>>>>>>> > compatible with Delta Lake deletion vectors.
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> Yes it does, and thanks for pointing this out, Micah. I 
>>>>>>>>> >>>>>>>>> think it's important to consider whether compatibility is 
>>>>>>>>> >>>>>>>>> important to this community. I just replied to Piotr on the 
>>>>>>>>> >>>>>>>>> PR, but I'll adapt some of that response here to reach the 
>>>>>>>>> >>>>>>>>> broader community.
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> I think there is value in supporting compatibility with 
>>>>>>>>> >>>>>>>>> older Delta readers, but I acknowledge that this may be my 
>>>>>>>>> >>>>>>>>> perspective because my employer has a lot of Delta 
>>>>>>>>> >>>>>>>>> customers that we are going to support now and in the 
>>>>>>>>> >>>>>>>>> future.
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> The main use case for maintaining compatibility with the 
>>>>>>>>> >>>>>>>>> Delta format is that it's hard to move old jobs to new code 
>>>>>>>>> >>>>>>>>> in a migration. We see a similar issue in Hive to Iceberg 
>>>>>>>>> >>>>>>>>> migrations, where unknown older readers prevent migration 
>>>>>>>>> >>>>>>>>> entirely because they are hard to track down and often read 
>>>>>>>>> >>>>>>>>> files directly from the backing object store. I'd like to 
>>>>>>>>> >>>>>>>>> avoid the same problem here, where all readers need to be 
>>>>>>>>> >>>>>>>>> identified and migrated at the same time. Compatibility 
>>>>>>>>> >>>>>>>>> with the format those readers expect makes it possible to 
>>>>>>>>> >>>>>>>>> maintain Delta metadata for them temporarily. That 
>>>>>>>>> >>>>>>>>> increases confidence that things won't randomly break and 
>>>>>>>>> >>>>>>>>> makes it easier to get people to move forward.
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> The second reason for maintaining compatibility is that we 
>>>>>>>>> >>>>>>>>> want for the formats to become more similar. My hope is 
>>>>>>>>> >>>>>>>>> that we can work across both communities and come up with a 
>>>>>>>>> >>>>>>>>> common metadata format in a future version -- which 
>>>>>>>>> >>>>>>>>> explains my interest in smooth migrations. Maintaining 
>>>>>>>>> >>>>>>>>> compatibility in cases like this builds trust and keeps our 
>>>>>>>>> >>>>>>>>> options open: if we have compatible data layers, then it's 
>>>>>>>>> >>>>>>>>> easier to build a compatible metadata layer. I'm hoping 
>>>>>>>>> >>>>>>>>> that if we make the blob format compatible, we can get the 
>>>>>>>>> >>>>>>>>> Delta community to start using Puffin for better 
>>>>>>>>> >>>>>>>>> self-describing delete files.
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> Other people may not share those goals, so I think it helps 
>>>>>>>>> >>>>>>>>> to consider what is being compromised for this 
>>>>>>>>> >>>>>>>>> compatibility. I don't think it is too much. There are 2 
>>>>>>>>> >>>>>>>>> additional fields:
>>>>>>>>> >>>>>>>>> * A 4-byte length field (that Iceberg doesn't need)
>>>>>>>>> >>>>>>>>> * A 4-byte CRC to validate the contents of the bitmap
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> There are also changes to how these would have been added 
>>>>>>>>> >>>>>>>>> if the Iceberg community were building this independently.
>>>>>>>>> >>>>>>>>> * Our initial version didn't include a CRC at all, but now 
>>>>>>>>> >>>>>>>>> that we think it's useful compatibility means using a 
>>>>>>>>> >>>>>>>>> CRC-32 checksum rather than a newer one
>>>>>>>>> >>>>>>>>> * The Delta format uses big endian for its fields (or mixed 
>>>>>>>>> >>>>>>>>> endian if you consider RoaringBitmap is LE)
>>>>>>>>> >>>>>>>>> * The magic bytes (added to avoid reading the Puffin 
>>>>>>>>> >>>>>>>>> footer) would have been different
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> Overall, I don't think that those changes to what we would 
>>>>>>>>> >>>>>>>>> have done are unreasonable. It's only 8 extra bytes and 
>>>>>>>>> >>>>>>>>> half of them are for a checksum that is a good idea.
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> I'm looking forward to what the rest of the community 
>>>>>>>>> >>>>>>>>> thinks about this. Thanks for reviewing the PR!
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> Ryan
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> On Sun, Oct 13, 2024 at 10:45 PM Jean-Baptiste Onofré 
>>>>>>>>> >>>>>>>>> <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote:
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> Hi
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> Thanks for the PRs ! I reviewed Anton's document, I will 
>>>>>>>>> >>>>>>>>>> do a pass on the PRs.
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> Imho, it's important to get feedback from query engines, 
>>>>>>>>> >>>>>>>>>> as, if delete
>>>>>>>>> >>>>>>>>>> vectors is not a problem per se (it's what we are using as 
>>>>>>>>> >>>>>>>>>> internal
>>>>>>>>> >>>>>>>>>> representation), the use of Puffin files to store it is 
>>>>>>>>> >>>>>>>>>> "impactful"
>>>>>>>>> >>>>>>>>>> for the query engines (probably some query engines might 
>>>>>>>>> >>>>>>>>>> need to
>>>>>>>>> >>>>>>>>>> implement Puffin spec (read/write) using other language 
>>>>>>>>> >>>>>>>>>> than Java, for
>>>>>>>>> >>>>>>>>>> instance Apache Impala).
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> I like the proposal, I just hope we won't "surprise" some 
>>>>>>>>> >>>>>>>>>> query
>>>>>>>>> >>>>>>>>>> engines with extra work :)
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> Regards
>>>>>>>>> >>>>>>>>>> JB
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> On Thu, Oct 10, 2024 at 11:41 PM rdb...@gmail.com 
>>>>>>>>> >>>>>>>>>> <mailto:rdb...@gmail.com> <rdb...@gmail.com 
>>>>>>>>> >>>>>>>>>> <mailto:rdb...@gmail.com>> wrote:
>>>>>>>>> >>>>>>>>>> >
>>>>>>>>> >>>>>>>>>> > Hi everyone,
>>>>>>>>> >>>>>>>>>> >
>>>>>>>>> >>>>>>>>>> > There seems to be broad agreement around Anton's 
>>>>>>>>> >>>>>>>>>> > proposal to use deletion vectors in Iceberg v3, so I've 
>>>>>>>>> >>>>>>>>>> > opened two PRs that update the spec with the proposed 
>>>>>>>>> >>>>>>>>>> > changes. The first, PR #11238, adds a new Puffin blob 
>>>>>>>>> >>>>>>>>>> > type, delete-vector-v1, that stores a delete vector. The 
>>>>>>>>> >>>>>>>>>> > second, PR #11240, updates the Iceberg table spec.
>>>>>>>>> >>>>>>>>>> >
>>>>>>>>> >>>>>>>>>> > Please take a look and comment!
>>>>>>>>> >>>>>>>>>> >
>>>>>>>>> >>>>>>>>>> > Ryan

Re: Spec changes for deletion vectors

Reply via email to