Re: [DISCUSS] v4 - One file commits

Amogh Jahagirdar Thu, 19 Feb 2026 15:52:22 -0800

Hey folks, I've set up an additional initial discussion on DVs for Monday.
This topic is fairly complex and there is also now a free calendar slot. I
think it'd be helpful for us to first make sure we're all on the same page
in terms of what the approach proposed by Anton earlier in the thread means
and the high level mechanics. I should also have more to share on the doc
about how the entry structure and change detection could look like in this
approach. Then on Thursday we can get into more details and targeted points
of discussion on this topic.


Thanks,
Amogh Jahagirdar

On Tue, Feb 17, 2026 at 9:27 PM Amogh Jahagirdar <[email protected]> wrote:

> Thanks Steven! I've set up some time next Thursday for the community to
> discuss this. We're also looking at how the content entry would look like
> in a combined DV with potential column updates for DV changes, and how
> change detection could look like in this approach. I should have more to
> share on this by the time of the community discussion next week.
> We should also consider potential root churn and memory consumption
> stemming from expected root entry inflation due to a combined data file +
> DV entry with possible column updates for certain DV workloads; though at
> least for memory consumption of stats being held after planning, that
> arguably is an implementation problem for certain integrations.
>
> Thanks,
> Amogh Jahagirdar
>
> On Fri, Feb 13, 2026 at 10:58 AM Steven Wu <[email protected]> wrote:
>
>> I wrote up some analysis with back-of-the-envelope calculations about the
>> column update approach for DV colocation. It mainly concerns the 2nd use
>> case: deleting a large number of rows from a small number of files.
>>
>>
>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.gvdulzy486n7
>>
>>
>>
>> On Wed, Feb 4, 2026 at 1:02 AM Péter Váry <[email protected]>
>> wrote:
>>
>>> I fully agree with Anton and Steven that we need benchmarks before
>>> choosing any direction.
>>>
>>> I ran some preliminary column‑stitching benchmarks last summer:
>>>
>>>    - Results are available in the doc:
>>>    
>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>>>    - Code is here: https://github.com/apache/iceberg/pull/13306
>>>
>>> I’ve summarized the most relevant results at the end of this email. They
>>> show roughly a 10% slowdown on the read path with column stitching in
>>> similar scenarios when using local SSDs. I expect that in real deployments
>>> the metadata read cost will mostly be driven by blob I/O (assuming no
>>> caching). If blob access becomes the dominant factor in read latency,
>>> multithreaded fetching should be able to absorb the overhead introduced by
>>> column stitching, resulting in latency similar to the single‑file layout
>>> (unless IO is already the bottleneck)
>>>
>>> We should definitely rerun the benchmarks once we have a clearer
>>> understanding of the intended usage patterns.
>>> Thanks,
>>> Peter
>>>
>>>
>>> The relevant(ish) results are for 100 columns, with 2 families with
>>> 50-50 columns and local read:
>>>
>>> The base is:
>>> MultiThreadedParquetBenchmark.read        100           0
>>>  false    ss   20   3.739 ±  0.096   s/op
>>>
>>> The read for single threaded:
>>> MultiThreadedParquetBenchmark.read        100           2
>>>  false    ss   20   4.036 ±  0.082   s/op
>>>
>>> The read for multi threaded:
>>> MultiThreadedParquetBenchmark.read        100           2
>>> true    ss   20   4.063 ±  0.080   s/op
>>>
>>> Steven Wu <[email protected]> ezt írta (időpont: 2026. febr. 3., K,
>>> 23:27):
>>>
>>>>
>>>> I agree with Anton in this
>>>> <https://docs.google.com/document/d/1jZy4g6UDi3hdblpkSzDnqgzgATFKFoMaHmt4nNH8M7o/edit?disco=AAAByzDx21w>
>>>> comment thread that we probably need to run benchmarks for a few common
>>>> scenarios to guide this decision. We need to write down detailed plans for
>>>> those scenarios and what are we measuring. Also ideally, we want to measure
>>>> using the V4 metadata structure (like Parquet manifest file, column stats
>>>> structs, adaptive tree). There are PoC PRs available for column stats,
>>>> Parquet manifest, and root manifest. It would probably be tricky to piece
>>>> them together to run the benchmark considering the PoC status. We also need
>>>> the column stitching capability on the read path to test the column file
>>>> approach.
>>>>
>>>> On Tue, Feb 3, 2026 at 1:53 PM Anoop Johnson <[email protected]> wrote:
>>>>
>>>>> I'm in favor of co-located DV metadata with column file override and
>>>>> not doing affiliated/unaffiliated delete manifests. This is conceptually
>>>>> similar to strictly affiliated delete manifests with positional joins, and
>>>>> will halve the number of I/Os when there is no DV column override. It is
>>>>> simpler to implement
>>>>> and will speed up reads.
>>>>>
>>>>> Unaffiliated DV manifests are flexible for writers. They reduce the
>>>>> chance of physical conflicts when there are concurrent large/random 
>>>>> deletes
>>>>> that change DVs on different files in the same manifest. But the
>>>>> flexibility comes at a read-time cost. If the number of unaffiliated DVs
>>>>> exceeds a threshold, it could cause driver OOMs or require distributed 
>>>>> join
>>>>> to pair up DVs with data files. With colocated metadata, manifest DVs can
>>>>> reduce the chance of conflicts up to a certain write size.
>>>>>
>>>>> I assume we will still support unaffiliated manifests for equality
>>>>> deletes, but perhaps we can restrict it to just equality deletes.
>>>>>
>>>>> -Anoop
>>>>>
>>>>>
>>>>> On Mon, Feb 2, 2026 at 4:27 PM Anton Okolnychyi <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I added the approach with column files to the doc.
>>>>>>
>>>>>> To sum up, separate data and delete manifests with affinity
>>>>>> would perform somewhat on par with co-located DV metadata (a.k.a. direct
>>>>>> assignment) if we add support for column files when we need to replace 
>>>>>> most
>>>>>> or all DVs (use case 1). That said, the support for direct assignment 
>>>>>> with
>>>>>> in-line metadata DVs can help us avoid unaffiliated delete manifests when
>>>>>> we need to replace a few DVs (use case 2).
>>>>>>
>>>>>> So the key question is whether we want to allow unaffiliated delete
>>>>>> manifests with DVs... If we don't, then we would likely want to have
>>>>>> co-located DV metadata and must support efficient column updates not to
>>>>>> regress compared to V2 and V3 for large MERGE jobs that modify a small 
>>>>>> set
>>>>>> of records for most files.
>>>>>>
>>>>>> пн, 2 лют. 2026 р. о 13:20 Anton Okolnychyi <[email protected]>
>>>>>> пише:
>>>>>>
>>>>>>> Anoop, correct, if we keep data and delete manifests separate, there
>>>>>>> is a better way to combine the entries and we should NOT rely on the
>>>>>>> referenced data file path. Reconciling by implicit position will reduce 
>>>>>>> the
>>>>>>> size of the DV entry (no need to store the referenced data file path) 
>>>>>>> and
>>>>>>> will improve the planning performance (no equals/hashCode on the path).
>>>>>>>
>>>>>>> Steven, I agree. Most notes in the doc pre-date discussions we had
>>>>>>> on column updates. You are right, given that we are gravitating towards 
>>>>>>> a
>>>>>>> native way to handle column updates, it seems logical to use the same
>>>>>>> approach for replacing DVs, since they’re essentially column updates. 
>>>>>>> Let
>>>>>>> me add one more approach to the doc based on what Anurag and Peter have 
>>>>>>> so
>>>>>>> far.
>>>>>>>
>>>>>>> нд, 1 лют. 2026 р. о 20:59 Steven Wu <[email protected]> пише:
>>>>>>>
>>>>>>>> Anton, thanks for raising this. I agree this deserves another look.
>>>>>>>> I added a comment in your doc that we can potentially apply the column
>>>>>>>> update proposal for data file update to the manifest file updates as 
>>>>>>>> well,
>>>>>>>> to colocate the data DV and data manifest files. Data DVs can be a
>>>>>>>> separate column in the data manifest file and updated separately in a
>>>>>>>> column file. This is the same as the coalesced positional join that 
>>>>>>>> Anoop
>>>>>>>> mentioned.
>>>>>>>>
>>>>>>>> On Sun, Feb 1, 2026 at 4:14 PM Anoop Johnson <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thank you for raising this, Anton. I had a similar observation
>>>>>>>>> while prototyping <https://github.com/apache/iceberg/pull/14533>
>>>>>>>>> the adaptive metadata tree. The overhead of doing a path-based hash 
>>>>>>>>> join of
>>>>>>>>> a data manifest with the affiliated delete manifest is high: my 
>>>>>>>>> estimate
>>>>>>>>> was that the join adds about 5-10% overhead. The hash table 
>>>>>>>>> build/probe
>>>>>>>>> alone takes about 5 ms for manifests with 25K entries. There are 
>>>>>>>>> engines
>>>>>>>>> that can do vectorized hash joins that can lower this, but the 
>>>>>>>>> overhead and
>>>>>>>>> complexity of a SIMD-friendly hash join is non-trivial.
>>>>>>>>>
>>>>>>>>> An alternative to relying on the external file feature in Parquet,
>>>>>>>>> is to make affiliated manifests order-preserving: ie DVs in an 
>>>>>>>>> affiliated
>>>>>>>>> delete manifest must appear in the same position as the corresponding 
>>>>>>>>> data
>>>>>>>>> file in the data manifest the delete manifest is affiliated to.  If a 
>>>>>>>>> data
>>>>>>>>> file does not have a DV, the DV manifest must store a NULL. This would
>>>>>>>>> allow us to do positional joins, which are much faster. If we wanted, 
>>>>>>>>> we
>>>>>>>>> could even have multiple affiliated DV manifests for a data manifest 
>>>>>>>>> and
>>>>>>>>> the reader would do a COALESCED positional join (i.e. pick the first
>>>>>>>>> non-null value as the DV). It puts the sorting responsibility to the
>>>>>>>>> writers, but it might be a reasonable tradeoff.
>>>>>>>>>
>>>>>>>>> Also, the options don't necessarily have to be mutually exclusive.
>>>>>>>>> We could still allow affiliated DVs to be "folded" into data manifest 
>>>>>>>>> (e.g.
>>>>>>>>> by background optimization jobs or the writer itself). That might be 
>>>>>>>>> the
>>>>>>>>> optimal choice for read-heavy tables because it will halve the number 
>>>>>>>>> of
>>>>>>>>> I/Os readers have to make.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Anoop
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jan 30, 2026 at 6:03 PM Anton Okolnychyi <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I had a chance to catch up on some of the V4 discussions. Given
>>>>>>>>>> that we are getting rid of the manifest list and switching to 
>>>>>>>>>> Parquet, I
>>>>>>>>>> wanted to re-evaluate the possibility of direct DV assignment that we
>>>>>>>>>> discarded in V3 to avoid regressions. I have put together my 
>>>>>>>>>> thoughts in a
>>>>>>>>>> doc [1].
>>>>>>>>>>
>>>>>>>>>> TL;DR:
>>>>>>>>>>
>>>>>>>>>> - I think the current V4 proposal that keeps data and delete
>>>>>>>>>> manifests separate but introduces affinity is a solid choice for 
>>>>>>>>>> cases when
>>>>>>>>>> we need to replace DVs in many / most files. I outlined an approach 
>>>>>>>>>> with
>>>>>>>>>> column-split Parquet files but it doesn't improve the performance 
>>>>>>>>>> and takes
>>>>>>>>>> dependency on a portion of the Parquet spec that is not really 
>>>>>>>>>> implemented.
>>>>>>>>>> - Pushing unaffiliated DVs directly into the root to replace a
>>>>>>>>>> small set of DVs is going to be fast on write but does require 
>>>>>>>>>> resolving
>>>>>>>>>> where those DVs apply at read time. Using inline metadata DVs with
>>>>>>>>>> column-split Parquet files is a little more promising in this case 
>>>>>>>>>> as it
>>>>>>>>>> allows to avoid unaffiliated DVs. That said, it again relies on 
>>>>>>>>>> something
>>>>>>>>>> Parquet doesn't implement right now, requires changing maintenance
>>>>>>>>>> operations, and yields minimal benefits.
>>>>>>>>>>
>>>>>>>>>> All in all, the V4 proposal seems like a strict improvement over
>>>>>>>>>> V3 but I insist that we reconsider usage of the referenced data file 
>>>>>>>>>> path
>>>>>>>>>> when resolving DVs to data files.
>>>>>>>>>>
>>>>>>>>>> [1] -
>>>>>>>>>> https://docs.google.com/document/d/1jZy4g6UDi3hdblpkSzDnqgzgATFKFoMaHmt4nNH8M7o
>>>>>>>>>>
>>>>>>>>>> - Anton
>>>>>>>>>>
>>>>>>>>>> сб, 22 лист. 2025 р. о 13:37 Amogh Jahagirdar <[email protected]>
>>>>>>>>>> пише:
>>>>>>>>>>
>>>>>>>>>>> Hey all,
>>>>>>>>>>>
>>>>>>>>>>> Here is the meeting recording
>>>>>>>>>>> <https://drive.google.com/file/d/1lG9sM-JTwqcIgk7JsAryXXCc1vMnstJs/view?usp=sharing>
>>>>>>>>>>>  and generated meeting summary
>>>>>>>>>>> <https://docs.google.com/document/d/1e50p8TXL2e3CnUwKMOvm8F4s2PeVMiKWHPxhxOW1fIM/edit?usp=sharing>.
>>>>>>>>>>> Thanks all for attending yesterday!
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Nov 20, 2025 at 8:49 AM Amogh Jahagirdar <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey folks,
>>>>>>>>>>>>
>>>>>>>>>>>> I was out for some time, but set up a sync for tomorrow at 9am
>>>>>>>>>>>> PST. For this discussion, I do think it would be great to focus on 
>>>>>>>>>>>> the
>>>>>>>>>>>> manifest DV representation, factoring in analyses on bitmap 
>>>>>>>>>>>> representation
>>>>>>>>>>>> storage footprints, and the entry structure considering how we 
>>>>>>>>>>>> want to
>>>>>>>>>>>> approach change detection. If there are other topics that people 
>>>>>>>>>>>> want to
>>>>>>>>>>>> highlight, please do bring those up as well!
>>>>>>>>>>>>
>>>>>>>>>>>> I also recognize that this is a bit short term scheduling, so
>>>>>>>>>>>> please do reach out to me if this time is difficult to work with; 
>>>>>>>>>>>> next week
>>>>>>>>>>>> is the Thanksgiving holidays here, and since people would be 
>>>>>>>>>>>> travelling/out
>>>>>>>>>>>> I figured I'd try to schedule before then.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Oct 17, 2025 at 9:03 AM Amogh Jahagirdar <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey folks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry for the delay, here's the recording link
>>>>>>>>>>>>> <https://drive.google.com/file/d/1YOmPROXjAKYAWAcYxqAFHdADbqELVVf2/view>
>>>>>>>>>>>>>   from
>>>>>>>>>>>>> last week's discussion.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Oct 10, 2025 at 9:44 AM Péter Váry <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Same here.
>>>>>>>>>>>>>> Please record if you can.
>>>>>>>>>>>>>> Thanks, Peter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Oct 10, 2025, 17:39 Fokko Driesprong <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hey Amogh,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the write-up. Unfortunately, I won’t be able to
>>>>>>>>>>>>>>> attend. Will it be recorded? Thanks!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>> Fokko
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Op di 7 okt 2025 om 20:36 schreef Amogh Jahagirdar <
>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've setup time this Friday at 9am PST for another sync on
>>>>>>>>>>>>>>>> single file commits. In terms of what would be great to focus 
>>>>>>>>>>>>>>>> on for the
>>>>>>>>>>>>>>>> discussion:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. Whether it makes sense or not to eliminate the tuple,
>>>>>>>>>>>>>>>> and instead representing the tuple via lower/upper boundaries. 
>>>>>>>>>>>>>>>> As a
>>>>>>>>>>>>>>>> reminder, one of the goals is to avoid tying a partition spec 
>>>>>>>>>>>>>>>> to a
>>>>>>>>>>>>>>>> manifest; in the root we can have a mix of files spanning 
>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>> partition specs, and even in leaf manifests avoiding this 
>>>>>>>>>>>>>>>> coupling can
>>>>>>>>>>>>>>>> enable more desirable clustering of metadata.
>>>>>>>>>>>>>>>> In the vast majority of cases, we could leverage the
>>>>>>>>>>>>>>>> property that a file is effectively partitioned if the 
>>>>>>>>>>>>>>>> lower/upper for a
>>>>>>>>>>>>>>>> given field is equal. The nuance here is with the particular 
>>>>>>>>>>>>>>>> case of
>>>>>>>>>>>>>>>> identity partitioned string/binary columns which can be 
>>>>>>>>>>>>>>>> truncated in stats.
>>>>>>>>>>>>>>>> One approach is to require that writers must not produce 
>>>>>>>>>>>>>>>> truncated stats
>>>>>>>>>>>>>>>> for identity partitioned columns. It's also important to keep 
>>>>>>>>>>>>>>>> in mind that
>>>>>>>>>>>>>>>> all of this is just for the purpose of reconstructing the 
>>>>>>>>>>>>>>>> partition tuple,
>>>>>>>>>>>>>>>> which is only required during equality delete matching. 
>>>>>>>>>>>>>>>> Another area we
>>>>>>>>>>>>>>>> need to cover as part of this is on exact bounds on stats. 
>>>>>>>>>>>>>>>> There are other
>>>>>>>>>>>>>>>> options here as well such as making all new equality deletes 
>>>>>>>>>>>>>>>> in V4 be
>>>>>>>>>>>>>>>> global and instead match based on bounds, or keeping the tuple 
>>>>>>>>>>>>>>>> but each
>>>>>>>>>>>>>>>> tuple is effectively based off a union schema of all partition 
>>>>>>>>>>>>>>>> specs. I am
>>>>>>>>>>>>>>>> adding a separate appendix section outlining the span of 
>>>>>>>>>>>>>>>> options here and
>>>>>>>>>>>>>>>> the different tradeoffs.
>>>>>>>>>>>>>>>> Once we get this more to a conclusive state, I'll move a
>>>>>>>>>>>>>>>> summarized version to the main doc.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. @[email protected] <[email protected]> has
>>>>>>>>>>>>>>>> updated the doc with a section
>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.rrpksmp8zkb#heading=h.qau0y5xkh9mn>
>>>>>>>>>>>>>>>>  on
>>>>>>>>>>>>>>>> how we can do change detection from the root in a variety of 
>>>>>>>>>>>>>>>> write
>>>>>>>>>>>>>>>> scenarios. I've done a review on it, and it covers the cases I 
>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>> expect. It'd be good for folks to take a look and please give 
>>>>>>>>>>>>>>>> feedback
>>>>>>>>>>>>>>>> before we discuss. Thank you Steven for adding that section 
>>>>>>>>>>>>>>>> and all the
>>>>>>>>>>>>>>>> diagrams.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Sep 18, 2025 at 3:19 PM Amogh Jahagirdar <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hey folks just following up from the discussion last
>>>>>>>>>>>>>>>>> Friday with a summary and some next steps:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1.) For the various change detection cases, we concluded
>>>>>>>>>>>>>>>>> it's best just to go through those in an offline manner on 
>>>>>>>>>>>>>>>>> the doc since
>>>>>>>>>>>>>>>>> it's hard to verify all that correctness in a large meeting 
>>>>>>>>>>>>>>>>> setting.
>>>>>>>>>>>>>>>>> 2.) We mostly discussed eliminating the partition tuple.
>>>>>>>>>>>>>>>>> On the original proposal, I was mostly aiming for the ability 
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> re-constructing the tuple from the stats for the purpose of 
>>>>>>>>>>>>>>>>> equality delete
>>>>>>>>>>>>>>>>> matching (a file is partitioned if the lower and upper bounds 
>>>>>>>>>>>>>>>>> are equal);
>>>>>>>>>>>>>>>>> There's some nuance in how we need to handle identity 
>>>>>>>>>>>>>>>>> partition values
>>>>>>>>>>>>>>>>> since for string/binary they cannot be truncated. Another 
>>>>>>>>>>>>>>>>> potential option
>>>>>>>>>>>>>>>>> is to treat all equality deletes as effectively global and 
>>>>>>>>>>>>>>>>> narrow their
>>>>>>>>>>>>>>>>> application based on the stats values. This may require 
>>>>>>>>>>>>>>>>> defining tight
>>>>>>>>>>>>>>>>> bounds. I'm still collecting my thoughts on this one.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks folks! Please also let me know if any of the
>>>>>>>>>>>>>>>>> following links are inaccessible for any reason.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Meeting recording link:
>>>>>>>>>>>>>>>>> https://drive.google.com/file/d/1gv8TrR5xzqqNxek7_sTZkpbwQx1M3dhK/view
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Meeting summary:
>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/131N0CDpzZczURxitN0HGS7dTqRxQT_YS9jMECkGGvQU
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Sep 8, 2025 at 3:40 PM Amogh Jahagirdar <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Update: I moved the discussion time to this Friday at 9
>>>>>>>>>>>>>>>>>> am PST since I found out that quite a few folks involved in 
>>>>>>>>>>>>>>>>>> the proposals
>>>>>>>>>>>>>>>>>> will be out next week, and I also know some folks will also 
>>>>>>>>>>>>>>>>>> be out the week
>>>>>>>>>>>>>>>>>> after that.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Amogh J
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Sep 8, 2025 at 8:57 AM Amogh Jahagirdar <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hey folks sorry for the late follow up here,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks @Kevin Liu <[email protected]> for sharing
>>>>>>>>>>>>>>>>>>> the recording link of the previous discussion! I've set up 
>>>>>>>>>>>>>>>>>>> another sync for
>>>>>>>>>>>>>>>>>>> next Tuesday 09/16 at 9am PST. This time I've set it up 
>>>>>>>>>>>>>>>>>>> from my corporate
>>>>>>>>>>>>>>>>>>> email so we can get recordings and transcriptions (and I've 
>>>>>>>>>>>>>>>>>>> made sure to
>>>>>>>>>>>>>>>>>>> keep the meeting invite open so we don't have to manually 
>>>>>>>>>>>>>>>>>>> let people in).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In terms of next steps of areas which I think would be
>>>>>>>>>>>>>>>>>>> good to focus on for establishing consensus:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1. How do we model the manifest entry structure so that
>>>>>>>>>>>>>>>>>>> changes to manifest DVs can be obtained easily from the 
>>>>>>>>>>>>>>>>>>> root? There are a
>>>>>>>>>>>>>>>>>>> few options here; the most promising approach is to keep an 
>>>>>>>>>>>>>>>>>>> additional DV
>>>>>>>>>>>>>>>>>>> which encodes the diff in additional positions which have 
>>>>>>>>>>>>>>>>>>> been removed from
>>>>>>>>>>>>>>>>>>> a leaf manifest.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2. Modeling partition transforms via expressions and
>>>>>>>>>>>>>>>>>>> establishing a unified table ID space so that we can 
>>>>>>>>>>>>>>>>>>> simplify how partition
>>>>>>>>>>>>>>>>>>> tuples may be represented via stats and also have a way in 
>>>>>>>>>>>>>>>>>>> the future to
>>>>>>>>>>>>>>>>>>> store stats on any derived column. I have a short
>>>>>>>>>>>>>>>>>>> proposal
>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1oV8dapKVzB4pZy5pKHUCj5j9i2_1p37BJSeT7hyKPpg/edit?tab=t.0>
>>>>>>>>>>>>>>>>>>>  for
>>>>>>>>>>>>>>>>>>> this that probably still needs some tightening up on the 
>>>>>>>>>>>>>>>>>>> expression
>>>>>>>>>>>>>>>>>>> modeling itself (and some prototyping) but the general idea 
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> establishing a unified table ID space is covered. All 
>>>>>>>>>>>>>>>>>>> feedback welcome!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Aug 25, 2025 at 1:34 PM Kevin Liu <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks Amogh. Looks like the recording for last week's
>>>>>>>>>>>>>>>>>>>> sync is available on Youtube. Here's the link,
>>>>>>>>>>>>>>>>>>>> https://www.youtube.com/watch?v=uWm-p--8oVQ
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Kevin Liu
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Aug 12, 2025 at 9:10 PM Amogh Jahagirdar <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hey folks,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Just following up on this to give the community as to
>>>>>>>>>>>>>>>>>>>>> where we're at and my proposed next steps.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I've been editing and merging the contents from our
>>>>>>>>>>>>>>>>>>>>> proposal into the proposal
>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw>
>>>>>>>>>>>>>>>>>>>>>  from
>>>>>>>>>>>>>>>>>>>>> Russell and others. For any future comments on docs, 
>>>>>>>>>>>>>>>>>>>>> please comment on the
>>>>>>>>>>>>>>>>>>>>> linked proposal. I've also marked it on our doc in red 
>>>>>>>>>>>>>>>>>>>>> text so it's clear
>>>>>>>>>>>>>>>>>>>>> to redirect to the other proposal as a source of truth 
>>>>>>>>>>>>>>>>>>>>> for comments.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> In terms of next steps,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 1. An important design decision point is around inline
>>>>>>>>>>>>>>>>>>>>> manifest DVs, external manifest DVs or enabling both. I'm 
>>>>>>>>>>>>>>>>>>>>> working on
>>>>>>>>>>>>>>>>>>>>> measuring different approaches for representing the 
>>>>>>>>>>>>>>>>>>>>> compressed DV
>>>>>>>>>>>>>>>>>>>>> representation since that will inform how many entries 
>>>>>>>>>>>>>>>>>>>>> can reasonably fit
>>>>>>>>>>>>>>>>>>>>> in a small root manifest; from that we can derive 
>>>>>>>>>>>>>>>>>>>>> implications on different
>>>>>>>>>>>>>>>>>>>>> write patterns and determine the right approach for 
>>>>>>>>>>>>>>>>>>>>> storing these manifest
>>>>>>>>>>>>>>>>>>>>> DVs.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2. Another key point is around determining if/how we
>>>>>>>>>>>>>>>>>>>>> can reasonably enable V4 to represent changes in the root 
>>>>>>>>>>>>>>>>>>>>> manifest so that
>>>>>>>>>>>>>>>>>>>>> readers can effectively just infer file level changes 
>>>>>>>>>>>>>>>>>>>>> from the root.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 3. One of the aspects of the proposal is getting away
>>>>>>>>>>>>>>>>>>>>> from partition tuple requirement in the root which 
>>>>>>>>>>>>>>>>>>>>> currently holds us to
>>>>>>>>>>>>>>>>>>>>> have associativity between a partition spec and a 
>>>>>>>>>>>>>>>>>>>>> manifest. These aspects
>>>>>>>>>>>>>>>>>>>>> can be modeled as essentially column stats which gives a 
>>>>>>>>>>>>>>>>>>>>> lot of flexibility
>>>>>>>>>>>>>>>>>>>>> into the organization of the manifest. There are 
>>>>>>>>>>>>>>>>>>>>> important details around
>>>>>>>>>>>>>>>>>>>>> field ID spaces here which tie into how the stats are 
>>>>>>>>>>>>>>>>>>>>> structured. What
>>>>>>>>>>>>>>>>>>>>> we're proposing here is to have a unified expression ID 
>>>>>>>>>>>>>>>>>>>>> space that could
>>>>>>>>>>>>>>>>>>>>> also benefit us for storing things like virtual columns 
>>>>>>>>>>>>>>>>>>>>> down the line. I go
>>>>>>>>>>>>>>>>>>>>> into this in the proposal but I'm working on separating 
>>>>>>>>>>>>>>>>>>>>> the appropriate
>>>>>>>>>>>>>>>>>>>>> parts so that the original proposal can mostly just focus 
>>>>>>>>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>>>>>>> organization of the content metadata tree and not how we 
>>>>>>>>>>>>>>>>>>>>> want to solve this
>>>>>>>>>>>>>>>>>>>>> particular ID space problem.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 4. I'm planning on scheduling a recurring community
>>>>>>>>>>>>>>>>>>>>> sync starting next Tuesday at 9am PST, every 2 weeks. If 
>>>>>>>>>>>>>>>>>>>>> I get feedback
>>>>>>>>>>>>>>>>>>>>> from folks that this time will never work, I can 
>>>>>>>>>>>>>>>>>>>>> certainly adjust. For some
>>>>>>>>>>>>>>>>>>>>> reason, I don't have the ability to add to the Iceberg 
>>>>>>>>>>>>>>>>>>>>> Dev calendar, so
>>>>>>>>>>>>>>>>>>>>> I'll figure that out and update the thread when the event 
>>>>>>>>>>>>>>>>>>>>> is scheduled.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 22, 2025 at 11:47 AM Russell Spitzer <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I think this is a great way forward, starting out
>>>>>>>>>>>>>>>>>>>>>> with this much parallel development shows that we have a 
>>>>>>>>>>>>>>>>>>>>>> lot of consensus
>>>>>>>>>>>>>>>>>>>>>> already :)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 22, 2025 at 12:42 PM Amogh Jahagirdar <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hey folks, just following up on this. It looks like
>>>>>>>>>>>>>>>>>>>>>>> our proposal and the proposal that @Russell Spitzer
>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> shared are pretty
>>>>>>>>>>>>>>>>>>>>>>> aligned. I was just chatting with Russell about this, 
>>>>>>>>>>>>>>>>>>>>>>> and we think it'd be
>>>>>>>>>>>>>>>>>>>>>>> best to combine both proposals and have a singular 
>>>>>>>>>>>>>>>>>>>>>>> large effort on this. I
>>>>>>>>>>>>>>>>>>>>>>> can also set up a focused community discussion (similar 
>>>>>>>>>>>>>>>>>>>>>>> to what we're doing
>>>>>>>>>>>>>>>>>>>>>>> on the other V4 proposals) on this starting sometime 
>>>>>>>>>>>>>>>>>>>>>>> next week just to get
>>>>>>>>>>>>>>>>>>>>>>> things moving, if that works for people.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 9:48 PM Amogh Jahagirdar <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hey Russell,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks for sharing the proposal! A few of us (Ryan,
>>>>>>>>>>>>>>>>>>>>>>>> Dan, Anoop and I) have also been working on a proposal 
>>>>>>>>>>>>>>>>>>>>>>>> for an adaptive
>>>>>>>>>>>>>>>>>>>>>>>> metadata tree structure as part of enabling more 
>>>>>>>>>>>>>>>>>>>>>>>> efficient one file
>>>>>>>>>>>>>>>>>>>>>>>> commits. From a read of the summary, it's great to see 
>>>>>>>>>>>>>>>>>>>>>>>> that we're thinking
>>>>>>>>>>>>>>>>>>>>>>>> along the same lines about how to tackle this 
>>>>>>>>>>>>>>>>>>>>>>>> fundamental area!
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Here is our proposal:
>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0
>>>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 8:08 PM Russell Spitzer <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hey y'all!
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> We (Yi Fang, Steven Wu and Myself) wanted to share
>>>>>>>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>>>>> of the thoughts we had on how one-file commits
>>>>>>>>>>>>>>>>>>>>>>>>> could work in Iceberg. This is pretty
>>>>>>>>>>>>>>>>>>>>>>>>> much just a high level overview of the concepts we
>>>>>>>>>>>>>>>>>>>>>>>>> think we need and how Iceberg would behave.
>>>>>>>>>>>>>>>>>>>>>>>>> We haven't gone very far into the actual
>>>>>>>>>>>>>>>>>>>>>>>>> implementation and changes that would need to occur 
>>>>>>>>>>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>>>>>>> SDK to make this happen.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> The high level summary is:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Manifest Lists are out
>>>>>>>>>>>>>>>>>>>>>>>>> Root Manifests take their place
>>>>>>>>>>>>>>>>>>>>>>>>>   A Root manifest can have data manifests, delete
>>>>>>>>>>>>>>>>>>>>>>>>> manifests, manifest delete vectors, data delete 
>>>>>>>>>>>>>>>>>>>>>>>>> vectors and data files
>>>>>>>>>>>>>>>>>>>>>>>>>   Manifest delete vectors allow for modifying a
>>>>>>>>>>>>>>>>>>>>>>>>> manifest without deleting it entirely
>>>>>>>>>>>>>>>>>>>>>>>>>   Data files let you append without writing an
>>>>>>>>>>>>>>>>>>>>>>>>> intermediary manifest
>>>>>>>>>>>>>>>>>>>>>>>>>   Having child data and delete manifests lets you
>>>>>>>>>>>>>>>>>>>>>>>>> still scale
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Please take a look if you like,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I'm excited to see what other proposals and Ideas
>>>>>>>>>>>>>>>>>>>>>>>>> are floating around the community,
>>>>>>>>>>>>>>>>>>>>>>>>> Russ
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 6:29 PM John Zhuge <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Very excited about the idea!
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 1:17 PM Anoop Johnson <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm very interested in this initiative. Micah
>>>>>>>>>>>>>>>>>>>>>>>>>>> Kornfield and I presented
>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://youtu.be/4d4nqKkANdM?si=9TXgaUIXbq-l8idi&t=1405>
>>>>>>>>>>>>>>>>>>>>>>>>>>> on high-throughput ingestion for Iceberg tables at 
>>>>>>>>>>>>>>>>>>>>>>>>>>> the 2024 Iceberg Summit,
>>>>>>>>>>>>>>>>>>>>>>>>>>> which leveraged Google infrastructure like Colossus 
>>>>>>>>>>>>>>>>>>>>>>>>>>> for efficient appends.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> This new proposal is particularly exciting
>>>>>>>>>>>>>>>>>>>>>>>>>>> because it offers significant advancements in 
>>>>>>>>>>>>>>>>>>>>>>>>>>> commit latency and metadata
>>>>>>>>>>>>>>>>>>>>>>>>>>> storage footprint. Furthermore, a consistent 
>>>>>>>>>>>>>>>>>>>>>>>>>>> manifest structure promises to
>>>>>>>>>>>>>>>>>>>>>>>>>>> simplify the design and codebase, which is a major 
>>>>>>>>>>>>>>>>>>>>>>>>>>> benefit.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> A related idea I've been exploring is having a
>>>>>>>>>>>>>>>>>>>>>>>>>>> loose affinity between data and delete manifests. 
>>>>>>>>>>>>>>>>>>>>>>>>>>> While the current
>>>>>>>>>>>>>>>>>>>>>>>>>>> separation of data and delete manifests in Iceberg 
>>>>>>>>>>>>>>>>>>>>>>>>>>> is valuable for avoiding
>>>>>>>>>>>>>>>>>>>>>>>>>>> data file rewrites (and stats updates) when deletes 
>>>>>>>>>>>>>>>>>>>>>>>>>>> change, it does
>>>>>>>>>>>>>>>>>>>>>>>>>>> necessitate a join operation during reads. I'd be 
>>>>>>>>>>>>>>>>>>>>>>>>>>> keen to discuss
>>>>>>>>>>>>>>>>>>>>>>>>>>> approaches that could potentially reduce this 
>>>>>>>>>>>>>>>>>>>>>>>>>>> read-side cost while
>>>>>>>>>>>>>>>>>>>>>>>>>>> retaining the benefits of separate manifests.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Anoop
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 13, 2025 at 11:06 AM Jagdeep Sidhu <
>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am new to the Iceberg community but would
>>>>>>>>>>>>>>>>>>>>>>>>>>>> love to participate in these discussions to reduce 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the number of file
>>>>>>>>>>>>>>>>>>>>>>>>>>>> writes, especially for small writes/commits.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Jagdeep
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 5, 2025 at 4:02 PM Anurag
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mantripragada <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We have been hitting all the metadata problems
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you mentioned, Ryan. I’m on-board to help however 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can to improve this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~ Anurag Mantripragada
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang Cheng
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am interested in this idea and looking
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forward to collaboration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Huang-Hsiang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 2, 2025, at 10:14 AM, namratha mk <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am interested in contributing to this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Namratha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 1:36 PM Amogh
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jahagirdar <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for kicking this thread off Ryan, I'm
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested in helping out here! I've been 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working on a proposal in this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area and it would be great to collaborate with 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different folks and exchange
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas here, since I think a lot of people are 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested in solving this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Like Russell’s recent note, I’m starting a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread to connect those of us that are 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested in the idea of changing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg’s metadata in v4 so that in most cases 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> committing a change only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires writing one additional metadata file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *Idea: One-file commits*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The current Iceberg metadata structure
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires writing at least one manifest and a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new manifest list to produce a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new snapshot. The goal of this work is to allow 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more flexibility by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> allowing the manifest list layer to store data 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and delete files. As a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> result, only one file write would be needed 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> before committing the new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> snapshot. In addition, this work will also try 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to explore:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    - Avoiding small manifests that must be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    read in parallel and later compacted 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (metadata maintenance changes)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    - Extend metadata skipping to use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    aggregated column ranges that are compatible 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with geospatial data (manifest
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    metadata)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    - Using soft deletes to avoid rewriting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    existing manifests (metadata DVs)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you’re interested in these problems,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> please reply!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>> John Zhuge
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>

Re: [DISCUSS] v4 - One file commits

Reply via email to