Hi everyone,

As a quick update on this PR: the current version has now reached full
feature parity with the existing code in the main branch, but with the
added benefit of supporting Delta reader version 3 and writer version 7.

Since we've hit this baseline milestone, could I please get another round
of reviews on the current state of the code?

Once reviewed, my immediate next steps for the PR will be:

   1. Refactoring to remove the internal Delta Kernel classes (addressing
   Anoop's feedback).
   2. Adding support for Deletion Vectors (DVs) conversions.
   3. Implementing incremental conversion.

Thanks in advance for your time and feedback!

Best, Vladislav

On Mon, Mar 2, 2026 at 9:46 PM Vladislav Sidorovich <[email protected]>
wrote:

> Nice to hear, so we work on it in parallel.
>
> On Mon, Mar 2, 2026 at 8:33 PM Anoop Johnson <[email protected]> wrote:
>
>> > A major challenge with UniForm right now is its limitation regarding
>> Deletion Vectors (DVs). Support for this is critical for many users
>> migrating their workloads.
>>
>> The reason why Uniform v1/v2 blocked DVs was because Iceberg v1/v2 had a
>> different positional delete representation than Delta Lake. But that
>> changed in Iceberg v3. So the upcoming version of Uniform (
>> IcebergCompatV3
>> <https://github.com/delta-io/delta/blob/master/protocol_rfcs/iceberg-compat-v3.md>)
>> will lift this restriction.
>>
>> On Mon, Mar 2, 2026 at 10:48 AM Vladislav Sidorovich via dev <
>> [email protected]> wrote:
>>
>>> Hi Anoop,
>>>
>>> Thanks for the feedback and for raising these important points.
>>>
>>> Regarding the technical feedback on minimizing the use of internal Delta
>>> Kernel classes: I completely agree. Relying on internal APIs like
>>> AddFile introduces an unnecessary maintenance burden. My plan is to
>>> refactor the code (e.g., transitioning to the Row API) once we have
>>> alignment on the core features this PR will support. I will also put
>>> together a list of the gaps I've encountered in the Kernel API (such as
>>> change detection) so we can file those upstream, as you suggested.
>>>
>>> As a quick update on the PR's progress: I’ve recently added support for
>>> UPDATE and DELETE operations, along with expanded test coverage. At
>>> this stage, the PR is roughly at feature parity with the existing tool
>>> (excluding VACUUM) but supports newer Delta versions. As outlined in
>>> the PR description, the next features on the roadmap are:
>>>
>>>    1. VACUUM support
>>>    2. Deletion Vectors (DVs) support
>>>    3. Incremental conversion
>>>
>>>
>>> *Bigger question*. To address your broader question about whether we
>>> should consider sunsetting the Delta Lake module in favor of Delta UniForm:
>>> based on my experience and observations, there are still compelling reasons
>>> to maintain a native Iceberg-driven conversion tool.
>>>
>>>    -
>>>
>>>    *Feature Limitations:* A major challenge with UniForm right now is
>>>    its limitation regarding Deletion Vectors (DVs). Support for this is
>>>    critical for many users migrating their workloads.
>>>    -
>>>
>>>    *User Preference:* I've observed that teams looking to migrate to
>>>    Iceberg strongly prefer "native" tooling maintained by the technology 
>>> they
>>>    are migrating *to*, rather than relying on the ecosystem they are
>>>    trying to move *from*. Having an in-house Iceberg tool gives the
>>>    community more control over the migration experience.
>>>
>>> Let me know your thoughts on the above, particularly regarding the
>>> long-term need for a native migration path.
>>>
>>> Best, Vladislav
>>>
>>> On Thu, Feb 26, 2026 at 8:07 PM Anoop Johnson <[email protected]> wrote:
>>>
>>>> Vladislav,
>>>>
>>>> We should minimize the usage of internal Delta kernel classes as much
>>>> as possible. There are no guarantees about the stability of the internal
>>>> APIs, and it will be a maintenance burden on the Iceberg project. For
>>>> instance, instead of using the internal `AddFile` class use the `Row` API
>>>> using ordinals defined by the scan file schema. I do recognize that there
>>>> are some gaps in the kernel API (you mentioned change detection): do you
>>>> have a list? It would be worth filing an issue against Delta kernel, it is
>>>> possible some of these like providing file changes might be in their
>>>> roadmap.
>>>>
>>>> *I have a higher level question to the community:* should we consider
>>>> sunsetting the Delta lake module? Delta Lake's Uniform
>>>> <https://docs.delta.io/delta-uniform/> can  already generate Iceberg
>>>> metadata: it is incremental, and already handles several features such as
>>>> column mapping. Do we need to duplicate all of that work? Obviously it is
>>>> better to have less code and less components to maintain.
>>>>
>>>> Best,
>>>> Anoop
>>>>
>>>> Disclosure: I work on Delta also as part of my day job.
>>>>
>>>>
>>>> On Wed, Feb 25, 2026 at 1:44 PM Vladislav Sidorovich <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Anoop,
>>>>>
>>>>> Thanks a lot for the initial review.
>>>>>
>>>>> Data correctness guards:
>>>>> 1. I will add support for Remove action soon, work on the PR is in
>>>>> progress.
>>>>> 2. Sure, let's do reject for `column mapping` feature for now for the
>>>>> safety. Later I will try to provide support of this feature as well.
>>>>>
>>>>>
>>>>> Yes, the PR depends on `*internal*` API of the delta-kernel. I do not
>>>>> see a simple way to replace it with the public API.  As an option I can
>>>>> replace these classes with our `in-house` classes that would rely on the
>>>>> Dela protocol spec, it will be safe in terms of runtime but it will be
>>>>> additional code that we will need to support.
>>>>>
>>>>> What do you think if I will continue work with `*internal*` delta API
>>>>> for now and refactor this logic before merging the PR once we will agree 
>>>>> on
>>>>> some solutions?
>>>>>
>>>>>
>>>>> On Tue, Feb 24, 2026 at 5:29 AM Anoop Johnson <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi, Vladislav -
>>>>>>
>>>>>> I've done an initial review of the PR
>>>>>> <https://github.com/apache/iceberg/pull/15407>. Moving to the Delta
>>>>>> kernel is the right direction, so thank you for doing this. Here's a
>>>>>> summary of my initial feedback (full details are in the PR):
>>>>>>
>>>>>> Data correctness guards:
>>>>>> 1. If we encounter `Remove` actions, it should fail fast rather than
>>>>>> silently skip it. Otherwise tables with DML will produce duplicate rows 
>>>>>> in
>>>>>> the Iceberg table.
>>>>>> 2. Tables with column mapping enabled) will produce silent data
>>>>>> corruption because the Parquet files will have physical column names that
>>>>>> don't match the logical schema. We should validate this and reject until
>>>>>> column mapping support is added (which can be done as a separate PR).
>>>>>>
>>>>>> The PR relies heavily on io.delta.kernel.internal.* classes, which
>>>>>> can be fragile. We should consider replacing them with the public kernel
>>>>>> APIs.
>>>>>>
>>>>>> Best,
>>>>>> Anoop
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 23, 2026 at 12:29 AM Vladislav Sidorovich via dev <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Iceberg Community,
>>>>>>>
>>>>>>> I recently opened a PR to update the existing Delta Lake to Iceberg
>>>>>>> migration functionality to support recent Delta Lake table versions 
>>>>>>> (read:
>>>>>>> 3, write: 7). I would appreciate it if anyone take a look and share
>>>>>>> thoughts on the architecture and initial implementation
>>>>>>>
>>>>>>> *PR Link:* https://github.com/apache/iceberg/pull/15407
>>>>>>>
>>>>>>> The main motivation for sharing this now is to get some early
>>>>>>> feedback from the community on the approach and the initial 
>>>>>>> implementation.
>>>>>>>
>>>>>>> To make reviewing easier, this PR doesn't remove or overwrite the
>>>>>>> old logic. Instead, I’ve added a new interface implementation utilizing 
>>>>>>> the *Delta
>>>>>>> Lake Kernel library* (replacing the deprecated Delta Lake
>>>>>>> standalone library). This side-by-side approach allows for easier
>>>>>>> comparison and shouldn't introduce any issues with current usage 
>>>>>>> scenarios.
>>>>>>>
>>>>>>>
>>>>>>> *Current PR Scope:*
>>>>>>>
>>>>>>>    - Maintains support for the existing migration interface.
>>>>>>>    - Migrates the underlying engine to the Delta Lake Kernel
>>>>>>>    library.
>>>>>>>    - Contains the basic migration flow.
>>>>>>>    - Successfully converts all data types, table schemas, and
>>>>>>>    partition specs.
>>>>>>>    - Currently supports INSERT operations only (Delta Lake Add
>>>>>>>    action).
>>>>>>>    - *Testing:* Includes unit tests for all supported data types
>>>>>>>    (including complex arrays and structures) and integration tests for
>>>>>>>    insert-only scenarios using Spark 3.5.
>>>>>>>
>>>>>>> *Future Steps (Next PRs):*
>>>>>>>
>>>>>>> Once we align on this foundation, I plan to follow up with:
>>>>>>>
>>>>>>>    - Adding support for UPDATE and DELETE (Delta Lake Remove
>>>>>>>    action).
>>>>>>>    - Supporting all remaining Delta Lake actions.
>>>>>>>    - Handling edge cases for partitions and generated columns.
>>>>>>>    - Adding Schema Evolution support.
>>>>>>>    - Adding Deletion Vector (DV) support.
>>>>>>>    - Enabling Incremental Conversion (from/to specific Delta
>>>>>>>    versions).
>>>>>>>    - Adding all tables from the Delta golden tables for robust
>>>>>>>    testing. *(Note: The current integration test will be updated
>>>>>>>    for newer Delta Lake versions once the old standalone solution is 
>>>>>>> fully
>>>>>>>    deprecated/deleted).*
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Vladislav Sidorovich
>>>>>>>
>>>>>>> Feedback: *go/feedback-for-vladislav
>>>>>>> <https://goto.google.com/feedback-for-vladislav> *
>>>>>>> [image: Google Logo]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Vladislav Sidorovich
>>>>>
>>>>> Feedback: *go/feedback-for-vladislav
>>>>> <https://goto.google.com/feedback-for-vladislav> *
>>>>> [image: Google Logo]
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> Best regards,
>>> Vladislav Sidorovich
>>>
>>> Feedback: *go/feedback-for-vladislav
>>> <https://goto.google.com/feedback-for-vladislav> *
>>> [image: Google Logo]
>>>
>>>
>>>
>
> --
> Best regards,
> Vladislav Sidorovich
>
> Feedback: *go/feedback-for-vladislav
> <https://goto.google.com/feedback-for-vladislav> *
> [image: Google Logo]
>
>
>

-- 
Best regards,
Vladislav Sidorovich

Feedback: *go/feedback-for-vladislav *
[image: Google Logo]

Reply via email to