Thank you Anurag for working on this!
Let's focus on the SPIP first.
The schema resolution flow makes sense to me, but I found the differences
between the "Merge-on-Read"  and "Copy-on-Write" implementations a bit
challenging to grasp at first. Could you clarify the purpose of the
mentioned rules and how they are applied/affected in your implementation? I
left some comments in the doc.

Thanks,
Peter

On Thu, Apr 23, 2026 at 8:39 PM Anurag Mantripragada <
[email protected]> wrote:

> Hi everyone,
>
> I would like to start a discussion regarding an enhancement to the DSv2
> API. This proposal allows connectors to declare which columns they need to
> receive during an update, significantly improving performance and reducing
> write amplification. This is particularly beneficial for connectors like
> Iceberg on wide tables, which are increasingly common in AI/ML use cases.
>
> I have included a PR with this SPIP that demonstrates the changes. It has
> been tested on the Iceberg connector and is working well end-to-end.
>
> Huaxian Gao has agreed to serve as the shepherd for this SPIP.
>
> SPARK-56599 <https://issues.apache.org/jira/browse/SPARK-56599>
> SPIP Doc
> <https://docs.google.com/document/d/1-Wiw9U54ESpbLakb9Cn_mO4AviM4nrk4TF7rNhI3JZg/edit?tab=t.0#heading=h.yoitjxhaitk8>
> PR <https://github.com/apache/spark/pull/55518>
>
> Please take a look and provide feedback!
>
> Thanks,
> Anurag Mantripragada
>

Reply via email to