Thank you Anurag for working on this! Let's focus on the SPIP first. The schema resolution flow makes sense to me, but I found the differences between the "Merge-on-Read" and "Copy-on-Write" implementations a bit challenging to grasp at first. Could you clarify the purpose of the mentioned rules and how they are applied/affected in your implementation? I left some comments in the doc.
Thanks, Peter On Thu, Apr 23, 2026 at 8:39 PM Anurag Mantripragada < [email protected]> wrote: > Hi everyone, > > I would like to start a discussion regarding an enhancement to the DSv2 > API. This proposal allows connectors to declare which columns they need to > receive during an update, significantly improving performance and reducing > write amplification. This is particularly beneficial for connectors like > Iceberg on wide tables, which are increasingly common in AI/ML use cases. > > I have included a PR with this SPIP that demonstrates the changes. It has > been tested on the Iceberg connector and is working well end-to-end. > > Huaxian Gao has agreed to serve as the shepherd for this SPIP. > > SPARK-56599 <https://issues.apache.org/jira/browse/SPARK-56599> > SPIP Doc > <https://docs.google.com/document/d/1-Wiw9U54ESpbLakb9Cn_mO4AviM4nrk4TF7rNhI3JZg/edit?tab=t.0#heading=h.yoitjxhaitk8> > PR <https://github.com/apache/spark/pull/55518> > > Please take a look and provide feedback! > > Thanks, > Anurag Mantripragada >
