Hi everyone, I would like to start a discussion regarding an enhancement to the DSv2 API. This proposal allows connectors to declare which columns they need to receive during an update, significantly improving performance and reducing write amplification. This is particularly beneficial for connectors like Iceberg on wide tables, which are increasingly common in AI/ML use cases.
I have included a PR with this SPIP that demonstrates the changes. It has been tested on the Iceberg connector and is working well end-to-end. Huaxian Gao has agreed to serve as the shepherd for this SPIP. SPARK-56599 <https://issues.apache.org/jira/browse/SPARK-56599> SPIP Doc <https://docs.google.com/document/d/1-Wiw9U54ESpbLakb9Cn_mO4AviM4nrk4TF7rNhI3JZg/edit?tab=t.0#heading=h.yoitjxhaitk8> PR <https://github.com/apache/spark/pull/55518> Please take a look and provide feedback! Thanks, Anurag Mantripragada
