Hi everyone,

I would like to start a discussion regarding an enhancement to the DSv2
API. This proposal allows connectors to declare which columns they need to
receive during an update, significantly improving performance and reducing
write amplification. This is particularly beneficial for connectors like
Iceberg on wide tables, which are increasingly common in AI/ML use cases.

I have included a PR with this SPIP that demonstrates the changes. It has
been tested on the Iceberg connector and is working well end-to-end.

Huaxian Gao has agreed to serve as the shepherd for this SPIP.

SPARK-56599 <https://issues.apache.org/jira/browse/SPARK-56599>
SPIP Doc
<https://docs.google.com/document/d/1-Wiw9U54ESpbLakb9Cn_mO4AviM4nrk4TF7rNhI3JZg/edit?tab=t.0#heading=h.yoitjxhaitk8>
PR <https://github.com/apache/spark/pull/55518>

Please take a look and provide feedback!

Thanks,
Anurag Mantripragada

Reply via email to