[ 
https://issues.apache.org/jira/browse/SPARK-57681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anurag Mantripragada updated SPARK-57681:
-----------------------------------------
    Description: 
SPARK-36680 added the WITH (...) options clause for SELECT ([PR 
#46707|https://github.com/apache/spark/pull/46707]), and SPARK-49098 extended 
it to INSERT (PR [#47591|https://github.com/apache/spark/pull/47591]). However, 
row-level DML commands (DELETE, UPDATE, MERGE) do not yet support this syntax.

A DataSource V2 connector such as Apache Iceberg needs per-statement options on 
these commands to control behavior like copy-on-write vs merge-on-read, 
delete-granularity, target-file-size-bytes, distribution-mode, isolation-level, 
and branch selection. 

This JIRA covers UPDATE only.

Proposed syntax (mirrors the existing SELECT/INSERT precedent):

{{UPDATE table WITH (`key` = 'value') SET ... WHERE ...}}

See the discussion on PR #46707:
 - [https://github.com/apache/spark/pull/46707#issuecomment-2274055363]
 - [https://github.com/apache/spark/pull/46707#issuecomment-2274312254]

  was:
SPARK-36680 added the WITH (...) options clause for SELECT ([PR 
#46707|https://github.com/apache/spark/pull/46707]), and SPARK-49098 extended 
it to INSERT (PR [#47591|https://github.com/apache/spark/pull/47591]). However, 
row-level DML commands (DELETE, UPDATE, MERGE) do not yet support this syntax.

A DataSource V2 connector such as Apache Iceberg needs per-statement options on 
these commands to control behavior like copy-on-write vs merge-on-read, 
delete-granularity, target-file-size-bytes, distribution-mode, isolation-level, 
and branch selection. The DSv2 API already has the hooks 
(RowLevelOperationInfo.options() and LogicalWriteInfo.options()), but they are 
never populated for row-level commands because the rewrite rules hardcode 
CaseInsensitiveStringMap.empty().

This JIRA covers DELETE and UPDATE. MERGE will be handled in a separate 
follow-up.

Proposed syntax (mirrors the existing SELECT/INSERT precedent):

{{DELETE FROM table WITH (`key` = 'value') WHERE ...}}
{{UPDATE table WITH (`key` = 'value') SET ... WHERE ...}}

The options are surfaced as a single map and the connector disambiguates read 
vs write keys internally, consistent with how RowLevelOperationInfo.options() 
is designed
and how Iceberg's SparkReadConf/SparkWriteConf already work.

See the discussion on PR #46707:
 - [https://github.com/apache/spark/pull/46707#issuecomment-2274055363]
 - [https://github.com/apache/spark/pull/46707#issuecomment-2274312254]


> Support dynamic table options for UPDATE
> ----------------------------------------
>
>                 Key: SPARK-57681
>                 URL: https://issues.apache.org/jira/browse/SPARK-57681
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 5.0.0
>            Reporter: Anurag Mantripragada
>            Priority: Major
>              Labels: pull-request-available
>
> SPARK-36680 added the WITH (...) options clause for SELECT ([PR 
> #46707|https://github.com/apache/spark/pull/46707]), and SPARK-49098 extended 
> it to INSERT (PR [#47591|https://github.com/apache/spark/pull/47591]). 
> However, row-level DML commands (DELETE, UPDATE, MERGE) do not yet support 
> this syntax.
> A DataSource V2 connector such as Apache Iceberg needs per-statement options 
> on these commands to control behavior like copy-on-write vs merge-on-read, 
> delete-granularity, target-file-size-bytes, distribution-mode, 
> isolation-level, and branch selection. 
> This JIRA covers UPDATE only.
> Proposed syntax (mirrors the existing SELECT/INSERT precedent):
> {{UPDATE table WITH (`key` = 'value') SET ... WHERE ...}}
> See the discussion on PR #46707:
>  - [https://github.com/apache/spark/pull/46707#issuecomment-2274055363]
>  - [https://github.com/apache/spark/pull/46707#issuecomment-2274312254]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to