[
https://issues.apache.org/jira/browse/SPARK-56171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yang Jie updated SPARK-56171:
-----------------------------
Summary: V2 file write with partition, dynamic overwrite, and catalog table
support (was: V2 file write with partition and dynamic overwrite support)
> V2 file write with partition, dynamic overwrite, and catalog table support
> --------------------------------------------------------------------------
>
> Key: SPARK-56171
> URL: https://issues.apache.org/jira/browse/SPARK-56171
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Yang Jie
> Priority: Major
> Labels: pull-request-available
>
> Enable `FileWrite` to support partition columns, dynamic partition overwrite,
> and truncate (full overwrite) behind feature flag
> `spark.sql.sources.v2.file.write.enabled` (default false). Add
> `partitionSchema` field to `FileWrite`, partition column separation in
> `WriteJobDescription`, `RequiresDistributionAndOrdering` for partition
> sorting, path creation for new paths, truncate logic for overwrite mode, and
> dynamic partition overwrite via `FileCommitProtocol`. Fix `lazy val
> description` to `val` so `prepareWrite` runs before `setupJob`. Add
> `checkNoCollationsInMapKeys` validation and skip `supportsDataType` check for
> partition columns in `FileWrite.validateInputs`. Use consistent `jobId` for
> `FileCommitProtocol` and `WriteJobDescription.uuid`. `DataFrameWriter`:
> dynamic partition overwrite routing, ErrorIfExists/Ignore mode V1 fallback.
> Update all format Write/Table classes (Parquet, ORC, CSV, JSON, Text, Avro).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]