Alexey Kudinkin created HUDI-4772:
-------------------------------------
Summary: Revisit dropped Partition Columns handling
Key: HUDI-4772
URL: https://issues.apache.org/jira/browse/HUDI-4772
Project: Apache Hudi
Issue Type: Bug
Components: writer-core
Affects Versions: 0.13.0
Reporter: Alexey Kudinkin
Assignee: Alexey Kudinkin
Currently, dropping partition columns (controlled by
"hoodie.datasource.write.drop.partition.columns") is handled in a piecemeal
fashion, which unfortunately may to lead to very subtle and hard to
troubleshoot issues when used.
For ex, currently in HoodieSparkSqlWriter this would affect what will be
persisted as writer's schema – in case partition columns are dropped from the
data file we will persist "reduced" schema as the one that was used by the
Writer, which is invalid since Writer was using the full schema, however
partition columns weren't persisted in the Data Files (ie dropped, since
they're already encoded into the partition path)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)