navbalaraman commented on issue #6101:
URL: https://github.com/apache/hudi/issues/6101#issuecomment-1216738397
@nsivabalan Thanks for your attention to this issue. Here is the current
status:
- Managed to get the deletes working.
- Was trying to delete with the partition column name as "_test_partition"
but the parquet file didn't have the column but instead had
"_hoodie_partition_path":"_test_partition=default".
- So while reading the S3 into the DF to be deleted, had to do this:
.withColumn("_test_partition", col("_hoodie_partition_path"))
- We had explicitly made a change to not have hudi create that additional
partition column with "_test_partition" because when we run a crawler against
the S3 data it would error on querying the table with "duplicate column".
- Also, had to set the operation to delete. (As you indicated above)
.options(hudioptions).option(DataSourceWriteOptions.OPERATION.key(),
DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL)
Question:
- Currently the data gets deleted but the partitions continue to exist. Will
that get fixed if i also include the below?
.option(DataSourceWriteOptions.OPERATION.key(),
DataSourceWriteOptions.DELETE_PARTITION_OPERATION_OPT_VAL)
- Also, anytime a schema change happens we are getting the below error.
Anything we can set to ensure hudi can handle schema changes?
org.apache.hudi.exception.HoodieUpsertException: Failed to delete for commit
time
Appreciate your inputs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]