[GitHub] [hudi] navbalaraman commented on issue #6101: [SUPPORT] Hudi Delete Not working with EMR, AWS Glue & S3

GitBox Tue, 16 Aug 2022 07:46:51 -0700


navbalaraman commented on issue #6101:
URL: https://github.com/apache/hudi/issues/6101#issuecomment-1216738397


   @nsivabalan Thanks for your attention to this issue. Here is the current 
status:
   - Managed to get the deletes working.
   - Was trying to delete with the partition column name as "_test_partition" 
but the parquet file didn't have the column but instead had 
"_hoodie_partition_path":"_test_partition=default".
   - So while reading the S3 into the DF to be deleted, had to do this:
     .withColumn("_test_partition", col("_hoodie_partition_path"))
   - We had explicitly made a change to not have hudi create that additional 
partition column with "_test_partition" because when we run a crawler against 
the S3 data it would error on querying the table with "duplicate column".
   - Also, had to set the operation to delete. (As you indicated above)
   .options(hudioptions).option(DataSourceWriteOptions.OPERATION.key(), 
DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL)
   
   Question:
   - Currently the data gets deleted but the partitions continue to exist. Will 
that get fixed if i also include the below?
   .option(DataSourceWriteOptions.OPERATION.key(), 
DataSourceWriteOptions.DELETE_PARTITION_OPERATION_OPT_VAL)
   - Also, anytime a schema change happens we are getting the below error. 
Anything we can set to ensure hudi can handle schema changes?
   org.apache.hudi.exception.HoodieUpsertException: Failed to delete for commit 
time
   
   Appreciate your inputs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] navbalaraman commented on issue #6101: [SUPPORT] Hudi Delete Not working with EMR, AWS Glue & S3

Reply via email to