[ 
https://issues.apache.org/jira/browse/HUDI-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398626#comment-17398626
 ] 

ASF GitHub Bot commented on HUDI-2307:
--------------------------------------

nsivabalan commented on a change in pull request #3469:
URL: https://github.com/apache/hudi/pull/3469#discussion_r688471966



##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -192,7 +193,7 @@ object HoodieSparkSqlWriter {
             }
 
             // Get list of partitions to delete
-            val partitionsToDelete = genericRecords.map(gr => 
keyGenerator.getKey(gr).getPartitionPath).toJavaRDD().distinct().collect()
+            val partitionsToDelete = genericRecords.map(gr => 
gr.get(partitionColumns).toString).toJavaRDD().distinct().collect()

Review comment:
       not sure if we can generalize this. partitionColumns is a comma separate 
list of partition col field names if Complex or CustomeKeyGen is used. 

##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -192,7 +193,7 @@ object HoodieSparkSqlWriter {
             }
 
             // Get list of partitions to delete
-            val partitionsToDelete = genericRecords.map(gr => 
keyGenerator.getKey(gr).getPartitionPath).toJavaRDD().distinct().collect()
+            val partitionsToDelete = genericRecords.map(gr => 
gr.get(partitionColumns).toString).toJavaRDD().distinct().collect()

Review comment:
       guess we have to add another config here. 
   "hoodie.datasource.write.partitions.to.delete" // goes into 
DataSourceWriteOptions
   
   when user sets this config with "delete_partitions" operation, we can parse 
this config value. this should refer to comma separated list of partitions to 
delete. 
   If this config is not set, we will go the usual route of fetching partition 
cols from keyGen and then using keyGen to determine the partitions to be 
deleted from partition path for records. Basically apache/master as is. 
   
   Do remember in the first case, we should not be using 
   ```
   HoodieWriterUtils.getPartitionColumns(keyGenerator)
   ```
   to fetch partition col fields.
   
   Let me know what do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> When using delete_partition with ds should not rely on the primary key
> ----------------------------------------------------------------------
>
>                 Key: HUDI-2307
>                 URL: https://issues.apache.org/jira/browse/HUDI-2307
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: liujinhui
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
>  
> {code:java}
> Caused by: org.apache.hudi.exception.HoodieKeyException: recordKey value: 
> "null" for field: "uuid" cannot be null or empty.Caused by: 
> org.apache.hudi.exception.HoodieKeyException: recordKey value: "null" for 
> field: "uuid" cannot be null or empty. at 
> org.apache.hudi.keygen.KeyGenUtils.getRecordKey(KeyGenUtils.java:141) at 
> org.apache.hudi.keygen.SimpleAvroKeyGenerator.getRecordKey(SimpleAvroKeyGenerator.java:50)
>  at 
> org.apache.hudi.keygen.SimpleKeyGenerator.getRecordKey(SimpleKeyGenerator.java:58)
>  at org.apache.hudi.keygen.BaseKeyGenerator.getKey(BaseKeyGenerator.java:62) 
> at 
> org.apache.hudi.HoodieSparkSqlWriter$$anonfun$5.apply(HoodieSparkSqlWriter.scala:195)
>  at 
> org.apache.hudi.HoodieSparkSqlWriter$$anonfun$5.apply(HoodieSparkSqlWriter.scala:195)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at 
> scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
> {code}
>  
> When using delete_partition, we should not rely on the primary key
>  
> aa 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to