[GitHub] [hudi] nsivabalan commented on a change in pull request #4787: [HUDI-2189] Adding delete partitions support to DeltaStreamer

GitBox Thu, 10 Feb 2022 13:51:09 -0800


nsivabalan commented on a change in pull request #4787:
URL: https://github.com/apache/hudi/pull/4787#discussion_r804136818




##########
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##########
@@ -555,6 +555,10 @@ public void refreshTimeline() throws IOException {
       case INSERT_OVERWRITE_TABLE:
         writeStatusRDD = writeClient.insertOverwriteTable(records, 
instantTime).getWriteStatuses();
         break;
+      case DELETE_PARTITION:
+        List<String> partitions = records.map(record -> 
record.getPartitionPath()).distinct().collect();

Review comment:
       yes. thats what I have been trying to convey for a long time :) 
Deltastreamer (as the name suggests) is meant for incremental and continual 
ingestion of data from some source. It goes in cycles of fetch from source -> 
ingest into hudi ->repeat. So, I don't see in general how come one would fetch 
data from a source and then trigger delete partitions. 
   
   But we have a [patch](https://github.com/apache/hudi/pull/4459) for a 
independent tool if you are interested. Guess that would help you. but that is 
a spark-submit command as well. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a change in pull request #4787: [HUDI-2189] Adding delete partitions support to DeltaStreamer

Reply via email to