yihua commented on a change in pull request #4787:
URL: https://github.com/apache/hudi/pull/4787#discussion_r806215809



##########
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##########
@@ -1944,6 +1990,18 @@ public Dataset apply(JavaSparkContext jsc, SparkSession 
sparkSession, Dataset<Ro
     }
   }
 
+  public static class TestSpecificPartitionTransformer implements Transformer {
+
+    @Override
+    public Dataset<Row> apply(JavaSparkContext jsc, SparkSession sparkSession, 
Dataset<Row> rowDataset,
+                              TypedProperties properties) {
+      //List<Row> row0 = rowDataset.collectAsList();

Review comment:
       nit: same here.

##########
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##########
@@ -258,6 +258,20 @@ static void assertRecordCount(long expected, String 
tablePath, SQLContext sqlCon
       assertEquals(expected, recordCount);
     }
 
+    static Map<String, Long> getPartitionRecordCount(String basePath, 
SQLContext sqlContext) {
+      sqlContext.clearCache();
+      //List<Row> rows0 = 
sqlContext.read().format("org.apache.hudi").load(basePath).collectAsList();

Review comment:
       nit: remove unused code?

##########
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##########
@@ -555,6 +555,10 @@ public void refreshTimeline() throws IOException {
       case INSERT_OVERWRITE_TABLE:
         writeStatusRDD = writeClient.insertOverwriteTable(records, 
instantTime).getWriteStatuses();
         break;
+      case DELETE_PARTITION:
+        List<String> partitions = records.map(record -> 
record.getPartitionPath()).distinct().collect();

Review comment:
       The logic here takes the partition list from the input records and does 
nothing to the records.  This workflow looks strange to me.  Shouldn't the 
input be the partition list directly?  It's counter-intuitive for the users to 
provide a list of records for deleting corresponding partitions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to