Raymond Xu created HUDI-3463:
--------------------------------

             Summary: Make user-defined BulkInsertPartitioner fit write path API
                 Key: HUDI-3463
                 URL: https://issues.apache.org/jira/browse/HUDI-3463
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Raymond Xu


this existing logic is problematic due to we can’t enforce user’s partitioner 
to return JavaRDD, this potentially breaks. 


{code:java}
    BulkInsertPartitioner partitioner = 
userDefinedBulkInsertPartitioner.isPresent()
        ? userDefinedBulkInsertPartitioner.get()
        : 
BulkInsertInternalPartitionerFactory.get(config.getBulkInsertSortMode());
    repartitionedRecords = (JavaRDD<HoodieRecord<T>>) 
partitioner.repartitionRecords(dedupedRecords, parallelism);
{code}


The factory is used only in spark for now. So, we expect JavaRDD or HoodieData. 
The API can be made explicit about the constraint.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to