Raymond Xu created HUDI-3463:
--------------------------------
Summary: Make user-defined BulkInsertPartitioner fit write path API
Key: HUDI-3463
URL: https://issues.apache.org/jira/browse/HUDI-3463
Project: Apache Hudi
Issue Type: Improvement
Reporter: Raymond Xu
this existing logic is problematic due to we can’t enforce user’s partitioner
to return JavaRDD, this potentially breaks.
{code:java}
BulkInsertPartitioner partitioner =
userDefinedBulkInsertPartitioner.isPresent()
? userDefinedBulkInsertPartitioner.get()
:
BulkInsertInternalPartitionerFactory.get(config.getBulkInsertSortMode());
repartitionedRecords = (JavaRDD<HoodieRecord<T>>)
partitioner.repartitionRecords(dedupedRecords, parallelism);
{code}
The factory is used only in spark for now. So, we expect JavaRDD or HoodieData.
The API can be made explicit about the constraint.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)