codope commented on code in PR #11581:
URL: https://github.com/apache/hudi/pull/11581#discussion_r1668779599
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkInsertOverwritePartitioner.java:
##########
@@ -38,8 +39,8 @@ public class SparkInsertOverwritePartitioner extends
UpsertPartitioner {
private static final Logger LOG =
LoggerFactory.getLogger(SparkInsertOverwritePartitioner.class);
public SparkInsertOverwritePartitioner(WorkloadProfile profile,
HoodieEngineContext context, HoodieTable table,
- HoodieWriteConfig config) {
- super(profile, context, table, config);
+ HoodieWriteConfig config,
WriteOperationType operationType) {
Review Comment:
Do we need this additional argument when we alrady have `HoodieTable` and
`HoodieWriteConfig`?
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java:
##########
@@ -86,16 +87,21 @@ public class UpsertPartitioner<T> extends
SparkHoodiePartitioner<T> {
private HashMap<Integer, BucketInfo> bucketInfoMap;
protected final HoodieWriteConfig config;
+ private final WriteOperationType operationType;
public UpsertPartitioner(WorkloadProfile profile, HoodieEngineContext
context, HoodieTable table,
- HoodieWriteConfig config) {
+ HoodieWriteConfig config, WriteOperationType
operationType) {
super(profile, table);
updateLocationToBucket = new HashMap<>();
partitionPathToInsertBucketInfos = new HashMap<>();
bucketInfoMap = new HashMap<>();
this.config = config;
+ this.operationType = operationType;
assignUpdates(profile);
- assignInserts(profile, context);
+ long totalInserts =
profile.getInputPartitionPathStatMap().values().stream().mapToLong(stat ->
stat.getNumInserts()).sum();
+ if (!WriteOperationType.isPreppedWriteOperation(operationType) ||
totalInserts > 0) { // skip if its prepped write operation. or if totalInserts
= 0.
+ assignInserts(profile, context);
Review Comment:
this is probably because the record size estimation is done only when
assigning inserts.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]