nsivabalan commented on code in PR #13976:
URL: https://github.com/apache/hudi/pull/13976#discussion_r2389729622
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/StreamingMetadataWriteHandler.java:
##########
@@ -47,21 +54,24 @@ public class StreamingMetadataWriteHandler {
* @param table The {@link HoodieTable} instance for data
table of interest.
* @param dataTableWriteStatuses The {@link WriteStatus} from data table
writes.
* @param instantTime The instant time of interest.
+ * @param enforceCoalesceWithRepartition true when repartition has to be
added to dag to coalesce data table write statuses to 1. false otherwise.
Review Comment:
fixed in latest commit.
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java:
##########
@@ -81,6 +81,15 @@ public final class HoodieMetadataConfig extends HoodieConfig
{
+ "in streaming manner rather than two disjoint writes. By default "
+ "streaming writes to metadata table is enabled for SPARK engine
for incremental operations and disabled for all other cases.");
+ public static final ConfigProperty<Integer>
STREAMING_WRITE_DATATABLE_WRITE_STATUSES_COALESCE_DIVIDENT = ConfigProperty
+ .key(METADATA_PREFIX +
".streaming.write.datatable.write.statuses.coalesce.divident")
+ .defaultValue(5000)
+ .markAdvanced()
+ .sinceVersion("1.1.0")
+ .withDocumentation("When streaming writes to metadata table is enabled
via hoodie.metadata.streaming.write.enabled, we had union data table write
statuses "
+ + "with metadata table write statuses before triggering the entire
write dag. While doing so, we had to downscale the data table tasks using "
+ + "coalesce so that we don't trigger 1000s of no-op tasks(data table
writes). The parallelism to use for such coalescing will be determined using
this config");
Review Comment:
fixed in latest commit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]