the-other-tim-brown commented on code in PR #13976:
URL: https://github.com/apache/hudi/pull/13976#discussion_r2389670594


##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java:
##########
@@ -81,6 +81,15 @@ public final class HoodieMetadataConfig extends HoodieConfig 
{
           + "in streaming manner rather than two disjoint writes. By default "
           + "streaming writes to metadata table is enabled for SPARK engine 
for incremental operations and disabled for all other cases.");
 
+  public static final ConfigProperty<Integer> 
STREAMING_WRITE_DATATABLE_WRITE_STATUSES_COALESCE_DIVIDENT = ConfigProperty
+      .key(METADATA_PREFIX + 
".streaming.write.datatable.write.statuses.coalesce.divident")
+      .defaultValue(5000)
+      .markAdvanced()
+      .sinceVersion("1.1.0")
+      .withDocumentation("When streaming writes to metadata table is enabled 
via hoodie.metadata.streaming.write.enabled, we had union data table write 
statuses "
+          + "with metadata table write statuses before triggering the entire 
write dag. While doing so, we had to downscale the data table tasks using "
+          + "coalesce so that we don't trigger 1000s of no-op tasks(data table 
writes). The parallelism to use for such coalescing will be determined using 
this config");

Review Comment:
   ```suggestion
         .withDocumentation("When streaming writes to metadata table is enabled 
via hoodie.metadata.streaming.write.enabled, the data table write statuses are 
unioned"
             + "with metadata table write statuses before triggering the entire 
write dag. The data table write statuses will be coalesce down to the number of 
write statuses divided by the specified divisor to avoid triggering "
             + "thousands of no-op tasks for the data table writes which have 
their status cached.");
   ```



##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java:
##########
@@ -81,6 +81,15 @@ public final class HoodieMetadataConfig extends HoodieConfig 
{
           + "in streaming manner rather than two disjoint writes. By default "
           + "streaming writes to metadata table is enabled for SPARK engine 
for incremental operations and disabled for all other cases.");
 
+  public static final ConfigProperty<Integer> 
STREAMING_WRITE_DATATABLE_WRITE_STATUSES_COALESCE_DIVIDENT = ConfigProperty
+      .key(METADATA_PREFIX + 
".streaming.write.datatable.write.statuses.coalesce.divident")

Review Comment:
   `divisor` is the proper term here 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to