nsivabalan commented on a change in pull request #4420: URL: https://github.com/apache/hudi/pull/4420#discussion_r784786957
########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java ########## @@ -143,6 +143,12 @@ .withDocumentation("Determines how to handle updates, deletes to file groups that are under clustering." + " Default strategy just rejects the update"); + public static final ConfigProperty<String> ASYNC_CLUSTERING_SCHEDULE = ConfigProperty + .key("hoodie.clustering.schedule.async") + .defaultValue("false") + .withDocumentation("When set to true, clustering service will be attempted for scheduling after each write. Users have to ensure " + + "they have a separate job to run async clustering(execution) for the one scheduled by this writer"); Review comment: I have fixed the config to `hoodie.clustering.schedule.inline` and var name to SCHEDULE_INLINE_CLUSTERING since scheduling is done INLINE with this config and only execution is async. ########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java ########## @@ -90,6 +90,12 @@ .withDocumentation("When set to true, compaction service is triggered after each write. While being " + " simpler operationally, this adds extra latency on the write path."); + public static final ConfigProperty<String> SCHEDULE_ASYNC_COMPACT = ConfigProperty + .key("hoodie.compact.schedule.async") + .defaultValue("false") Review comment: have responded else where. prefer to keep it as is to be in sync w/ existing inline compaction config. ########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java ########## @@ -177,6 +178,16 @@ .withDocumentation("Determines how to handle updates, deletes to file groups that are under clustering." + " Default strategy just rejects the update"); + public static final ConfigProperty<String> SCHEDULE_INLINE_CLUSTERING = ConfigProperty + .key("hoodie.clustering.schedule.inline") Review comment: responded else where. I wanted to keep it in sync w/ existing inline clustering config. "hoodie.clustering.inline", and the variable is named as INLINE_CLUSTERING. Guess it goes well w/ how the getter method is named (inlineClusteringEnabled). So, prefer to leave it as is. ########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java ########## @@ -90,6 +90,12 @@ .withDocumentation("When set to true, compaction service is triggered after each write. While being " + " simpler operationally, this adds extra latency on the write path."); + public static final ConfigProperty<String> SCHEDULE_ASYNC_COMPACT = ConfigProperty + .key("hoodie.compact.schedule.async") + .defaultValue("false") Review comment: same here. have fixed the naming. ########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java ########## @@ -1116,13 +1135,16 @@ protected boolean scheduleCleaningAtInstant(String instantTime, Option<Map<Strin /** * Executes a clustering plan on a table, serially before or after an insert/upsert action. + * Schedules clustering inline and may be optionally execute. */ - protected Option<String> inlineCluster(Option<Map<String, String>> extraMetadata) { + protected Option<String> inlineScheduleClusterAndOptionallyExecute(Option<Map<String, String>> extraMetadata, boolean executeInline) { Option<String> clusteringInstantOpt = scheduleClustering(extraMetadata); - clusteringInstantOpt.ifPresent(clusteringInstant -> { - // inline cluster should auto commit as the user is never given control - cluster(clusteringInstant, true); - }); + if (executeInline) { + clusteringInstantOpt.ifPresent(clusteringInstant -> { Review comment: same response as above. ########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java ########## @@ -464,27 +464,43 @@ protected void postCommit(HoodieTable<T, I, K, O> table, HoodieCommitMetadata me } protected void runTableServicesInline(HoodieTable<T, I, K, O> table, HoodieCommitMetadata metadata, Option<Map<String, String>> extraMetadata) { - if (config.areAnyTableServicesInline()) { + if (config.areAnyTableServicesInline() || config.scheduleInlineTableServices()) { if (config.isMetadataTableEnabled()) { Review comment: have renamed the methods to be explicit. areAnyTableServicesInline -> areAnyTableServicesExecutedInline scheduleInlineTableServices -> areAnyTableServicesScheduledInline ########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java ########## @@ -464,27 +464,43 @@ protected void postCommit(HoodieTable<T, I, K, O> table, HoodieCommitMetadata me } protected void runTableServicesInline(HoodieTable<T, I, K, O> table, HoodieCommitMetadata metadata, Option<Map<String, String>> extraMetadata) { - if (config.areAnyTableServicesInline()) { + if (config.areAnyTableServicesInline() || config.scheduleInlineTableServices()) { if (config.isMetadataTableEnabled()) { table.getHoodieView().sync(); } // Do an inline compaction if enabled if (config.inlineCompactionEnabled()) { runAnyPendingCompactions(table); metadata.addMetadata(HoodieCompactionConfig.INLINE_COMPACT.key(), "true"); - inlineCompact(extraMetadata); + inlineScheduleCompactAndOptionallyExecute(extraMetadata, !config.scheduleInlineCompaction()); Review comment: just scheduling is only one line. ``` Option<String> compactionInstantTimeOpt = scheduleCompaction(extraMetadata); ``` thats the reason did not find a need to create a new method just for scheduling inline. Infact I did had it that way, on my first attempt. but did not feel the need and later removed it. With new naming, I guess its clear that execution is optional. So, I feel we should be ok here. ########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java ########## @@ -1005,13 +1021,16 @@ protected void rollbackFailedWrites(Map<String, Option<HoodiePendingRollbackInfo /** * Performs a compaction operation on a table, serially before or after an insert/upsert action. + * Scheduling is done inline, and optionally execution as well. */ - protected Option<String> inlineCompact(Option<Map<String, String>> extraMetadata) { + protected Option<String> inlineScheduleCompactAndOptionallyExecute(Option<Map<String, String>> extraMetadata, boolean executeInline) { Option<String> compactionInstantTimeOpt = scheduleCompaction(extraMetadata); Review comment: have responded else where. scheduling is just oneline. does not warrant a new method. after renaming the method, I feel we are ok. ########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java ########## @@ -464,27 +464,43 @@ protected void postCommit(HoodieTable<T, I, K, O> table, HoodieCommitMetadata me } protected void runTableServicesInline(HoodieTable<T, I, K, O> table, HoodieCommitMetadata metadata, Option<Map<String, String>> extraMetadata) { - if (config.areAnyTableServicesInline()) { + if (config.areAnyTableServicesInline() || config.scheduleInlineTableServices()) { if (config.isMetadataTableEnabled()) { table.getHoodieView().sync(); } // Do an inline compaction if enabled if (config.inlineCompactionEnabled()) { runAnyPendingCompactions(table); metadata.addMetadata(HoodieCompactionConfig.INLINE_COMPACT.key(), "true"); - inlineCompact(extraMetadata); + inlineScheduleCompactAndOptionallyExecute(extraMetadata, !config.scheduleInlineCompaction()); } else { metadata.addMetadata(HoodieCompactionConfig.INLINE_COMPACT.key(), "false"); } + // if just inline schedule is enabled + if (!config.inlineCompactionEnabled() && config.scheduleInlineCompaction() + && !table.getActiveTimeline().getWriteTimeline().filterPendingCompactionTimeline().getInstants().findAny().isPresent()) { + // proceed only if there are no pending compactions Review comment: 1. Usually we try to have methods in HoodieActivetimeline that are used by many callers so that we can avoid duplication or errors. For one off cases like this, we keep it away from HoodieActivetimeline. 2. somehow I feel we are over populating HoodieWriteConfig. This is a very specific use-case and probably we can leave it here. condition here is very specific in the sense that, we need to check if inline is not enabled, but schedule inline is enabled. and I don't see this being used anywhere else. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org