Davis-Zhang-Onehouse commented on code in PR #13489: URL: https://github.com/apache/hudi/pull/13489#discussion_r2180621088
########## hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieEngineContext.java: ########## @@ -129,4 +129,21 @@ public abstract <I, K, V> List<V> reduceByKey( public abstract <I, O> O aggregate(HoodieData<I> data, O zeroValue, Functions.Function2<O, I, O> seqOp, Functions.Function2<O, O, O> combOp); public abstract <T> ReaderContextFactory<T> getReaderContextFactory(HoodieTableMetaClient metaClient); + + /** + * Groups values by key and applies a function to each group of values. + * [1 iterator maps to 1 key] It only guarantees that items returned by the same iterator shares to the same key. + * [exact once across iterators] The item returned by the same iterator will not be returned by other iterators. + * [1 key maps to >= 1 iterators] Items belong to the same shard can be load-balanced across multiple iterators. It's up to API implementations to decide + * load balancing pattern and how many iterators to split into. + * + * @param data The input pair<ShardIndex, Item> to process. + * @param func Function to apply to each group of items with the same shard + * @param maxShardIndex The range of ShardIndex in data parameter. If data contain ShardIndex 1,2,6, any maxShardIndex >=6 is valid. + * @param preservesPartitioning whether to preserve partitioning in the resulting collection. Review Comment: would be great if there is some further clarity on the criteria that code contributors can operate with -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
