[ https://issues.apache.org/jira/browse/BEAM-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466169#comment-17466169 ]
Ahmet Altay commented on BEAM-13541: ------------------------------------ In addition to this change, would it make sense to have a flag to configure DEFAULT_IN_MEMORY_ELEMENT_COUNT (https://github.com/apache/beam/blob/release-2.34.0/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/join/CoGbkResult.java#L60) similar to the setWorkerCacheMb flag? > Use runtime information to improve CoGroupByKey caching > ------------------------------------------------------- > > Key: BEAM-13541 > URL: https://issues.apache.org/jira/browse/BEAM-13541 > Project: Beam > Issue Type: Improvement > Components: sdk-ideas > Reporter: Sunil Pedapudi > Assignee: Robert Bradshaw > Priority: P2 > Time Spent: 40m > Remaining Estimate: 0h > > Currently, CoGroupByKey creates UnionTables that are Flattened. The Flattened > output is processed by a GroupByKey to produce a CoGbkResult (via > ConstructCoGbkResultFn). > > Given the performance of CoGBK is greatly impacted based on the which > elements are cached in the (finitely sized) in-memory results, it would be > useful if CoGbkResult can use runtime information to prioritize which > elements are stored in-memory. -- This message was sent by Atlassian Jira (v8.20.1#820001)