[ https://issues.apache.org/jira/browse/FLINK-32098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Pohl resolved FLINK-32098. ----------------------------------- Assignee: Weijie Guo Resolution: Fixed master: c6d58e17e8ce736a062234e1558ac8d7b65990ef > Dispatcher#submitJob calls Dispatcher#isInGloballyTerminalState up to three > times which might be expensive due to IO > -------------------------------------------------------------------------------------------------------------------- > > Key: FLINK-32098 > URL: https://issues.apache.org/jira/browse/FLINK-32098 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.17.0, 1.16.1, 1.18.0 > Reporter: Matthias Pohl > Assignee: Weijie Guo > Priority: Major > Labels: pull-request-available > > {{Dispatcher#submitJob}} calls {{Dispatcher#isInGloballyTerminalState}} up to > three times (1x through {{Dispatcher#isDuplicateJob}} and 2x directly) which > calls {{JobResultStore#hasJobResultStore}}. {{hasJobResultStore}} calls > {{hasDirtyJobResultEntry}} and {{hasCleanJobResultEntry}} if the underlying > job hasn't completed globally, yet. Both calls run {{FileSystem#exists}} on > an non-existing file which can be a quite expensive operation (depending on > the {{FileSystem}} implementation for object storage) since it might require > a full table scan. > tbh, so far, nobody complained. But we might want to either reconsider the > {{FileSystemJobResultStore}}/{{JobResultStore#hasJobResultEntry}} > implementation or, at least, reduce the number of > {{isInGloballyTerminalState}} in the {{Dispatcher}} and document the > performance issue in the JavaDoc. -- This message was sent by Atlassian Jira (v8.20.10#820010)