Till Rohrmann created FLINK-10866: ------------------------------------- Summary: Queryable state can prevent cluster from starting Key: FLINK-10866 URL: https://issues.apache.org/jira/browse/FLINK-10866 Project: Flink Issue Type: Improvement Components: Local Runtime Affects Versions: 1.6.2, 1.5.5, 1.7.0 Reporter: Till Rohrmann
The {{KvStateServerImpl}} can currently prevent the {{TaskExecutor}} from starting. Currently, the QS server starts per default on port {{9067}}. If this port is not free, then it fails and stops the whole initialization of the {{TaskExecutor}}. I think the QS server should not stop the {{TaskExecutor}} from starting. We should at least change the default port to {{0}} to avoid port conflicts. However, this will break all setups which don't explicitly set the QS port because now it either needs to be setup or extracted from the logs. Additionally, we should think about whether a QS server startup failure should lead to a {{TaskExecutor}} failure or simply be logged. Both approaches have pros and cons. Currently, a failing QS server will also affect users which don't want to use QS. If we tolerate failures in the QS server, then a user who wants to use QS might run into problems with state not being reachable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)