Heart Zhou created FLINK-37606: ---------------------------------- Summary: Blocklist timeout check may lost Key: FLINK-37606 URL: https://issues.apache.org/jira/browse/FLINK-37606 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.20.1 Reporter: Heart Zhou
The blocklist timeout check may be scheduled before the rpc server starts The blocklist timeout check is scheduled by the mainThreadExecutor in the constructor. {code:java} DefaultBlocklistHandler(xxx, Duration timeoutCheckInterval, ComponentMainThreadExecutor mainThreadExecutor, xxx) { xxx this.timeoutCheckInterval = checkNotNull(timeoutCheckInterval); this.mainThreadExecutor = checkNotNull(mainThreadExecutor); xxx scheduleTimeoutCheck(); } {code} When the check function is called, the org.apache.flink.runtime.rpc.RpcEndpoint#start method may not have been called yet, although it will be called very soon. Therefore, the check function might be lost. {code:java} public ScheduledFuture<?> schedule(Runnable command, long delay, TimeUnit unit) { final long delayMillis = TimeUnit.MILLISECONDS.convert(delay, unit); FutureTask<Void> ft = new FutureTask<>(command, null); if (mainScheduledExecutor.isShutdown()) { log.warn( "The scheduled executor service is shutdown and ignores the command {}", command); } else { mainScheduledExecutor.schedule( () -> gateway.runAsync(ft), delayMillis, TimeUnit.MILLISECONDS); } return new ScheduledFutureAdapter<>(ft, delayMillis, TimeUnit.MILLISECONDS); }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)