[ https://issues.apache.org/jira/browse/FLINK-25316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459923#comment-17459923 ]
Chesnay Schepler commented on FLINK-25316: ------------------------------------------ So I heard you like virtualization... It'd be a bit weird if this would only happen on ARM, but maybe somewhere in there the signal gets lost? Where do we actually interrupt the blobserver thread when doing an orderly shutdown? > BlobServer can get stuck during shutdown > ---------------------------------------- > > Key: FLINK-25316 > URL: https://issues.apache.org/jira/browse/FLINK-25316 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.15.0 > Reporter: Robert Metzger > Priority: Minor > Fix For: 1.15.0 > > > The cluster shutdown can get stuck > {code} > "AkkaRpcService-Supervisor-Termination-Future-Executor-thread-1" #89 daemon > prio=5 os_prio=0 tid=0x0000004017d70000 nid=0x2ec in Object.wait() > [0x000000402a9b5000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00000000d6c48368> (a > org.apache.flink.runtime.blob.BlobServer) > at java.lang.Thread.join(Thread.java:1252) > - locked <0x00000000d6c48368> (a > org.apache.flink.runtime.blob.BlobServer) > at java.lang.Thread.join(Thread.java:1326) > at org.apache.flink.runtime.blob.BlobServer.close(BlobServer.java:319) > at > org.apache.flink.runtime.entrypoint.ClusterEntrypoint.stopClusterServices(ClusterEntrypoint.java:406) > - locked <0x00000000d5d27350> (a java.lang.Object) > at > org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$shutDownAsync$4(ClusterEntrypoint.java:505 > {code} > because the BlobServer.run() method ignores interrupts: > {code} > "BLOB Server listener at 6124" #30 daemon prio=5 os_prio=0 > tid=0x000000401c929800 nid=0x2b4 runnable [0x00000040263f9000] > java.lang.Thread.State: RUNNABLE > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) > at java.net.ServerSocket.implAccept(ServerSocket.java:560) > at java.net.ServerSocket.accept(ServerSocket.java:528) > at > org.apache.flink.util.NetUtils.acceptWithoutTimeout(NetUtils.java:143) > at org.apache.flink.runtime.blob.BlobServer.run(BlobServer.java:268) > {code} > This issue was introduced in FLINK-24156 and first mentioned in > https://issues.apache.org/jira/browse/FLINK-24113?focusedCommentId=17459414&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17459414 -- This message was sent by Atlassian Jira (v8.20.1#820001)