walterddr commented on a change in pull request #13163: URL: https://github.com/apache/flink/pull/13163#discussion_r471568050
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/management/JMXService.java ########## @@ -85,6 +86,9 @@ private static JMXServer startJMXServerWithPortRanges(Iterator<Integer> ports) { while (ports.hasNext() && successfullyStartedServer == null) { JMXServer server = new JMXServer(); int port = ports.next(); + if (port == 0) { // try poke with a random port when port is set to zero + port = tryPokeForNewPort(); Review comment: Good point. This approach is yes in fact unreliable. and consider # of containers our typical prod use case has, it's very likely that >1 of them will hit the corner case. ########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/management/JMXService.java ########## @@ -85,6 +86,9 @@ private static JMXServer startJMXServerWithPortRanges(Iterator<Integer> ports) { while (ports.hasNext() && successfullyStartedServer == null) { JMXServer server = new JMXServer(); int port = ports.next(); + if (port == 0) { // try poke with a random port when port is set to zero Review comment: actually this mapping to a large port range solution might be much more reliable, just one concern: Since JMXServer part is on the container startup code path, I was wondering if we should set a timeout limit on the while loop - I haven't done any profiling but even if we assume each port poking takes ~1ms, this is still significant. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org