Dominik Dębowczyk created FLINK-39316:
-----------------------------------------

             Summary: BlobServer.getAddress() uses InetAddress.getLocalHost() 
which fails with VPN networking
                 Key: FLINK-39316
                 URL: https://issues.apache.org/jira/browse/FLINK-39316
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
    Affects Versions: 2.1.1, 2.2.0, 2.1.0
            Reporter: Dominik Dębowczyk


{{BlobServer.getAddress()}} returns an unreachable address when the server is 
bound to the wildcard address (0.0.0.0) and the machine's hostname resolves to 
a non-local IP (e.g. a VPN address). This causes blob uploads to fail with 
Connection reset during job submission.
h3. Root Cause

FLINK-38109 changed {{MiniCluster.createBlobServerAddress()}} from using the 
Dispatcher's RPC hostname (typically localhost) to using 
{{{}BlobServer.getAddress(){}}}. When the BlobServer binds to 0.0.0.0, 
{{getAddress()}} falls back to {{InetAddress.getLocalHost()}}

{{InetAddress.getLocalHost()}} resolves the machine's hostname via DNS. On 
machines with VPN software (e.g. corporate VPNs), the hostname can resolve to a 
VPN-assigned IP that is not directly reachable on any local interface.
The TCP connection to 100.64.1.5 completes at the kernel level (routed through 
the VPN) but the packets never reach the local BlobServer's accept queue. The 
BlobServer never processes the request, and the client gets a Connection reset 
when reading the response.
h3. How to Reproduce

Run any test that submits a job with user jars through the {{MiniCluster}} on a 
machine where the hostname resolves to a non-local-interface IP:
{{mvn test -pl flink-table/flink-table-planner 
-Dtest="FunctionITCase#testUsingAddJar"}}
Fails with:
{code}
Caused by: java.io.IOException: PUT operation failed: Connection reset
    at o.a.f.runtime.blob.BlobClient.putInputStream(BlobClient.java:496)
    at o.a.f.runtime.blob.BlobClient.uploadFile(BlobClient.java:545)
Caused by: java.net.SocketException: Connection reset
    at 
o.a.f.runtime.blob.BlobOutputStream.receiveAndCheckPutResponse(BlobOutputStream.java:175){code}

h3. Fix
Replace {{InetAddress.getLocalHost()}} with 
{{InetAddress.getLoopbackAddress()}} in {{BlobServer.getAddress()}}. When a 
server binds to 0.0.0.0 (all interfaces), the loopback address (127.0.0.1) is 
always a valid way to reach it locally. This avoids the dependency on DNS 
hostname resolution which is unreliable across different network configurations.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to