Component: AppClient Level: beginner Scenario: how-to I have an Zk HA setup for Standalone Spark (have test 2.x.x as well as 3.x.x). I have one URL point to three master nodes/ips. While using spark-submit , the --master spark://<URL>:<PORT> resolves only the first ip. If it is the ACTIVE master, job is successfully submitted, else it says ALL masters unresponsive.
Looking at the code, it seems spark is expecting only 1 ip per URL. Any reason not to accept a URL with multiple IPs? Can this be a valid feature request ? 24/12/27 13:47:13 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-ha-tester.dream11-load.local:7077... 24/12/27 13:47:13 TRACE TransportClientFactory: DNS resolution succeed for spark-ha-tester.dream11-load.local/10.10.19.203:7077 took 0 ms 24/12/27 13:47:13 DEBUG TransportClientFactory: Creating new connection to spark-ha-tester.dream11-load.local/10.10.19.203:7077 24/12/27 13:47:13 DEBUG AbstractByteBuf: -Dio.netty.buffer.checkAccessible: true 24/12/27 13:47:13 DEBUG AbstractByteBuf: -Dio.netty.buffer.checkBounds: true 24/12/27 13:47:13 DEBUG ResourceLeakDetectorFactory: Loaded default ResourceLeakDetector: io.netty.util.ResourceLeakDetector@20df5859 24/12/27 13:47:13 DEBUG TransportClientFactory: Connection to spark-ha-tester.dream11-load.local/10.10.19.203:7077 successful, running bootstraps... 24/12/27 13:47:13 INFO TransportClientFactory: Successfully created connection to spark-ha-tester.dream11-load.local/10.10.19.203:7077 after 44 ms (0 ms spent in bootstraps) 24/12/27 13:47:13 TRACE TransportClient: Sending RPC to spark-ha-tester.dream11-load.local/10.10.19.203:7077 24/12/27 13:47:13 DEBUG Recycler: -Dio.netty.recycler.maxCapacityPerThread: 4096 24/12/27 13:47:13 DEBUG Recycler: -Dio.netty.recycler.maxSharedCapacityFactor: 2 24/12/27 13:47:13 DEBUG Recycler: -Dio.netty.recycler.linkCapacity: 16 24/12/27 13:47:13 DEBUG Recycler: -Dio.netty.recycler.ratio: 8 24/12/27 13:47:13 DEBUG Recycler: -Dio.netty.recycler.delayedQueue.ratio: 8 24/12/27 13:47:13 TRACE TransportClient: Sending request RPC 7616564823635340173 to spark-ha-tester.dream11-load.local/10.10.19.203:7077 took 25 ms 24/12/27 13:47:13 TRACE MessageDecoder: Received message RpcResponse: RpcResponse[requestId=7616564823635340173,body=NettyManagedBuffer[buf=PooledUnsafeDirectByteBuf(ridx: 21, widx: 68, cap: 1024)]] 24/12/27 13:47:33 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-ha-tester.dream11-load.local:7077... 24/12/27 13:47:33 TRACE TransportClient: Sending RPC to spark-ha-tester.dream11-load.local/10.10.19.203:7077 24/12/27 13:47:33 TRACE TransportClient: Sending request RPC 6029480920073015166 to spark-ha-tester.dream11-load.local/10.10.19.203:7077 took 1 ms 24/12/27 13:47:33 TRACE MessageDecoder: Received message RpcResponse: RpcResponse[requestId=6029480920073015166,body=NettyManagedBuffer[buf=PooledUnsafeDirectByteBuf(ridx: 21, widx: 68, cap: 1024)]] 24/12/27 13:47:53 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-ha-tester.dream11-load.local:7077... 24/12/27 13:47:53 TRACE TransportClient: Sending RPC to spark-ha-tester.dream11-load.local/10.10.19.203:7077 24/12/27 13:47:53 TRACE TransportClient: Sending request RPC 8634193967572417974 to spark-ha-tester.dream11-load.local/10.10.19.203:7077 took 1 ms 24/12/27 13:47:53 TRACE MessageDecoder: Received message RpcResponse: RpcResponse[requestId=8634193967572417974,body=NettyManagedBuffer[buf=PooledUnsafeDirectByteBuf(ridx: 21, widx: 68, cap: 512)]] 24/12/27 13:48:13 TRACE HeartbeatReceiver: Checking for hosts with no recent heartbeats in HeartbeatReceiver. 24/12/27 13:48:13 ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. -- Disclaimer: This message and its attachments contain confidential and legally privileged information. Dream Sports Group, including all its affiliates and subsidiaries, provides no warranties with respect to the contents of this communication and disclaims any and all liability for reliance thereon.