Hi,

I'm running Spark 2.0.1 version with Spark Launcher 2.0.1 version on Yarn
cluster. I launch map task which spawns Spark job via
SparkLauncher#startApplication().

Deploy mode is yarn-client. I'm running in Mac laptop.

I have this snippet of code:

SparkAppHandle appHandle = sparkLauncher.startApplication();
while (appHandle.getState() == null || !appHandle.getState().isFinal()) {
    if (appHandle.getState() != null) {
       * log.info <http://log.info>("while: Spark job state is : " +
appHandle.getState());*
        if (appHandle.getAppId() != null) {
            log.info("\t App id: " + appHandle.getAppId() + "\tState: " +
appHandle.getState());
        }
    }
}

The above snippet of code works fine, both spark job and the map task which
spawns that Spark job successfully completes.

But if i comment out the red highlighted line, then the Spark job launches
and finishes successfully, but the map task hangs for a while (in Running
state) and then fails with the exception below.

I run exact same code in exact same environment except that one line
commented out.

When the highlighted line is commented out, I even see the 2nd log line in
the stderr either, it seems appHandle hook never returns back anything
(neither app id nor app state), even though spark application starts, runs
and finishes successfully. Inside the same stderr, i can see Spark job
related logs, and spark job results printed, and application report
indicating status.

You can see the exception below (this is from the stderr of the mapper
container which launches Spark job):
---

INFO: Communication exception: java.net.ConnectException: Call From
<my-hostname>/10.3.8.118 to <my-hostname>:53567 failed on connection
exception: java.net.ConnectException: Connection refused;

Caused by: java.net.ConnectException: Connection refused

        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

        at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)

        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)

        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)

        at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)

        at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)

        at
org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)

        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)

        at org.apache.hadoop.ipc.Client.call(Client.java:1451)

        ... 5 more

---

Nov 05, 2016 2:41:54 AM org.apache.hadoop.ipc.Client handleConnectionFailure

INFO: Retrying connect to server: <my-hostname>/10.3.8.118:53567. Already
tried 9 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)

Nov 05, 2016 2:41:54 AM org.apache.hadoop.mapred.Task run

INFO: Communication exception: java.net.ConnectException: Call From
<my-hostname>/10.3.8.118 to <my-hostname>:53567 failed on connection
exception: java.net.ConnectException: Connection refused; For more details
see:  http://wiki.apache.org/hadoop/ConnectionRefused

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)

        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)

        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)

        at org.apache.hadoop.ipc.Client.call(Client.java:1479)

        at org.apache.hadoop.ipc.Client.call(Client.java:1412)

        at
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:242)

        at com.sun.proxy.$Proxy9.ping(Unknown Source)

        at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:767)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.net.ConnectException: Connection refused

        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

        at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)

        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)

        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)

        at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)

        at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)

        at
org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)

        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)

        at org.apache.hadoop.ipc.Client.call(Client.java:1451)

        ... 5 more

---

Nov 05, 2016 2:41:54 AM org.apache.hadoop.mapred.Task logThreadInfo

INFO: Process Thread Dump: Communication exception

10 active threads

Thread 24 (org.apache.hadoop.hdfs.PeerCache@4763c727):

  State: TIMED_WAITING

  Blocked count: 0

  Waited count: 79

  Stack:

    java.lang.Thread.sleep(Native Method)

    org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:255)

    org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:46)

    org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:124)

    java.lang.Thread.run(Thread.java:745)

Reply via email to