i want to use yarn cluster with my current code. if i use
conf.set("spark.master","local[*]") inplace of
conf.set("spark.master","yarn"), everything is very well. but i try to use
yarn in setmaster, my code give an below error.
```
package com.example.pocsparkspring;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.*;
public class poc2 {
public static void main(String[] args) {
try {
System.setProperty("hadoop.home.dir", "D:/hadoop");
String HADOOP_CONF_DIR =
"D:/Dev/ws/poc-spark-spring/src/main/resources/";
Configuration configuration1 = new Configuration();
configuration1.addResource(new Path(HADOOP_CONF_DIR +
"core-site.xml"));
configuration1.addResource(new Path(HADOOP_CONF_DIR +
"hdfs-site.xml"));
configuration1.addResource(new Path(HADOOP_CONF_DIR +
"yarn-site.xml"));
configuration1.addResource(new Path(HADOOP_CONF_DIR +
"mapred-site.xml"));
UserGroupInformation.setConfiguration(configuration1);
UserGroupInformation.loginUserFromKeytab("user",
HADOOP_CONF_DIR + "mykeytab.keytab");
SparkConf conf = new SparkConf();
conf.set("spark.master","yarn");
conf.set("spark.kerberos.keytab", HADOOP_CONF_DIR +
"mykeytab.keytab");
conf.set("spark.kerberos.principal", "user");
conf.set("hadoop.security.authentication", "kerberos");
conf.set("hadoop.security.authorization", "true");
conf.set("dfs.client.use.datanode.hostname", "true");
conf.set("spark.security.credentials.hbase.enabled", "false");
conf.set("spark.security.credentials.hive.enabled", "false");
conf.setAppName("poc");
conf.setJars(new String[]{"target/poc-spark-spring.jar"});
JavaSparkContext sparkContext = new JavaSparkContext(conf);
SparkSession sparkSession = SparkSession.builder()
.sparkContext(sparkContext.sc())
.getOrCreate();
sparkSession.sparkContext().hadoopConfiguration().addResource(new
Path(HADOOP_CONF_DIR + "core-site.xml"));
sparkSession.sparkContext().hadoopConfiguration().addResource(new
Path(HADOOP_CONF_DIR + "hdfs-site.xml"));
sparkSession.sparkContext().hadoopConfiguration().addResource(new
Path(HADOOP_CONF_DIR + "yarn-site.xml"));
sparkSession.sparkContext().hadoopConfiguration().addResource(new
Path(HADOOP_CONF_DIR + "mapred-site.xml"));
sparkSession.sparkContext().conf().set("spark.kerberos.keytab",
HADOOP_CONF_DIR + "mykeytab.keytab");
sparkSession.sparkContext().conf().set("spark.kerberos.principal", "user");
sparkSession.sparkContext().conf().set("hadoop.security.authentication",
"kerberos");
sparkSession.sparkContext().conf().set("hadoop.security.authorization",
"true");
sparkSession.sparkContext().conf().set("spark.security.credentials.hbase.enabled",
"false");
sparkSession.sparkContext().conf().set("spark.security.credentials.hive.enabled",
"false");
Dataset<Row> oracleDF = sparkSession.read()
.format("jdbc")
.option("fetchsize", 10000)
.option("url", "jdbc:oracle:thin:@exampledev
:1521:exampledev")
.option("query", "select * FROM exampletable")
.option("driver", "oracle.jdbc.driver.OracleDriver")
.option("user", "user")
.option("password", "pass")
.load();
oracleDF.show(false);
oracleDF.write().mode(SaveMode.Append).format("csv").save("hdfs:///hive/stage_space/example");
} catch (Exception e) {
e.printStackTrace();
}
}
}
```
as i mentioned my code give an error. firstly everything looks good. i
control cluster gui, i can see in there my job. My job status changed
"accepted" but after a while it gives below error. and i can see same error
in gui log.
```
09:38:45.129 [YARN application state monitor] ERROR
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend - YARN
application has exited unexpectedly with state FAILED! Check the YARN
application logs for more details.
09:38:45.129 [YARN application state monitor] ERROR
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend - Diagnostics
message: Uncaught exception: org.apache.spark.SparkException: Exception
thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:110)
at
org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:558)
at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:277)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:926)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:925)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:925)
at
org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:957)
at
org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.io.IOException: Failed to connect to user.../
172.21.242.26:59690
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
at
org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException:
Connection timed out: user.../172.21.242.26:59690
Caused by: java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715)
at
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:750)
```
actually i couldnt sure, can i use yarn with my code like this. Because i
couldnt found any example like my code.