Does anyone have suggestions in setting property of hive-exec-2.0.0.jar
path in application?
Something like
'hiveConf.set("hive.remote.driver.jar","hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.2016-03-11 10:53 GMT+08:00 Stana <[email protected]>: > Thanks for reply > > I have set the property spark.home in my application. Otherwise the > application threw 'SPARK_HOME not found exception'. > > I found hive source code in SparkClientImpl.java: > > private Thread startDriver(final RpcServer rpcServer, final String > clientId, final String secret) > throws IOException { > ... > > List<String> argv = Lists.newArrayList(); > > ... > > argv.add("--class"); > argv.add(RemoteDriver.class.getName()); > > String jar = "spark-internal"; > if (SparkContext.jarOfClass(this.getClass()).isDefined()) { > jar = SparkContext.jarOfClass(this.getClass()).get(); > } > argv.add(jar); > > ... > > } > > When hive executed spark-submit , it generate the shell command with > --class org.apache.hive.spark.client.RemoteDriver ,and set jar path with > SparkContext.jarOfClass(this.getClass()).get(). It will get the local path > of hive-exec-2.0.0.jar. > > In my situation, the application and yarn cluster are in different cluster. > When application executed spark-submit with local path of > hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar in > yarn cluster. Then application threw the exception: "hive-exec-2.0.0.jar > does not exist ...". > > Can it be set property of hive-exec-2.0.0.jar path in application ? > Something like 'hiveConf.set("hive.remote.driver.jar", > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'. > If not, is it possible to achieve in the future version? > > > > > 2016-03-10 23:51 GMT+08:00 Xuefu Zhang <[email protected]>: > >> You can probably avoid the problem by set environment variable SPARK_HOME >> or JVM property spark.home that points to your spark installation. >> >> --Xuefu >> >> On Thu, Mar 10, 2016 at 3:11 AM, Stana <[email protected]> wrote: >> >> > I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and >> > executing org.apache.hadoop.hive.ql.Driver with java application. >> > >> > Following are my situations: >> > 1.Building spark 1.4.1 assembly jar without Hive . >> > 2.Uploading the spark assembly jar to the hadoop cluster. >> > 3.Executing the java application with eclipse IDE in my client computer. >> > >> > The application went well and it submitted mr job to the yarn cluster >> > successfully when using " hiveConf.set("hive.execution.engine", "mr") >> > ",but it threw exceptions in spark-engine. >> > >> > Finally, i traced Hive source code and came to the conclusion: >> > >> > In my situation, SparkClientImpl class will generate the spark-submit >> > shell and executed it. >> > The shell command allocated --class with RemoteDriver.class.getName() >> > and jar with SparkContext.jarOfClass(this.getClass()).get(), so that >> > my application threw the exception. >> > >> > Is it right? And how can I do to execute the application with >> > spark-engine successfully in my client computer ? Thanks a lot! >> > >> > >> > Java application code: >> > >> > public class TestHiveDriver { >> > >> > private static HiveConf hiveConf; >> > private static Driver driver; >> > private static CliSessionState ss; >> > public static void main(String[] args){ >> > >> > String sql = "select * from hadoop0263_0 as a join >> > hadoop0263_0 as b >> > on (a.key = b.key)"; >> > ss = new CliSessionState(new >> HiveConf(SessionState.class)); >> > hiveConf = new HiveConf(Driver.class); >> > hiveConf.set("fs.default.name", "hdfs://storm0:9000"); >> > hiveConf.set("yarn.resourcemanager.address", >> > "storm0:8032"); >> > hiveConf.set("yarn.resourcemanager.scheduler.address", >> > "storm0:8030"); >> > >> > >> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031"); >> > hiveConf.set("yarn.resourcemanager.admin.address", >> > "storm0:8033"); >> > hiveConf.set("mapreduce.framework.name", "yarn"); >> > hiveConf.set("mapreduce.johistory.address", >> > "storm0:10020"); >> > >> > >> hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore"); >> > >> > >> hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver"); >> > hiveConf.set("javax.jdo.option.ConnectionUserName", >> > "root"); >> > hiveConf.set("javax.jdo.option.ConnectionPassword", >> > "123456"); >> > hiveConf.setBoolean("hive.auto.convert.join",false); >> > hiveConf.set("spark.yarn.jar", >> > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar"); >> > hiveConf.set("spark.home","target/spark"); >> > hiveConf.set("hive.execution.engine", "spark"); >> > hiveConf.set("hive.dbname", "default"); >> > >> > >> > driver = new Driver(hiveConf); >> > SessionState.start(hiveConf); >> > >> > CommandProcessorResponse res = null; >> > try { >> > res = driver.run(sql); >> > } catch (CommandNeedRetryException e) { >> > // TODO Auto-generated catch block >> > e.printStackTrace(); >> > } >> > >> > System.out.println("Response Code:" + >> > res.getResponseCode()); >> > System.out.println("Error Message:" + >> > res.getErrorMessage()); >> > System.out.println("SQL State:" + res.getSQLState()); >> > >> > } >> > } >> > >> > >> > >> > >> > Exception of spark-engine: >> > >> > 16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with >> > argv: >> > >> /Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit >> > --properties-file >> > >> > >> /var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-submit.7697089826296920539.properties >> > --class org.apache.hive.spark.client.RemoteDriver >> > >> > >> /Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar >> > --remote-host MacBook-Pro.local --remote-port 51331 --conf >> > hive.spark.client.connect.timeout=1000 --conf >> > hive.spark.client.server.connect.timeout=90000 --conf >> > hive.spark.client.channel.log.level=null --conf >> > hive.spark.client.rpc.max.size=52428800 --conf >> > hive.spark.client.rpc.threads=8 --conf >> > hive.spark.client.secret.bits=256 >> > 16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO Client: >> > 16/03/10 18:33:09 INFO SparkClientImpl: client token: N/A >> > 16/03/10 18:33:09 INFO SparkClientImpl: diagnostics: N/A >> > 16/03/10 18:33:09 INFO SparkClientImpl: ApplicationMaster host: >> > N/A >> > 16/03/10 18:33:09 INFO SparkClientImpl: ApplicationMaster RPC >> > port: -1 >> > 16/03/10 18:33:09 INFO SparkClientImpl: queue: default >> > 16/03/10 18:33:09 INFO SparkClientImpl: start time: >> 1457180833494 >> > 16/03/10 18:33:09 INFO SparkClientImpl: final status: UNDEFINED >> > 16/03/10 18:33:09 INFO SparkClientImpl: tracking URL: >> > http://storm0:8088/proxy/application_1457002628102_0043/ >> > 16/03/10 18:33:09 INFO SparkClientImpl: user: stana >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client: >> > Application report for application_1457002628102_0043 (state: FAILED) >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client: >> > 16/03/10 18:33:10 INFO SparkClientImpl: client token: N/A >> > 16/03/10 18:33:10 INFO SparkClientImpl: diagnostics: >> Application >> > application_1457002628102_0043 failed 1 times due to AM Container for >> > appattempt_1457002628102_0043_000001 exited with exitCode: -1000 >> > 16/03/10 18:33:10 INFO SparkClientImpl: For more detailed output, >> > check application tracking >> > page:http://storm0:8088/proxy/application_1457002628102_0043/Then, >> > click on links to logs of each attempt. >> > 16/03/10 18:33:10 INFO SparkClientImpl: Diagnostics: >> > java.io.FileNotFoundException: File >> > >> > >> file:/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar >> > does not exist >> > 16/03/10 18:33:10 INFO SparkClientImpl: Failing this attempt. Failing >> > the application. >> > 16/03/10 18:33:10 INFO SparkClientImpl: ApplicationMaster host: >> > N/A >> > 16/03/10 18:33:10 INFO SparkClientImpl: ApplicationMaster RPC >> > port: -1 >> > 16/03/10 18:33:10 INFO SparkClientImpl: queue: default >> > 16/03/10 18:33:10 INFO SparkClientImpl: start time: >> 1457180833494 >> > 16/03/10 18:33:10 INFO SparkClientImpl: final status: FAILED >> > 16/03/10 18:33:10 INFO SparkClientImpl: tracking URL: >> > http://storm0:8088/cluster/app/application_1457002628102_0043 >> > 16/03/10 18:33:10 INFO SparkClientImpl: user: stana >> > 16/03/10 18:33:10 INFO SparkClientImpl: Exception in thread "main" >> > org.apache.spark.SparkException: Application >> > application_1457002628102_0043 finished with failed status >> > 16/03/10 18:33:10 INFO SparkClientImpl: at >> > org.apache.spark.deploy.yarn.Client.run(Client.scala:920) >> > 16/03/10 18:33:10 INFO SparkClientImpl: at >> > org.apache.spark.deploy.yarn.Client$.main(Client.scala:966) >> > 16/03/10 18:33:10 INFO SparkClientImpl: at >> > org.apache.spark.deploy.yarn.Client.main(Client.scala) >> > 16/03/10 18:33:10 INFO SparkClientImpl: at >> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > 16/03/10 18:33:10 INFO SparkClientImpl: at >> > >> > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> > 16/03/10 18:33:10 INFO SparkClientImpl: at >> > >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > 16/03/10 18:33:10 INFO SparkClientImpl: at >> > java.lang.reflect.Method.invoke(Method.java:606) >> > 16/03/10 18:33:10 INFO SparkClientImpl: at >> > >> > >> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) >> > 16/03/10 18:33:10 INFO SparkClientImpl: at >> > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) >> > 16/03/10 18:33:10 INFO SparkClientImpl: at >> > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) >> > 16/03/10 18:33:10 INFO SparkClientImpl: at >> > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) >> > 16/03/10 18:33:10 INFO SparkClientImpl: at >> > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO >> > ShutdownHookManager: Shutdown hook called >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO >> > ShutdownHookManager: Deleting directory >> > >> > >> /private/var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-5b92ce20-b6f8-4832-8b15-5e98bd0e0705 >> > 16/03/10 18:33:10 WARN SparkClientImpl: Error while waiting for client >> > to connect. >> > java.util.concurrent.ExecutionException: java.lang.RuntimeException: >> > Cancel client '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child >> > process exited before connecting back >> > at >> > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) >> > ~[netty-all-4.0.23.Final.jar:4.0.23.Final] >> > at >> > >> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:101) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:94) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:117) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239) >> > [hive-exec-2.0.0.jar:2.0.0] >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479) >> > [hive-exec-2.0.0.jar:?] >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319) >> > [hive-exec-2.0.0.jar:?] >> > at >> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255) >> > [hive-exec-2.0.0.jar:?] >> > at >> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301) >> > [hive-exec-2.0.0.jar:?] >> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184) >> > [hive-exec-2.0.0.jar:?] >> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172) >> > [hive-exec-2.0.0.jar:?] >> > at >> > org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41) >> > [test-classes/:?] >> > Caused by: java.lang.RuntimeException: Cancel client >> > '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child process exited >> > before connecting back >> > at >> > >> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179) >> > ~[hive-exec-2.0.0.jar:2.0.0] >> > at >> > >> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450) >> > ~[hive-exec-2.0.0.jar:2.0.0] >> > at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_67] >> > 16/03/10 18:33:10 WARN SparkClientImpl: Child process exited with code >> 1. >> > FAILED: SemanticException Failed to get a spark session: >> > org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create >> > spark client. >> > 16/03/10 18:33:10 ERROR Driver: FAILED: SemanticException Failed to >> > get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: >> > Failed to create spark client. >> > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a >> > spark session: org.apache.hadoop.hive.ql.metadata.HiveException: >> > Failed to create spark client. >> > at >> > >> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:121) >> > at >> > >> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) >> > at >> > >> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) >> > at >> > >> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) >> > at >> > >> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158) >> > at >> > >> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) >> > at >> > >> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181) >> > at >> > >> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119) >> > at >> > >> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102) >> > at >> > >> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195) >> > at >> > >> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229) >> > at >> > >> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239) >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479) >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319) >> > at >> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255) >> > at >> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301) >> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184) >> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172) >> > at >> > org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41) >> > >> > >
