Hi,
I am not having much luck making Hive run on Spark! I tried to build spark 1.5.2 without Hive jards. It worked but could not run hive sql on Spark. I saw in this link: http://stackoverflow.com/questions/33233431/hive-on-spark-java-lang-noclassdeffounderror-org-apache-hive-spark-client-job stating that “This issue was solved by moving to spark 1.3.0 version and rebuilding it without hive. – Arvindkumar <http://stackoverflow.com/users/647955/arvindkumar> <http://stackoverflow.com/questions/33233431/hive-on-spark-java-lang-noclassdeffounderror-org-apache-hive-spark-client-job#comment54530821_33233431> Oct 27 “ So I downloaded spark 1.3 source and tried to build it myself Using the following command hduser@rhes564::/usr/lib/spark-1.3.0> build/mvn -X -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package It comes back OK I believe [DEBUG] Scalastyle:check no violations found [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 3.518 s] [INFO] Spark Project Networking ........................... SUCCESS [ 9.662 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 5.272 s] [INFO] Spark Project Core ................................. SUCCESS [02:47 min] [INFO] Spark Project Bagel ................................ SUCCESS [ 6.522 s] [INFO] Spark Project GraphX ............................... SUCCESS [ 18.118 s] [INFO] Spark Project Streaming ............................ SUCCESS [ 31.471 s] [INFO] Spark Project Catalyst ............................. SUCCESS [ 36.314 s] [INFO] Spark Project SQL .................................. SUCCESS [ 44.442 s] [INFO] Spark Project ML Library ........................... SUCCESS [ 53.826 s] [INFO] Spark Project Tools ................................ SUCCESS [ 2.879 s] [INFO] Spark Project Hive ................................. SUCCESS [ 34.870 s] [INFO] Spark Project REPL ................................. SUCCESS [ 10.789 s] [INFO] Spark Project YARN ................................. SUCCESS [ 11.262 s] [INFO] Spark Project Assembly ............................. SUCCESS [01:44 min] [INFO] Spark Project External Twitter ..................... SUCCESS [ 6.754 s] [INFO] Spark Project External Flume Sink .................. SUCCESS [ 5.013 s] [INFO] Spark Project External Flume ....................... SUCCESS [ 8.276 s] [INFO] Spark Project External MQTT ........................ SUCCESS [ 6.630 s] [INFO] Spark Project External ZeroMQ ...................... SUCCESS [ 6.293 s] [INFO] Spark Project External Kafka ....................... SUCCESS [ 10.764 s] [INFO] Spark Project Examples ............................. SUCCESS [01:58 min] [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 6.819 s] [INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 35.834 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 12:27 min [INFO] Finished at: 2015-11-26T15:46:05+00:00 [INFO] Final Memory: 82M/691M [INFO] ------------------------------------------------------------------------ [WARNING] The requested profile "hadoop-2.6" could not be activated because it does not exist. Now when I try to build a tar file ./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided" First I get this ***NOTE***: JAVA_HOME is not set to a JDK 6 installation. The resulting distribution may not work well with PySpark and will not run with Java 6 (See SPARK-1703 and SPARK-1911). This test can be disabled by adding --skip-java-test. Output from 'java -version' was: java version "1.7.0_25" Java(TM) SE Runtime Environment (build 1.7.0_25-b15) Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode) Would you like to continue anyways? [y,n]: Then I get the following error INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 3.534 s] [INFO] Spark Project Networking ........................... SUCCESS [ 9.733 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 4.987 s] [INFO] Spark Project Core ................................. SUCCESS [02:43 min] [INFO] Spark Project Bagel ................................ SUCCESS [ 5.717 s] [INFO] Spark Project GraphX ............................... SUCCESS [ 17.316 s] [INFO] Spark Project Streaming ............................ SUCCESS [ 32.133 s] [INFO] Spark Project Catalyst ............................. SUCCESS [ 36.060 s] [INFO] Spark Project SQL .................................. SUCCESS [ 41.609 s] [INFO] Spark Project ML Library ........................... SUCCESS [ 53.484 s] [INFO] Spark Project Tools ................................ SUCCESS [ 2.323 s] [INFO] Spark Project Hive ................................. SUCCESS [ 33.704 s] [INFO] Spark Project REPL ................................. SUCCESS [ 9.625 s] [INFO] Spark Project YARN ................................. FAILURE [ 0.035 s] [INFO] Spark Project Assembly ............................. SKIPPED [INFO] Spark Project External Twitter ..................... SKIPPED [INFO] Spark Project External Flume Sink .................. SKIPPED [INFO] Spark Project External Flume ....................... SKIPPED [INFO] Spark Project External MQTT ........................ SKIPPED [INFO] Spark Project External ZeroMQ ...................... SKIPPED [INFO] Spark Project External Kafka ....................... SKIPPED [INFO] Spark Project Examples ............................. SKIPPED [INFO] Spark Project YARN Shuffle Service ................. SKIPPED [INFO] Spark Project External Kafka Assembly .............. SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 06:54 min [INFO] Finished at: 2015-11-26T16:29:02+00:00 [INFO] Final Memory: 52M/475M [INFO] ------------------------------------------------------------------------ [WARNING] The requested profile "hadoop-2.6" could not be activated because it does not exist. [ERROR] Failed to execute goal on project spark-yarn_2.10: Could not resolve dependencies for project org.apache.spark:spark-yarn_2.10:jar:1.3.0: The following artifacts could not be resolved: org.apache.hadoop:hadoop-yarn-api:jar:1.0.4, org.apache.hadoop:hadoop-yarn-common:jar:1.0.4, org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:1.0.4, org.apache.hadoop:hadoop-yarn-client:jar:1.0.4, org.apache.hadoop:hadoop-yarn-server-tests:jar:tests:1.0.4: Failure to find org.apache.hadoop:hadoop-yarn-api:jar:1.0.4 in https://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :spark-yarn_2.10 Mich Talebzadeh Sybase ASE 15 Gold Medal Award 2008 A Winning Strategy: Running the most Critical Financial Data on ASE 15 http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4 Publications due shortly: Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8 Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: 25 November 2015 23:42 To: u...@hive.apache.org Subject: RE: hive1.2.1 on spark connection time out Yes I sorted out the issue. It was using an older version of maven when I was running as hduser (not root) hduser@rhes564::/usr/lib/spark> build/mvn -X -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package > log Using `mvn` from path: /usr/local/apache-maven/apache-maven-3.3.1/bin/mvn WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed with message: Detected Maven Version: 3.3.1 is not in the allowed range 3.3.3. Changed the maven version in the environment file to use maven-3.3.3 build for user hduser and ran the command again and it worked build/mvn -X -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package > log Using `mvn` from path: /usr/local/apache-maven/apache-maven-3.3.3/bin/mvn …… INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 39.937 s] [INFO] Spark Project Launcher ............................. SUCCESS [ 44.718 s] [INFO] Spark Project Networking ........................... SUCCESS [ 11.294 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 4.720 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 10.705 s] [INFO] Spark Project Core ................................. SUCCESS [02:52 min] [INFO] Spark Project Bagel ................................ SUCCESS [ 5.937 s] [INFO] Spark Project GraphX ............................... SUCCESS [ 15.977 s] [INFO] Spark Project Streaming ............................ SUCCESS [ 36.453 s] [INFO] Spark Project Catalyst ............................. SUCCESS [ 54.381 s] [INFO] Spark Project SQL .................................. SUCCESS [01:07 min] [INFO] Spark Project ML Library ........................... SUCCESS [01:22 min] [INFO] Spark Project Tools ................................ SUCCESS [ 2.493 s] [INFO] Spark Project Hive ................................. SUCCESS [ 58.496 s] [INFO] Spark Project REPL ................................. SUCCESS [ 9.278 s] [INFO] Spark Project YARN ................................. SUCCESS [ 12.424 s] [INFO] Spark Project Assembly ............................. SUCCESS [01:51 min] [INFO] Spark Project External Twitter ..................... SUCCESS [ 7.604 s] [INFO] Spark Project External Flume Sink .................. SUCCESS [ 7.580 s] [INFO] Spark Project External Flume ....................... SUCCESS [ 9.526 s] [INFO] Spark Project External Flume Assembly .............. SUCCESS [ 3.163 s] [INFO] Spark Project External MQTT ........................ SUCCESS [ 31.774 s] [INFO] Spark Project External MQTT Assembly ............... SUCCESS [ 8.698 s] [INFO] Spark Project External ZeroMQ ...................... SUCCESS [ 6.992 s] [INFO] Spark Project External Kafka ....................... SUCCESS [ 11.487 s] [INFO] Spark Project Examples ............................. SUCCESS [02:12 min] [INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 9.046 s] [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 6.097 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 16:16 min [INFO] Finished at: 2015-11-25T23:34:35+00:00 [INFO] Final Memory: 90M/1312M [INFO] ------------------------------------------------------------------------ Mich Talebzadeh Sybase ASE 15 Gold Medal Award 2008 A Winning Strategy: Running the most Critical Financial Data on ASE 15 http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4 Publications due shortly: Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8 Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. From: Xuefu Zhang [mailto:xzh...@cloudera.com] Sent: 25 November 2015 23:33 To: u...@hive.apache.org <mailto:u...@hive.apache.org> Subject: Re: hive1.2.1 on spark connection time out There usually a few more messages before this but after "spark-submit" in hive.log. Do you have spark.home set? On Sun, Nov 22, 2015 at 10:17 PM, zhangjp <smart...@hotmail.com <mailto:smart...@hotmail.com> > wrote: I'm using hive1.2.1 . I want to run hive on spark model,but there is some issues. have been set spark.master=yarn-client; spark version 1.4.1 which run spark-shell --master yarn-client there is no problem. log 2015-11-23 13:54:56,068 ERROR [main]: spark.SparkTask (SessionState.java:printError(960)) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client. at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:57) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:116) at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:112) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:101) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting for client connection. at com.google.common.base.Throwables.propagate(Throwables.java:156) at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:109) at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:90) at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:65) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55) ... 21 more Caused by: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting for client connection. at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:99) ... 25 more Caused by: java.util.concurrent.TimeoutException: Timed out waiting for client connection. at org.apache.hive.spark.client.rpc.RpcServer$2.run(RpcServer.java:141) at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:123) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745)