building spark from 1.3 release without Hive

Mich Talebzadeh Thu, 26 Nov 2015 08:31:59 -0800

Hi,


I am not having much luck making Hive run on Spark! I tried to build spark 
1.5.2 without Hive jards. It worked but could not run hive sql on Spark.

 

I saw in this link:

 

http://stackoverflow.com/questions/33233431/hive-on-spark-java-lang-noclassdeffounderror-org-apache-hive-spark-client-job

 

stating that

 

“This issue was solved by moving to spark 1.3.0 version and rebuilding it 
without hive. – Arvindkumar <http://stackoverflow.com/users/647955/arvindkumar> 
  
<http://stackoverflow.com/questions/33233431/hive-on-spark-java-lang-noclassdeffounderror-org-apache-hive-spark-client-job#comment54530821_33233431>
 Oct 27 “

 

So I downloaded spark 1.3 source and tried to build it myself

 

Using the following command

 

hduser@rhes564::/usr/lib/spark-1.3.0> build/mvn -X -Pyarn -Phadoop-2.6 
-Dhadoop.version=2.6.0 -DskipTests clean package

 

It comes back OK I believe

 

[DEBUG] Scalastyle:check no violations found

[INFO] ------------------------------------------------------------------------

[INFO] Reactor Summary:

[INFO]

[INFO] Spark Project Parent POM ........................... SUCCESS [  3.518 s]

[INFO] Spark Project Networking ........................... SUCCESS [  9.662 s]

[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  5.272 s]

[INFO] Spark Project Core ................................. SUCCESS [02:47 min]

[INFO] Spark Project Bagel ................................ SUCCESS [  6.522 s]

[INFO] Spark Project GraphX ............................... SUCCESS [ 18.118 s]

[INFO] Spark Project Streaming ............................ SUCCESS [ 31.471 s]

[INFO] Spark Project Catalyst ............................. SUCCESS [ 36.314 s]

[INFO] Spark Project SQL .................................. SUCCESS [ 44.442 s]

[INFO] Spark Project ML Library ........................... SUCCESS [ 53.826 s]

[INFO] Spark Project Tools ................................ SUCCESS [  2.879 s]

[INFO] Spark Project Hive ................................. SUCCESS [ 34.870 s]

[INFO] Spark Project REPL ................................. SUCCESS [ 10.789 s]

[INFO] Spark Project YARN ................................. SUCCESS [ 11.262 s]

[INFO] Spark Project Assembly ............................. SUCCESS [01:44 min]

[INFO] Spark Project External Twitter ..................... SUCCESS [  6.754 s]

[INFO] Spark Project External Flume Sink .................. SUCCESS [  5.013 s]

[INFO] Spark Project External Flume ....................... SUCCESS [  8.276 s]

[INFO] Spark Project External MQTT ........................ SUCCESS [  6.630 s]

[INFO] Spark Project External ZeroMQ ...................... SUCCESS [  6.293 s]

[INFO] Spark Project External Kafka ....................... SUCCESS [ 10.764 s]

[INFO] Spark Project Examples ............................. SUCCESS [01:58 min]

[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [  6.819 s]

[INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 35.834 s]

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

[INFO] ------------------------------------------------------------------------

[INFO] Total time: 12:27 min

[INFO] Finished at: 2015-11-26T15:46:05+00:00

[INFO] Final Memory: 82M/691M

[INFO] ------------------------------------------------------------------------

[WARNING] The requested profile "hadoop-2.6" could not be activated because it 
does not exist.

 

 

Now when I try to build a tar file

 

./make-distribution.sh --name "hadoop2-without-hive" --tgz 
"-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided"

 

First I get this

 

***NOTE***: JAVA_HOME is not set to a JDK 6 installation. The resulting

            distribution may not work well with PySpark and will not run

            with Java 6 (See SPARK-1703 and SPARK-1911).

            This test can be disabled by adding --skip-java-test.

Output from 'java -version' was:

java version "1.7.0_25"

Java(TM) SE Runtime Environment (build 1.7.0_25-b15)

Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)

Would you like to continue anyways? [y,n]:

 

Then I get the following error

 

INFO] ------------------------------------------------------------------------

[INFO] Reactor Summary:

[INFO]

[INFO] Spark Project Parent POM ........................... SUCCESS [  3.534 s]

[INFO] Spark Project Networking ........................... SUCCESS [  9.733 s]

[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  4.987 s]

[INFO] Spark Project Core ................................. SUCCESS [02:43 min]

[INFO] Spark Project Bagel ................................ SUCCESS [  5.717 s]

[INFO] Spark Project GraphX ............................... SUCCESS [ 17.316 s]

[INFO] Spark Project Streaming ............................ SUCCESS [ 32.133 s]

[INFO] Spark Project Catalyst ............................. SUCCESS [ 36.060 s]

[INFO] Spark Project SQL .................................. SUCCESS [ 41.609 s]

[INFO] Spark Project ML Library ........................... SUCCESS [ 53.484 s]

[INFO] Spark Project Tools ................................ SUCCESS [  2.323 s]

[INFO] Spark Project Hive ................................. SUCCESS [ 33.704 s]

[INFO] Spark Project REPL ................................. SUCCESS [  9.625 s]

[INFO] Spark Project YARN ................................. FAILURE [  0.035 s]

[INFO] Spark Project Assembly ............................. SKIPPED

[INFO] Spark Project External Twitter ..................... SKIPPED

[INFO] Spark Project External Flume Sink .................. SKIPPED

[INFO] Spark Project External Flume ....................... SKIPPED

[INFO] Spark Project External MQTT ........................ SKIPPED

[INFO] Spark Project External ZeroMQ ...................... SKIPPED

[INFO] Spark Project External Kafka ....................... SKIPPED

[INFO] Spark Project Examples ............................. SKIPPED

[INFO] Spark Project YARN Shuffle Service ................. SKIPPED

[INFO] Spark Project External Kafka Assembly .............. SKIPPED

[INFO] ------------------------------------------------------------------------

[INFO] BUILD FAILURE

[INFO] ------------------------------------------------------------------------

[INFO] Total time: 06:54 min

[INFO] Finished at: 2015-11-26T16:29:02+00:00

[INFO] Final Memory: 52M/475M

[INFO] ------------------------------------------------------------------------

[WARNING] The requested profile "hadoop-2.6" could not be activated because it 
does not exist.

[ERROR] Failed to execute goal on project spark-yarn_2.10: Could not resolve 
dependencies for project org.apache.spark:spark-yarn_2.10:jar:1.3.0: The 
following artifacts could not be resolved: 
org.apache.hadoop:hadoop-yarn-api:jar:1.0.4, 
org.apache.hadoop:hadoop-yarn-common:jar:1.0.4, 
org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:1.0.4, 
org.apache.hadoop:hadoop-yarn-client:jar:1.0.4, 
org.apache.hadoop:hadoop-yarn-server-tests:jar:tests:1.0.4: Failure to find 
org.apache.hadoop:hadoop-yarn-api:jar:1.0.4 in https://repo1.maven.org/maven2 
was cached in the local repository, resolution will not be reattempted until 
the update interval of central has elapsed or updates are forced -> [Help 1]

[ERROR]

[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.

[ERROR] Re-run Maven using the -X switch to enable full debug logging.

[ERROR]

[ERROR] For more information about the errors and possible solutions, please 
read the following articles:

[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

[ERROR]

[ERROR] After correcting the problems, you can resume the build with the command

[ERROR]   mvn <goals> -rf :spark-yarn_2.10

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Mich Talebzadeh [mailto:m...@peridale.co.uk] 
Sent: 25 November 2015 23:42
To: u...@hive.apache.org
Subject: RE: hive1.2.1 on spark connection time out

 

Yes I sorted out the issue. It was using an older version of maven when I was 
running as hduser (not root)

 

 

hduser@rhes564::/usr/lib/spark> build/mvn -X -Pyarn -Phadoop-2.6 
-Dhadoop.version=2.6.0 -DskipTests clean package > log

Using `mvn` from path: /usr/local/apache-maven/apache-maven-3.3.1/bin/mvn

 

WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed 
with message:

Detected Maven Version: 3.3.1 is not in the allowed range 3.3.3.

 

Changed the maven version in the environment file to use maven-3.3.3 build for 
user hduser and ran the command again and it worked

 

build/mvn -X -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean 
package > log

Using `mvn` from path: /usr/local/apache-maven/apache-maven-3.3.3/bin/mvn

……

 

INFO] ------------------------------------------------------------------------

[INFO] Reactor Summary:

[INFO]

[INFO] Spark Project Parent POM ........................... SUCCESS [ 39.937 s]

[INFO] Spark Project Launcher ............................. SUCCESS [ 44.718 s]

[INFO] Spark Project Networking ........................... SUCCESS [ 11.294 s]

[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  4.720 s]

[INFO] Spark Project Unsafe ............................... SUCCESS [ 10.705 s]

[INFO] Spark Project Core ................................. SUCCESS [02:52 min]

[INFO] Spark Project Bagel ................................ SUCCESS [  5.937 s]

[INFO] Spark Project GraphX ............................... SUCCESS [ 15.977 s]

[INFO] Spark Project Streaming ............................ SUCCESS [ 36.453 s]

[INFO] Spark Project Catalyst ............................. SUCCESS [ 54.381 s]

[INFO] Spark Project SQL .................................. SUCCESS [01:07 min]

[INFO] Spark Project ML Library ........................... SUCCESS [01:22 min]

[INFO] Spark Project Tools ................................ SUCCESS [  2.493 s]

[INFO] Spark Project Hive ................................. SUCCESS [ 58.496 s]

[INFO] Spark Project REPL ................................. SUCCESS [  9.278 s]

[INFO] Spark Project YARN ................................. SUCCESS [ 12.424 s]

[INFO] Spark Project Assembly ............................. SUCCESS [01:51 min]

[INFO] Spark Project External Twitter ..................... SUCCESS [  7.604 s]

[INFO] Spark Project External Flume Sink .................. SUCCESS [  7.580 s]

[INFO] Spark Project External Flume ....................... SUCCESS [  9.526 s]

[INFO] Spark Project External Flume Assembly .............. SUCCESS [  3.163 s]

[INFO] Spark Project External MQTT ........................ SUCCESS [ 31.774 s]

[INFO] Spark Project External MQTT Assembly ............... SUCCESS [  8.698 s]

[INFO] Spark Project External ZeroMQ ...................... SUCCESS [  6.992 s]

[INFO] Spark Project External Kafka ....................... SUCCESS [ 11.487 s]

[INFO] Spark Project Examples ............................. SUCCESS [02:12 min]

[INFO] Spark Project External Kafka Assembly .............. SUCCESS [  9.046 s]

[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [  6.097 s]

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

[INFO] ------------------------------------------------------------------------

[INFO] Total time: 16:16 min

[INFO] Finished at: 2015-11-25T23:34:35+00:00

[INFO] Final Memory: 90M/1312M

[INFO] ------------------------------------------------------------------------

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Xuefu Zhang [mailto:xzh...@cloudera.com] 
Sent: 25 November 2015 23:33
To: u...@hive.apache.org <mailto:u...@hive.apache.org> 
Subject: Re: hive1.2.1 on spark connection time out

 

There usually a few more messages before this but after "spark-submit" in 
hive.log. Do you have spark.home set?

 

On Sun, Nov 22, 2015 at 10:17 PM, zhangjp <smart...@hotmail.com 
<mailto:smart...@hotmail.com> > wrote:

 

I'm using hive1.2.1 . I want to run hive on spark model,but there is some 
issues.

have been set spark.master=yarn-client;

spark version  1.4.1 which run spark-shell --master yarn-client there is no 
problem.

 

log

2015-11-23 13:54:56,068 ERROR [main]: spark.SparkTask 
(SessionState.java:printError(960)) - Failed to execute spark task, with 
exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create 
spark client.)'

org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.

at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:57)

at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:116)

at 
org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:112)

at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:101)

at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)

at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)

at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)

at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)

at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)

at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)

at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)

at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)

at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)

at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)

at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.util.concurrent.TimeoutException: Timed out waiting for client connection.

at com.google.common.base.Throwables.propagate(Throwables.java:156)

at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:109)

at 
org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)

at 
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:90)

at 
org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:65)

at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)

... 21 more

Caused by: java.util.concurrent.ExecutionException: 
java.util.concurrent.TimeoutException: Timed out waiting for client connection.

at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)

at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:99)

... 25 more

Caused by: java.util.concurrent.TimeoutException: Timed out waiting for client 
connection.

at org.apache.hive.spark.client.rpc.RpcServer$2.run(RpcServer.java:141)

at 
io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)

at 
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:123)

at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)

at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)

at java.lang.Thread.run(Thread.java:745)

building spark from 1.3 release without Hive

Reply via email to