These missing classes are in hadoop jar. If you have HADOOP_HOME set, then they should be in Hive classpath.
--Xuefu On Thu, Dec 17, 2015 at 10:12 AM, Ophir Etzion <op...@foursquare.com> wrote: > it seems like the problem is that the spark client needs FSDataInputStream > but is not included in the hive-exec-1.1.0-cdh5.4.3.jar that is passed in > the class path. > I need to look more in spark-submit / org.apache.spark.deploy to see if > there is a way to include more jars. > > > 2015-12-17 17:34:01,679 INFO org.apache.hive.spark.client.SparkClientImpl: > Running client driver with argv: > /export/hdb3/data/cloudera/parcels/CDH-5.4.3-1.cdh5.4.3.p0.6/lib/spark/bin/spark-submit > --executor-cores 1 --executor-memory 268435456 --proxy-user anonymous > --properties-file /tmp/spark-submit.1508744664719491459.properties --class > org.apache.hive.spark.client.RemoteDriver > /export/hdb3/data/cloudera/parcels/CDH-5.4.3-1.cdh5.4.3.p0.6/jars/hive-exec-1.1.0-cdh5.4.3.jar > --remote-host ezaq6.prod.foursquare.com --remote-port 44306 --conf > hive.spark.client.connect.timeout=1000 --conf > hive.spark.client.server.connect.timeout=90000 --conf > hive.spark.client.channel.log.level=null --conf > hive.spark.client.rpc.max.size=52428800 --conf > hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256 > 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/fs/FSDataInputStream > 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at > org.apache.spark.deploy.SparkSubmitDriverBootstrapper$.main(SparkSubmitDriverBootstrapper.scala:71) > 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at > org.apache.spark.deploy.SparkSubmitDriverBootstrapper.main(SparkSubmitDriverBootstrapper.scala) > 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.fs.FSDataInputStream > 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at > java.net.URLClassLoader$1.run(URLClassLoader.java:366) > 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at > java.net.URLClassLoader$1.run(URLClassLoader.java:355) > 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at > java.security.AccessController.doPrivileged(Native Method) > 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at > java.net.URLClassLoader.findClass(URLClassLoader.java:354) > 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at > java.lang.ClassLoader.loadClass(ClassLoader.java:425) > 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at > java.lang.ClassLoader.loadClass(ClassLoader.java:358) > 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: ... > 2 more > 2015-12-17 17:34:02,438 WARN org.apache.hive.spark.client.SparkClientImpl: > Child process exited with code 1. > > On Tue, Dec 15, 2015 at 11:15 PM, Xuefu Zhang <xzh...@cloudera.com> wrote: > >> As to the spark versions that are supported. Spark has made >> non-compatible API changes in 1.5, and that's the reason why Hive 1.1.0 >> doesn't work with Spark 1.5. However, the latest Hive in master or branch-1 >> should work with spark 1.5. >> >> Also, later CDH 5.4.x versions have already supported Spark 1.5. CDH 5.7, >> which is coming so, will support Spark 1.6. >> >> --Xuefu >> >> On Tue, Dec 15, 2015 at 3:50 PM, Mich Talebzadeh <m...@peridale.co.uk> >> wrote: >> >>> To answer your point: >>> >>> >>> >>> “why would spark 1.5.2 specifically would not work with hive?” >>> >>> >>> >>> Because I tried Spark 1.5.2 and it did not work and unfortunately the >>> only version seem to work (albeit requires messaging around) is version >>> 1.3.1 of Spark. >>> >>> >>> >>> Look at the threads on “Managed to make Hive run on Spark engine” in >>> user@hive.apache.org >>> >>> >>> >>> >>> >>> HTH, >>> >>> >>> >>> >>> >>> Mich Talebzadeh >>> >>> >>> >>> *Sybase ASE 15 Gold Medal Award 2008* >>> >>> A Winning Strategy: Running the most Critical Financial Data on ASE 15 >>> >>> >>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf >>> >>> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE >>> 15", ISBN 978-0-9563693-0-7*. >>> >>> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN >>> 978-0-9759693-0-4* >>> >>> *Publications due shortly:* >>> >>> *Complex Event Processing in Heterogeneous Environments*, ISBN: >>> 978-0-9563693-3-8 >>> >>> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume >>> one out shortly >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> NOTE: The information in this email is proprietary and confidential. >>> This message is for the designated recipient only, if you are not the >>> intended recipient, you should destroy it immediately. Any information in >>> this message shall not be understood as given or endorsed by Peridale >>> Technology Ltd, its subsidiaries or their employees, unless expressly so >>> stated. It is the responsibility of the recipient to ensure that this email >>> is virus free, therefore neither Peridale Ltd, its subsidiaries nor their >>> employees accept any responsibility. >>> >>> >>> >>> *From:* Ophir Etzion [mailto:op...@foursquare.com] >>> *Sent:* 15 December 2015 22:42 >>> *To:* user@hive.apache.org >>> *Cc:* u...@spark.apache.org >>> *Subject:* Re: Hive on Spark - Error: Child process exited before >>> connecting back >>> >>> >>> >>> Hi, >>> >>> the versions are spark 1.3.0 and hive 1.1.0 as part of cloudera 5.4.3. >>> >>> I find it weird that it would work only on the version you mentioned as >>> there is documentation (not good documentation but still..) on how to do it >>> with cloudera that packages different versions. >>> >>> Thanks for the answer though. >>> >>> why would spark 1.5.2 specifically would not work with hive? >>> >>> >>> >>> Ophir >>> >>> >>> >>> On Tue, Dec 15, 2015 at 5:33 PM, Mich Talebzadeh <m...@peridale.co.uk> >>> wrote: >>> >>> Hi, >>> >>> >>> >>> The only version that I have managed to run Hive using Spark engine is >>> Spark 1.3.1 on Hive 1.2.1 >>> >>> >>> >>> Can you confirm the version of Spark you are running? >>> >>> >>> >>> FYI, Spark 1.5.2 will not work with Hive. >>> >>> >>> >>> HTH >>> >>> >>> >>> Mich Talebzadeh >>> >>> >>> >>> *Sybase ASE 15 Gold Medal Award 2008* >>> >>> A Winning Strategy: Running the most Critical Financial Data on ASE 15 >>> >>> >>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf >>> >>> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE >>> 15", ISBN 978-0-9563693-0-7*. >>> >>> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN >>> 978-0-9759693-0-4* >>> >>> *Publications due shortly:* >>> >>> *Complex Event Processing in Heterogeneous Environments*, ISBN: >>> 978-0-9563693-3-8 >>> >>> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume >>> one out shortly >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> NOTE: The information in this email is proprietary and confidential. >>> This message is for the designated recipient only, if you are not the >>> intended recipient, you should destroy it immediately. Any information in >>> this message shall not be understood as given or endorsed by Peridale >>> Technology Ltd, its subsidiaries or their employees, unless expressly so >>> stated. It is the responsibility of the recipient to ensure that this email >>> is virus free, therefore neither Peridale Ltd, its subsidiaries nor their >>> employees accept any responsibility. >>> >>> >>> >>> *From:* Ophir Etzion [mailto:op...@foursquare.com] >>> *Sent:* 15 December 2015 22:27 >>> *To:* u...@spark.apache.org; user@hive.apache.org >>> *Subject:* Hive on Spark - Error: Child process exited before >>> connecting back >>> >>> >>> >>> Hi, >>> >>> >>> >>> when trying to do Hive on Spark on CDH5.4.3 I get the following error >>> when trying to run a simple query using spark. >>> >>> I've tried setting everything written here ( >>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started) >>> as well as what the cdh recommends. >>> >>> any one encountered this as well? (searching for it didn't help much) >>> >>> the error: >>> >>> ERROR : Failed to execute spark task, with exception >>> 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark >>> client.)' >>> >>> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark >>> client. >>> >>> at >>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:57) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:120) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:97) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) >>> >>> at >>> org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1640) >>> >>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1399) >>> >>> at >>> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183) >>> >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) >>> >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044) >>> >>> at >>> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:144) >>> >>> at >>> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69) >>> >>> at >>> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196) >>> >>> at java.security.AccessController.doPrivileged(Native Method) >>> >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) >>> >>> at >>> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208) >>> >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> >>> at java.lang.Thread.run(Thread.java:745) >>> >>> Caused by: java.lang.RuntimeException: >>> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel >>> client '2b2d7314-e0cc-4933-82a1-992a3299d109'. Error: Child process exited >>> before connecting back >>> >>> at >>> com.google.common.base.Throwables.propagate(Throwables.java:156) >>> >>> at >>> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:109) >>> >>> at >>> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:91) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:65) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55) >>> >>> ... 22 more >>> >>> Caused by: java.util.concurrent.ExecutionException: >>> java.lang.RuntimeException: Cancel client >>> '2b2d7314-e0cc-4933-82a1-992a3299d109'. Error: Child process exited before >>> connecting back >>> >>> at >>> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) >>> >>> at >>> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:99) >>> >>> ... 26 more >>> >>> Caused by: java.lang.RuntimeException: Cancel client >>> '2b2d7314-e0cc-4933-82a1-992a3299d109'. Error: Child process exited before >>> connecting back >>> >>> at >>> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179) >>> >>> at >>> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:427) >>> >>> ... 1 more >>> >>> >>> >>> ERROR : Failed to execute spark task, with exception >>> 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark >>> client.)' >>> >>> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark >>> client. >>> >>> at >>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:57) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:120) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:97) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) >>> >>> at >>> org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1640) >>> >>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1399) >>> >>> at >>> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183) >>> >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) >>> >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044) >>> >>> at >>> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:144) >>> >>> at >>> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69) >>> >>> at >>> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196) >>> >>> at java.security.AccessController.doPrivileged(Native Method) >>> >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) >>> >>> at >>> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208) >>> >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> >>> at java.lang.Thread.run(Thread.java:745) >>> >>> Caused by: java.lang.RuntimeException: >>> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel >>> client '2b2d7314-e0cc-4933-82a1-992a3299d109'. Error: Child process exited >>> before connecting back >>> >>> at >>> com.google.common.base.Throwables.propagate(Throwables.java:156) >>> >>> at >>> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:109) >>> >>> at >>> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:91) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:65) >>> >>> at >>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55) >>> >>> ... 22 more >>> >>> Caused by: java.util.concurrent.ExecutionException: >>> java.lang.RuntimeException: Cancel client >>> '2b2d7314-e0cc-4933-82a1-992a3299d109'. Error: Child process exited before >>> connecting back >>> >>> at >>> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) >>> >>> at >>> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:99) >>> >>> ... 26 more >>> >>> Caused by: java.lang.RuntimeException: Cancel client >>> '2b2d7314-e0cc-4933-82a1-992a3299d109'. Error: Child process exited before >>> connecting back >>> >>> at >>> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179) >>> >>> at >>> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:427) >>> >>> ... 1 more >>> >>> Error: Error while processing statement: FAILED: Execution Error, return >>> code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask >>> (state=08S01,code=1) >>> >>> >>> >>> >>> >> >> >