Re: Hive on Spark - Error: Child process exited before connecting back

Xuefu Zhang Thu, 17 Dec 2015 13:52:40 -0800

These missing classes are in hadoop jar. If you have HADOOP_HOME set, then
they should be in Hive classpath.


--Xuefu

On Thu, Dec 17, 2015 at 10:12 AM, Ophir Etzion <op...@foursquare.com> wrote:

> it seems like the problem is that the spark client needs FSDataInputStream
> but is not included in the hive-exec-1.1.0-cdh5.4.3.jar that is passed in
> the class path.
> I need to look more in spark-submit / org.apache.spark.deploy to see if
> there is a way to include more jars.
>
>
> 2015-12-17 17:34:01,679 INFO org.apache.hive.spark.client.SparkClientImpl:
> Running client driver with argv:
> /export/hdb3/data/cloudera/parcels/CDH-5.4.3-1.cdh5.4.3.p0.6/lib/spark/bin/spark-submit
> --executor-cores 1 --executor-memory 268435456 --proxy-user anonymous
> --properties-file /tmp/spark-submit.1508744664719491459.properties --class
> org.apache.hive.spark.client.RemoteDriver
> /export/hdb3/data/cloudera/parcels/CDH-5.4.3-1.cdh5.4.3.p0.6/jars/hive-exec-1.1.0-cdh5.4.3.jar
> --remote-host ezaq6.prod.foursquare.com --remote-port 44306 --conf
> hive.spark.client.connect.timeout=1000 --conf
> hive.spark.client.server.connect.timeout=90000 --conf
> hive.spark.client.channel.log.level=null --conf
> hive.spark.client.rpc.max.size=52428800 --conf
> hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256
> 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl:
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/hadoop/fs/FSDataInputStream
> 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at
> org.apache.spark.deploy.SparkSubmitDriverBootstrapper$.main(SparkSubmitDriverBootstrapper.scala:71)
> 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at
> org.apache.spark.deploy.SparkSubmitDriverBootstrapper.main(SparkSubmitDriverBootstrapper.scala)
> 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl:
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.fs.FSDataInputStream
> 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at
> java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at
> java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at
> java.security.AccessController.doPrivileged(Native Method)
> 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at
> java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at
> java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at
> java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: ...
> 2 more
> 2015-12-17 17:34:02,438 WARN org.apache.hive.spark.client.SparkClientImpl:
> Child process exited with code 1.
>
> On Tue, Dec 15, 2015 at 11:15 PM, Xuefu Zhang <xzh...@cloudera.com> wrote:
>
>> As to the spark versions that are supported. Spark has made
>> non-compatible API changes in 1.5, and that's the reason why Hive 1.1.0
>> doesn't work with Spark 1.5. However, the latest Hive in master or branch-1
>> should work with spark 1.5.
>>
>> Also, later CDH 5.4.x versions have already supported Spark 1.5. CDH 5.7,
>> which is coming so, will support Spark 1.6.
>>
>> --Xuefu
>>
>> On Tue, Dec 15, 2015 at 3:50 PM, Mich Talebzadeh <m...@peridale.co.uk>
>> wrote:
>>
>>> To answer your point:
>>>
>>>
>>>
>>> “why would spark 1.5.2 specifically would not work with hive?”
>>>
>>>
>>>
>>> Because I tried Spark 1.5.2 and it did not work and unfortunately the
>>> only version seem to work (albeit requires messaging around) is version
>>> 1.3.1 of Spark.
>>>
>>>
>>>
>>> Look at the threads on “Managed to make Hive run on Spark engine” in
>>> user@hive.apache.org
>>>
>>>
>>>
>>>
>>>
>>> HTH,
>>>
>>>
>>>
>>>
>>>
>>> Mich Talebzadeh
>>>
>>>
>>>
>>> *Sybase ASE 15 Gold Medal Award 2008*
>>>
>>> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>>>
>>>
>>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>>>
>>> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
>>> 15", ISBN 978-0-9563693-0-7*.
>>>
>>> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
>>> 978-0-9759693-0-4*
>>>
>>> *Publications due shortly:*
>>>
>>> *Complex Event Processing in Heterogeneous Environments*, ISBN:
>>> 978-0-9563693-3-8
>>>
>>> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
>>> one out shortly
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> NOTE: The information in this email is proprietary and confidential.
>>> This message is for the designated recipient only, if you are not the
>>> intended recipient, you should destroy it immediately. Any information in
>>> this message shall not be understood as given or endorsed by Peridale
>>> Technology Ltd, its subsidiaries or their employees, unless expressly so
>>> stated. It is the responsibility of the recipient to ensure that this email
>>> is virus free, therefore neither Peridale Ltd, its subsidiaries nor their
>>> employees accept any responsibility.
>>>
>>>
>>>
>>> *From:* Ophir Etzion [mailto:op...@foursquare.com]
>>> *Sent:* 15 December 2015 22:42
>>> *To:* user@hive.apache.org
>>> *Cc:* u...@spark.apache.org
>>> *Subject:* Re: Hive on Spark - Error: Child process exited before
>>> connecting back
>>>
>>>
>>>
>>> Hi,
>>>
>>> the versions are spark 1.3.0 and hive 1.1.0 as part of cloudera 5.4.3.
>>>
>>> I find it weird that it would work only on the version you mentioned as
>>> there is documentation (not good documentation but still..) on how to do it
>>> with cloudera that packages different versions.
>>>
>>> Thanks for the answer though.
>>>
>>> why would spark 1.5.2 specifically would not work with hive?
>>>
>>>
>>>
>>> Ophir
>>>
>>>
>>>
>>> On Tue, Dec 15, 2015 at 5:33 PM, Mich Talebzadeh <m...@peridale.co.uk>
>>> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> The only version that I have managed to run Hive using Spark engine is
>>> Spark 1.3.1 on Hive 1.2.1
>>>
>>>
>>>
>>> Can you confirm the version of Spark you are running?
>>>
>>>
>>>
>>> FYI, Spark 1.5.2 will not work with Hive.
>>>
>>>
>>>
>>> HTH
>>>
>>>
>>>
>>> Mich Talebzadeh
>>>
>>>
>>>
>>> *Sybase ASE 15 Gold Medal Award 2008*
>>>
>>> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>>>
>>>
>>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>>>
>>> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
>>> 15", ISBN 978-0-9563693-0-7*.
>>>
>>> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
>>> 978-0-9759693-0-4*
>>>
>>> *Publications due shortly:*
>>>
>>> *Complex Event Processing in Heterogeneous Environments*, ISBN:
>>> 978-0-9563693-3-8
>>>
>>> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
>>> one out shortly
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> NOTE: The information in this email is proprietary and confidential.
>>> This message is for the designated recipient only, if you are not the
>>> intended recipient, you should destroy it immediately. Any information in
>>> this message shall not be understood as given or endorsed by Peridale
>>> Technology Ltd, its subsidiaries or their employees, unless expressly so
>>> stated. It is the responsibility of the recipient to ensure that this email
>>> is virus free, therefore neither Peridale Ltd, its subsidiaries nor their
>>> employees accept any responsibility.
>>>
>>>
>>>
>>> *From:* Ophir Etzion [mailto:op...@foursquare.com]
>>> *Sent:* 15 December 2015 22:27
>>> *To:* u...@spark.apache.org; user@hive.apache.org
>>> *Subject:* Hive on Spark - Error: Child process exited before
>>> connecting back
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> when trying to do Hive on Spark on CDH5.4.3 I get the following error
>>> when trying to run a simple query using spark.
>>>
>>> I've tried setting everything written here (
>>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started)
>>> as well as what the cdh recommends.
>>>
>>> any one encountered this as well? (searching for it didn't help much)
>>>
>>> the error:
>>>
>>> ERROR : Failed to execute spark task, with exception
>>> 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark
>>> client.)'
>>>
>>> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark
>>> client.
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:57)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:120)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:97)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1640)
>>>
>>>             at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1399)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183)
>>>
>>>             at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>>>
>>>             at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044)
>>>
>>>             at
>>> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:144)
>>>
>>>             at
>>> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
>>>
>>>             at
>>> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196)
>>>
>>>             at java.security.AccessController.doPrivileged(Native Method)
>>>
>>>             at javax.security.auth.Subject.doAs(Subject.java:415)
>>>
>>>             at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>>>
>>>             at
>>> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208)
>>>
>>>             at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>
>>>             at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>
>>>             at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>
>>>             at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>
>>>             at java.lang.Thread.run(Thread.java:745)
>>>
>>> Caused by: java.lang.RuntimeException:
>>> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel
>>> client '2b2d7314-e0cc-4933-82a1-992a3299d109'. Error: Child process exited
>>> before connecting back
>>>
>>>             at
>>> com.google.common.base.Throwables.propagate(Throwables.java:156)
>>>
>>>             at
>>> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:109)
>>>
>>>             at
>>> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:91)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:65)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
>>>
>>>             ... 22 more
>>>
>>> Caused by: java.util.concurrent.ExecutionException:
>>> java.lang.RuntimeException: Cancel client
>>> '2b2d7314-e0cc-4933-82a1-992a3299d109'. Error: Child process exited before
>>> connecting back
>>>
>>>             at
>>> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
>>>
>>>             at
>>> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:99)
>>>
>>>             ... 26 more
>>>
>>> Caused by: java.lang.RuntimeException: Cancel client
>>> '2b2d7314-e0cc-4933-82a1-992a3299d109'. Error: Child process exited before
>>> connecting back
>>>
>>>             at
>>> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179)
>>>
>>>             at
>>> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:427)
>>>
>>>             ... 1 more
>>>
>>>
>>>
>>> ERROR : Failed to execute spark task, with exception
>>> 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark
>>> client.)'
>>>
>>> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark
>>> client.
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:57)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:120)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:97)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1640)
>>>
>>>             at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1399)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183)
>>>
>>>             at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>>>
>>>             at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1044)
>>>
>>>             at
>>> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:144)
>>>
>>>             at
>>> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
>>>
>>>             at
>>> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196)
>>>
>>>             at java.security.AccessController.doPrivileged(Native Method)
>>>
>>>             at javax.security.auth.Subject.doAs(Subject.java:415)
>>>
>>>             at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>>>
>>>             at
>>> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208)
>>>
>>>             at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>
>>>             at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>
>>>             at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>
>>>             at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>
>>>             at java.lang.Thread.run(Thread.java:745)
>>>
>>> Caused by: java.lang.RuntimeException:
>>> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel
>>> client '2b2d7314-e0cc-4933-82a1-992a3299d109'. Error: Child process exited
>>> before connecting back
>>>
>>>             at
>>> com.google.common.base.Throwables.propagate(Throwables.java:156)
>>>
>>>             at
>>> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:109)
>>>
>>>             at
>>> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:91)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:65)
>>>
>>>             at
>>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
>>>
>>>             ... 22 more
>>>
>>> Caused by: java.util.concurrent.ExecutionException:
>>> java.lang.RuntimeException: Cancel client
>>> '2b2d7314-e0cc-4933-82a1-992a3299d109'. Error: Child process exited before
>>> connecting back
>>>
>>>             at
>>> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
>>>
>>>             at
>>> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:99)
>>>
>>>             ... 26 more
>>>
>>> Caused by: java.lang.RuntimeException: Cancel client
>>> '2b2d7314-e0cc-4933-82a1-992a3299d109'. Error: Child process exited before
>>> connecting back
>>>
>>>             at
>>> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179)
>>>
>>>             at
>>> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:427)
>>>
>>>             ... 1 more
>>>
>>> Error: Error while processing statement: FAILED: Execution Error, return
>>> code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
>>> (state=08S01,code=1)
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: Hive on Spark - Error: Child process exited before connecting back

Reply via email to