[ 
https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-7747:
----------------------------------

    Attachment: HIVE-7747.1.patch

Problem is that we ship a wrong jar to Spark cluster. Instead of hive-exec, we 
ship hive-common. In SparkClient, we get the jar from HiveConf.getJar() which 
returns that jar that contains the initialization class. Initialization class 
given to HiveConf is different in HS2 versus CLI. In CliDriver (see run() 
method), SessionState.class (contained in hive-exec jar) is passed to HiveConf. 
In HS2 no initialization class is passed which defaults to HiveConf.class 
(contained in hive-common). 

The error thrown in Spark task is strange. Not sure if it is the standard error 
that is throw if no classes are found on classpath. Attaching a fix to pass 
SessionState.class as initilization class to HiveConf in HiveSessionImpl. It is 
a general fix, not specific to spark branch.

> Submitting a query to Spark from HiveServer2 fails [Spark Branch]
> -----------------------------------------------------------------
>
>                 Key: HIVE-7747
>                 URL: https://issues.apache.org/jira/browse/HIVE-7747
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: spark-branch
>            Reporter: Venki Korukanti
>            Assignee: Venki Korukanti
>         Attachments: HIVE-7747.1.patch
>
>
> {{spark.serializer}} is set to 
> {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine 
> from Hive CLI.
> Spark tasks fails with following error:
> {code}
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): 
> java.lang.IllegalStateException: unread block data
>         
> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
>         java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
>         
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>         java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>         
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>         java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>         
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
>         
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84)
>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
>         
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to