As far as I know, Spark can't read Hive's transactionnal tables yet:
https://issues.apache.org/jira/browse/SPARK-16996




On Thu, Aug 24, 2017 at 4:34 AM, Aviral Agarwal <aviral12...@gmail.com>
wrote:

> So, there is no way possible right now for Spark to read Hive 2.x data ?
>
> On Thu, Aug 24, 2017 at 12:17 AM, Eugene Koifman <ekoif...@hortonworks.com
> > wrote:
>
>> This looks like you have some data written by Hive 2.x and Hive 1.x code
>> trying to read it.
>>
>> That is not supported.
>>
>>
>>
>> *From: *Aviral Agarwal <aviral12...@gmail.com>
>> *Reply-To: *"user@hive.apache.org" <user@hive.apache.org>
>> *Date: *Wednesday, August 23, 2017 at 12:24 AM
>> *To: *"user@hive.apache.org" <user@hive.apache.org>
>> *Subject: *Re: ORC Transaction Table - Spark
>>
>>
>>
>> Hi,
>>
>> Yes it caused by wrong naming convention of the delta directory :
>>
>> /apps/hive/warehouse/foo.db/bar/year=2017/month=5/delta_0645
>> 253_0645253_0001
>>
>> How do I solve this ?
>>
>> Thanks !
>> Aviral Agarwal
>>
>>
>>
>> On Tue, Aug 22, 2017 at 11:50 PM, Eugene Koifman <
>> ekoif...@hortonworks.com> wrote:
>>
>> Could you do recursive “ls” in your table or partition that you are
>> trying to read?
>>
>> Most likely you have files that don’t follow expected naming convention
>>
>>
>>
>> Eugene
>>
>>
>>
>>
>>
>> *From: *Aviral Agarwal <aviral12...@gmail.com>
>> *Reply-To: *"user@hive.apache.org" <user@hive.apache.org>
>> *Date: *Tuesday, August 22, 2017 at 5:39 AM
>> *To: *"user@hive.apache.org" <user@hive.apache.org>
>> *Subject: *ORC Transaction Table - Spark
>>
>>
>>
>> Hi,
>>
>>
>>
>> I am trying to read hive orc transaction table through Spark but I am
>> getting the following error
>>
>>
>> Caused by: java.lang.RuntimeException: serious problem
>> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSpli
>> tsInfo(OrcInputFormat.java:1021)
>> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(Or
>> cInputFormat.java:1048)
>> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
>> .....
>> Caused by: java.util.concurrent.ExecutionException:
>> java.lang.NumberFormatException: For input string: "0645253_0001"
>> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSpli
>> tsInfo(OrcInputFormat.java:998)
>> ... 118 more
>>
>>
>> Any help would be appreciated.
>>
>> Thanks and Regards,
>> Aviral Agarwal
>>
>>
>>
>
>

Reply via email to