As far as I know, Spark can't read Hive's transactionnal tables yet: https://issues.apache.org/jira/browse/SPARK-16996
On Thu, Aug 24, 2017 at 4:34 AM, Aviral Agarwal <aviral12...@gmail.com> wrote: > So, there is no way possible right now for Spark to read Hive 2.x data ? > > On Thu, Aug 24, 2017 at 12:17 AM, Eugene Koifman <ekoif...@hortonworks.com > > wrote: > >> This looks like you have some data written by Hive 2.x and Hive 1.x code >> trying to read it. >> >> That is not supported. >> >> >> >> *From: *Aviral Agarwal <aviral12...@gmail.com> >> *Reply-To: *"user@hive.apache.org" <user@hive.apache.org> >> *Date: *Wednesday, August 23, 2017 at 12:24 AM >> *To: *"user@hive.apache.org" <user@hive.apache.org> >> *Subject: *Re: ORC Transaction Table - Spark >> >> >> >> Hi, >> >> Yes it caused by wrong naming convention of the delta directory : >> >> /apps/hive/warehouse/foo.db/bar/year=2017/month=5/delta_0645 >> 253_0645253_0001 >> >> How do I solve this ? >> >> Thanks ! >> Aviral Agarwal >> >> >> >> On Tue, Aug 22, 2017 at 11:50 PM, Eugene Koifman < >> ekoif...@hortonworks.com> wrote: >> >> Could you do recursive “ls” in your table or partition that you are >> trying to read? >> >> Most likely you have files that don’t follow expected naming convention >> >> >> >> Eugene >> >> >> >> >> >> *From: *Aviral Agarwal <aviral12...@gmail.com> >> *Reply-To: *"user@hive.apache.org" <user@hive.apache.org> >> *Date: *Tuesday, August 22, 2017 at 5:39 AM >> *To: *"user@hive.apache.org" <user@hive.apache.org> >> *Subject: *ORC Transaction Table - Spark >> >> >> >> Hi, >> >> >> >> I am trying to read hive orc transaction table through Spark but I am >> getting the following error >> >> >> Caused by: java.lang.RuntimeException: serious problem >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSpli >> tsInfo(OrcInputFormat.java:1021) >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(Or >> cInputFormat.java:1048) >> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) >> ..... >> Caused by: java.util.concurrent.ExecutionException: >> java.lang.NumberFormatException: For input string: "0645253_0001" >> at java.util.concurrent.FutureTask.report(FutureTask.java:122) >> at java.util.concurrent.FutureTask.get(FutureTask.java:192) >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSpli >> tsInfo(OrcInputFormat.java:998) >> ... 118 more >> >> >> Any help would be appreciated. >> >> Thanks and Regards, >> Aviral Agarwal >> >> >> > >