Just some clarifying points please. 1. Is this the general case for all file formats?
2. Or, is this an artifact of an incompatibility between ORC files written by the Hive 2.x ORC serde not being readable by the Hive 1.x ORC serde? 3. Is there a difference in the ORC file format spec. at play here? 4. Or, is any incompatibility limited to the Hive ORC serde implementations in Hive 1.x and 2.x? 5. What’s the mechanism that affects Spark here? a. Same ORC serdes as Hive? b. Similar issues in Spark ORC serde implementation(s) as in Hive 1.x ORC serde? 6. Any similar issues with Parquet format in Hive 1.x and 2.x? From: Aviral Agarwal [mailto:aviral12...@gmail.com] Sent: Wednesday, August 23, 2017 10:34 PM To: user@hive.apache.org Subject: Re: ORC Transaction Table - Spark So, there is no way possible right now for Spark to read Hive 2.x data ? On Thu, Aug 24, 2017 at 12:17 AM, Eugene Koifman <ekoif...@hortonworks.com<mailto:ekoif...@hortonworks.com>> wrote: This looks like you have some data written by Hive 2.x and Hive 1.x code trying to read it. That is not supported. From: Aviral Agarwal <aviral12...@gmail.com<mailto:aviral12...@gmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Date: Wednesday, August 23, 2017 at 12:24 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: Re: ORC Transaction Table - Spark Hi, Yes it caused by wrong naming convention of the delta directory : /apps/hive/warehouse/foo.db/bar/year=2017/month=5/delta_0645253_0645253_0001 How do I solve this ? Thanks ! Aviral Agarwal On Tue, Aug 22, 2017 at 11:50 PM, Eugene Koifman <ekoif...@hortonworks.com<mailto:ekoif...@hortonworks.com>> wrote: Could you do recursive “ls” in your table or partition that you are trying to read? Most likely you have files that don’t follow expected naming convention Eugene From: Aviral Agarwal <aviral12...@gmail.com<mailto:aviral12...@gmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Date: Tuesday, August 22, 2017 at 5:39 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: ORC Transaction Table - Spark Hi, I am trying to read hive orc transaction table through Spark but I am getting the following error Caused by: java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021) at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) ..... Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "0645253_0001" at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998) ... 118 more Any help would be appreciated. Thanks and Regards, Aviral Agarwal