RE: ORC Transaction Table - Spark

Larson, Kurt Thu, 24 Aug 2017 08:42:51 -0700

Just some clarifying points please.

1.       Is this the general case for all file formats?


2.       Or, is this an artifact of an incompatibility between ORC files 
written by the Hive 2.x ORC serde not being readable by the Hive 1.x ORC serde?

3.       Is there a difference in the ORC file format spec. at play here?

4.       Or, is any incompatibility limited to the Hive ORC serde 
implementations in Hive 1.x and 2.x?

5.       What’s the mechanism that affects Spark here?

a.       Same ORC serdes as Hive?

b.      Similar issues in Spark ORC serde implementation(s) as in Hive 1.x ORC 
serde?

6.       Any similar issues with Parquet format in Hive 1.x and 2.x?


From: Aviral Agarwal [mailto:aviral12...@gmail.com]
Sent: Wednesday, August 23, 2017 10:34 PM
To: user@hive.apache.org
Subject: Re: ORC Transaction Table - Spark

So, there is no way possible right now for Spark to read Hive 2.x data ?

On Thu, Aug 24, 2017 at 12:17 AM, Eugene Koifman 
<ekoif...@hortonworks.com<mailto:ekoif...@hortonworks.com>> wrote:
This looks like you have some data written by Hive 2.x and Hive 1.x code trying 
to read it.
That is not supported.

From: Aviral Agarwal <aviral12...@gmail.com<mailto:aviral12...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Wednesday, August 23, 2017 at 12:24 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: ORC Transaction Table - Spark

Hi,

Yes it caused by wrong naming convention of the delta directory :

/apps/hive/warehouse/foo.db/bar/year=2017/month=5/delta_0645253_0645253_0001

How do I solve this ?

Thanks !
Aviral Agarwal

On Tue, Aug 22, 2017 at 11:50 PM, Eugene Koifman 
<ekoif...@hortonworks.com<mailto:ekoif...@hortonworks.com>> wrote:
Could you do recursive “ls” in your table or partition that you are trying to 
read?
Most likely you have files that don’t follow expected naming convention

Eugene


From: Aviral Agarwal <aviral12...@gmail.com<mailto:aviral12...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Tuesday, August 22, 2017 at 5:39 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: ORC Transaction Table - Spark

Hi,

I am trying to read hive orc transaction table through Spark but I am getting 
the following error

Caused by: java.lang.RuntimeException: serious problem
at 
org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
at 
org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
.....
Caused by: java.util.concurrent.ExecutionException: 
java.lang.NumberFormatException: For input string: "0645253_0001"
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998)
... 118 more

Any help would be appreciated.

Thanks and Regards,
Aviral Agarwal

RE: ORC Transaction Table - Spark

Reply via email to