Hi Evyatar, Yes, directly reading the parquet data works. Since we use hive metastore to obfuscate the underlying datastore details, we want to avoid directly accessing the files. I guess then the only option is to either change the data or change the schema of the hive metastore as you suggested right? But int to long / bigint seems to be a reasonable evolution (correct me if I'm wrong). Is it possible to reopen the jira i mentioned earlier? Any reason for that getting closed?
Regards, Naresh On Mon, Nov 7, 2022, 16:55 Evy M <evya...@gmail.com> wrote: > Hi Naresh, > > Have you tried any of the following in order to resolve your issue: > > 1. Reading the Parquet files (directly, not via Hive [i.e, > spark.read.parquet(<path>)]), casting to LongType and creating the hive > table based on this dataframe? Hive's BigInt and Spark's Long should have > the same values as seen here Hive Types > > <https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-IntegralTypes(TINYINT,SMALLINT,INT/INTEGER,BIGINT)> > ; Spark Types > <https://spark.apache.org/docs/latest/sql-ref-datatypes.html>. > 2. Modifying the hive table to have the columns as INT? If the > underlying data is an INT, I guess there is no reason to have a BigInt > definition for that column. > > I hope this might help. > > Best, > Evyatar > > On Sun, 6 Nov 2022 at 15:21, Naresh Peshwe <nareshpeshwe12...@gmail.com> > wrote: > >> Hi all, >> I am trying to read data (using spark sql) via a hive metastore which has >> a column of type bigint. Underlying parquet data has int as the datatype >> for the same column. I am getting the following error while trying to read >> the data using spark sql - >> >> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be >> cast to org.apache.hadoop.io.LongWritable >> at >> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36) >> at >> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$6.apply(TableReader.scala:418) >> ... >> >> I believe it is related to >> https://issues.apache.org/jira/browse/SPARK-17477. Any suggestions on how I >> can work around this issue? >> >> Spark version: 2.4.5 >> >> Regards, >> >> Naresh >> >> >>