Hi Evyatar,
Yes, directly reading the parquet data works. Since we use hive metastore
to obfuscate the underlying datastore details, we want to avoid directly
accessing the files.
I guess then the only option is to either change the data or change the
schema of the hive metastore as you suggested right?
But int to long / bigint seems to be a reasonable evolution (correct me if
I'm wrong). Is it possible to reopen the jira i mentioned earlier? Any
reason for that getting closed?


Regards,
Naresh


On Mon, Nov 7, 2022, 16:55 Evy M <evya...@gmail.com> wrote:

> Hi Naresh,
>
> Have you tried any of the following in order to resolve your issue:
>
>    1. Reading the Parquet files (directly, not via Hive [i.e,
>    spark.read.parquet(<path>)]), casting to LongType and creating the hive
>    table based on this dataframe? Hive's BigInt and Spark's Long should have
>    the same values as seen here Hive Types
>    
> <https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-IntegralTypes(TINYINT,SMALLINT,INT/INTEGER,BIGINT)>
>    ; Spark Types
>    <https://spark.apache.org/docs/latest/sql-ref-datatypes.html>.
>    2. Modifying the hive table to have the columns as INT? If the
>    underlying data is an INT, I guess there is no reason to have a BigInt
>    definition for that column.
>
> I hope this might help.
>
> Best,
> Evyatar
>
> On Sun, 6 Nov 2022 at 15:21, Naresh Peshwe <nareshpeshwe12...@gmail.com>
> wrote:
>
>> Hi all,
>> I am trying to read data (using spark sql) via a hive metastore which has
>> a column of type bigint. Underlying parquet data has int as the datatype
>> for the same column. I am getting the following error while trying to read
>> the data using spark sql -
>>
>> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be 
>> cast to org.apache.hadoop.io.LongWritable
>> at 
>> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36)
>> at 
>> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$6.apply(TableReader.scala:418)
>> ...
>>
>> I believe it is related to 
>> https://issues.apache.org/jira/browse/SPARK-17477. Any suggestions on how I 
>> can work around this issue?
>>
>> Spark version: 2.4.5
>>
>> Regards,
>>
>> Naresh
>>
>>
>>

Reply via email to