Issue of Hive parquet partitioned table schema mismatch

Rex Xiong Fri, 30 Oct 2015 04:06:09 -0700

Hi folks,

I have a Hive external table with partitions.
Every day, an App will generate a new partition day=yyyy-MM-dd stored by
parquet and run add-partition Hive command.
In some cases, we will add additional column to new partitions and update
Hive table schema, then a query across new and old partitions will fail
with exception:


org.apache.hive.service.cli.HiveSQLException:
org.apache.spark.sql.AnalysisException: cannot resolve 'newcolumn' given
input columns ....

We have tried schema merging feature, but it's too slow, there're hundreds
of partitions.
Is it possible to bypass this schema check and return a default value (such
as null) for missing columns?

Thank you

Issue of Hive parquet partitioned table schema mismatch

Reply via email to