Hi folks,

I have a Hive external table with partitions.
Every day, an App will generate a new partition day=yyyy-MM-dd stored by
parquet and run add-partition Hive command.
In some cases, we will add additional column to new partitions and update
Hive table schema, then a query across new and old partitions will fail
with exception:

org.apache.hive.service.cli.HiveSQLException:
org.apache.spark.sql.AnalysisException: cannot resolve 'newcolumn' given
input columns ....

We have tried schema merging feature, but it's too slow, there're hundreds
of partitions.
Is it possible to bypass this schema check and return a default value (such
as null) for missing columns?

Thank you

Reply via email to