Hi folks, I have a Hive external table with partitions. Every day, an App will generate a new partition day=yyyy-MM-dd stored by parquet and run add-partition Hive command. In some cases, we will add additional column to new partitions and update Hive table schema, then a query across new and old partitions will fail with exception:
org.apache.hive.service.cli.HiveSQLException: org.apache.spark.sql.AnalysisException: cannot resolve 'newcolumn' given input columns .... We have tried schema merging feature, but it's too slow, there're hundreds of partitions. Is it possible to bypass this schema check and return a default value (such as null) for missing columns? Thank you
