Hi all, I'm a little confused about how refresh table (SPARK-5833) should work. So I did the following,
val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double") df1.write.parquet("hdfs://<path>/test_table/key=1") Then I created an external table by doing, CREATE EXTERNAL TABLE `tmp_table` ( `single`: int, `double`: int) PARTITIONED BY ( `key` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://<path>/test_table/' Then I added the partition to the table by `alter table tmp_table add partition (key=1) location 'hdfs://..` Then I added a new partition with different schema by, val df2 = sc.makeRDD(1 to 5).map(i => (i, i * 3)).toDF("single", "triple") df2.write.parquet("hdfs://<path>/test_table/key=2") And added the new partition to the table by `alter table ..`, But when I did `refresh table tmp_table` and `describe table` it couldn't pick up the new column `triple`. Can someone explain to me how partition discovery and schema merging of refresh table should work? Thanks