Refresh table

Jerrick Hoang Mon, 10 Aug 2015 23:16:11 -0700

Hi all,

I'm a little confused about how refresh table (SPARK-5833) should work. So
I did the following,


val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double")

df1.write.parquet("hdfs://<path>/test_table/key=1")


Then I created an external table by doing,

CREATE EXTERNAL TABLE `tmp_table` (
`single`: int,
`double`: int)
PARTITIONED BY (
  `key` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://<path>/test_table/'

Then I added the partition to the table by `alter table tmp_table add
partition (key=1) location 'hdfs://..`

Then I added a new partition with different schema by,

val df2 = sc.makeRDD(1 to 5).map(i => (i, i * 3)).toDF("single", "triple")

df2.write.parquet("hdfs://<path>/test_table/key=2")


And added the new partition to the table by `alter table ..`,

But when I did `refresh table tmp_table` and `describe table` it couldn't
pick up the new column `triple`. Can someone explain to me how partition
discovery and schema merging of refresh table should work?

Thanks

Refresh table

Reply via email to