I found a set of slides from Facebook online about Hive that claims you can have a schema per partition in the table, this is exciting to us, because we have a table like so:
id int name string level int date string And it's broken up into partitions by date. However, on a particular date last year, the table dramatically changed its schema to: id int level int date string name_id int So now if I do "select * from table" in hive, the data is completely garbled for whichever portion of data doesn't fit the Hive schema. We are considering re-writing the datafiles so they're the same before/after that date, but if Hive supports having two entirely different schemas depending on the partition, that'd be really convenient, since these datafiles are hundreds of gigabytes in size (and we do sort of like the idea of knowing how the datafile looked back then...). This page: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable%2FPartitionStatementsdoesn't seem to have an appropriate example, so I'm left wondering. Has anyone done anything like this? -- Tim Ellis Data Architect, Riot Games