I found a set of slides from Facebook online about Hive that claims you can
have a schema per partition in the table, this is exciting to us, because we
have a table like so:

id     int
name   string
level  int
date   string

And it's broken up into partitions by date. However, on a particular date
last year, the table dramatically changed its schema to:

id       int
level    int
date     string
name_id  int

So now if I do "select * from table" in hive, the data is completely garbled
for whichever portion of data doesn't fit the Hive schema. We are
considering re-writing the datafiles so they're the same before/after that
date, but if Hive supports having two entirely different schemas depending
on the partition, that'd be really convenient, since these datafiles are
hundreds of gigabytes in size (and we do sort of like the idea of knowing
how the datafile looked back then...).

This page:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable%2FPartitionStatementsdoesn't
seem to have an appropriate example, so I'm left wondering.

Has anyone done anything like this?

-- 
Tim Ellis
Data Architect, Riot Games

Reply via email to