Hello, Ashutosh, I did nothing like that... :)
It seems the problem here is I didn't RTFM. Perchance, could you say where you figured this out? I am going from the Hive DDL page on confluence[1], and although it mentions partitions and it mentions the "replace columns" you've mentioned here, it doesn't mention them together that I see. I would like to document this for future generations. Is that the proper page where I'd document this? I would probably explicitly create a section titled "Different Schemas per Partition" and basically give the syntax you give (from quoted, assuming after I test it, it works). [1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable%2FPartitionStatements On Wed, Aug 24, 2011 at 6:14 PM, Ashutosh Chauhan <hashut...@apache.org>wrote: > Hey Tim, > > Hive does support different schema's for different partitions. If your data > comes out garbled, that seems to be a bug then. In your case, is the > following sequence of steps resemble what you did: > > a) create table tbl (id: int, name: string, level: int) partitioned by > date; > b) -- add partitions > c) alter table tbl replace columns (id: int, level: int, name_id: int) > d) -- add more partitions. > > If you do select * from tbl, then this should work. You need not to rewrite > any of your data. Can you provide more info about what output you were > expecting and what you got. Are there any error logs? > > Ashutosh > > > On Mon, Aug 22, 2011 at 14:34, Time Less <timelessn...@gmail.com> wrote: > >> I found a set of slides from Facebook online about Hive that claims you >> can have a schema per partition in the table, this is exciting to us, >> because we have a table like so: >> >> id int >> name string >> level int >> date string >> >> And it's broken up into partitions by date. However, on a particular date >> last year, the table dramatically changed its schema to: >> >> id int >> level int >> date string >> name_id int >> >> So now if I do "select * from table" in hive, the data is completely >> garbled for whichever portion of data doesn't fit the Hive schema. We are >> considering re-writing the datafiles so they're the same before/after that >> date, but if Hive supports having two entirely different schemas depending >> on the partition, that'd be really convenient, since these datafiles are >> hundreds of gigabytes in size (and we do sort of like the idea of knowing >> how the datafile looked back then...). >> >> This page: >> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable%2FPartitionStatementsdoesn't >> seem to have an appropriate example, so I'm left wondering. >> >> Has anyone done anything like this? >> >> -- >> Tim Ellis >> Data Architect, Riot Games >> >> > -- Tim