How is it broken? What was the result you were expecting? Ashutosh
On Thu, Oct 6, 2011 at 11:13, Time Less <timelessn...@gmail.com> wrote: > I have finally gotten around to testing this functionality, and it would > doesn't work. The ALTER table change columns command just changes the > metadata for the table, not for the partition. Follows is exactly what I did > to test this, and the (broken) result: > > hive> create table multischema_test ( > > id int, > > crdt string, > > name string, > > age int > > ) > > partitioned by (dtp string) > > row format delimited > > fields terminated by '\t' > > lines terminated by '\n' > > stored as textfile; > OK > Time taken: 0.345 seconds > hive> alter table multischema_test add partition (dtp=20110101) location > '/user/hive/warehouse/test/multischema_test/20110101'; > OK > Time taken: 0.662 seconds > hive> alter table multischema_test replace columns (id int, name string, > gender string, age int, crdt string) ; > OK > Time taken: 0.119 seconds > hive> alter table multischema_test add partition (dtp=20110102) location > '/user/hive/warehouse/test/multischema_test/20110102'; > OK > Time taken: 0.186 seconds > hive> select * from multischema_test ; > OK > 1 2010-07-01 Jeff 32 NULL 20110101 > 2 2010-07-01 Lisa 33 NULL 20110101 > 3 2010-07-01 Bob 22 NULL 20110101 > 4 2010-07-01 Fred 27 NULL 20110101 > 100 Gregory Male 45 2010-08-01 20110102 > 101 Horus Male 14 2010-08-01 20110102 > 102 Verdann Male 33 2010-08-01 20110102 > 103 Gennefer Female 32 2010-08-01 20110102 > Time taken: 0.348 seconds > hive> select name,gender from multischema_test ; > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_201108291505_28322, Tracking URL = > http://laxhadoop1-001:50030/jobdetails.jsp?jobid=job_201108291505_28322 > Kill Command = /usr/lib/hadoop/bin/hadoop job > -Dmapred.job.tracker=laxhadoop1-001:54311 -kill job_201108291505_28322 > 2011-10-06 11:02:27,099 Stage-1 map = 0%, reduce = 0% > 2011-10-06 11:02:31,129 Stage-1 map = 100%, reduce = 0% > 2011-10-06 11:02:33,144 Stage-1 map = 100%, reduce = 100% > Ended Job = job_201108291505_28322 > OK > Gregory Male > Horus Male > Verdann Male > Gennefer Female > 2010-07-01 Jeff > 2010-07-01 Lisa > 2010-07-01 Bob > 2010-07-01 Fred > Time taken: 8.977 seconds > hive> > > The two text files that make up the data in the table are like this: > > [hdfs@laxhadoop1-012 ~/Tim] :) cat multischema-datafile-20110101 | hadoop > fs -put - /user/hive/warehouse/test/multischema_test/20110101/datafile > [hdfs@laxhadoop1-012 ~/Tim] :) cat multischema-datafile-20110102 | hadoop > fs -put - /user/hive/warehouse/test/multischema_test/20110102/datafile > [hdfs@laxhadoop1-012 ~/Tim] :) cat multischema-datafile-20110101 > 1 2010-07-01 Jeff 32 > 2 2010-07-01 Lisa 33 > 3 2010-07-01 Bob 22 > 4 2010-07-01 Fred 27 > [hdfs@laxhadoop1-012 ~/Tim] :) cat multischema-datafile-20110102 > 100 Gregory Male 45 2010-08-01 > 101 Horus Male 14 2010-08-01 > 102 Verdann Male 33 2010-08-01 > 103 Gennefer Female 32 2010-08-01 > > Did I do something wrong? > > > > On Mon, Aug 29, 2011 at 10:46 PM, Ashutosh Chauhan > <hashut...@apache.org>wrote: > >> Hi Tim, >> >> I figured that both reading the code and manual. I don't think >> its explicitly documented anywhere, so it will be great if you document >> this. This page looks right place where this place of information can live. >> Thanks for the help in making Hive better. >> >> Ashutosh >> >> On Mon, Aug 29, 2011 at 15:26, Time Less <timelessn...@gmail.com> wrote: >> >>> Hello, Ashutosh, >>> >>> I did nothing like that... :) >>> >>> It seems the problem here is I didn't RTFM. Perchance, could you say >>> where you figured this out? I am going from the Hive DDL page on >>> confluence[1], and although it mentions partitions and it mentions the >>> "replace columns" you've mentioned here, it doesn't mention them together >>> that I see. I would like to document this for future generations. Is that >>> the proper page where I'd document this? >>> >>> I would probably explicitly create a section titled "Different Schemas >>> per Partition" and basically give the syntax you give (from quoted, assuming >>> after I test it, it works). >>> >>> >>> [1] >>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable%2FPartitionStatements >>> >>> >>> On Wed, Aug 24, 2011 at 6:14 PM, Ashutosh Chauhan >>> <hashut...@apache.org>wrote: >>> >>>> Hey Tim, >>>> >>>> Hive does support different schema's for different partitions. If your >>>> data comes out garbled, that seems to be a bug then. In your case, is the >>>> following sequence of steps resemble what you did: >>>> >>>> a) create table tbl (id: int, name: string, level: int) partitioned by >>>> date; >>>> b) -- add partitions >>>> c) alter table tbl replace columns (id: int, level: int, name_id: int) >>>> d) -- add more partitions. >>>> >>>> If you do select * from tbl, then this should work. You need not to >>>> rewrite any of your data. Can you provide more info about what output you >>>> were expecting and what you got. Are there any error logs? >>>> >>>> Ashutosh >>>> >>>> >>>> On Mon, Aug 22, 2011 at 14:34, Time Less <timelessn...@gmail.com>wrote: >>>> >>>>> I found a set of slides from Facebook online about Hive that claims you >>>>> can have a schema per partition in the table, this is exciting to us, >>>>> because we have a table like so: >>>>> >>>>> id int >>>>> name string >>>>> level int >>>>> date string >>>>> >>>>> And it's broken up into partitions by date. However, on a particular >>>>> date last year, the table dramatically changed its schema to: >>>>> >>>>> id int >>>>> level int >>>>> date string >>>>> name_id int >>>>> >>>>> So now if I do "select * from table" in hive, the data is completely >>>>> garbled for whichever portion of data doesn't fit the Hive schema. We are >>>>> considering re-writing the datafiles so they're the same before/after that >>>>> date, but if Hive supports having two entirely different schemas depending >>>>> on the partition, that'd be really convenient, since these datafiles are >>>>> hundreds of gigabytes in size (and we do sort of like the idea of knowing >>>>> how the datafile looked back then...). >>>>> >>>>> This page: >>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable%2FPartitionStatementsdoesn't >>>>> seem to have an appropriate example, so I'm left wondering. >>>>> >>>>> Has anyone done anything like this? >>>>> >>>>> -- >>>>> Tim Ellis >>>>> Data Architect, Riot Games >>>>> >>>>> >>>> >>> >>> >>> -- >>> Tim >>> >> >> > > > -- > Tim >