Thanks much for your help. Your earlier query worked well, but the latest query produces an error about a mismatch: Error in semantic analysis: Line 2:23 Cannot insert into target table because column number/types are different brand: Table insclause-0 has 5 columns, but query has 3 columns.
I tried the redundant version: FROM raw insert overwrite table polished partition (partition1, partition2) select TRANSFORM(raw.partition1, raw.partition2, raw.data) USING 'python parser.py' AS (foo STRING, date STRING, bar MAP<STRING, STRING>, partition1 STRING, partition2 STRING) CLUSTER BY date But that produced a table with partitons set to __HIVE_DEFAULT_PARTITION__ This is turning to be a little bit harder than I expected ;) Adriaan On 2011/09/18, at 23:23, Sumanth V wrote: > The earlier query was just to show the dynamic partitions concept. > For your case, you will have to use a query something like the one below. > Although I have not given it a try, in theory this should work - > > FROM raw > insert overwrite table polished partition (partition1, partition2) > select TRANSFORM(raw.data) > USING 'python parser.py' AS (foo STRING, date STRING, bar MAP<STRING, > STRING>) > CLUSTER BY date > > > Sumanth > > > > On Sun, Sep 18, 2011 at 12:28 PM, Adriaan Tijsseling <[email protected] >> wrote: > >> I looked at your solution, but the problem is still that the "data" column >> needs to be processed still. What I want is to process "data" and put the >> results into a table with partitioned defined by the other columns. With >> your solution, I get partitions but still with the same unprocessed data. >> >> Adriaan >> >> On 2011/09/18, at 04:56, Sumanth V wrote: >> >>> Hi Adriaan, >>> >>> To use dynamic partition, follow the following steps inside hive shell - >>> >>> #Set the following values - >>> >>> set hive.exec.dynamic.partition.mode=nonstrict; >>> >>> set hive.exec.dynamic.partition=true >>> >>> #Create another table - >>> >>> create table raw_2 >>> ( >>> data string >>> ) >>> partitioned by (partition1 string, partition2 string); >>> >>> #Now insert the values stored in table raw into table raw_2 using the >>> following query - >>> >>> from raw >>> insert overwrite table raw_2 partition (partition1, partition2) >>> select data, partition1, partition2; >>> >>> This will dynamically create the 2 partitions based on the values of >>> partition1 and partition2 and insert the values of 'data' in the >> appropriate >>> partition. >>> >>> Regards, >>> Sumanth >>> >>> >>> >>> On Sat, Sep 17, 2011 at 2:18 PM, Adriaan Tijsseling >>> <[email protected]>wrote: >>> >>>> Hi, >>>> >>>> I have a table created with >>>> >>>> CREATE TABLE raw(partition1 string, partition2 string, data string) ROW >>>> FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS TEXTFILE; >>>> >>>> I want to further process "data" and put it in a partition (partition1, >>>> partition2) defined by the values in the relevant row. >>>> >>>> I'm however stuck at trying to use dynamic partitions in a query. With >>>> predefined partition values it's straightforward: >>>> >>>> FROM ( >>>> FROM raw >>>> SELECT TRANSFORM(raw.data) >>>> USING 'python parser.py' AS (foo STRING, date STRING, bar >>>> MAP<STRING,STRING>) >>>> CLUSTER BY date >>>> ) tmap >>>> INSERT OVERWRITE TABLE polished PARTITION (partition1='p1', >>>> partition2='p2') SELECT foo, date, bar; >>>> >>>> What would be the best way to define the partition using raw.partition1 >> and >>>> raw.partition2 as values? >>>> >>>> Thanks much, >>>> >>>> Adriaan >>>> >>>> >> >>
