Sorry, query 1 should be: create table tmp__imp as select requestbegintime, count(*) from impressions2 where requestbegintime<'1239572996000' group by requestbegintime;
-----Original Message----- From: Lu, Wei [mailto:w...@microstrategy.com] Sent: Wednesday, March 07, 2012 9:08 AM To: user@hive.apache.org Subject: RE: how to split query result into several smaller tables without creating temp table?? Hi, Mark Query 1 is: 1) create table tmp__imp as select requestbegintime, count(*) from impressions2 where requestbegintime<'1239572996000'; from tmp__imp insert OVERWRITE LOCAL DIRECTORY '/disk2/is1' select * where requestbegintime<'1239572956000' insert OVERWRITE LOCAL DIRECTORY '/disk2/is2' select * where requestbegintime>='1239572956000'; Do you have any idea about the failure of query 2 and 3? Is the failure caused by some issues of Hive? Regards, Wei -----Original Message----- From: Mark Grover [mailto:mgro...@oanda.com] Sent: Wednesday, March 07, 2012 1:11 AM To: user@hive.apache.org Subject: Re: how to split query result into several smaller tables without creating temp table?? Hi Wei, In query 1, it's invalid to requestbegintime in the select list if it's not in the group by clause. There doesn't seem to be a group by clause there. Is that the right query? Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com "Best Trading Platform" - World Finance's Forex Awards 2009. "The One to Watch" - Treasury Today's Adam Smith Awards 2009. ----- Original Message ----- From: "Wei Lu" <w...@microstrategy.com> To: user@hive.apache.org Sent: Tuesday, March 6, 2012 4:40:00 AM Subject: how to split query result into several smaller tables without creating temp table?? Hi, I tried to do aggregation based on Table impressions2, and then need to save the results to two different local files (or tables). I tried three methods, only the first one succeeded: 1) create a new table and store aggregation results to it, and then use multi-insert to split the results to local disk: create table tmp__imp as select requestbegintime, count(*) from impressions2 where requestbegintime<'1239572996000'; from tmp__imp insert OVERWRITE LOCAL DIRECTORY '/disk2/is1' select * where requestbegintime<'1239572956000' insert OVERWRITE LOCAL DIRECTORY '/disk2/is2' select * where requestbegintime>='1239572956000'; The method works, and #rows of each local file: 40 Rows loaded to /disk2/is2 31057 Rows loaded to /disk2/is1 2) use one statement without create new table: from (select requestbegintime, count(*) as ct from impressions2 where requestbegintime<'1239572996000' GROUP BY requestbegintime)mt insert OVERWRITE LOCAL DIRECTORY '/disk2/is1' select mt.requestbegintime,mt.ct where mt.requestbegintime<'1239572956000' insert OVERWRITE LOCAL DIRECTORY '/disk2/is2' select mt.requestbegintime,mt.ct where mt.requestbegintime>='1239572956000'; But all aggregated results are copies to both files: 31097 Rows loaded to /disk2/is2 31097 Rows loaded to /disk2/is1 3) create a view and store aggregation results to it, and then use multi-insert to split the results to local disk: create view view_imp(rt, ct) as select requestbegintime, count(*) as ct from impressions2 where requestbegintime<'1239572996000' GROUP BY requestbegintime; from view_imp insert OVERWRITE LOCAL DIRECTORY '/disk2/is1' select * where rt<'1239572956000' insert OVERWRITE LOCAL DIRECTORY '/disk2/is2' select * where rt>='1239572956000'; But, again, all aggregated results are copies to both files: 31097 Rows loaded to /disk2/is2 31097 Rows loaded to /disk2/is1 Why does 2) and 3) fail? How can I split the results without creating new table? Regards, Wei