Thanks That was it
From: Namit Jain [mailto:[email protected]] Sent: Tuesday, January 25, 2011 7:04 PM To: [email protected] Subject: Re: Distinct in hive Is there skew in data ? You may want to set the parameter: hive.groupby.skewindata: to true. Thanks, -namit From: Guy Doulberg <[email protected]<mailto:[email protected]>> Reply-To: <[email protected]<mailto:[email protected]>> Date: Tue, 25 Jan 2011 08:25:36 -0800 To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Distinct in hive Hey, We made a query in hive, that calculates the number of distinct values in a group by. On small portion of data it worked well, however when we ran the query over large portion of data, we failed because OutOfMemory in some of the reducers. We wonder how is the distinct operator works in HIVE? Does it use some sort ofdata structure that its size is proportional to the number of distinct values? Many thanks
