Distinct in hive

Guy Doulberg Tue, 25 Jan 2011 08:27:34 -0800

Hey,
We made a query in hive, that calculates the number of distinct values in a  
group by.
On small portion of data it worked well, however when we ran the query over 
large portion of data, we failed because OutOfMemory in some of the reducers.


We wonder how is the distinct operator works in HIVE? Does it use some sort of 
data structure that its size is proportional to the number of distinct values?

Many thanks

Distinct in hive

Reply via email to