cluster by in a subquery

Imran Akbar Mon, 10 Mar 2014 12:02:29 -0700

Hi,
   I'm trying to use the "cluster by" statement in Hive to write a query
like this:


FROM (SELECT * FROM attribute_table
CLUSTER BY id, name, value, amount) map_output
INSERT OVERWRITE TABLE attributed_table
SELECT TRANSFORM (map_output.id,...)
USING 'python2.7 data_attribution.py'
AS id, name, value;

but it doesn't seem to be sending all the rows with the same
id,name,value,amount to the same reducer (but it does work when I set the
number of reducers to 1).

Could someone help point me in the right direction please?

thanks,
imran

cluster by in a subquery

Reply via email to