Hi, I'm trying to use the "cluster by" statement in Hive to write a query like this:
FROM (SELECT * FROM attribute_table CLUSTER BY id, name, value, amount) map_output INSERT OVERWRITE TABLE attributed_table SELECT TRANSFORM (map_output.id,...) USING 'python2.7 data_attribution.py' AS id, name, value; but it doesn't seem to be sending all the rows with the same id,name,value,amount to the same reducer (but it does work when I set the number of reducers to 1). Could someone help point me in the right direction please? thanks, imran