Hi all, I am using UDFRowSequence as follows:
CREATE TEMPORARY FUNCTION rowSequence AS 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence'; mapred.reduce.tasks=1; CREATE TABLE temp_tc1_test as SELECT rowSequence() AS id, data_resource_id, local_id, local_parent_id, name, author FROM normalized; I see 2 jobs, the first of which running with 2 map() and 0 reduce() on my small test data. I believe the rowSequence() is being called in the map and not the reduce as the results have duplicate IDs: select * from temp_tc1_test where id=8915; 8915 167 1148 1113 Cytospora elaeagni Allesch. 8915 168 7 6 Achromadora inflata Abebe & Coomans, 1996 Is there any way to enforce the UDF is called in the reduce? Thanks, Tim