UDFRowSequence called in Map() ?

Tim Robertson Sun, 20 Feb 2011 20:48:37 -0800

Hi all,

I am using UDFRowSequence as follows:


CREATE TEMPORARY FUNCTION rowSequence AS
'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';
mapred.reduce.tasks=1;
CREATE TABLE temp_tc1_test
as
SELECT
  rowSequence() AS id,
  data_resource_id,
  local_id,
  local_parent_id,
  name,
  author
FROM normalized;

I see 2 jobs, the first of which running with 2 map() and 0 reduce()
on my small test data.  I believe the rowSequence() is being called in
the map and not the reduce as the results have duplicate IDs:

select * from temp_tc1_test where id=8915;
8915    167     1148    1113    Cytospora elaeagni      Allesch.
8915    168     7       6       Achromadora inflata     Abebe & Coomans, 1996

Is there any way to enforce the UDF is called in the reduce?

Thanks,
Tim

UDFRowSequence called in Map() ?

Reply via email to