Hi all, I am porting custom MR code to Hive and have written working UDFs where I need them. Is there a work around to having to do this in Hive:
select * from ( select name_id, toTileX(longitude,0) as x, toTileY(latitude,0) as y, 0 as zoom, funct2(lontgitude, 0) as f2_x, funct2(latitude,0) as f2_y, count (1) as count from table group by name_id, x, y, f2_x, f2_y UNION ALL select name_id, toTileX(longitude,1) as x, toTileY(latitude,1) as y, 1 as zoom, funct2(lontgitude, 1) as f2_x, funct2(latitude,1) as f2_y, count (1) as count from table group by name_id, x, y, f2_x, f2_y --- etc etc increasing in zoom ) The issue being that this does many passes over the table, whereas previously in my Map() I would just emit many times from the same input record and then let it all group in the shuffle and sort. I actually emit 184 times for an input record (23 zoom levels of google maps, and 8 ways to derive the name_id) for a single record which means 184 union statements - Is it possible in hive to force it to emit many times from the source record in the stage-1 map? (ahem) Does anyone know if Pig can do this if not in Hive? I hope I have explained this well enough to make sense. Thanks in advance, Tim