For anyone stumbling across this thread the (pretty much) working UDTF can be seen: http://code.google.com/p/gbif-occurrencestore/source/browse/trunk/occurrence-spatial/src/main/java/org/gbif/occurrence/spatial/udf/TaxonDensityUDTF.java
allowing for the following query: create table taxon_density as select taxonId,tileX,tileY,zoom,clusterX,clusterY,sum(count) from occurrence_record lateral view taxonDensityUDTF(kingdom_concept_id, phylum_concept_id, class_concept_id, order_concept_id, family_concept_id, genus_concept_id, species_concept_id,nub_concept_id, latitude, longitude, 10) e as taxonId,tileX,tileY,zoom,clusterX,clusterY,count group by taxonId,tileX,tileY,zoom,clusterX,clusterY; Thanks again for the pointers Sonal and Namit, and also on the other thread, Tim On Mon, Nov 8, 2010 at 9:17 AM, Tim Robertson <timrobertson...@gmail.com> wrote: > I am writing a GenericUDTF now, but notice on > http://wiki.apache.org/hadoop/Hive/DeveloperGuide/UDTF > > the method docs show: > /** > * Called to notify the UDTF that there are no more rows to process. Note > that > * forward() should not be called in this function. Only clean up code should > * be run. > */ > public abstract void close() throws HiveException; > > but the example does exactly that: > @Override > public void close() throws HiveException { > forwardObj[0] = count; > forward(forwardObj); > forward(forwardObj); > } > > I'll assume the example is correct and continue, but it might be worth > fixing that page. > > Cheers, > Tim > > > > > > On Mon, Nov 8, 2010 at 7:35 AM, Tim Robertson <timrobertson...@gmail.com> > wrote: >> Thank you both, >> >> A quick glance looks like that is what I am looking for. When I get >> it working, I'll post the solution. >> >> Cheers, >> Tim >> >> On Mon, Nov 8, 2010 at 6:55 AM, Namit Jain <nj...@facebook.com> wrote: >>> Other option would be to create a wrapper script (not use either UDF or >>> UDTF) >>> That script, in any language, can emit any number of output rows per input >>> row. >>> >>> Look at: >>> http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform >>> for details >>> >>> ________________________________ >>> From: Sonal Goyal [sonalgoy...@gmail.com] >>> Sent: Sunday, November 07, 2010 8:40 PM >>> To: user@hive.apache.org >>> Subject: Re: Unions causing many scans of input - workaround? >>> >>> Hey Tim, >>> >>> You have an interesting problem. Have you tried creating a UDTF for your >>> case, so that you can possibly emit more than one record for each row of >>> your input? >>> >>> http://wiki.apache.org/hadoop/Hive/DeveloperGuide/UDTF >>> >>> Thanks and Regards, >>> Sonal >>> >>> Sonal Goyal | Founder and CEO | Nube Technologies LLP >>> http://www.nubetech.co | http://in.linkedin.com/in/sonalgoyal >>> >>> >>> >>> >>> >>> On Mon, Nov 8, 2010 at 2:31 AM, Tim Robertson <timrobertson...@gmail.com> >>> wrote: >>>> >>>> Hi all, >>>> >>>> I am porting custom MR code to Hive and have written working UDFs >>>> where I need them. Is there a work around to having to do this in >>>> Hive: >>>> >>>> select * from >>>> ( >>>> select name_id, toTileX(longitude,0) as x, toTileY(latitude,0) as >>>> y, 0 as zoom, funct2(lontgitude, 0) as f2_x, funct2(latitude,0) as >>>> f2_y, count (1) as count >>>> from table >>>> group by name_id, x, y, f2_x, f2_y >>>> >>>> UNION ALL >>>> >>>> select name_id, toTileX(longitude,1) as x, toTileY(latitude,1) as >>>> y, 1 as zoom, funct2(lontgitude, 1) as f2_x, funct2(latitude,1) as >>>> f2_y, count (1) as count >>>> from table >>>> group by name_id, x, y, f2_x, f2_y >>>> >>>> --- etc etc increasing in zoom >>>> ) >>>> >>>> The issue being that this does many passes over the table, whereas >>>> previously in my Map() I would just emit many times from the same >>>> input record and then let it all group in the shuffle and sort. >>>> I actually emit 184 times for an input record (23 zoom levels of >>>> google maps, and 8 ways to derive the name_id) for a single record >>>> which means 184 union statements - Is it possible in hive to force it >>>> to emit many times from the source record in the stage-1 map? >>>> >>>> (ahem) Does anyone know if Pig can do this if not in Hive? >>>> >>>> I hope I have explained this well enough to make sense. >>>> >>>> Thanks in advance, >>>> Tim >>> >>> >> >