Re: UDAF and group by

2011-09-05 Thread Koert Kuipers
Thanks for your response. My thinking was that by turning off hive.map.aggr hive would do the following: col3 becomes the key in mapping. All rows with same col3 go to same reducer. In the reducer the values (=col1,col2) are sorted by key (=col3) and myUdf iterates over the over the values, with te

Re: UDAF and group by

2011-09-04 Thread Huan Li
Setting hive.map.aggr false will reduce the chance of terminatePartial() and merge() being called. Though I don't think it will eliminate the possibility. If your data is large, it's still possible that a group of data is processed by multiple reducers and those two methods are needed. If you need

Re: UDAF and group by

2011-09-04 Thread Koert Kuipers
Hey, my question wasn't very clear. I have a UDAF that I apply per group. The UDAF does not support terminatePartial() and merge(). So to do this i run: set hive.map.aggr=false; select myUdf(col1, col2) from table group by col3; Now this seems to work. But are my assumptions correct that this wil

Re: UDAF and group by

2011-09-01 Thread Huan Li
Koert, Not sure what you mean by "results can be merged between groups". UDAF should be used to aggregated records by group. Why need to merge between groups? Can you give some examples of what kind of query you'd like to run? 2011/8/30 Koert Kuipers > If i run my own UDAF with group by, can i