Thanks for your response.
My thinking was that by turning off hive.map.aggr hive would do the
following:
col3 becomes the key in mapping. All rows with same col3 go to same reducer.
In the reducer the values (=col1,col2) are sorted by key (=col3) and myUdf
iterates over the over the values, with te
Setting hive.map.aggr false will reduce the chance of terminatePartial() and
merge() being called. Though I don't think it will eliminate the
possibility. If your data is large, it's still possible that a group of data
is processed by multiple reducers and those two methods are needed.
If you need
Hey, my question wasn't very clear. I have a UDAF that I apply per group.
The UDAF does not support terminatePartial() and merge(). So to do this i
run:
set hive.map.aggr=false;
select myUdf(col1, col2) from table group by col3;
Now this seems to work. But are my assumptions correct that this wil
Koert, Not sure what you mean by "results can be merged between groups".
UDAF should be used to aggregated records by group. Why need to merge
between groups?
Can you give some examples of what kind of query you'd like to run?
2011/8/30 Koert Kuipers
> If i run my own UDAF with group by, can i