Setting hive.map.aggr false will reduce the chance of terminatePartial() and merge() being called. Though I don't think it will eliminate the possibility. If your data is large, it's still possible that a group of data is processed by multiple reducers and those two methods are needed.
If you need to process records in each group in a single method, you can first use collect_set to collect your group data and process them in a UDF. 2011/9/4 Koert Kuipers <ko...@tresata.com> > Hey, my question wasn't very clear. I have a UDAF that I apply per group. > The UDAF does not support terminatePartial() and merge(). So to do this i > run: > > set hive.map.aggr=false; > select myUdf(col1, col2) from table group by col3; > > Now this seems to work. But are my assumptions correct that this will never > call terminatePartial() or merge()? > Thanks Koert > > > On Thu, Sep 1, 2011 at 11:59 PM, Huan Li <yrlih...@gmail.com> wrote: > >> Koert, Not sure what you mean by "results can be merged between groups". >> UDAF should be used to aggregated records by group. Why need to merge >> between groups? >> >> Can you give some examples of what kind of query you'd like to run? >> >> >> 2011/8/30 Koert Kuipers <ko...@tresata.com> >> >>> If i run my own UDAF with group by, can i be sure that a single UDAF >>> instance initialized once will process all members in a group? Or should i >>> code so as to take into account the situation where even within a group >>> multiple UDAFs could run, and i would have to deal with terminatePartial() >>> and merge() even within a group? >>> My problem is that my results within a group are not easily merged, but >>> between groups they are. >>> >> >> >