Your understanding is correct -- there is no partial aggregation currently for Hive UDAF.
However, there is a PR to fix that: https://github.com/apache/spark/pull/5542 On Thu, Apr 23, 2015 at 1:30 AM, daniel.mescheder < daniel.mesche...@realimpactanalytics.com> wrote: > Hi everyone, > > I was playing with the integration of Hive UDAFs in Spark-SQL and noticed > that the terminatePartial and merge methods of custom UDAFs were not > called. This made me curious as those two methods are the ones responsible > for distributing the UDAF execution in Hive. > Looking at the code of HiveUdafFunction which seems to be the wrapper for > all native Hive functions for which there exists no spark-sql specific > implementation, I noticed that it > > a) extends AggregateFunction and not PartialAggregate > b) only contains calls to iterate and evaluate, but never to merge of the > underlying UDAFEvaluator object > > My question is thus twofold: Is my observation correct, that to achieve > distributed execution of a UDAF I have to add a custom implementation at > the spark-sql layer (like the examples in aggregates.scala)? If that is the > case, how difficult would it be to use the terminatePartial and merge > functions provided by the UDAFEvaluator to make Hive UDAFs distributed by > default? > > > Cheers, > > Daniel > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/In-Spark-SQL-is-there-support-for-distributed-execution-of-native-Hive-UDAFs-tp11753.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com.