UDAF and UDT with SparkSQL 1.5.0

jussipekkap Sat, 12 Sep 2015 11:20:06 -0700

Hi,

Issue #1:
I'm using the new UDAF interface (UserDefinedAggregateFunction) at Spark
1.5.0 release. Is it possible to aggregate all values in the
MutableAggregationBuffer into an array in a robust manner? I'm creating an
aggregation function that collects values into an array from all input rows,
and then calculates the final result from the UDAF using the array/list as
input value. The issue I'm running into is that the values contained in the
MutableAggregationBuffer are immutable, which means I need to create a copy
of the array every time I append a new value. This of course makes it very
slow for any significant number of elements.


Issue #2:
I also tried the hive 'collect_list' UDAF, but as the input values are UDTs,
I'm getting scala.MatchError as a result. I suppose the hive UDAFs only work
with primitive parameters!?

-JP



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/UDAF-and-UDT-with-SparkSQL-1-5-0-tp24670.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

UDAF and UDT with SparkSQL 1.5.0

Reply via email to