Thanks. It appears that TypedImperativeAggregate won't be available till
2.2.x. I'm stuck with my RDD approach then :(
---
Regards,
Andy
On Tue, Jan 10, 2017 at 2:01 AM, Liang-Chi Hsieh wrote:
>
> Hi Andy,
>
> Because hash-based aggregate uses unsafe row as aggregation states, so the
> aggr
Hi Andy,
Because hash-based aggregate uses unsafe row as aggregation states, so the
aggregation buffer schema must be mutable types in unsafe row.
If you can use TypedImperativeAggregate to implement your aggregation
function, SparkSQL has ObjectHashAggregateExec which supports hash-based
aggreg
Hi Takeshi,
Thanks for the answer. My UDAF aggregates data into an array of rows.
Apparently this makes it ineligible to using Hash-based aggregate based on
the logic at:
https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeFixedWidthAggregationM
Hi,
Spark always uses hash-based aggregates if the types of aggregated data are
supported there;
otherwise, spark fails to use hash-based ones, then it uses sort-based ones.
See:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.s