subject:"Re\: How to hint Spark to use HashAggregate\(\) for UDAF"

Re: How to hint Spark to use HashAggregate() for UDAF

2017-01-10 Thread Andy Dang

Thanks. It appears that TypedImperativeAggregate won't be available till 2.2.x. I'm stuck with my RDD approach then :( --- Regards, Andy On Tue, Jan 10, 2017 at 2:01 AM, Liang-Chi Hsieh wrote: > > Hi Andy, > > Because hash-based aggregate uses unsafe row as aggregation states, so the > aggr

Re: How to hint Spark to use HashAggregate() for UDAF

2017-01-09 Thread Liang-Chi Hsieh

Hi Andy, Because hash-based aggregate uses unsafe row as aggregation states, so the aggregation buffer schema must be mutable types in unsafe row. If you can use TypedImperativeAggregate to implement your aggregation function, SparkSQL has ObjectHashAggregateExec which supports hash-based aggreg

Re: How to hint Spark to use HashAggregate() for UDAF

2017-01-09 Thread Andy Dang

Hi Takeshi, Thanks for the answer. My UDAF aggregates data into an array of rows. Apparently this makes it ineligible to using Hash-based aggregate based on the logic at: https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeFixedWidthAggregationM

Re: How to hint Spark to use HashAggregate() for UDAF

2017-01-09 Thread Takeshi Yamamuro

Hi, Spark always uses hash-based aggregates if the types of aggregated data are supported there; otherwise, spark fails to use hash-based ones, then it uses sort-based ones. See: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.s