This looks like the following issue:
https://github.com/apache/incubator-datasketches-hive/issues/34
You did not mention your version of Spark. The issue must have been
addressed in Spark a long time ago.

On Mon, Dec 21, 2020 at 10:10 AM Dong Jiang <dong.ji...@gmail.com> wrote:

> Hi,
>
> Datasketches has out-of-box HLL UDAF in hive, when I tried in spark, I got
> errors. Can someone explain why it is failing in spark?
>
> spark-shell --jars
> datasketches-memory-1.2.0-incubating.jar,datasketches-hive-1.0.0-incubating.jar,datasketches-java-1.2.0-incubating.jar
>
>
> spark.sql("""create temporary function data2sketch as
> 'org.apache.datasketches.hive.hll.DataToSketchUDAF'""")
>
>
> spark.sql("""with v as (select 'a' x union select 'b') select
> data2sketch(x)  from v""").show
>
>
> Caused by: java.lang.ClassCastException:
> org.apache.datasketches.hive.hll.SketchState cannot be cast to
> org.apache.datasketches.hive.hll.UnionState
>
>   at
> org.apache.datasketches.hive.hll.SketchEvaluator.merge(SketchEvaluator.java:69)
>
>   at
> org.apache.datasketches.hive.hll.DataToSketchUDAF$DataToSketchEvaluator.merge(DataToSketchUDAF.java:114)
>
>   at org.apache.spark.sql.hive.HiveUDAFFunction.merge(hiveUDFs.scala:421)
>
>   at org.apache.spark.sql.hive.HiveUDAFFunction.merge(hiveUDFs.scala:307)
>
>   at
> org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.merge(interfaces.scala:541)
>
>   at
> org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1$$anonfun$applyOrElse$2.apply(AggregationIterator.scala:174)
>
>   at
> org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1$$anonfun$applyOrElse$2.apply(AggregationIterator.scala:174)
>
>   at
> org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateProcessRow$1.apply(AggregationIterator.scala:188)
>
>   at
> org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateProcessRow$1.apply(AggregationIterator.scala:182)
>
>   at
> org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.processInputs(ObjectAggregationIterator.scala:152)
>
>   at
> org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.<init>(ObjectAggregationIterator.scala:78)
>
>   at
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:114)
>
>   at
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:105)
>
> ...
>
>
> Best
>

Reply via email to