This looks like the following issue: https://github.com/apache/incubator-datasketches-hive/issues/34 You did not mention your version of Spark. The issue must have been addressed in Spark a long time ago.
On Mon, Dec 21, 2020 at 10:10 AM Dong Jiang <dong.ji...@gmail.com> wrote: > Hi, > > Datasketches has out-of-box HLL UDAF in hive, when I tried in spark, I got > errors. Can someone explain why it is failing in spark? > > spark-shell --jars > datasketches-memory-1.2.0-incubating.jar,datasketches-hive-1.0.0-incubating.jar,datasketches-java-1.2.0-incubating.jar > > > spark.sql("""create temporary function data2sketch as > 'org.apache.datasketches.hive.hll.DataToSketchUDAF'""") > > > spark.sql("""with v as (select 'a' x union select 'b') select > data2sketch(x) from v""").show > > > Caused by: java.lang.ClassCastException: > org.apache.datasketches.hive.hll.SketchState cannot be cast to > org.apache.datasketches.hive.hll.UnionState > > at > org.apache.datasketches.hive.hll.SketchEvaluator.merge(SketchEvaluator.java:69) > > at > org.apache.datasketches.hive.hll.DataToSketchUDAF$DataToSketchEvaluator.merge(DataToSketchUDAF.java:114) > > at org.apache.spark.sql.hive.HiveUDAFFunction.merge(hiveUDFs.scala:421) > > at org.apache.spark.sql.hive.HiveUDAFFunction.merge(hiveUDFs.scala:307) > > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.merge(interfaces.scala:541) > > at > org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1$$anonfun$applyOrElse$2.apply(AggregationIterator.scala:174) > > at > org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1$$anonfun$applyOrElse$2.apply(AggregationIterator.scala:174) > > at > org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateProcessRow$1.apply(AggregationIterator.scala:188) > > at > org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateProcessRow$1.apply(AggregationIterator.scala:182) > > at > org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.processInputs(ObjectAggregationIterator.scala:152) > > at > org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.<init>(ObjectAggregationIterator.scala:78) > > at > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:114) > > at > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:105) > > ... > > > Best >