Hi,

Datasketches has out-of-box HLL UDAF in hive, when I tried in spark, I got
errors. Can someone explain why it is failing in spark?

spark-shell --jars
datasketches-memory-1.2.0-incubating.jar,datasketches-hive-1.0.0-incubating.jar,datasketches-java-1.2.0-incubating.jar


spark.sql("""create temporary function data2sketch as
'org.apache.datasketches.hive.hll.DataToSketchUDAF'""")


spark.sql("""with v as (select 'a' x union select 'b') select
data2sketch(x)  from v""").show


Caused by: java.lang.ClassCastException:
org.apache.datasketches.hive.hll.SketchState cannot be cast to
org.apache.datasketches.hive.hll.UnionState

  at
org.apache.datasketches.hive.hll.SketchEvaluator.merge(SketchEvaluator.java:69)

  at
org.apache.datasketches.hive.hll.DataToSketchUDAF$DataToSketchEvaluator.merge(DataToSketchUDAF.java:114)

  at org.apache.spark.sql.hive.HiveUDAFFunction.merge(hiveUDFs.scala:421)

  at org.apache.spark.sql.hive.HiveUDAFFunction.merge(hiveUDFs.scala:307)

  at
org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.merge(interfaces.scala:541)

  at
org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1$$anonfun$applyOrElse$2.apply(AggregationIterator.scala:174)

  at
org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1$$anonfun$applyOrElse$2.apply(AggregationIterator.scala:174)

  at
org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateProcessRow$1.apply(AggregationIterator.scala:188)

  at
org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateProcessRow$1.apply(AggregationIterator.scala:182)

  at
org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.processInputs(ObjectAggregationIterator.scala:152)

  at
org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.<init>(ObjectAggregationIterator.scala:78)

  at
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:114)

  at
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:105)

...


Best

Reply via email to