Re: Support for Percentile and Variance Aggregation functions in Spark with HiveContext

Michael Armbrust Fri, 25 Jul 2014 12:14:29 -0700

Hmm, in general we try to support all the UDAFs, but this one must be using
a different base class that we don't have a wrapper for.  JIRA here:
https://issues.apache.org/jira/browse/SPARK-2693



On Fri, Jul 25, 2014 at 8:06 AM, <vinay.kash...@socialinfra.net> wrote:

>
> Hi all,
>
> I am using Spark 1.0.0 with CDH 5.1.0.
>
> I want to aggregate the data in a raw table using a simple query like below
>
> *SELECT MIN(field1), MAX(field2), AVG(field3), PERCENTILE(field4),
> year,month,day FROM  raw_data_table  GROUP BY year, month, day*
>
> MIN, MAX and AVG functions work fine for me, but with PERCENTILE, I get an
> error as shown below.
>
> Exception in thread "main" java.lang.RuntimeException: No handler for udf
> class org.apache.hadoop.hive.ql.udf.UDAFPercentile
>         at scala.sys.package$.error(package.scala:27)
>         at
> org.apache.spark.sql.hive.HiveFunctionRegistry$.lookupFunction(hiveUdfs.scala:69)
>         at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:115)
>         at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:113)
>         at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>
> I have read in the documentation that with HiveContext Spark SQL supports
> all the UDFs supported in Hive.
>
> I want to know if there is anything else I need to follow to use
> Percentile with Spark SQL..?? Or .. Are there any limitations still in
> Spark SQL with respect to UDFs and UDAFs in the version I am using..??
>
>
>
>
>
> Thanks and regards
>
> Vinay Kashyap
>

Re: Support for Percentile and Variance Aggregation functions in Spark with HiveContext

Reply via email to