Support for Percentile and Variance Aggregation functions in Spark with HiveContext

vinay . kashyap Fri, 25 Jul 2014 08:07:53 -0700




Hi all,
I am using Spark 1.0.0 with CDH 5.1.0.
I want to
aggregate the data in a raw table using a simple query like
below
SELECT MIN(field1), MAX(field2), AVG(field3),
PERCENTILE(field4), year,month,day FROM  raw_data_table  GROUP
BY year, month, day
MIN, MAX and AVG functions work fine
for me, but with PERCENTILE, I get an error as shown
below.
Exception in thread "main"
java.lang.RuntimeException: No handler for udf class
org.apache.hadoop.hive.ql.udf.UDAFPercentile

        at
scala.sys.package$.error(package.scala:27)

        at
org.apache.spark.sql.hive.HiveFunctionRegistry$.lookupFunction(hiveUdfs.scala:69)

        at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:115)

        at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:113)

        at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
I
have read in the documentation that with HiveContext Spark SQL supports
all the UDFs supported in Hive.
I want to know if there is anything
else I need to follow to use Percentile with Spark SQL..?? Or .. Are there
any limitations still in Spark SQL with respect to UDFs and UDAFs in the
version I am using..??
 
 
Thanks and
regards
Vinay Kashyap

Support for Percentile and Variance Aggregation functions in Spark with HiveContext

Reply via email to