Hi all,
I am using Spark 1.0.0 with CDH 5.1.0.
I want to
aggregate the data in a raw table using a simple query like
below
SELECT MIN(field1), MAX(field2), AVG(field3),
PERCENTILE(field4), year,month,day FROM raw_data_table GROUP
BY year, month, day
MIN, MAX and AVG functions work fine
for me, but with PERCENTILE, I get an error as shown
below.
Exception in thread "main"
java.lang.RuntimeException: No handler for udf class
org.apache.hadoop.hive.ql.udf.UDAFPercentile
at
scala.sys.package$.error(package.scala:27)
at
org.apache.spark.sql.hive.HiveFunctionRegistry$.lookupFunction(hiveUdfs.scala:69)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:115)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:113)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
I
have read in the documentation that with HiveContext Spark SQL supports
all the UDFs supported in Hive.
I want to know if there is anything
else I need to follow to use Percentile with Spark SQL..?? Or .. Are there
any limitations still in Spark SQL with respect to UDFs and UDAFs in the
version I am using..??
Thanks and
regards
Vinay Kashyap