The udf is defined in GenericUDAFPercentileApprox of hive.

When spark-shell runs, it has access to the above class which is packaged
in assembly/target/scala-2.10/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.0.jar
:

  2143 Fri Oct 16 15:02:26 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class
  4602 Fri Oct 16 15:02:26 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class
  1697 Fri Oct 16 15:02:26 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator$PercentileAggBuf.class
  6570 Fri Oct 16 15:02:26 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.class
  4334 Fri Oct 16 15:02:26 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFSinglePercentileApproxEvaluator.class
  6293 Fri Oct 16 15:02:26 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.class

That was the cause for different behavior.

FYI

On Sun, Oct 18, 2015 at 12:10 AM, unk1102 <[email protected]> wrote:

> Hi starting new thread following old thread looks like code for compiling
> callUdf("percentile_approx",col("mycol"),lit(0.25)) is not merged in spark
> 1.5.1 source but I dont understand why this function call works in Spark
> 1.5.1 spark-shell/bin. Please guide.
>
> ---------- Forwarded message ----------
> From: "Ted Yu" <[email protected]>
> Date: Oct 14, 2015 3:26 AM
> Subject: Re: How to calculate percentile of a column of DataFrame?
> To: "Umesh Kacha" <[email protected]>
> Cc: "Michael Armbrust" <[email protected]>,
> "&lt;[email protected]&gt;" <[email protected]>,
> "user" <[email protected]>
>
> I modified DataFrameSuite, in master branch, to call percentile_approx
> instead of simpleUDF :
>
> - deprecated callUdf in SQLContext
> - callUDF in SQLContext *** FAILED ***
>   org.apache.spark.sql.AnalysisException: undefined function
> percentile_approx;
>   at
>
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>   at
>
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>   at scala.Option.getOrElse(Option.scala:120)
>   at
>
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:63)
>   at
>
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>   at
>
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>   at
>
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>   at
>
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505)
>   at
>
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502)
>   at
>
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
>
> SPARK-10671 is included.
> For 1.5.1, I guess the absence of SPARK-10671 means that SparkSQL treats
> percentile_approx as normal UDF.
>
> Experts can correct me, if there is any misunderstanding.
>
> Cheers
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/callUdf-percentile-approx-col-mycol-lit-0-25-does-not-compile-spark-1-5-1-source-but-it-does-work-inn-tp25111.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to