Have you tried using avg in place of mean ? (1 to 5).foreach { i => val df = (1 to 1000).map(j => (j, s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") } sqlContext.sql(""" CREATE TEMPORARY TABLE partitionedParquet USING org.apache.spark.sql.parquet OPTIONS ( path '/tmp/partitioned' )""") sqlContext.sql("""select avg(a) from partitionedParquet""").show()
Cheers On Tue, Oct 27, 2015 at 10:12 AM, Shagun Sodhani <sshagunsodh...@gmail.com> wrote: > So I tried @Reynold's suggestion. I could get countDistinct and > sumDistinct running but mean and approxCountDistinct do not work. (I > guess I am using the wrong syntax for approxCountDistinct) For mean, I > think the registry entry is missing. Can someone clarify that as well? > > On Tue, Oct 27, 2015 at 8:02 PM, Shagun Sodhani <sshagunsodh...@gmail.com> > wrote: > >> Will try in a while when I get back. I assume this applies to all >> functions other than mean. Also countDistinct is defined along with all >> other SQL functions. So I don't get "distinct is not part of function name" >> part. >> On 27 Oct 2015 19:58, "Reynold Xin" <r...@databricks.com> wrote: >> >>> Try >>> >>> count(distinct columnane) >>> >>> In SQL distinct is not part of the function name. >>> >>> On Tuesday, October 27, 2015, Shagun Sodhani <sshagunsodh...@gmail.com> >>> wrote: >>> >>>> Oops seems I made a mistake. The error message is : Exception in thread >>>> "main" org.apache.spark.sql.AnalysisException: undefined function >>>> countDistinct >>>> On 27 Oct 2015 15:49, "Shagun Sodhani" <sshagunsodh...@gmail.com> >>>> wrote: >>>> >>>>> Hi! I was trying out some aggregate functions in SparkSql and I >>>>> noticed that certain aggregate operators are not working. This includes: >>>>> >>>>> approxCountDistinct >>>>> countDistinct >>>>> mean >>>>> sumDistinct >>>>> >>>>> For example using countDistinct results in an error saying >>>>> *Exception in thread "main" org.apache.spark.sql.AnalysisException: >>>>> undefined function cosh;* >>>>> >>>>> I had a similar issue with cosh operator >>>>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Exception-when-using-cosh-td14724.html> >>>>> as well some time back and it turned out that it was not registered in the >>>>> registry: >>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala >>>>> >>>>> >>>>> *I* *think it is the same issue again and would be glad to send over >>>>> a PR if someone can confirm if this is an actual bug and not some mistake >>>>> on my part.* >>>>> >>>>> >>>>> Query I am using: SELECT countDistinct(`age`) as `data` FROM `table` >>>>> Spark Version: 10.4 >>>>> SparkSql Version: 1.5.1 >>>>> >>>>> I am using the standard example of (name, age) schema (though I am >>>>> setting age as Double and not Int as I am trying out maths functions). >>>>> >>>>> The entire error stack can be found here >>>>> <http://pastebin.com/G6YzQXnn>. >>>>> >>>>> Thanks! >>>>> >>>> >