Re: Exception when using some aggregate operators

Ted Yu Tue, 27 Oct 2015 14:51:29 -0700

Have you tried using avg in place of mean ?

(1 to 5).foreach { i => val df = (1 to 1000).map(j => (j,
s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") }
    sqlContext.sql("""
    CREATE TEMPORARY TABLE partitionedParquet
    USING org.apache.spark.sql.parquet
    OPTIONS (
      path '/tmp/partitioned'
    )""")
sqlContext.sql("""select avg(a) from partitionedParquet""").show()


Cheers

On Tue, Oct 27, 2015 at 10:12 AM, Shagun Sodhani <[email protected]>
wrote:

> So I tried @Reynold's suggestion. I could get countDistinct and
> sumDistinct running but  mean and approxCountDistinct do not work. (I
> guess I am using the wrong syntax for approxCountDistinct) For mean, I
> think the registry entry is missing. Can someone clarify that as well?
>
> On Tue, Oct 27, 2015 at 8:02 PM, Shagun Sodhani <[email protected]>
> wrote:
>
>> Will try in a while when I get back. I assume this applies to all
>> functions other than mean. Also countDistinct is defined along with all
>> other SQL functions. So I don't get "distinct is not part of function name"
>> part.
>> On 27 Oct 2015 19:58, "Reynold Xin" <[email protected]> wrote:
>>
>>> Try
>>>
>>> count(distinct columnane)
>>>
>>> In SQL distinct is not part of the function name.
>>>
>>> On Tuesday, October 27, 2015, Shagun Sodhani <[email protected]>
>>> wrote:
>>>
>>>> Oops seems I made a mistake. The error message is : Exception in thread
>>>> "main" org.apache.spark.sql.AnalysisException: undefined function
>>>> countDistinct
>>>> On 27 Oct 2015 15:49, "Shagun Sodhani" <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi! I was trying out some aggregate  functions in SparkSql and I
>>>>> noticed that certain aggregate operators are not working. This includes:
>>>>>
>>>>> approxCountDistinct
>>>>> countDistinct
>>>>> mean
>>>>> sumDistinct
>>>>>
>>>>> For example using countDistinct results in an error saying
>>>>> *Exception in thread "main" org.apache.spark.sql.AnalysisException:
>>>>> undefined function cosh;*
>>>>>
>>>>> I had a similar issue with cosh operator
>>>>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Exception-when-using-cosh-td14724.html>
>>>>> as well some time back and it turned out that it was not registered in the
>>>>> registry:
>>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
>>>>>
>>>>>
>>>>> *I* *think it is the same issue again and would be glad to send over
>>>>> a PR if someone can confirm if this is an actual bug and not some mistake
>>>>> on my part.*
>>>>>
>>>>>
>>>>> Query I am using: SELECT countDistinct(`age`) as `data` FROM `table`
>>>>> Spark Version: 10.4
>>>>> SparkSql Version: 1.5.1
>>>>>
>>>>> I am using the standard example of (name, age) schema (though I am
>>>>> setting age as Double and not Int as I am trying out maths functions).
>>>>>
>>>>> The entire error stack can be found here
>>>>> <http://pastebin.com/G6YzQXnn>.
>>>>>
>>>>> Thanks!
>>>>>
>>>>
>

Re: Exception when using some aggregate operators

Reply via email to