Re: df.count() returns one more count than SELECT COUNT()

Mohamed Nadjib MAMI Thu, 06 Apr 2017 10:51:39 -0700

That was the case. Thanks for the quick and clean answer, Hemanth.

*Regards, Grüße, **Cordialement,** Recuerdos, Saluti, προσρήσεις, 问候,
تحياتي.*
*Mohamed Nadjib Mami*
*Research Associate @ Fraunhofer IAIS - PhD Student @ Bonn University*
*About me! <http://www.strikingly.com/mohamed-nadjib-mami>*
*LinkedIn <http://fr.linkedin.com/in/mohamednadjibmami/>*


On Thu, Apr 6, 2017 at 7:33 PM, Hemanth Gudela <hemanth.gud...@qvantel.com>
wrote:

> Nulls are excluded with *spark.sql("SELECT count(distinct col) FROM
> Table").show()*
>
> I think it is ANSI SQL behaviour.
>
>
>
> scala> spark.sql("select distinct count(null)").show(false)
>
> +-----------+
>
> |count(NULL)|
>
> +-----------+
>
> |0          |
>
> +-----------+
>
>
>
> scala> spark.sql("select distinct null").count
>
> res1: Long = 1
>
>
>
> Regards,
>
> Hemanth
>
>
>
> *From: *Mohamed Nadjib Mami <mohamed.nadjib.m...@gmail.com>
> *Date: *Thursday, 6 April 2017 at 20.29
> *To: *"user@spark.apache.org" <user@spark.apache.org>
> *Subject: *df.count() returns one more count than SELECT COUNT()
>
>
>
> *spark.sql("SELECT count(distinct col) FROM Table").show()*
>

Re: df.count() returns one more count than SELECT COUNT()

Reply via email to