I think it's an expression, rather than a function you'd find in the API
(as a function you could do df.select(col).distinct.count)
This will give you the number of distinct rows in both columns:
scala> df.select(countDistinct("name", "age"))
res397: org.apache.spark.sql.DataFrame = [COUNT(DIST
Thanks Yanbo,
Thanks for the help. But I'm not able to find countDistinct ot
approxCountDistinct. function. These functions are within dataframe or any
other package
On Tue, Jan 5, 2016 at 3:24 PM, Yanbo Liang wrote:
> Hi Arunkumar,
>
> You can use datasetDF.select(countDistinct(col1, col2, col
Hi Arunkumar,
You can use datasetDF.select(countDistinct(col1, col2, col3, ...)) or
approxCountDistinct for a approximate result.
2016-01-05 17:11 GMT+08:00 Arunkumar Pillai :
> Hi
>
> Is there any functions to find distinct count of all the variables in
> dataframe.
>
> val sc = new SparkCont