Re: finding distinct count using dataframe

2016-01-05 Thread Kristina Rogale Plazonic
I think it's an expression, rather than a function you'd find in the API (as a function you could do df.select(col).distinct.count) This will give you the number of distinct rows in both columns: scala> df.select(countDistinct("name", "age")) res397: org.apache.spark.sql.DataFrame = [COUNT(DIST

Re: finding distinct count using dataframe

2016-01-05 Thread Arunkumar Pillai
Thanks Yanbo, Thanks for the help. But I'm not able to find countDistinct ot approxCountDistinct. function. These functions are within dataframe or any other package On Tue, Jan 5, 2016 at 3:24 PM, Yanbo Liang wrote: > Hi Arunkumar, > > You can use datasetDF.select(countDistinct(col1, col2, col

Re: finding distinct count using dataframe

2016-01-05 Thread Yanbo Liang
Hi Arunkumar, You can use datasetDF.select(countDistinct(col1, col2, col3, ...)) or approxCountDistinct for a approximate result. 2016-01-05 17:11 GMT+08:00 Arunkumar Pillai : > Hi > > Is there any functions to find distinct count of all the variables in > dataframe. > > val sc = new SparkCont