Re: Using countApproxDistinct in pyspark

2014-08-04 Thread Diederik
rdd.rdd().countApproxDistinct(4, 0) > > Out[7]: 29L > > > > In [8]: rdd._jrdd.rdd().countApproxDistinct(8, 0) > > Out[8]: 26L > > > > > > Clearly, I am doing something wrong here :) What is also weird is that > when > > I set p to 8, I should get a mor

Re: Using countApproxDistinct in pyspark

2014-07-29 Thread Davies Liu
untApproxDistinct(4, 0) > Out[7]: 29L > > In [8]: rdd._jrdd.rdd().countApproxDistinct(8, 0) > Out[8]: 26L > > > Clearly, I am doing something wrong here :) What is also weird is that when > I set p to 8, I should get a more accurate number, but it's actually > smaller

Using countApproxDistinct in pyspark

2014-07-29 Thread Diederik
hat is also weird is that when I set p to 8, I should get a more accurate number, but it's actually smaller. Any tips or pointers are much appreciated! Best, Diederik -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-countApproxDistinct-in-pyspark-tp1087