Re: Spark SQL Count Distinct

2016-06-16 Thread Reynold Xin
d all of the data into RAM. > The Spark Core implementation which uses reduce to 1 and sum doesn't have > this risk. > > I've found this old thread which compares Spark Core and Spark SQL count > distinct performance: > > > http://apache-spark-developers-list.10

Spark SQL Count Distinct

2016-06-16 Thread Avshalom
of memory overload if the distinct implementation has to load all of the data into RAM. The Spark Core implementation which uses reduce to 1 and sum doesn't have this risk. I've found this old thread which compares Spark Core and Spark SQL count distinct performance: http://apache-spark-