d all of the data into RAM.
> The Spark Core implementation which uses reduce to 1 and sum doesn't have
> this risk.
>
> I've found this old thread which compares Spark Core and Spark SQL count
> distinct performance:
>
>
> http://apache-spark-developers-list.10
of memory overload if the distinct
implementation has to load all of the data into RAM.
The Spark Core implementation which uses reduce to 1 and sum doesn't have
this risk.
I've found this old thread which compares Spark Core and Spark SQL count
distinct performance:
http://apache-spark-