Thanks Michael, much appreciated!
Nothing should be held in memory for a query like this (other than a single
count per partition), so I don't think that is the problem. There is
likely an error buried somewhere.
For your above comments - I don't get any error but just get the NULL as
return val
>
> Much appreciated! I am not comparing with "select count(*)" for
> performance, but it was one simple thing I tried to check the performance
> :). I think it now makes sense since Spark tries to extract all records
> before doing the count. I thought having an aggregated function query
> submitt