Here is the link on jira: https://issues.apache.org/jira/browse/SPARK-4243
<https://issues.apache.org/jira/browse/SPARK-4243>
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SQL-COUNT-DISTINCT-tp17818p18166.html
Sent from the Apache Spark Use
On Mon, Nov 3, 2014 at 12:45 AM, Bojan Kostic wrote:
>
> But will this improvement also affect when you want to count distinct on 2
> or more fields:
> SELECT COUNT(f1), COUNT(DISTINCT f2), COUNT(DISTINCT f3), COUNT(DISTINCT
> f4)
> FROM parquetFile
>
Unfortunately I think this case may be harder
questions or anything like that.
Best regards
Bojan
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SQL-COUNT-DISTINCT-tp17818p17939.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
;SELECT COUNT(DISTINCT f2) FROM parquetFile")
>> count.map(t => t(0)).collect().foreach(println)
>>
>> I guess because of the distinct process must be on single node. But i
>> wonder
>> can i add some parallelism to the collect process.
>>
>>
n)
>
> I guess because of the distinct process must be on single node. But i
> wonder
> can i add some parallelism to the collect process.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com
use of the distinct process must be on single node. But i wonder
can i add some parallelism to the collect process.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SQL-COUNT-DISTINCT-tp17818.html
Sent from the Apache Spark User L