Re: SQL COUNT DISTINCT

2014-11-05 Thread Bojan Kostic
Here is the link on jira: https://issues.apache.org/jira/browse/SPARK-4243 <https://issues.apache.org/jira/browse/SPARK-4243> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQL-COUNT-DISTINCT-tp17818p18166.html Sent from the Apache Spark Use

Re: SQL COUNT DISTINCT

2014-11-03 Thread Michael Armbrust
On Mon, Nov 3, 2014 at 12:45 AM, Bojan Kostic wrote: > > But will this improvement also affect when you want to count distinct on 2 > or more fields: > SELECT COUNT(f1), COUNT(DISTINCT f2), COUNT(DISTINCT f3), COUNT(DISTINCT > f4) > FROM parquetFile > Unfortunately I think this case may be harder

Re: SQL COUNT DISTINCT

2014-11-03 Thread Bojan Kostic
questions or anything like that. Best regards Bojan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQL-COUNT-DISTINCT-tp17818p17939.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: SQL COUNT DISTINCT

2014-10-31 Thread Michael Armbrust
;SELECT COUNT(DISTINCT f2) FROM parquetFile") >> count.map(t => t(0)).collect().foreach(println) >> >> I guess because of the distinct process must be on single node. But i >> wonder >> can i add some parallelism to the collect process. >> >>

Re: SQL COUNT DISTINCT

2014-10-31 Thread Nicholas Chammas
n) > > I guess because of the distinct process must be on single node. But i > wonder > can i add some parallelism to the collect process. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com

SQL COUNT DISTINCT

2014-10-31 Thread Bojan Kostic
use of the distinct process must be on single node. But i wonder can i add some parallelism to the collect process. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQL-COUNT-DISTINCT-tp17818.html Sent from the Apache Spark User L