Re: Count distinct with groupBy usage

2014-07-15 Thread buntu
Thanks Sean!! Thats what I was looking for -- group by on mulitple fields. I'm gonna play with it now. Thanks again! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781p9803.html Sent from the Apache Spark User

Re: Count distinct with groupBy usage

2014-07-15 Thread Sean Owen
; > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Count distinct with groupBy usage

2014-07-15 Thread buntu
Thats is correct Raffy. Assume I convert the timestamp field to date and in the required format, is it possible to report it by date? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781p9790.html Sent from the Apache

Re: Count distinct with groupBy usage

2014-07-15 Thread Raffael Marty
> All I'm attempting is to report number of unique visitors per page by date. But the way you are doing it currently, you will get a count per second. You have to bucketize your dates by whatever time resolution you want. -raffy

Re: Count distinct with groupBy usage

2014-07-15 Thread buntu
Thanks Nick. All I'm attempting is to report number of unique visitors per page by date. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781p9786.html Sent from the Apache Spark User List mailing list archi

Re: Count distinct with groupBy usage

2014-07-15 Thread buntu
pache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781p9787.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Count distinct with groupBy usage

2014-07-15 Thread Zongheng Yang
csv.groupBy(_(1)).count >> >> But not able to see how to do count distinct on userId and also apply >> another groupBy on timestamp field. Please let me know how to handle such >> cases. >> >> Thanks! >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. > >

Re: Count distinct with groupBy usage

2014-07-15 Thread Nick Pentreath
gt; csv.groupBy(_(1)).count > But not able to see how to do count distinct on userId and also apply > another groupBy on timestamp field. Please let me know how to handle such > cases. > Thanks! > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabbl

Count distinct with groupBy usage

2014-07-15 Thread buntu
ache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781.html Sent from the Apache Spark User List mailing list archive at Nabble.com.