Thanks Sean!! Thats what I was looking for -- group by on mulitple fields.
I'm gonna play with it now. Thanks again!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781p9803.html
Sent from the Apache Spark User
;
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thats is correct Raffy. Assume I convert the timestamp field to date and in
the required format, is it possible to report it by date?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781p9790.html
Sent from the Apache
> All I'm attempting is to report number of unique visitors per page by date.
But the way you are doing it currently, you will get a count per second. You
have to bucketize your dates by whatever time resolution you want.
-raffy
Thanks Nick.
All I'm attempting is to report number of unique visitors per page by date.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781p9786.html
Sent from the Apache Spark User List mailing list archi
pache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781p9787.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
csv.groupBy(_(1)).count
>>
>> But not able to see how to do count distinct on userId and also apply
>> another groupBy on timestamp field. Please let me know how to handle such
>> cases.
>>
>> Thanks!
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
gt; csv.groupBy(_(1)).count
> But not able to see how to do count distinct on userId and also apply
> another groupBy on timestamp field. Please let me know how to handle such
> cases.
> Thanks!
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabbl
ache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.