Thanks, I will try this.
On Fri, Dec 5, 2014 at 1:19 AM, Cheng Lian wrote:
> Oh, sorry. So neither SQL nor Spark SQL is preferred. Then you may write
> you own aggregation with aggregateByKey:
>
> users.aggregateByKey((0, Set.empty[String]))({ case ((count, seen), user) =>
> (count + 1, seen
Oh, sorry. So neither SQL nor Spark SQL is preferred. Then you may write
you own aggregation with |aggregateByKey|:
|users.aggregateByKey((0,Set.empty[String]))({case ((count, seen), user) =>
(count +1, seen + user)
}, {case ((count0, seen0), (count1, seen1)) =>
(count0 + count1, seen0 ++
Disclaimer : I am new at Spark
I did something similar in a prototype which works but I that did not test
at scale yet
val agg =3D users.mapValues(_ =3D> 1)..aggregateByKey(new
CustomAggregation())(CustomAggregation.sequenceOp, CustomAggregation.comboO=
p)
class CustomAggregation() extends
Is that Spark SQL? I'm wondering if it's possible without spark SQL.
On Wed, Dec 3, 2014 at 8:08 PM, Cheng Lian wrote:
> You may do this:
>
> table("users").groupBy('zip)('zip, count('user), countDistinct('user))
>
> On 12/4/14 8:47 AM, Arun Luthra wrote:
>
> I'm wondering how to do this kind
You may do this:
|table("users").groupBy('zip)('zip, count('user), countDistinct('user))
|
On 12/4/14 8:47 AM, Arun Luthra wrote:
I'm wondering how to do this kind of SQL query with PairRDDFunctions.
SELECT zip, COUNT(user), COUNT(DISTINCT user)
FROM users
GROUP BY zip
In the Spark scala API
I'm wondering how to do this kind of SQL query with PairRDDFunctions.
SELECT zip, COUNT(user), COUNT(DISTINCT user)
FROM users
GROUP BY zip
In the Spark scala API, I can make an RDD (called "users") of key-value
pairs where the keys are zip (as in ZIP code) and the values are user id's.
Then I ca