I am exploring Spark SQL and Dataframe and trying to create an aggregration
by column and generate a single json row with aggregation. Any inputs on the
right approach will be helpful.
Here is my sample data
user,sports,major,league,count
[test1,Sports,Switzerland,NLA,6]
[test1,Football,Australia,A-League,6]
[test1,Ice Hockey,Sweden,SHL,3]
[test1,Ice Hockey,Switzerland,NLB,2]
[test1,Football,Romania,Liga I,1]
I want to aggregate by user and create a single json row.
{ user : test1 , sports : [ { "Ice Hockey" : 11, "Football" : 7 }] , major
: [ {"Switzerland" : 8, "Australia" :6 , "Sweden" : 3 , "Romania" :1 }]
,league : [ "NLA" : 6 , "A-League" : 6 , "SHL" :3 , "NLB" :2 , "Liga I" :
1] , total : 18}
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Aggregation-by-column-and-generating-a-json-tp22562.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]