The following is a simplified example of what I am trying to accomplish. Say I have an RDD of objects like this:
{ "country": "USA", "name": "Franklin", "age": 24, "hits": 224} { "country": "USA", "name": "Bob", "age": 55, "hits": 108} { "country": "France", "name": "Remi", "age": 33, "hits": 72} I want to find the average age and total number of hits per country. Ideally, I would like to scan the data once and perform both aggregations simultaneously. What is a good approach to doing this? I’m thinking that we’d want to keyBy(country), and then somehow reduceByKey(). The problem is, I don’t know how to approach writing a function that can be passed to reduceByKey() and that will track a running average and total simultaneously. Nick -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Patterns-for-making-multiple-aggregations-in-one-pass-tp7874.html Sent from the Apache Spark User List mailing list archive at Nabble.com.