Re: new user question on using scala collections inside RDDs

2014-03-14 Thread Ewen Cheslack-Postava
Code in a transformation (i.e. inside the function passed to RDD.map() or RDD.filter()) will run on workers, not the driver. They will run in parallel. In Spark, the driver actually doesn't do much -- it just builds up a description of the computation to be performed and then sends it off to th

new user question on using scala collections inside RDDs

2014-03-14 Thread Peter
Hi  I'm new to Spark. I have played with some data locally but starting to wonder if I'm going down a wrong track of using Scala collections inside RDDs.  I'm looking at a log file of events from mobile clients. One of the engagement metrics we're interested in is lifetime (not terribly interes