Code in a transformation
(i.e. inside the function passed to RDD.map() or RDD.filter()) will run
on workers, not the driver. They will run in parallel. In Spark, the
driver actually doesn't do much -- it just builds up a description of
the computation to be performed and then sends it off to th
Hi
I'm new to Spark. I have played with some data locally but starting to wonder
if I'm going down a wrong track of using Scala collections inside RDDs.
I'm looking at a log file of events from mobile clients. One of the engagement
metrics we're interested in is lifetime (not terribly interes