Let's say I have to apply a complex sequence of operations to a certain RDD.
In order to make code more modular/readable, I would typically have
something like this:
object myObject {
def main(args: Array[String]) {
val rdd1 = function1(myRdd)
val rdd2 = function2(rdd1)
val rdd3 = function3(rdd2)
}
def function1(rdd: RDD) : RDD = { doSomething }
def function2(rdd: RDD) : RDD = { doSomethingElse }
def function3(rdd: RDD) : RDD = { doSomethingElseYet }
}
So I am explicitly declaring vals for the intermediate steps. Does this end
up using more storage than if I just chained all of the operations and
declared only one val instead?
If yes, is there a better way to chain together the operations?
Ideally I would like to do something like:
val rdd = function1.function2.function3
Is there a way I can write the signature of my functions to accomplish
this? Is this also an efficiency issue or just a stylistic one?
Simone Franzini, PhD
http://www.linkedin.com/in/simonefranzini