(Maybe unrelated FYI): in case you're using only Scala or Java with Spark, I would recommend to use Datasets instead of DataFrames. They provide exactly the same functionality, yet offer more type-safety.
On Thu, Sep 8, 2016 at 11:05 AM, Lee Becker <lee.bec...@hapara.com> wrote: > > On Thu, Sep 8, 2016 at 11:35 AM, Ashish Tadose <ashishtad...@gmail.com> > wrote: >> >> I wish to organize these dataframe operations by grouping them Scala >> Object methods. >> Something like below >> >> >>> Object Driver { >>> def main(args: Array[String]) { >>> val df = Operations.process(sparkContext) >>> } >>> } >>> >>> Object Operations { >>> def process(sparkContext: SparkContext) : DataFrame = { >>> //series of dataframe operations >>> } >>> } >> >> >> My stupid question is would retrieving DF from other Scala Object's method >> as return type is right thing do in terms of large scale. >> Would returning DF to driver will cause all data get passed to the driver >> code or it would be return just pointer to the DF? > > > As long as the methods do not trigger any executions, it is fine to pass a > DataFrame back to the driver. Think of a DataFrame as an abstraction over > RDDs. When you return an RDD or DataFrame you're not returning the object > itself. Instead you're returning a recipe that details the series of > operations needed to produce the data. > > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org