(Maybe unrelated FYI): in case you're using only Scala or Java with
Spark, I would recommend to use Datasets instead of DataFrames. They
provide exactly the same functionality, yet offer more type-safety.

On Thu, Sep 8, 2016 at 11:05 AM, Lee Becker <lee.bec...@hapara.com> wrote:
>
> On Thu, Sep 8, 2016 at 11:35 AM, Ashish Tadose <ashishtad...@gmail.com>
> wrote:
>>
>> I wish to organize these dataframe operations by grouping them Scala
>> Object methods.
>> Something like below
>>
>>
>>> Object Driver {
>>> def main(args: Array[String]) {
>>>   val df = Operations.process(sparkContext)
>>>   }
>>> }
>>>
>>> Object Operations {
>>>   def process(sparkContext: SparkContext) : DataFrame = {
>>>     //series of dataframe operations
>>>   }
>>> }
>>
>>
>> My stupid question is would retrieving DF from other Scala Object's method
>> as return type is right thing do in terms of large scale.
>> Would returning DF to driver will cause all data get passed to the driver
>> code or it would be return just pointer to the DF?
>
>
> As long as the methods do not trigger any executions, it is fine to pass a
> DataFrame back to the driver.  Think of a DataFrame as an abstraction over
> RDDs.  When you return an RDD or DataFrame you're not returning the object
> itself.  Instead you're returning a recipe that details the series of
> operations needed to produce the data.
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to