I would think DF=RDD+Schema+some additional methods. In fact, a DF object has a DF.rdd in it so you can (if needed) convert DF<=>RDD really easily.
On Mon, Jun 8, 2015 at 5:41 PM, kiran lonikar <loni...@gmail.com> wrote: > Thanks. Can you point me to a place in the documentation of SQL > programming guide or DataFrame scaladoc where this transformation and > actions are grouped like in the case of RDD? > > Also if you can tell me if sqlContext.load and unionAll are > transformations or actions... > > I answered a question on the forum assuming unionAll is a blocking call > and said execution of multiple load and df.unionAll in different threads > would benefit performance :) > > Kiran > On 08-Jun-2015 4:37 pm, "Cheng Lian" <lian.cs....@gmail.com> wrote: > >> For DataFrame, there are also transformations and actions. And >> transformations are also lazily evaluated. However, DataFrame >> transformations like filter(), select(), agg() return a DataFrame rather >> than an RDD. Other methods like show() and collect() are actions. >> >> Cheng >> >> On 6/8/15 1:33 PM, kiran lonikar wrote: >> >> Thanks for replying twice :) I think I sent this question by email and >> somehow thought I did not sent it, hence created the other one on the web >> interface. Lets retain this thread since you have provided more details >> here. >> >> Great, it confirms my intuition about DataFrame. It's similar to Shark >> columnar layout, with the addition of compression. There it used java nio's >> ByteBuffer to hold actual data. I will go through the code you pointed. >> >> I have another question about DataFrame: The RDD operations are divided >> in two groups: *transformations *which are lazily evaluated and return a >> new RDD and *actions *which evaluate lineage defined by transformations, >> invoke actions and return results. What about DataFrame operations like >> join, groupBy, agg, unionAll etc which are all transformations in RDD? Are >> they lazily evaluated or immediately executed? >> >> >> >> -- Best Regards, Ayan Guha