Thanks for replying twice :) I think I sent this question by email and
somehow thought I did not sent it, hence created the other one on the web
interface. Lets retain this thread since you have provided more details
here.

Great, it confirms my intuition about DataFrame. It's similar to Shark
columnar layout, with the addition of compression. There it used java nio's
ByteBuffer to hold actual data. I will go through the code you pointed.

I have another question about DataFrame: The RDD operations are divided in
two groups: *transformations *which are lazily evaluated and return a new
RDD and *actions *which evaluate lineage defined by transformations, invoke
actions and return results. What about DataFrame operations like join,
groupBy, agg, unionAll etc which are all transformations in RDD? Are they
lazily evaluated or immediately executed?

Reply via email to