On Fri, Aug 4, 2017 at 4:36 PM, Jean Georges Perrin <j...@jgp.net> wrote:

> Thanks Daniel,
>
> I like your answer for #1. It makes sense.
>
> However, I don't get why you say that there are always pending
> transformations... After you call an action, you should be "clean" from
> pending transformations, no?
>

Nope. Say you have val df = spark.read.csv("data.csv"); println(df.count +
df.count). The first "df.count" reads in the file and counts the rows. The
action was executed, but "df" still represents the same pending
transformations. The second "df.count" again reads in the file and counts
the rows. Actions do not modify DataFrames/RDDs. (The only exception is
"cache()".)

Reply via email to