On Wed, Aug 2, 2017 at 2:16 PM, Jean Georges Perrin <j...@jgp.net> wrote:
> Hi Sparkians, > > I understand the lazy evaluation mechanism with transformations and > actions. My question is simpler: 1) are show() and/or printSchema() > actions? I would assume so... > show() is an action (it prints data) but printSchema() is not an action. Spark can tell you the schema of the result without computing the result. and optional question: 2) is there a way to know if there are > transformations "pending"? > There are always transformations pending :). An RDD or DataFrame is a series of pending transformations. If you say val df = spark.read.csv("foo.csv"), that is a pending transformation. Even spark.emptyDataFrame is best understood as a pending transformation: it does not do anything on the cluster, but records locally what it will have to do on the cluster.