Spark DataFrame Creation

Mark Bidewell Wed, 22 Jul 2020 14:47:38 -0700

Sorry if this is the wrong place for this.  I am trying to debug an issue
with this library:
https://github.com/springml/spark-sftp


When I attempt to create a dataframe:

spark.read.
            format("com.springml.spark.sftp").
            option("host", "...").
            option("username", "...").
            option("password", "...").
            option("fileType", "csv").
            option("inferSchema", "true").
            option("tempLocation","/srv/spark/tmp").
            option("hdfsTempLocation","/srv/spark/tmp");
     .load("...")

What I am seeing is that the download is occurring on the spark driver not
the spark worker,  This leads to a failure when spark tries to create the
DataFrame on the worker.

I'm confused by the behavior.  my understanding was that load() was lazily
executed on the Spark worker.  Why would some elements be executing on the
driver?

Thanks for your help
-- 
Mark Bidewell
http://www.linkedin.com/in/markbidewell

Spark DataFrame Creation

Reply via email to