Sorry if this is the wrong place for this.  I am trying to debug an issue
with this library:
https://github.com/springml/spark-sftp

When I attempt to create a dataframe:

spark.read.
            format("com.springml.spark.sftp").
            option("host", "...").
            option("username", "...").
            option("password", "...").
            option("fileType", "csv").
            option("inferSchema", "true").
            option("tempLocation","/srv/spark/tmp").
            option("hdfsTempLocation","/srv/spark/tmp");
     .load("...")

What I am seeing is that the download is occurring on the spark driver not
the spark worker,  This leads to a failure when spark tries to create the
DataFrame on the worker.

I'm confused by the behavior.  my understanding was that load() was lazily
executed on the Spark worker.  Why would some elements be executing on the
driver?

Thanks for your help
-- 
Mark Bidewell
http://www.linkedin.com/in/markbidewell

Reply via email to