Hi, im trying to run an external script on spark using rdd.pipe() and although it runs successfully on standalone, it throws an error on cluster. The error comes from the executors and it's : "Cannot run program "path/to/program": error=2, No such file or directory".
Does the external script need to be available on all nodes in the cluster when using rdd.pipe()? What if i don't have permission to install anything on the nodes of the cluster? Is there any other way to make the script available to the worker nodes? (The external script is loaded in HDFS and is passed to the driver class through args) -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org