In my answer I assumed you run your program with "pyspark" command (e.g.
"pyspark mymainscript.py", pyspark should be on your path). In this case
workflow is as follows:
1. You create SparkConf object that simply contains your app's options.
2. You create SparkContext, which initializes your appli
Hi Andrei,
Thank you for your help! Just to make sure I understand, when I run this
command sc.addPyFile("/path/to/yourmodule.py"), I need to be already logged
into the master node and have my python files somewhere, is that correct?
--
View this message in context:
http://apache-spark-user-li
For third party libraries the simplest way is to use Puppet [1] or Chef [2]
or any similar automation tool to install packages (either from PIP [2] or
from distribution's repository). It's easy because if you manage your
cluster's software you are most probably already using one of these
automation