Re: Loading Python libraries into Spark

2014-06-05 Thread Andrei
In my answer I assumed you run your program with "pyspark" command (e.g. "pyspark mymainscript.py", pyspark should be on your path). In this case workflow is as follows: 1. You create SparkConf object that simply contains your app's options. 2. You create SparkContext, which initializes your appli

Re: Loading Python libraries into Spark

2014-06-05 Thread mrm
Hi Andrei, Thank you for your help! Just to make sure I understand, when I run this command sc.addPyFile("/path/to/yourmodule.py"), I need to be already logged into the master node and have my python files somewhere, is that correct? -- View this message in context: http://apache-spark-user-li

Re: Loading Python libraries into Spark

2014-06-05 Thread Andrei
For third party libraries the simplest way is to use Puppet [1] or Chef [2] or any similar automation tool to install packages (either from PIP [2] or from distribution's repository). It's easy because if you manage your cluster's software you are most probably already using one of these automation