You can basically add one function call to install the stuffs you want. If
you look at the spark-ec2 script, there's a function which does all the
setup named: setup_cluster(..)
<https://github.com/apache/spark/blob/master/ec2/spark_ec2.py#L625>. Now,
if you want to install a python library ( assuming pip is already
installed), you can add one more line in the above function like:

ssh(master, opts, "pip install pandas")

This will install it on the master node, you have slave_nodes variable
which has all info of slave machines
​. You can iterate through it and do the same.​


Thanks
Best Regards

On Sun, Feb 8, 2015 at 2:16 PM, Chengi Liu <chengi.liu...@gmail.com> wrote:

> Hi,
>   I want to install couple of python libraries (pip install
> python_library) which I want to use on pyspark cluster which are developed
> using the ec2 scripts.
> Is there a way to specify these libraries when I am building those ec2
> clusters?
> Whats the best way to install these libraries on each ec2 node?
> Thanks
>

Reply via email to