Deploying a python code on a spark EC2 cluster

Shubhabrata Thu, 24 Apr 2014 06:47:31 -0700

I am stuck with an issue for last two days and did not find any solution
after several hours of googling. Here is the details.


The following is a simple python code (Temp.py):

import sys
from random import random
from operator import add

from pyspark import SparkContext
from pyspark import SparkConf

if __name__ == "__main__":

    master = 'spark://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077'
# sys.argv[1]
    conf = SparkConf()
    conf.setMaster(master)
    conf.setAppName("PythonPi")
    conf.set("spark.executor.memory", "2g")
    conf.set("spark.cores.max", "10")
    conf.setSparkHome("/root/spark")

    sc = SparkContext(conf = conf)

    slices = 2
    n = 100000 * slices
    def f(_):
        x = random() * 2 - 1
        y = random() * 2 - 1
        return 1 if x ** 2 + y ** 2 < 1 else 0
    count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
    print "Pi is roughly %f" % (4.0 * count / n)

    sc.stop()

I have spark installed in my local machine and when I deploy the code
locally then it works fine with pyspark (master = 'local[5]' in the above
code ).

Next I installed spark in EC2 where I can create master and a number of
slaves for deploying my code. After I create a master I get its URL which is
in the following format:
spark://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077. However when
I run it using pyspark
./bin/pyspark/Temp.py I get the following warning:
 TaskSchedulerImpl: Initial job has not accepted any resources; check your
cluster UI to ensure that workers are registered and have sufficient memory

I have checked from the UI that each worker has 2.7 gb memory and have not
been used. Could you please give me any idea of this error ?

Looking forward to hear from you.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Deploying a python code on a spark EC2 cluster

Reply via email to