I am stuck with an issue for last two days and did not find any solution
after several hours of googling. Here is the details.
The following is a simple python code (Temp.py):
import sys
from random import random
from operator import add
from pyspark import SparkContext
from pyspark import SparkConf
if __name__ == "__main__":
master = 'spark://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077'
# sys.argv[1]
conf = SparkConf()
conf.setMaster(master)
conf.setAppName("PythonPi")
conf.set("spark.executor.memory", "2g")
conf.set("spark.cores.max", "10")
conf.setSparkHome("/root/spark")
sc = SparkContext(conf = conf)
slices = 2
n = 100000 * slices
def f(_):
x = random() * 2 - 1
y = random() * 2 - 1
return 1 if x ** 2 + y ** 2 < 1 else 0
count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
print "Pi is roughly %f" % (4.0 * count / n)
sc.stop()
I have spark installed in my local machine and when I deploy the code
locally then it works fine with pyspark (master = 'local[5]' in the above
code ).
Next I installed spark in EC2 where I can create master and a number of
slaves for deploying my code. After I create a master I get its URL which is
in the following format:
spark://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077. However when
I run it using pyspark
./bin/pyspark/Temp.py I get the following warning:
TaskSchedulerImpl: Initial job has not accepted any resources; check your
cluster UI to ensure that workers are registered and have sufficient memory
I have checked from the UI that each worker has 2.7 gb memory and have not
been used. Could you please give me any idea of this error ?
Looking forward to hear from you.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.