I am trying to use PyCharm for Spark development on a windows 8.1 Machine -
I have installed py4j, added Spark pythin as a content root and have Cygwin
in my path
Also Using intelliJ works for Spark Java code.
When I run a simple word count below I get errors in launching a Spark
local cluster - any one know what I an doing wrong
============ Simple Word Count ========
# Set the path for spark installation
# this is the path where you have built spark using sbt/sbt assembly
os.environ['SPARK_HOME'] = "E:/spark-1.2.0-bin-hadoop2.3/"
# os.environ['SPARK_HOME'] = "/home/jie/d2/spark-0.9.1"
# Append to PYTHONPATH so that pyspark could be found
sys.path.append("E:/spark-1.2.0-bin-hadoop2.3/python")
# sys.path.append("/home/jie/d2/spark-0.9.1/python")
# Now we are ready to import Spark Modules
try:
from pyspark import SparkContext
from pyspark import SparkConf
except ImportError as e:
print ("Error importing Spark Modules", e)
sys.exit(1)
if __name__ =='__main__':
conf=SparkConf()
conf.setMaster("local[*]")
conf.setAppName("word count")
conf.set("spark.executor.memory", "4g")
sc = SparkContext(conf=conf)
file = sc.textFile("E:/sparkhydra/trunk/data/war_and_peace.txt")
counts = file.flatMap(lambda line: line.split(" "))\
.map(lambda word: (word, 1))\
.reduceByKey(lambda a, b: a + b)
items = counts.collect()
=============================================================
I get the following Error
C:\Python27\python.exe "C:\Program Files (x86)\JetBrains\PyCharm
2.6.3\helpers\pydev\pydevd.py" --multiproc --client 127.0.0.1 --port 58178
--file E:/PySparkRunner/SimpleApp.py
pydev debugger: process 14312 is connecting
Connected to pydev debugger (build 121.378)
'cmd' is not recognized as an internal or external command,
operable program or batch file.
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm
2.6.3\helpers\pydev\pydevd.py", line 1457, in <module>
debugger.run(setup['file'], None, None)
File "C:\Program Files (x86)\JetBrains\PyCharm
2.6.3\helpers\pydev\pydevd.py", line 1103, in run
pydev_imports.execfile(file, globals, locals) #execute the script
File "E:/PySparkRunner/SimpleApp.py", line 15, in <module>
sc = SparkContext("local", "Simple App")
File "E:\spark-1.2.0-bin-hadoop2.3\python\pyspark\context.py", line 102,
in __init__
SparkContext._ensure_initialized(self, gateway=gateway)
File "E:\spark-1.2.0-bin-hadoop2.3\python\pyspark\context.py", line 211,
in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "E:\spark-1.2.0-bin-hadoop2.3\python\pyspark\java_gateway.py", line
73, in launch_gateway
raise Exception(error_msg)
Exception: Launching GatewayServer failed with exit code 1!
Warning: Expected GatewayServer to output a port, but found no output.