I am trying to use PyCharm for Spark development on a windows 8.1 Machine - I have installed py4j, added Spark pythin as a content root and have Cygwin in my path Also Using intelliJ works for Spark Java code. When I run a simple word count below I get errors in launching a Spark local cluster - any one know what I an doing wrong
============ Simple Word Count ======== # Set the path for spark installation # this is the path where you have built spark using sbt/sbt assembly os.environ['SPARK_HOME'] = "E:/spark-1.2.0-bin-hadoop2.3/" # os.environ['SPARK_HOME'] = "/home/jie/d2/spark-0.9.1" # Append to PYTHONPATH so that pyspark could be found sys.path.append("E:/spark-1.2.0-bin-hadoop2.3/python") # sys.path.append("/home/jie/d2/spark-0.9.1/python") # Now we are ready to import Spark Modules try: from pyspark import SparkContext from pyspark import SparkConf except ImportError as e: print ("Error importing Spark Modules", e) sys.exit(1) if __name__ =='__main__': conf=SparkConf() conf.setMaster("local[*]") conf.setAppName("word count") conf.set("spark.executor.memory", "4g") sc = SparkContext(conf=conf) file = sc.textFile("E:/sparkhydra/trunk/data/war_and_peace.txt") counts = file.flatMap(lambda line: line.split(" "))\ .map(lambda word: (word, 1))\ .reduceByKey(lambda a, b: a + b) items = counts.collect() ============================================================= I get the following Error C:\Python27\python.exe "C:\Program Files (x86)\JetBrains\PyCharm 2.6.3\helpers\pydev\pydevd.py" --multiproc --client 127.0.0.1 --port 58178 --file E:/PySparkRunner/SimpleApp.py pydev debugger: process 14312 is connecting Connected to pydev debugger (build 121.378) 'cmd' is not recognized as an internal or external command, operable program or batch file. Traceback (most recent call last): File "C:\Program Files (x86)\JetBrains\PyCharm 2.6.3\helpers\pydev\pydevd.py", line 1457, in <module> debugger.run(setup['file'], None, None) File "C:\Program Files (x86)\JetBrains\PyCharm 2.6.3\helpers\pydev\pydevd.py", line 1103, in run pydev_imports.execfile(file, globals, locals) #execute the script File "E:/PySparkRunner/SimpleApp.py", line 15, in <module> sc = SparkContext("local", "Simple App") File "E:\spark-1.2.0-bin-hadoop2.3\python\pyspark\context.py", line 102, in __init__ SparkContext._ensure_initialized(self, gateway=gateway) File "E:\spark-1.2.0-bin-hadoop2.3\python\pyspark\context.py", line 211, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File "E:\spark-1.2.0-bin-hadoop2.3\python\pyspark\java_gateway.py", line 73, in launch_gateway raise Exception(error_msg) Exception: Launching GatewayServer failed with exit code 1! Warning: Expected GatewayServer to output a port, but found no output.