I am trying to use PyCharm for Spark development on a windows 8.1 Machine -
I have installed py4j, added Spark pythin as a content root and have Cygwin
in my path
Also Using intelliJ works for Spark Java code.
When I run a simple word count below I get errors in launching a Spark
local cluster - any one know what I an doing wrong

============ Simple Word Count ========
# Set the path for spark installation
# this is the path where you have built spark using sbt/sbt assembly
os.environ['SPARK_HOME'] = "E:/spark-1.2.0-bin-hadoop2.3/"
# os.environ['SPARK_HOME'] = "/home/jie/d2/spark-0.9.1"
# Append to PYTHONPATH so that pyspark could be found
sys.path.append("E:/spark-1.2.0-bin-hadoop2.3/python")
# sys.path.append("/home/jie/d2/spark-0.9.1/python")

# Now we are ready to import Spark Modules
try:
    from pyspark import SparkContext
    from pyspark import SparkConf

except ImportError as e:
    print ("Error importing Spark Modules", e)
    sys.exit(1)

if __name__ =='__main__':
    conf=SparkConf()
    conf.setMaster("local[*]")
    conf.setAppName("word count")
    conf.set("spark.executor.memory", "4g")
    sc = SparkContext(conf=conf)

    file = sc.textFile("E:/sparkhydra/trunk/data/war_and_peace.txt")
    counts = file.flatMap(lambda line: line.split(" "))\
    .map(lambda word: (word, 1))\
    .reduceByKey(lambda a, b: a + b)
    items = counts.collect()

=============================================================
I get the following Error

C:\Python27\python.exe "C:\Program Files (x86)\JetBrains\PyCharm
2.6.3\helpers\pydev\pydevd.py" --multiproc --client 127.0.0.1 --port 58178
--file E:/PySparkRunner/SimpleApp.py
pydev debugger: process 14312 is connecting
Connected to pydev debugger (build 121.378)
'cmd' is not recognized as an internal or external command,
operable program or batch file.
Traceback (most recent call last):
  File "C:\Program Files (x86)\JetBrains\PyCharm
2.6.3\helpers\pydev\pydevd.py", line 1457, in <module>
    debugger.run(setup['file'], None, None)
  File "C:\Program Files (x86)\JetBrains\PyCharm
2.6.3\helpers\pydev\pydevd.py", line 1103, in run
    pydev_imports.execfile(file, globals, locals) #execute the script
  File "E:/PySparkRunner/SimpleApp.py", line 15, in <module>
    sc = SparkContext("local", "Simple App")
  File "E:\spark-1.2.0-bin-hadoop2.3\python\pyspark\context.py", line 102,
in __init__
    SparkContext._ensure_initialized(self, gateway=gateway)
  File "E:\spark-1.2.0-bin-hadoop2.3\python\pyspark\context.py", line 211,
in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "E:\spark-1.2.0-bin-hadoop2.3\python\pyspark\java_gateway.py", line
73, in launch_gateway
    raise Exception(error_msg)
Exception: Launching GatewayServer failed with exit code 1!
Warning: Expected GatewayServer to output a port, but found no output.

Reply via email to