I wonder if I am starting iPython notebook incorrectly. The example in my original email does not work. It looks like stdout is not configured correctly If I submit it as a python.py file It works correctly
Any idea how I what the problem is? Thanks Andy From: Andrew Davidson <a...@santacruzintegration.com> Date: Tuesday, October 7, 2014 at 4:23 PM To: "user@spark.apache.org" <user@spark.apache.org> Subject: bug with IPython notebook? > Hi > > I think I found a bug in the iPython notebook integration. I am not sure how > to report it > > I am running spark-1.1.0-bin-hadoop2.4 on an AWS ec2 cluster. I start the > cluster using the launch script provided by spark > > I start iPython notebook on my cluster master as follows and use an ssh tunnel > to open the notebook in a browser running on my local computer > > ec2-user@ip-172-31-20-107 ~]$ IPYTHON_OPTS="notebook --pylab inline > --no-browser --port=7000" /root/spark/bin/pyspark > > > Bellow is the code my notebook executes > > > Bug list: > 1. Why do I need to create a SparkContext? If I run pyspark interactively The > context is created automatically for me > 2. The print statement causes the output to be displayed in the terminal I > started pyspark, not in the notebooks output > Any comments or suggestions would be greatly appreciated > > Thanks > > Andy > > > import sys > from operator import add > > from pyspark import SparkContext > > # only stand alone jobs should create a SparkContext > sc = SparkContext(appName="pyStreamingSparkRDDPipe²) > > data = [1, 2, 3, 4, 5] > rdd = sc.parallelize(data) > > def echo(data): > print "python recieved: %s" % (data) # output winds up in the shell > console in my cluster (ie. The machine I launched pyspark from) > > rdd.foreach(echo) > print "we are done" > >