I wonder if I am starting iPython notebook incorrectly. The example in my
original email does not work. It looks like stdout is not configured
correctly If I submit it as a python.py file It works correctly

Any idea how I what the problem is?


Thanks

Andy


From:  Andrew Davidson <a...@santacruzintegration.com>
Date:  Tuesday, October 7, 2014 at 4:23 PM
To:  "user@spark.apache.org" <user@spark.apache.org>
Subject:  bug with IPython notebook?

> Hi
> 
> I think I found a bug in the iPython notebook integration. I am not sure how
> to report it
> 
> I am running spark-1.1.0-bin-hadoop2.4 on an AWS ec2 cluster. I start the
> cluster using the launch script provided by spark
> 
> I start iPython notebook on my cluster master as follows and use an ssh tunnel
> to open the notebook in a browser running on my local computer
> 
> ec2-user@ip-172-31-20-107 ~]$ IPYTHON_OPTS="notebook --pylab inline
> --no-browser --port=7000" /root/spark/bin/pyspark
> 
> 
> Bellow is the code my notebook executes
> 
> 
> Bug list:
> 1. Why do I need to create a SparkContext? If I run pyspark interactively The
> context is created automatically for me
> 2. The print statement causes the output to be displayed in the terminal I
> started pyspark, not in the notebooks output
> Any comments or suggestions would be greatly appreciated
> 
> Thanks
> 
> Andy
> 
> 
> import sys
> from operator import add
> 
> from pyspark import SparkContext
> 
> # only stand alone jobs should create a SparkContext
> sc = SparkContext(appName="pyStreamingSparkRDDPipe²)
> 
> data = [1, 2, 3, 4, 5]
> rdd = sc.parallelize(data)
> 
> def echo(data):
>     print "python recieved: %s" % (data) # output winds up in the shell
> console in my cluster (ie. The machine I launched pyspark from)
> 
> rdd.foreach(echo)
> print "we are done"
> 
> 


Reply via email to