Re: PySpark without PySpark

2015-07-10 Thread Sujit Pal
Hi Ashish, Cool. glad it worked out. I have only used Spark clusters on EC2, which I spin up using the spark-ec2 scripts (part of the Spark downloads). So don't have any experience setting up inhouse clusters like you want to do. But I found some documentation here that may be helpful. https://doc

Re: PySpark without PySpark

2015-07-09 Thread Sujit Pal
Hi Ashish, Julian's approach is probably better, but few observations: 1) Your SPARK_HOME should be C:\spark-1.3.0 (not C:\spark-1.3.0\bin). 2) If you have anaconda python installed (I saw that you had set this up in a separate thread, py4j should be part of the package - at least I think so. To

Re: PySpark without PySpark

2015-07-09 Thread Sujit Pal
Hi Ashish, Your 00-pyspark-setup file looks very different from mine (and from the one described in the blog post). Questions: 1) Do you have SPARK_HOME set up in your environment? Because if not, it sets it to None in your code. You should provide the path to your Spark installation. In my case

Re: PySpark without PySpark

2015-07-08 Thread Ashish Dutt
Hi Sujit, Thanks for your response. So i opened a new notebook using the command ipython notebook --profile spark and tried the sequence of commands. i am getting errors. Attached is the screenshot of the same. Also I am attaching the 00-pyspark-setup.py for your reference. Looks like, I have wri

Re: PySpark without PySpark

2015-07-08 Thread Bhupendra Mishra
Very interesting and well organized post. Thanks for sharing On Wed, Jul 8, 2015 at 10:29 PM, Sujit Pal wrote: > Hi Julian, > > I recently built a Python+Spark application to do search relevance > analytics. I use spark-submit to submit PySpark jobs to a Spark cluster on > EC2 (so I don't use th

Re: PySpark without PySpark

2015-07-08 Thread Sujit Pal
Hi Ashish, >> Nice post. Agreed, kudos to the author of the post, Benjamin Benfort of District Labs. >> Following your post, I get this problem; Again, not my post. I did try setting up IPython with the Spark profile for the edX Intro to Spark course (because I didn't want to use the Vagrant con

Re: PySpark without PySpark

2015-07-08 Thread Ashish Dutt
Hi Sujit, Nice post.. Exactly what I had been looking for. I am relatively a beginner with Spark and real time data processing. We have a server with CDH5.4 with 4 nodes. The spark version in our server is 1.3.0 On my laptop I have spark 1.3.0 too and its using Windows 7 environment. As per point 5

Re: PySpark without PySpark

2015-07-08 Thread Sujit Pal
You are welcome Davies. Just to clarify, I didn't write the post (not sure if my earlier post gave that impression, apologize if so), although I agree its great :-). -sujit On Wed, Jul 8, 2015 at 10:36 AM, Davies Liu wrote: > Great post, thanks for sharing with us! > > On Wed, Jul 8, 2015 at 9

Re: PySpark without PySpark

2015-07-08 Thread Davies Liu
Great post, thanks for sharing with us! On Wed, Jul 8, 2015 at 9:59 AM, Sujit Pal wrote: > Hi Julian, > > I recently built a Python+Spark application to do search relevance > analytics. I use spark-submit to submit PySpark jobs to a Spark cluster on > EC2 (so I don't use the PySpark shell, hopefu

Re: PySpark without PySpark

2015-07-08 Thread Sujit Pal
Hi Julian, I recently built a Python+Spark application to do search relevance analytics. I use spark-submit to submit PySpark jobs to a Spark cluster on EC2 (so I don't use the PySpark shell, hopefully thats what you are looking for). Can't share the code, but the basic approach is covered in this