Re: PySpark without PySpark

Sujit Pal Wed, 08 Jul 2015 20:54:12 -0700

Hi Ashish,

>> Nice post.
Agreed, kudos to the author of the post, Benjamin Benfort of District Labs.


>> Following your post, I get this problem;
Again, not my post.

I did try setting up IPython with the Spark profile for the edX Intro to
Spark course (because I didn't want to use the Vagrant container) and it
worked flawlessly with the instructions provided (on OSX). I haven't used
the IPython/PySpark environment beyond very basic tasks since then though,
because my employer has a Databricks license which we were already using
for other stuff and we ended up doing the labs on Databricks.

Looking at your screenshot though, I don't see why you think its picking up
the default profile. One simple way of checking to see if things are
working is to open a new notebook and try this sequence of commands:

from pyspark import SparkContext
sc = SparkContext("local", "pyspark")
sc

You should see something like this after a little while:
<pyspark.context.SparkContext at 0x1093c9b10>

While the context is being instantiated, you should also see lots of log
lines scroll by on the terminal where you started the "ipython notebook
--profile spark" command - these log lines are from Spark.

Hope this helps,
Sujit


On Wed, Jul 8, 2015 at 6:04 PM, Ashish Dutt <ashish.du...@gmail.com> wrote:

> Hi Sujit,
> Nice post.. Exactly what I had been looking for.
> I am relatively a beginner with Spark and real time data processing.
> We have a server with CDH5.4 with 4 nodes. The spark version in our server
> is 1.3.0
> On my laptop I have spark 1.3.0 too and its using Windows 7 environment.
> As per point 5 of your post I am able to invoke pyspark locally as in a
> standalone mode.
>
> Following your post, I get this problem;
>
> 1. In section "Using Ipython notebook with spark" I cannot understand why
> it is picking up the default profile and not the pyspark profile. I am sure
> it is because of the path variables. Attached is the screenshot. Can you
> suggest how to solve this.
>
> Current the path variables for my laptop are like
> SPARK_HOME="C:\SPARK-1.3.0\BIN", JAVA_HOME="C:\PROGRAM
> FILES\JAVA\JDK1.7.0_79", HADOOP_HOME="D:\WINUTILS", M2_HOME="D:\MAVEN\BIN",
> MAVEN_HOME="D:\MAVEN\BIN", PYTHON_HOME="C:\PYTHON27\", SBT_HOME="C:\SBT\"
>
>
> Sincerely,
> Ashish Dutt
> PhD Candidate
> Department of Information Systems
> University of Malaya, Lembah Pantai,
> 50603 Kuala Lumpur, Malaysia
>
> On Thu, Jul 9, 2015 at 4:56 AM, Sujit Pal <sujitatgt...@gmail.com> wrote:
>
>> You are welcome Davies. Just to clarify, I didn't write the post (not
>> sure if my earlier post gave that impression, apologize if so), although I
>> agree its great :-).
>>
>> -sujit
>>
>>
>> On Wed, Jul 8, 2015 at 10:36 AM, Davies Liu <dav...@databricks.com>
>> wrote:
>>
>>> Great post, thanks for sharing with us!
>>>
>>> On Wed, Jul 8, 2015 at 9:59 AM, Sujit Pal <sujitatgt...@gmail.com>
>>> wrote:
>>> > Hi Julian,
>>> >
>>> > I recently built a Python+Spark application to do search relevance
>>> > analytics. I use spark-submit to submit PySpark jobs to a Spark
>>> cluster on
>>> > EC2 (so I don't use the PySpark shell, hopefully thats what you are
>>> looking
>>> > for). Can't share the code, but the basic approach is covered in this
>>> blog
>>> > post - scroll down to the section "Writing a Spark Application".
>>> >
>>> >
>>> https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python
>>> >
>>> > Hope this helps,
>>> >
>>> > -sujit
>>> >
>>> >
>>> > On Wed, Jul 8, 2015 at 7:46 AM, Julian <julian+sp...@magnetic.com>
>>> wrote:
>>> >>
>>> >> Hey.
>>> >>
>>> >> Is there a resource that has written up what the necessary steps are
>>> for
>>> >> running PySpark without using the PySpark shell?
>>> >>
>>> >> I can reverse engineer (by following the tracebacks and reading the
>>> shell
>>> >> source) what the relevant Java imports needed are, but I would assume
>>> >> someone has attempted this before and just published something I can
>>> >> either
>>> >> follow or install? If not, I have something that pretty much works
>>> and can
>>> >> publish it, but I'm not a heavy Spark user, so there may be some
>>> things
>>> >> I've
>>> >> left out that I haven't hit because of how little of pyspark I'm
>>> playing
>>> >> with.
>>> >>
>>> >> Thanks,
>>> >> Julian
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> View this message in context:
>>> >>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-without-PySpark-tp23719.html
>>> >> Sent from the Apache Spark User List mailing list archive at
>>> Nabble.com.
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> >> For additional commands, e-mail: user-h...@spark.apache.org
>>> >>
>>> >
>>>
>>
>>
>

Re: PySpark without PySpark

Reply via email to