Hi Sujit, Nice post.. Exactly what I had been looking for. I am relatively a beginner with Spark and real time data processing. We have a server with CDH5.4 with 4 nodes. The spark version in our server is 1.3.0 On my laptop I have spark 1.3.0 too and its using Windows 7 environment. As per point 5 of your post I am able to invoke pyspark locally as in a standalone mode.
Following your post, I get this problem; 1. In section "Using Ipython notebook with spark" I cannot understand why it is picking up the default profile and not the pyspark profile. I am sure it is because of the path variables. Attached is the screenshot. Can you suggest how to solve this. Current the path variables for my laptop are like SPARK_HOME="C:\SPARK-1.3.0\BIN", JAVA_HOME="C:\PROGRAM FILES\JAVA\JDK1.7.0_79", HADOOP_HOME="D:\WINUTILS", M2_HOME="D:\MAVEN\BIN", MAVEN_HOME="D:\MAVEN\BIN", PYTHON_HOME="C:\PYTHON27\", SBT_HOME="C:\SBT\" Sincerely, Ashish Dutt PhD Candidate Department of Information Systems University of Malaya, Lembah Pantai, 50603 Kuala Lumpur, Malaysia On Thu, Jul 9, 2015 at 4:56 AM, Sujit Pal <sujitatgt...@gmail.com> wrote: > You are welcome Davies. Just to clarify, I didn't write the post (not sure > if my earlier post gave that impression, apologize if so), although I agree > its great :-). > > -sujit > > > On Wed, Jul 8, 2015 at 10:36 AM, Davies Liu <dav...@databricks.com> wrote: > >> Great post, thanks for sharing with us! >> >> On Wed, Jul 8, 2015 at 9:59 AM, Sujit Pal <sujitatgt...@gmail.com> wrote: >> > Hi Julian, >> > >> > I recently built a Python+Spark application to do search relevance >> > analytics. I use spark-submit to submit PySpark jobs to a Spark cluster >> on >> > EC2 (so I don't use the PySpark shell, hopefully thats what you are >> looking >> > for). Can't share the code, but the basic approach is covered in this >> blog >> > post - scroll down to the section "Writing a Spark Application". >> > >> > >> https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python >> > >> > Hope this helps, >> > >> > -sujit >> > >> > >> > On Wed, Jul 8, 2015 at 7:46 AM, Julian <julian+sp...@magnetic.com> >> wrote: >> >> >> >> Hey. >> >> >> >> Is there a resource that has written up what the necessary steps are >> for >> >> running PySpark without using the PySpark shell? >> >> >> >> I can reverse engineer (by following the tracebacks and reading the >> shell >> >> source) what the relevant Java imports needed are, but I would assume >> >> someone has attempted this before and just published something I can >> >> either >> >> follow or install? If not, I have something that pretty much works and >> can >> >> publish it, but I'm not a heavy Spark user, so there may be some things >> >> I've >> >> left out that I haven't hit because of how little of pyspark I'm >> playing >> >> with. >> >> >> >> Thanks, >> >> Julian >> >> >> >> >> >> >> >> -- >> >> View this message in context: >> >> >> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-without-PySpark-tp23719.html >> >> Sent from the Apache Spark User List mailing list archive at >> Nabble.com. >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> > >> > >
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org