I also saw an example you posted regarding %dep and python
This example
%dep
z.load("org.apache.spark:spark-streaming-kafka_2.10:1.5.1”)
works even if you remove the %dep.
from pyspark.streaming.kafka import KafkaUtils
This import will always resolve – likely because it is part of the spark
assembly already.
Give it a try – reset the interpreter, and just run (with no z.load(…):
%pyspark
from pyspark.streaming.kafka import KafkaUtils
from pyspark.streaming import StreamingContext
So – still looking for a real world example of an external dependency loaded in
%dep that is demonstrates best practice around %pyspark dependency loading.
I’ll stay tuned – and continue to dig around a bit.
Next step is to start over and try a no frills basic install with z-manager
Jeff
From: moon soo Lee
Reply-To: <[email protected]>
Date: Thursday, October 29, 2015 at 8:00 PM
To: <[email protected]>
Subject: Re: pyspark with jar
Hi,
Thanks for the question.
Actually, %pyspark runs in the same JVM process that %spark runs. And it shares
a single SparkContext instance. (although %pyspark runs additional python
process)
Libraries loaded from %dep should be available in %pyspark, too.
interpreter property 'spark.home' is little bit confusing with SPARK_HOME.
At the moment, defining SPARK_HOME in conf/zeppelin-env.sh is recommended
instead of spark.home.
Best,
moon
On Fri, Oct 30, 2015 at 2:44 AM Jeff Steinmetz <[email protected]>
wrote:
That’s a good pointer.
Question still stands, how do you load libraries (jars) for %pyspark?
Its clear how to do it for %spark (scala) via %dep.
Looking for the equivalent of:
./bin/pyspark --master local[2] --jars jars/elasticsearch-hadoop-2.1.0.Beta2.jar
From: Matt Sochor
Reply-To: <[email protected]>
Date: Thursday, October 29, 2015 at 3:19 PM
To: <[email protected]>
Subject: Re: pyspark with jar
I actually *just* figured it out. Zeppelin has sqlContext "already created and
exposed" (https://zeppelin.incubator.apache.org/docs/interpreter/spark.html).
So when I do "sqlContext = SQLContext(sc)" I overwrite sqlContext. Then
Zeppelin cannot see this new sqlContext.
Anyway, anyone out there experiencing this problem, do NOT initialize
sqlContext and it works fine.
On Thu, Oct 29, 2015 at 6:10 PM Jeff Steinmetz <[email protected]>
wrote:
In zeppelin, what is the equivalent to adding jars in a pyspark call?
Such as running pyspark with the elasticsearch-hadoop jar
./bin/pyspark --master local[2] --jars jars/elasticsearch-hadoop-2.1.0.Beta2.jar
My assumption is that loading something like this inside a %dep is pointless,
since those dependencies would only live in the %spark scala world (the spark
jvm). In zeppelin - pyspark spawns a separate process.
Also how is the interpreters “spark.home” used? How is it different that the
“SPARK_HOME” zeppelin-env.sh
And finally – how are args used in the interpreter? (what uses them)?
Thank you.
Jeff
--
Best regards,
Matt Sochor
Data Scientist
Mobile Defense
Mobile +1 215 307 7768
This email and any of its attachments may contain Mobile Defense Inc.
proprietary information, which is privileged, confidential, or subject to
copyright belonging to Mobile Defense Inc. This email is intended solely for
the use of the individuals or entities to which it is addressed by Mobile
Defense Inc. If you are not the intended recipient of this email, you are
hereby notified that any dissemination, distribution, copying, or action taken
in relation to the contents of and attachments to this email is strictly
prohibited and may be unlawful. If you have received this email in error,
please notify the sender immediately and permanently delete the original and
any copy of this email and any printout.