Re: pyspark with jar

Jeff Steinmetz Thu, 29 Oct 2015 22:44:05 -0700

I also saw an example you posted regarding %dep and python
This example 

%dep
z.load("org.apache.spark:spark-streaming-kafka_2.10:1.5.1”)


works even if you remove the %dep.

    from pyspark.streaming.kafka import KafkaUtils

This import will always resolve – likely because it is part of the spark 
assembly already.

Give it a try – reset the interpreter, and just run (with no z.load(…):


%pyspark
from pyspark.streaming.kafka import KafkaUtils
from pyspark.streaming import StreamingContext

So – still looking for a real world example of an external dependency loaded in 
%dep that is demonstrates best practice around %pyspark dependency loading.
I’ll stay tuned – and continue to dig around a bit.
Next step is to start over and try a no frills basic install with z-manager

Jeff

From:  moon soo Lee
Reply-To:  <[email protected]>
Date:  Thursday, October 29, 2015 at 8:00 PM
To:  <[email protected]>
Subject:  Re: pyspark with jar

Hi,

Thanks for the question.

Actually, %pyspark runs in the same JVM process that %spark runs. And it shares 
a single SparkContext instance. (although %pyspark runs additional python 
process)
Libraries loaded from %dep should be available in %pyspark, too.

interpreter property 'spark.home' is little bit confusing with SPARK_HOME.
At the moment, defining SPARK_HOME in conf/zeppelin-env.sh is recommended 
instead of spark.home.

Best,
moon

On Fri, Oct 30, 2015 at 2:44 AM Jeff Steinmetz <[email protected]> 
wrote:
That’s a good pointer.
Question still stands, how do you load libraries (jars) for %pyspark?

Its clear how to do it for %spark (scala) via %dep.

Looking for the equivalent of:

./bin/pyspark --master local[2] --jars jars/elasticsearch-hadoop-2.1.0.Beta2.jar


From:  Matt Sochor
Reply-To:  <[email protected]>
Date:  Thursday, October 29, 2015 at 3:19 PM
To:  <[email protected]>
Subject:  Re: pyspark with jar

I actually *just* figured it out.  Zeppelin has sqlContext "already created and 
exposed" (https://zeppelin.incubator.apache.org/docs/interpreter/spark.html).

So when I do "sqlContext = SQLContext(sc)" I overwrite sqlContext.  Then 
Zeppelin cannot see this new sqlContext.

Anyway, anyone out there experiencing this problem, do NOT initialize 
sqlContext and it works fine.  

On Thu, Oct 29, 2015 at 6:10 PM Jeff Steinmetz <[email protected]> 
wrote:
In zeppelin, what is the equivalent to adding jars in a pyspark call?

Such as running pyspark with the elasticsearch-hadoop jar

./bin/pyspark --master local[2] --jars jars/elasticsearch-hadoop-2.1.0.Beta2.jar

My assumption is that loading something like this inside a %dep is pointless, 
since those dependencies would only live in the %spark scala world (the spark 
jvm).  In zeppelin - pyspark spawns a separate process.

Also how is the interpreters “spark.home” used?  How is it different that the  
“SPARK_HOME” zeppelin-env.sh
And finally – how are args used in the interpreter?  (what uses them)?

Thank you.
Jeff
-- 
Best regards,

Matt Sochor
Data Scientist
Mobile Defense

Mobile +1 215 307 7768


This email and any of its attachments may contain Mobile Defense Inc. 
proprietary information, which is privileged, confidential, or subject to 
copyright belonging to Mobile Defense Inc. This email is intended solely for 
the use of the individuals or entities to which it is addressed by Mobile 
Defense Inc. If you are not the intended recipient of this email, you are 
hereby notified that any dissemination, distribution, copying, or action taken 
in relation to the contents of and attachments to this email is strictly 
prohibited and may be unlawful. If you have received this email in error, 
please notify the sender immediately and permanently delete the original and 
any copy of this email and any printout.

Re: pyspark with jar

Reply via email to