[jira] [Created] (ZEPPELIN-1419) PySpark dependencies support

Semet (JIRA) Thu, 08 Sep 2016 02:42:07 -0700

Semet created ZEPPELIN-1419:
-------------------------------

             Summary: PySpark dependencies support
                 Key: ZEPPELIN-1419
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1419
             Project: Zeppelin
          Issue Type: Improvement
          Components: python-interpreter
            Reporter: Semet



Is it possible to add support for dependencies description on a notebook?

Ideally, one would describes its dependencies on top of the notebook, ie, when 
python developers write in requirements.txt).

PySpark would automatically handle the installation and deployment of it, 
inside a virtualenv or a conda.

This would allow PySpark jobs to be completely independent from each other. If 
one notebook needs a Python library that does not exist on the cluster the 
installation will be done automatically, and with all the transitive 
dependencies automatically downloaded as well from pypi.python.org.
Also, two different jobs might use the same library but in two different 
versions.

I am working on this support for PySpark, with the ticket SPARK-16367 and in 
Toree for Jupyter, with TOREE-337. Let me know what you think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ZEPPELIN-1419) PySpark dependencies support

Reply via email to