Refer this post
http://blog.prabeeshk.com/blog/2015/04/07/self-contained-pyspark-application/

On 13 April 2015 at 17:41, Punya Biswal <pbis...@palantir.com> wrote:

> Dear Spark users,
>
> My team is working on a small library that builds on PySpark and is
> organized like PySpark as well -- it has a JVM component (that runs in the
> Spark driver and executor) and a Python component (that runs in the PySpark
> driver and executor processes). What's a good approach for packaging such a
> library?
>
> Some ideas we've considered:
>
>    - Package up the JVM component as a Jar and the Python component as a
>    binary egg. This is reasonable but it means that there are two separate
>    artifacts that people have to manage and keep in sync.
>    - Include Python files in the Jar and add it to the PYTHONPATH. This
>    follows the example of the Spark assembly jar, but deviates from the Python
>    community's standards.
>
> We'd really appreciate hearing experiences from other people who have
> built libraries on top of PySpark.
>
> Punya
>

Reply via email to