Refer this post http://blog.prabeeshk.com/blog/2015/04/07/self-contained-pyspark-application/
On 13 April 2015 at 17:41, Punya Biswal <pbis...@palantir.com> wrote: > Dear Spark users, > > My team is working on a small library that builds on PySpark and is > organized like PySpark as well -- it has a JVM component (that runs in the > Spark driver and executor) and a Python component (that runs in the PySpark > driver and executor processes). What's a good approach for packaging such a > library? > > Some ideas we've considered: > > - Package up the JVM component as a Jar and the Python component as a > binary egg. This is reasonable but it means that there are two separate > artifacts that people have to manage and keep in sync. > - Include Python files in the Jar and add it to the PYTHONPATH. This > follows the example of the Spark assembly jar, but deviates from the Python > community's standards. > > We'd really appreciate hearing experiences from other people who have > built libraries on top of PySpark. > > Punya >