Dear Spark users,

My team is working on a small library that builds on PySpark and is organized 
like PySpark as well -- it has a JVM component (that runs in the Spark driver 
and executor) and a Python component (that runs in the PySpark driver and 
executor processes). What's a good approach for packaging such a library?

Some ideas we've considered:
Package up the JVM component as a Jar and the Python component as a binary egg. 
This is reasonable but it means that there are two separate artifacts that 
people have to manage and keep in sync.
Include Python files in the Jar and add it to the PYTHONPATH. This follows the 
example of the Spark assembly jar, but deviates from the Python community's 
standards.
We'd really appreciate hearing experiences from other people who have built 
libraries on top of PySpark.

Punya

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to