This has been proposed before: https://issues.apache.org/jira/browse/SPARK-1267
There's currently tighter coupling between the Python and Java halves of PySpark than just requiring SPARK_HOME to be set; if we did this, I bet we'd run into tons of issues when users try to run a newer version of the Python half of PySpark against an older set of Java components or vice-versa. On Thu, Jun 4, 2015 at 10:45 PM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > Considering the python API as just a front needing the SPARK_HOME defined > anyway, I think it would be interesting to deploy the Python part of Spark > on PyPi in order to handle the dependencies in a Python project needing > PySpark via pip. > > For now I just symlink the python/pyspark in my python install dir > site-packages/ in order for PyCharm or other lint tools to work properly. > I can do the setup.py work or anything. > > What do you think ? > > Regards, > > Olivier. >