Couldn't we have a pip installable "pyspark" package that just serves as a
shim to an existing Spark installation? Or it could even download the
latest Spark binary if SPARK_HOME isn't set during installation. Right now,
Spark doesn't play very well with the usual Python ecosystem. For example,
why do I need to use a strange incantation when booting up IPython if I
want to use PySpark in a notebook with MASTER="local[4]"? It would be much
nicer to just type `from pyspark import SparkContext; sc =
SparkContext("local[4]")` in my notebook.

I did a test and it seems like PySpark's basic unit-tests do pass when
SPARK_HOME is set and Py4J is on the PYTHONPATH:


PYTHONPATH=$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
python $SPARK_HOME/python/pyspark/rdd.py

-Jey


On Fri, Jun 5, 2015 at 10:57 AM, Josh Rosen <rosenvi...@gmail.com> wrote:

> This has been proposed before:
> https://issues.apache.org/jira/browse/SPARK-1267
>
> There's currently tighter coupling between the Python and Java halves of
> PySpark than just requiring SPARK_HOME to be set; if we did this, I bet
> we'd run into tons of issues when users try to run a newer version of the
> Python half of PySpark against an older set of Java components or
> vice-versa.
>
> On Thu, Jun 4, 2015 at 10:45 PM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Hi everyone,
>> Considering the python API as just a front needing the SPARK_HOME defined
>> anyway, I think it would be interesting to deploy the Python part of Spark
>> on PyPi in order to handle the dependencies in a Python project needing
>> PySpark via pip.
>>
>> For now I just symlink the python/pyspark in my python install dir
>> site-packages/ in order for PyCharm or other lint tools to work properly.
>> I can do the setup.py work or anything.
>>
>> What do you think ?
>>
>> Regards,
>>
>> Olivier.
>>
>
>

Reply via email to