Re: PySpark on PyPi

2015-08-20 Thread westurner
e the spark distribution provided >>>> >> >>>> >>>>>> pyspark, then the >>>> >> >>>> >>>>>> user would be alerted immediately. >>>> >> >>>> >>>>>> >&g

Re: PySpark on PyPi

2015-08-20 Thread Justin Uang
One other question: Do we have consensus on publishing the pip-installable source distribution to PyPI? If so, is that something that the maintainers need to add to the process that they use to publish releases? On Thu, Aug 20, 2015 at 5:44 PM Justin Uang wrote: > I would prefer to just do it wi

Re: PySpark on PyPi

2015-08-20 Thread Justin Uang
I would prefer to just do it without the jar first as well. My hunch is that to run spark the way it is intended, we need the wrapper scripts, like spark-submit. Does anyone know authoritatively if that is the case? On Thu, Aug 20, 2015 at 4:54 PM Olivier Girardot < o.girar...@lateral-thoughts.com

Re: PySpark on PyPi

2015-08-20 Thread Brian Granger
I would start with just the plain python package without the JAR and then see if it makes sense to add the JAR over time. On Thu, Aug 20, 2015 at 12:27 PM, Auberon Lopez wrote: > Hi all, > > I wanted to bubble up a conversation from the PR to this discussion to see > if there is support the idea

Re: PySpark on PyPi

2015-08-20 Thread Brian Granger
Auberon, can you also post this to the Jupyter Google Group? On Wed, Aug 19, 2015 at 12:23 PM, Auberon Lopez wrote: > Hi all, > > I've created an updated PR for this based off of the previous work of > @prabinb: > https://github.com/apache/spark/pull/8318 > > I am not very familiar with python pa

Re: PySpark on PyPi

2015-08-12 Thread quasiben
The above should work for OSX/Linux-64 and py27/py34 --Ben -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/PySpark-on-PyPi-tp12626p13659.html Sent from the Apache Spark Developers List mailing list archive at

Re: PySpark on PyPi

2015-08-11 Thread westurner
;>>>> nothing's done yet it will (I guess ?). >>> >>>>>>> >>> >>>>>>> For step 3, maybe just adding a setup.py to the distribution >>> would >>> be >>> >>>>>>> enough, I'

Re: PySpark on PyPi

2015-08-11 Thread westurner
t;> >>>>>>> Regards, >> >>>>>>> >> >>>>>>> Olivier. >> >>>>>>> >> >>>>>>> Le ven. 5 juin 2015 à 22:12, Jey Kottalam < > jey@.berkeley > > a >> écrit >&g

Re: PySpark on PyPi

2015-08-10 Thread Matt Goodman
I would tentatively suggest also conda packaging. http://conda.pydata.org/docs/ --Matthew Goodman = Check Out My Website: http://craneium.net Find me on LinkedIn: http://tinyurl.com/d6wlch On Mon, Aug 10, 2015 at 11:23 AM, Davies Liu wrote: > I think so, any contributions

Re: PySpark on PyPi

2015-08-10 Thread Davies Liu
I think so, any contributions on this are welcome. On Mon, Aug 10, 2015 at 11:03 AM, Brian Granger wrote: > Sorry, trying to follow the context here. Does it look like there is > support for the idea of creating a setup.py file and pypi package for > pyspark? > > Cheers, > > Brian > > On Thu, Aug

Re: PySpark on PyPi

2015-08-06 Thread Davies Liu
We could do that after 1.5 released, it will have same release cycle as Spark in the future. On Tue, Jul 28, 2015 at 5:52 AM, Olivier Girardot wrote: > +1 (once again :) ) > > 2015-07-28 14:51 GMT+02:00 Justin Uang : >> >> // ping >> >> do we have any signoff from the pyspark devs to submit a PR

Re: PySpark on PyPi

2015-07-28 Thread Justin Uang
// ping do we have any signoff from the pyspark devs to submit a PR to publish to PyPI? On Fri, Jul 24, 2015 at 10:50 PM Jeremy Freeman wrote: > Hey all, great discussion, just wanted to +1 that I see a lot of value in > steps that make it easier to use PySpark as an ordinary python library. >

Re: PySpark on PyPi

2015-07-24 Thread Jeremy Freeman
Hey all, great discussion, just wanted to +1 that I see a lot of value in steps that make it easier to use PySpark as an ordinary python library. You might want to check out this (https://github.com/minrk/findspark ), started by Jupyter project devs, that off

Re: PySpark on PyPi

2015-07-22 Thread Punyashloka Biswal
I agree with everything Justin just said. An additional advantage of publishing PySpark's Python code in a standards-compliant way is the fact that we'll be able to declare transitive dependencies (Pandas, Py4J) in a way that pip can use. Contrast this with the current situation, where df.toPandas(

Re: PySpark on PyPi

2015-07-22 Thread Justin Uang
// + *Davies* for his comments // + Punya for SA For development and CI, like Olivier mentioned, I think it would be hugely beneficial to publish pyspark (only code in the python/ dir) on PyPI. If anyone wants to develop against PySpark APIs, they need to download the distribution and do a lot of

Re: PySpark on PyPi

2015-06-05 Thread Olivier Girardot
Ok, I get it. Now what can we do to improve the current situation, because right now if I want to set-up a CI env for PySpark, I have to : 1- download a pre-built version of pyspark and unzip it somewhere on every agent 2- define the SPARK_HOME env 3- symlink this distribution pyspark dir inside th

Re: PySpark on PyPi

2015-06-05 Thread Jey Kottalam
Couldn't we have a pip installable "pyspark" package that just serves as a shim to an existing Spark installation? Or it could even download the latest Spark binary if SPARK_HOME isn't set during installation. Right now, Spark doesn't play very well with the usual Python ecosystem. For example, why

Re: PySpark on PyPi

2015-06-05 Thread Josh Rosen
This has been proposed before: https://issues.apache.org/jira/browse/SPARK-1267 There's currently tighter coupling between the Python and Java halves of PySpark than just requiring SPARK_HOME to be set; if we did this, I bet we'd run into tons of issues when users try to run a newer version of the

PySpark on PyPi

2015-06-04 Thread Olivier Girardot
Hi everyone, Considering the python API as just a front needing the SPARK_HOME defined anyway, I think it would be interesting to deploy the Python part of Spark on PyPi in order to handle the dependencies in a Python project needing PySpark via pip. For now I just symlink the python/pyspark in my