hey evil admin:)
i think the bit about java was from me?
if so, i meant to indicate that the reality for us is java is 1.7 on most
(all?) clusters. i do not believe spark prefers java 1.8. my point was that
even although java 1.7 is getting old as well it would be a major issue for
me if spark dropped java 1.7 support.

On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken <carli...@janelia.hhmi.org>
wrote:

> As one of the evil administrators that runs a RHEL 6 cluster, we already
> provide quite a few different version of python on our cluster pretty darn
> easily. All you need is a separate install directory and to set the
> PYTHON_HOME environment variable to point to the correct python, then have
> the users make sure the correct python is in their PATH. I understand that
> other administrators may not be so compliant.
>
> Saw a small bit about the java version in there; does Spark currently
> prefer Java 1.8.x?
>
> —Ken
>
> On Jan 5, 2016, at 6:08 PM, Josh Rosen <joshro...@databricks.com> wrote:
>
> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
>> while continuing to use a vanilla `python` executable on the executors
>
>
> Whoops, just to be clear, this should actually read "while continuing to
> use a vanilla `python` 2.7 executable".
>
> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen <joshro...@databricks.com>
> wrote:
>
>> Yep, the driver and executors need to have compatible Python versions. I
>> think that there are some bytecode-level incompatibilities between 2.6 and
>> 2.7 which would impact the deserialization of Python closures, so I think
>> you need to be running the same 2.x version for all communicating Spark
>> processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
>> driver while continuing to use a vanilla `python` executable on the
>> executors (we have environment variables which allow you to control these
>> separately).
>>
>> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> I think all the slaves need the same (or a compatible) version of Python
>>> installed since they run Python code in PySpark jobs natively.
>>>
>>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>>> interesting i didnt know that!
>>>>
>>>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>>>> nicholas.cham...@gmail.com> wrote:
>>>>
>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>> the app we can not ship it with our software because its gpl licensed
>>>>>
>>>>> Not to nitpick, but maybe this is important. The Python license is 
>>>>> GPL-compatible
>>>>> but not GPL <https://docs.python.org/3/license.html>:
>>>>>
>>>>> Note GPL-compatible doesn’t mean that we’re distributing Python under
>>>>> the GPL. All Python licenses, unlike the GPL, let you distribute a 
>>>>> modified
>>>>> version without making your changes open source. The GPL-compatible
>>>>> licenses make it possible to combine Python with other software that is
>>>>> released under the GPL; the others don’t.
>>>>>
>>>>> Nick
>>>>> ​
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com>
>>>>> wrote:
>>>>>
>>>>>> i do not think so.
>>>>>>
>>>>>> does the python 2.7 need to be installed on all slaves? if so, we do
>>>>>> not have direct access to those.
>>>>>>
>>>>>> also, spark is easy for us to ship with our software since its apache
>>>>>> 2 licensed, and it only needs to be present on the machine that launches
>>>>>> the app (thanks to yarn).
>>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>>> the app we can not ship it with our software because its gpl licensed, so
>>>>>> the client would have to download it and install it themselves, and this
>>>>>> would mean its an independent install which has to be audited and 
>>>>>> approved
>>>>>> and now you are in for a lot of fun. basically it will never happen.
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <joshro...@databricks.com>
>>>>>> wrote:
>>>>>>
>>>>>>> If users are able to install Spark 2.0 on their RHEL clusters, then
>>>>>>> I imagine that they're also capable of installing a standalone Python
>>>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>>>> require any special permissions to install (you don't need root / sudo
>>>>>>> access). Does this address the Python versioning concerns for RHEL 
>>>>>>> users?
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> yeah, the practical concern is that we have no control over java or
>>>>>>>> python version on large company clusters. our current reality for the 
>>>>>>>> vast
>>>>>>>> majority of them is java 7 and python 2.6, no matter how outdated that 
>>>>>>>> is.
>>>>>>>>
>>>>>>>> i dont like it either, but i cannot change it.
>>>>>>>>
>>>>>>>> we currently don't use pyspark so i have no stake in this, but if
>>>>>>>> we did i can assure you we would not upgrade to spark 2.x if python 
>>>>>>>> 2.6 was
>>>>>>>> dropped. no point in developing something that doesnt run for majority 
>>>>>>>> of
>>>>>>>> customers.
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>>>>>>> until 2020. So I'm assuming these large companies will have the 
>>>>>>>>> option of
>>>>>>>>> riding out Python 2.6 until then.
>>>>>>>>>
>>>>>>>>> Are we seriously saying that Spark should likewise support Python
>>>>>>>>> 2.6 for the next several years? Even though the core Python devs 
>>>>>>>>> stopped
>>>>>>>>> supporting it in 2013?
>>>>>>>>>
>>>>>>>>> If that's not what we're suggesting, then when, roughly, can we
>>>>>>>>> drop support? What are the criteria?
>>>>>>>>>
>>>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But 
>>>>>>>>> balancing
>>>>>>>>> that concern against the maintenance burden on this project, I would 
>>>>>>>>> say
>>>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>>>> position to take. There are many tiny annoyances one has to put up 
>>>>>>>>> with to
>>>>>>>>> support 2.6.
>>>>>>>>>
>>>>>>>>> I suppose if our main PySpark contributors are fine putting up
>>>>>>>>> with those annoyances, then maybe we don't need to drop support just 
>>>>>>>>> yet...
>>>>>>>>>
>>>>>>>>> Nick
>>>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>>>> ju...@esbet.es>님이 작성:
>>>>>>>>>
>>>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>>>
>>>>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>>>
>>>>>>>>>> That said, I believe it should not be a concern for Spark. Python
>>>>>>>>>> 2.6 is old and busted, which is totally opposite to the Spark 
>>>>>>>>>> philosophy
>>>>>>>>>> IMO.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>>>> escribió:
>>>>>>>>>>
>>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>>>
>>>>>>>>>> if so, i still know plenty of large companies where python 2.6 is
>>>>>>>>>> the only option. asking them for python 2.7 is not going to work
>>>>>>>>>>
>>>>>>>>>> so i think its a bad idea
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>>>> juliet.hougl...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6.
>>>>>>>>>>> At this point, Python 3 should be the default that is encouraged.
>>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>>>> behind the version they should theoretically use. Dropping python 
>>>>>>>>>>> 2.6
>>>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1
>>>>>>>>>>>>
>>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core 
>>>>>>>>>>>> Python
>>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good 
>>>>>>>>>>>> enough
>>>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>>>
>>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>>>> believe we currently do).
>>>>>>>>>>>>
>>>>>>>>>>>> Nick
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <
>>>>>>>>>>>> allenzhang...@126.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> plus 1,
>>>>>>>>>>>>>
>>>>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <
>>>>>>>>>>>>> meethu.mat...@flytxt.com> 写道:
>>>>>>>>>>>>>
>>>>>>>>>>>>> +1
>>>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>>>> r...@databricks.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does anybody here care about us dropping support for Python
>>>>>>>>>>>>>> 2.6 in Spark 2.0?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects
>>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some 
>>>>>>>>>>>>>> libraries that
>>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince 
>>>>>>>>>>>>>> the library
>>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm 
>>>>>>>>>>>>>> curious if
>>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
>
>

Reply via email to