Re: [DISCUSS] Increasing minimum supported version of Pandas

Bryan Cutler Fri, 14 Jun 2019 10:23:42 -0700

Shane, I think 0.24.2 is probably more common right now, so if we were to
pick one to test against, I still think it should be that one. Our Pandas
usage in PySpark is pretty conservative, so it's pretty unlikely that we
will add something that would break 0.23.X.


On Fri, Jun 14, 2019 at 10:10 AM shane knapp <skn...@berkeley.edu> wrote:

> ah, ok...  should we downgrade the testing env on jenkins then?  any
> specific version?
>
> shane, who is loathe (and i mean LOATHE) to touch python envs ;)
>
> On Fri, Jun 14, 2019 at 10:08 AM Bryan Cutler <cutl...@gmail.com> wrote:
>
>> I should have stated this earlier, but when the user does something that
>> requires Pandas, the minimum version is checked against what was imported
>> and will raise an exception if it is a lower version. So I'm concerned that
>> using 0.24.2 might be a little too new for users running older clusters. To
>> give some release dates, 0.23.2 was released about a year ago, 0.24.0 in
>> January and 0.24.2 in March.
>>
>> On Fri, Jun 14, 2019 at 9:27 AM shane knapp <skn...@berkeley.edu> wrote:
>>
>>> just to everyone knows, our python 3.6 testing infra is currently on
>>> 0.24.2...
>>>
>>> On Fri, Jun 14, 2019 at 9:16 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
>>> wrote:
>>>
>>>> +1
>>>>
>>>> Thank you for this effort, Bryan!
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>> On Fri, Jun 14, 2019 at 4:24 AM Holden Karau <hol...@pigscanfly.ca>
>>>> wrote:
>>>>
>>>>> I’m +1 for upgrading, although since this is probably the last easy
>>>>> chance we’ll have to bump version numbers easily I’d suggest 0.24.2
>>>>>
>>>>>
>>>>> On Fri, Jun 14, 2019 at 4:38 AM Hyukjin Kwon <gurwls...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow
>>>>>> and pandas combinations. Spark 3 should be good time to increase.
>>>>>>
>>>>>> 2019년 6월 14일 (금) 오전 9:46, Bryan Cutler <cutl...@gmail.com>님이 작성:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> We would like to discuss increasing the minimum supported version of
>>>>>>> Pandas in Spark, which is currently 0.19.2.
>>>>>>>
>>>>>>> Pandas 0.19.2 was released nearly 3 years ago and there are some
>>>>>>> workarounds in PySpark that could be removed if such an old version is 
>>>>>>> not
>>>>>>> required. This will help to keep code clean and reduce maintenance 
>>>>>>> effort.
>>>>>>>
>>>>>>> The change is targeted for Spark 3.0.0 release, see
>>>>>>> https://issues.apache.org/jira/browse/SPARK-28041. The current
>>>>>>> thought is to bump the version to 0.23.2, but we would like to discuss
>>>>>>> before making a change. Does anyone else have thoughts on this?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Bryan
>>>>>>>
>>>>>> --
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>
>>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>

Re: [DISCUSS] Increasing minimum supported version of Pandas

Reply via email to