Thanks Dongjoon. That makes much more sense now! 2020년 7월 3일 (금) 오전 12:11, Dongjoon Hyun <dongjoon.h...@gmail.com>님이 작성:
> Thank you, Hyukjin. > > According to the Python community, Python 3.5 is also EOF at 2020-09-13 > (only two months left). > > - https://www.python.org/downloads/ > > So, targeting live Python versions at Apache Spark 3.1.0 (December 2020) > looks reasonable to me. > > For old Python versions, we still have Apache Spark 2.4 LTS and also > Apache Spark 3.0.x will work. > > Bests, > Dongjoon. > > > On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li <xyliyuanj...@gmail.com> > wrote: > >> +1, especially Python 2 >> >> Holden Karau <hol...@pigscanfly.ca> 于2020年7月2日周四 上午10:20写道: >> >>> I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It >>> will be exciting to get to use more recent Python features. The most recent >>> Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if >>> folks really can’t upgrade there’s conda. >>> >>> Is there anyone with a large Python 3.5 fleet who can’t use conda? >>> >>> On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon <gurwls...@gmail.com> wrote: >>> >>>> Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we >>>> should make such changes in maintenance releases >>>> >>>> 2020년 7월 2일 (목) 오전 11:13, Holden Karau <hol...@pigscanfly.ca>님이 작성: >>>> >>>>> To be clear the plan is to drop them in Spark 3.1 onwards, yes? >>>>> >>>>> On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon <gurwls...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I would like to discuss dropping deprecated Python versions 2, 3.4 >>>>>> and 3.5 at https://github.com/apache/spark/pull/28957. I assume >>>>>> people support it in general >>>>>> but I am writing this to make sure everybody is happy. >>>>>> >>>>>> Fokko made a very good investigation on it, see >>>>>> https://github.com/apache/spark/pull/28957#issuecomment-652022449. >>>>>> Assuming from the statistics, I think we're pretty safe to drop them. >>>>>> Also note that dropping Python 2 was actually declared at >>>>>> https://python3statement.org/ >>>>>> >>>>>> Roughly speaking, there are many main advantages by dropping them: >>>>>> 1. It removes a bunch of hacks we added around 700 lines in PySpark. >>>>>> 2. PyPy2 has a critical bug that causes a flaky test, >>>>>> https://issues.apache.org/jira/browse/SPARK-28358 given my testing >>>>>> and investigation. >>>>>> 3. Users can use Python type hints with Pandas UDFs without >>>>>> thinking about Python version >>>>>> 4. Users can leverage one latest cloudpickle, >>>>>> https://github.com/apache/spark/pull/28950. With Python 3.8+ it can >>>>>> also leverage C pickle. >>>>>> 5. ... >>>>>> >>>>>> So it benefits both users and dev. WDYT guys? >>>>>> >>>>>> >>>>>> -- >>>>> Twitter: https://twitter.com/holdenkarau >>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>> >>>> -- >>> Twitter: https://twitter.com/holdenkarau >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> >>