Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-13 Thread Hyukjin Kwon
I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow and pandas combinations. Spark 3 should be good time to increase. 2019년 6월 14일 (금) 오전 9:46, Bryan Cutler 님이 작성: > Hi All, > > We would like to discuss increasing the minimum supported version of > Pandas in Spark, which is current

[DISCUSS] Increasing minimum supported version of Pandas

2019-06-13 Thread Bryan Cutler
Hi All, We would like to discuss increasing the minimum supported version of Pandas in Spark, which is currently 0.19.2. Pandas 0.19.2 was released nearly 3 years ago and there are some workarounds in PySpark that could be removed if such an old version is not required. This will help to keep cod

[VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-13 Thread Matt Cheah
Hi everyone, I would like to call a vote for the SPIP for SPARK-25299, which proposes to introduce a pluggable storage API for temporary shuffle data. You may find the SPIP document here. The discussion thread for the SPIP was conducted here. Please vote on whether or not this prop

Re: Exposing JIRA issue types at GitHub PRs

2019-06-13 Thread Dongjoon Hyun
Thank you for the feedbacks and requirements, Hyukjin, Reynold, Marco. Sure, we can do whatever we want. I'll wait for more feedbacks and proceed to the next steps. Bests, Dongjoon. On Wed, Jun 12, 2019 at 11:51 PM Marco Gaido wrote: > Hi Dongjoon, > Thanks for the proposal! I like the idea.

Re: Adding Custom finalize method to RDDs.

2019-06-13 Thread Phillip Henry
If you control the codebase, you control when an RDD goes out of scope. Or am I missing something? (Note that finalize will not necessarily executed when an object goes out of scope but when the GC runs at some indeterminate point in the future. Please avoid using finalize for the kind of task you