I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow and
pandas combinations. Spark 3 should be good time to increase.
2019년 6월 14일 (금) 오전 9:46, Bryan Cutler 님이 작성:
> Hi All,
>
> We would like to discuss increasing the minimum supported version of
> Pandas in Spark, which is current
Hi All,
We would like to discuss increasing the minimum supported version of Pandas
in Spark, which is currently 0.19.2.
Pandas 0.19.2 was released nearly 3 years ago and there are some
workarounds in PySpark that could be removed if such an old version is not
required. This will help to keep cod
Hi everyone,
I would like to call a vote for the SPIP for SPARK-25299, which proposes to
introduce a pluggable storage API for temporary shuffle data.
You may find the SPIP document here.
The discussion thread for the SPIP was conducted here.
Please vote on whether or not this prop
Thank you for the feedbacks and requirements, Hyukjin, Reynold, Marco.
Sure, we can do whatever we want.
I'll wait for more feedbacks and proceed to the next steps.
Bests,
Dongjoon.
On Wed, Jun 12, 2019 at 11:51 PM Marco Gaido wrote:
> Hi Dongjoon,
> Thanks for the proposal! I like the idea.
If you control the codebase, you control when an RDD goes out of scope. Or
am I missing something?
(Note that finalize will not necessarily executed when an object goes out
of scope but when the GC runs at some indeterminate point in the future.
Please avoid using finalize for the kind of task you