Re: Thoughts on Cloudpickle Update

2018-01-19 Thread Bryan Cutler
Thanks Holden and Hyukjin. I agree, let's start doing the work first and see if it the changes are low risk enough, then we can evaluate how best to proceed. I made https://issues.apache.org/jira/browse/SPARK-23159 and will get started on the update and we can continue to discuss in the PR. On F

Re: Thoughts on Cloudpickle Update

2018-01-19 Thread Hyukjin Kwon
Yea, that sounds good to me. 2018-01-19 18:29 GMT+09:00 Holden Karau : > So it is pretty core, but its one of the better indirectly tested > components. I think probably the most reasonable path is to see what the > diff ends up looking like and make a call at that point for if we want it > to go

Re: Thoughts on Cloudpickle Update

2018-01-19 Thread Holden Karau
So it is pretty core, but its one of the better indirectly tested components. I think probably the most reasonable path is to see what the diff ends up looking like and make a call at that point for if we want it to go to master or master & branch-2.3? On Fri, Jan 19, 2018 at 12:30 AM, Hyukjin Kwo

Re: Thoughts on Cloudpickle Update

2018-01-19 Thread Hyukjin Kwon
> So given that it fixes some real world bugs, any particular reason why? Would you be comfortable with doing it in 2.3.1? Ah, I don't feel strongly about this but RC2 will be running on and cloudpickle's quite core fix to PySpark. Just thought we might want to have enough time with it. One worry

Re: Thoughts on Cloudpickle Update

2018-01-18 Thread Holden Karau
On Jan 19, 2018 7:28 PM, "Hyukjin Kwon" wrote: > Is it an option to match the latest version of cloudpickle and still set protocol level 2? IMHO, I think this can be an option but I am not fully sure yet if we should/could go ahead for it within Spark 2.X. I need some investigations including th

Re: Thoughts on Cloudpickle Update

2018-01-18 Thread Hyukjin Kwon
> Is it an option to match the latest version of cloudpickle and still set protocol level 2? IMHO, I think this can be an option but I am not fully sure yet if we should/could go ahead for it within Spark 2.X. I need some investigations including things about Pyrolite. Let's go ahead with matchin

Re: Thoughts on Cloudpickle Update

2018-01-18 Thread Holden Karau
So if there are different version of Python on the cluster machines I think that's already unsupported so I'm not worried about that. I'd suggest going to the highest released version since there appear to be some useful fixes between 0.4.2 & 0.5.2 Also lets try to keep track in our commit messag

Re: Thoughts on Cloudpickle Update

2018-01-18 Thread Bryan Cutler
Thanks for all the details and background Hyukjin! Regarding the pickle protocol change, if I understand correctly, it is currently at level 2 in Spark which is good for backwards compatibility for all of Python 2. Choosing HIGHEST_PROTOCOL, which is the default for cloudpickle 0.5.0 and above, wil

Re: Thoughts on Cloudpickle Update

2018-01-15 Thread Hyukjin Kwon
Hi Bryan, Yup, I support to match the version. I pushed it forward before to match it with https://github.com/cloudpipe/cloudpickle before few times in Spark's copy and also cloudpickle itself with few fixes. I believe our copy is closest to 0.4.1. I have been trying to follow up the changes in c

Thoughts on Cloudpickle Update

2018-01-15 Thread Bryan Cutler
Hi All, I've seen a couple issues lately related to cloudpickle, notably https://issues.apache.org/jira/browse/SPARK-22674, and would like to get some feedback on updating the version in PySpark which should fix these issues and allow us to remove some workarounds. Spark is currently using a fork