Hi All, I've seen a couple issues lately related to cloudpickle, notably https://issues.apache.org/jira/browse/SPARK-22674, and would like to get some feedback on updating the version in PySpark which should fix these issues and allow us to remove some workarounds. Spark is currently using a forked version and it seems like updates are made every now and then when needed, but it's not really clear where the current state is and how much it has diverged. This makes back-porting fixes difficult. There was a previous discussion on moving it to a dependency here <http://apache-spark-developers-list.1001551.n3.nabble.com/PYTHON-DISCUSS-Moving-to-cloudpickle-and-or-Py4J-as-a-dependencies-td20954.html>, but given the status right now I think it would be best to do another update and bring things closer to upstream before we talk about completely moving it outside of Spark. Before starting another update, it might be good to discuss the strategy a little. Should the version in Spark be derived from a release or at least tied to a specific commit? It would also be good if we can document where it has diverged. Are there any known issues with recent changes from those that follow cloudpickle dev? Any other thoughts or concerns?
Thanks, Bryan