Re: Make cloudpickle the default library in Beam 2.65.0

2025-07-22 Thread Claudius van der Merwe
Hey all, I wrote up a quick update on the status of replacing dill https://docs.google.com/document/d/1XypNkB0ujc-U2hy9PuJNYj6asY3tZyGoKvfFDbRux_Y/edit?usp=sharing . There is one remaining blocker (dill is used to deterministically encode special types by default) that I discuss further in https:

Re: Make cloudpickle the default library in Beam 2.65.0

2025-04-30 Thread Valentyn Tymofieiev via dev
Ah yes, and no more saving the main session :) > FWIW - I noticed that the DataFlow Options documentation[1] for setting the pickling library and the Beam documentation Thanks for bringing it up. The doc is outdated, the issue was fixed in https://github.com/apache/beam/issues/21615 . On Wed, Ap

Re: Make cloudpickle the default library in Beam 2.65.0

2025-04-30 Thread Joey Tran
Wow this is fantastic! I tested it out and it worked great for my runner. I am also excited for this change now and will eagerly set `cloudpickle` as the default pickler for our code. FWIW - I noticed that the DataFlow Options documentation[1] for setting the pickling library and the Beam document

Re: Make cloudpickle the default library in Beam 2.65.0

2025-04-29 Thread Robert Bradshaw
On Tue, Apr 29, 2025 at 7:51 PM Joey Tran wrote: > > Does cloudpickle make --save_main_session unnecessary? As in, will more > transforms defined in __main__ "just work"? Yes. Or at least it "just works" much more often. (There may still be corner cases, but I haven't run into them...) I, for o

Re: Make cloudpickle the default library in Beam 2.65.0

2025-04-29 Thread Joey Tran
Does cloudpickle make --save_main_session unnecessary? As in, will more transforms defined in __main__ "just work"? If so, I can see why that's worthwhile. I've had a _ton_ of issues with this, especially with new users of beam at my company. Explaining main session and why random things throw unp

Re: Make cloudpickle the default library in Beam 2.65.0

2025-04-29 Thread Valentyn Tymofieiev via dev
There are several reasons: - wide adoption in data processing community , see initial discussion: [1] - expectations on cloudpickle having a larger number of maintainers and contributors. - new releases of dill had breaking changes[2], which made adoption of a new version challenging. - cloudpi

Re: Make cloudpickle the default library in Beam 2.65.0

2025-04-28 Thread Valentyn Tymofieiev via dev
Thanks Claude! Great to see a lot of progress on this effort. The dependency on an old version of dill has been a persistent painpoint for many users. Please call out this change in the release notes, so that customers can provide feedback and find instructions on how to unblock themselves. It c

Re: Make cloudpickle the default library in Beam 2.65.0

2025-04-28 Thread Joey Tran
Naive question, but why is beam upgrading to cloudpickle? I saw this doc: https://docs.google.com/document/d/1G5Q0ckX5sKQRQD1yEkLCPQL7N6B-AL9Cb1p0zlOOfQU/edit?tab=t.0 Is the main reason because cloudpickle is more actively maintained? On Mon, Apr 28, 2025 at 6:51 PM Claudius van der Merwe wrot