[ https://issues.apache.org/jira/browse/BEAM-14514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542645#comment-17542645 ]
Ryan Thompson edited comment on BEAM-14514 at 5/26/22 7:33 PM: --------------------------------------------------------------- [~dctelus] There is a known issue that usage of cloudpickle needs to be specified from the beginning of execution. This is something we are looking to fix but in the mean time you can work around this by adding this line at the beginning of your code. ``` beam.internal.pickler.set_library('cloudpickle') ``` Also still set the pipeline option as that will propagate to the workers. was (Author: ryan.thompson): [~dctelus] There is a known issue that usage of cloudpickle needs to be specified from the beginning of execution. This is something we are looking to fix but in the mean time you can work around this by adding this line at the beginning of your code. ``` beam.internal.pickler.set_library('cloudpickle') ``` > Beam python SDK ignores pickle_library option in pipeline.run() > --------------------------------------------------------------- > > Key: BEAM-14514 > URL: https://issues.apache.org/jira/browse/BEAM-14514 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Affects Versions: 2.38.0 > Reporter: dctelus > Assignee: Ryan Thompson > Priority: P2 > > Context: > In the Python SDK, you can specify the Pipeline argument --pickle_library > which dictates which library to use to pickle variables to send them from the > executing machine to the workers (when save_main_session is True). > Issue: > pickle_library options is ignored in the pipeline.run() function, which > reverts to using dill (the default one). > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py#L570 > Reproduce: > Add --pickle_library cloudpickle to pipeline options and notice that dill is > used for this session dump, even though cloudpickle is provided. > > I found this out because dill parser throws an exception for my use case, but > cloud pickle doesn't. -- This message was sent by Atlassian Jira (v8.20.7#820007)