[
https://issues.apache.org/jira/browse/BEAM-8397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959139#comment-16959139
]
Valentyn Tymofieiev commented on BEAM-8397:
-------------------------------------------
Re: why the test fails when we move SpecialParDo to be module-level:
When SpecialPardo is inside the function it is not pickle-able, and Beam
catches that in [1]. If we add some logging there, we can see that we catch an
exception like "Cannot pickle/unpickle AppliedPTransform(Do, SpecialParDo),
caught Exception: maximum recursion depth exceeded while calling a Python
object". Beam then decides that the pipeline is not RunnerApi-compatible and
omits the "RunnerAPI roundrtip" as per [2].
When we make the SpecialParDo pickleable, pipeline is now runnerapi-compatible,
and this roundtrip changes the pipeline in some way that causes the test to
fail. Perhaps SpecialParDo, becomes regular ParDo, or something like that. If
we set test_runner_api=False [3], and move SpecialParDo to be module level,
the test passes.
[1]
https://github.com/apache/beam/blob/f21f41724f1d9bb07ffffb49644499240f7e9bfe/sdks/python/apache_beam/pipeline.py#L631
[2]
[https://github.com/apache/beam/blob/f21f41724f1d9bb07ffffb49644499240f7e9bfe/sdks/python/apache_beam/pipeline.py#L432]
[3]
https://github.com/apache/beam/blob/f21f41724f1d9bb07ffffb49644499240f7e9bfe/sdks/python/apache_beam/pipeline.py#L402
> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite
> recursion during pickling.
> ---------------------------------------------------------------------------------------------------
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-py-core
> Reporter: Valentyn Tymofieiev
> Assignee: Valentyn Tymofieiev
> Priority: Major
>
> `python ./setup.py test -s
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
> passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
> fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ...
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x00007f9d700ed740 (most recent call first):
> File "/usr/lib/python3.7/pickle.py", line 479 in get
> File "/usr/lib/python3.7/pickle.py", line 497 in save
> File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
> File "/usr/lib/python3.7/pickle.py", line 504 in save
> File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
> File
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
> line 1394 in save_function
> File "/usr/lib/python3.7/pickle.py", line 504 in save
> File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
> File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
> File
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
> line 910 in save_module_dict
> File
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
> line 198 in new_save_module_dict
> File "/usr/lib/python3.7/pickle.py", line 504 in save
> File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
> File "/usr/lib/python3.7/pickle.py", line 504 in save
> File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
> File
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
> line 114 in wrapper
> File "/usr/lib/python3.7/pickle.py", line 504 in save
> File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
> File "/usr/lib/python3.7/pickle.py", line 504 in save
> File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
> File
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
> line 1137 in save_cell
> File "/usr/lib/python3.7/pickle.py", line 504 in save
> File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
> File "/usr/lib/python3.7/pickle.py", line 504 in save
> File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
> File "/usr/lib/python3.7/pickle.py", line 504 in save
> File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
> File
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
> line 1394 in save_function
> File "/usr/lib/python3.7/pickle.py", line 504 in save
> File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
> File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
> File
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
> line 910 in save_module_dict
> File
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
> line 198 in new_save_module_dict
> ...
> {noformat}
> cc: [~yoshiki.obata]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)