Benoit Clennett-Sirois created BEAM-13905:
---------------------------------------------
Summary: Apache Beam Python: Datafrane Transforms break when the
option runtime_type_check is enabled.
Key: BEAM-13905
URL: https://issues.apache.org/jira/browse/BEAM-13905
Project: Beam
Issue Type: Bug
Components: runner-core
Affects Versions: 2.35.0
Environment: OS: Linux
Python 3.8.12
Reporter: Benoit Clennett-Sirois
We have discovered a potential bug whereas when you execute a pipeline that
contains
a DataframeTransform with the "runtime_type_check" option set to True, a
cryptic
error is raised by Apache Beam typecheckng.
Simple example to reproduce the bug:
{code:java}
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam import Pipeline, Create, Row
from apache_beam.dataframe.transforms import DataframeTransform
pipeline = Pipeline(options=PipelineOptions(runtime_type_check=True))
pipeline | Create([Row(val1=1)]) | DataframeTransform(lambda df: df)
pipeline.run(){code}
This raises a apache_beam.typehints.decorators.TypeCheckError:
{code:java}
File ".....lib/python3.8/site-packages/apache_beam/typehints/typehints.py",
line 416, in check_constraint
raise SimpleTypeHintError
apache_beam.typehints.decorators.TypeCheckError: According to type-hint
expected output should be of type <class
'apache_beam.typehints.schemas.BeamSchema_118086df_671f_4643_a929_ba65de48e7e8'>.
Instead, received 'BeamSchema_118086df_671f_4643_a929_ba65de48e7e8(val1=1)',
an instance of type <class
'apache_beam.typehints.schemas.BeamSchema_118086df_671f_4643_a929_ba65de48e7e8'>.
[while running 'DataframeTransform/Unbatch
'placeholder_DataFrame_140623617251840'/ParDo(_UnbatchNoIndex)'] {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)