[
https://issues.apache.org/jira/browse/BEAM-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209880#comment-17209880
]
Valentyn Tymofieiev commented on BEAM-9154:
-------------------------------------------
Discussed this offline with [~altay], [~yifanmai], [~zhitao]. The purpose of
these tests is to identify regressions in Beam code on TFX-related scenarios.
These tests should be using a stable version of TFX and Beam code at head. TFX
version should be configurable, and it makes sense to periodically switch to
newer version, but using the latest version of TFX is not a goal for these
benchmarks.
We should fix current failure in these benchmarks by upgrading to a newer
version of TFX stack. Recent TFX libraries do not set an upper bound on Beam.
Given that we will likely update TFX later and there were Py3 changes in TFX
code, I suggest to try a newer version of TFX stack and see if the Py3 error is
reproducible.
We should migrate the benchmark code to be compatible with TF 2.0. It sounds
like these benchmarks use Chicago Taxi example. I imagine we have a newer
version of the example that is compatible TF 2.0.
cc: [~tysonjh]
> Move Chicago Taxi Example to Python 3
> -------------------------------------
>
> Key: BEAM-9154
> URL: https://issues.apache.org/jira/browse/BEAM-9154
> Project: Beam
> Issue Type: Improvement
> Components: testing
> Reporter: Kamil Wasilewski
> Assignee: Kamil Wasilewski
> Priority: P1
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> The Chicago Taxi Example[1] should be moved to the latest version of Python
> supported by Beam (currently it's Python 3.7).
> At the moment, the following error occurs when running the benchmark on
> Python 3.7 (requires futher investigation):
> {code:java}
> Traceback (most recent call last):
> File "preprocess.py", line 259, in <module>
> main()
> File "preprocess.py", line 254, in main
> project=known_args.metric_reporting_project
> File "preprocess.py", line 155, in transform_data
> ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
> File
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
> line 987, in __ror__
> return self.transform.__ror__(pvalueish, self.label)
> File
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
> line 547, in __ror__
> result = p.apply(self, pvalueish, label)
> File
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line
> 532, in apply
> return self.apply(transform, pvalueish)
> File
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line
> 573, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py",
> line 193, in apply
> return m(transform, input, options)
> File
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py",
> line 223, in apply_PTransform
> return transform.expand(input)
> File
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
> line 825, in expand
> input_metadata))
> File
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
> line 716, in expand
> output_signature = self._preprocessing_fn(copied_inputs)
> File "preprocess.py", line 102, in preprocessing_fn
> _fill_in_missing(inputs[key]),
> KeyError: 'company'
> {code}
> [1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi
--
This message was sent by Atlassian Jira
(v8.3.4#803005)