[ 
https://issues.apache.org/jira/browse/BEAM-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209880#comment-17209880
 ] 

Valentyn Tymofieiev commented on BEAM-9154:
-------------------------------------------

Discussed this offline with [~altay], [~yifanmai], [~zhitao]. The purpose of 
these tests is to identify regressions in Beam code on  TFX-related scenarios. 
These tests should be using a stable version of TFX and Beam code at head. TFX 
version should be configurable, and it makes sense to periodically switch to 
newer version, but using the latest version of TFX is not a goal for these 
benchmarks.

We should fix current failure in these benchmarks by upgrading to a newer 
version of TFX stack. Recent TFX libraries do not set an upper bound on Beam. 
Given that we will likely update TFX later and there were Py3 changes in TFX 
code, I suggest to try a newer version of TFX stack and see if the Py3 error is 
reproducible. 

We should migrate the benchmark code to be compatible with TF 2.0. It sounds 
like these benchmarks use Chicago Taxi example. I imagine we have a newer 
version of the example that is compatible TF 2.0. 

cc: [~tysonjh]


> Move Chicago Taxi Example to Python 3
> -------------------------------------
>
>                 Key: BEAM-9154
>                 URL: https://issues.apache.org/jira/browse/BEAM-9154
>             Project: Beam
>          Issue Type: Improvement
>          Components: testing
>            Reporter: Kamil Wasilewski
>            Assignee: Kamil Wasilewski
>            Priority: P1
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The Chicago Taxi Example[1] should be moved to the latest version of Python 
> supported by Beam (currently it's Python 3.7).
> At the moment, the following error occurs when running the benchmark on 
> Python 3.7 (requires futher investigation):
> {code:java}
> Traceback (most recent call last):
>   File "preprocess.py", line 259, in <module>
>     main()
>   File "preprocess.py", line 254, in main
>     project=known_args.metric_reporting_project
>   File "preprocess.py", line 155, in transform_data
>     ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 987, in __ror__
>     return self.transform.__ror__(pvalueish, self.label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 547, in __ror__
>     result = p.apply(self, pvalueish, label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 532, in apply
>     return self.apply(transform, pvalueish)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 573, in apply
>     pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
>     return m(transform, input, options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 223, in apply_PTransform
>     return transform.expand(input)
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 825, in expand
>     input_metadata))
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 716, in expand
>     output_signature = self._preprocessing_fn(copied_inputs)
>   File "preprocess.py", line 102, in preprocessing_fn
>     _fill_in_missing(inputs[key]),
> KeyError: 'company'
> {code}
> [1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to