Beam High Priority Issue Report (38)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/25675 [Bug]: https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK Harness ProcessBundleProgress https://github.com/apache/beam/issues/24389 [Failing Test]: HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError ContainerFetchException https://github.com/apache/beam/issues/24367 [Bug]: workflow.tar.gz cannot be passed to flink runner https://github.com/apache/beam/issues/24313 [Flaky]: apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder https://github.com/apache/beam/issues/24267 [Failing Test]: Timeout waiting to lock gradle https://github.com/apache/beam/issues/23944 beam_PreCommit_Python_Cron regularily failing - test_pardo_large_input flaky https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle https://github.com/apache/beam/issues/22969 Discrepancy in behavior of `DoFn.process()` when `yield` is combined with `return` statement, or vice versa https://github.com/apache/beam/issues/22961 [Bug]: WriteToBigQuery silently skips most of records without job fail https://github.com/apache/beam/issues/22913 [Bug]: beam_PostCommit_Java_ValidatesRunner_Flink is flakes in org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState https://github.com/apache/beam/issues/22115 [Bug]: apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses is flaky https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output to Failed Inserts PCollection https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit test action StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not raise exception for unsuccessful states. https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial (order 1000 elements) numpy input flakes in non-cython environment https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: Connection refused https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) failing: ParDoTest$TimestampTests/OnWindowExpirationTests https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not follow spec https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit data at GC time https://github.com/apache/beam/issues/21121 apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it flakey https://github.com/apache/beam/issues/21104 Flaky: apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers https://github.com/apache/beam/issues/20976 apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics is flaky https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with grpc.FutureTimeoutError on SDK harness startup https://github.com/apache/beam/issues/20689 Kafka commitOffsetsInFinalize OOM on Flink https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit empty pane when it should https://github.com/apache/beam/issues/19814 Flink streaming flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful https://github.com/apache/beam/issues/19465 Explore possibilities to lower in-use IP address quota footprint. https://github.com/apache/beam/issues/19241 Python Dataflow integration tests should export the pipeline Job ID and console output to Jenkins Test Result section P1 Issues with no update in the last week: https://github.com/apache/beam/issues/25412 [Feature Request]: Google Cloud Bigtable Change Stream Connector https://github.com/apache/beam/issues/23875 [Bug]: beam.Row.__eq__ returns true for unequal rows https://github.com/apache/beam/issues/23848 Support for Python 3.11 https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder will drop message id and orderingKey https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it https://github.com/apache/beam/issues/21714 PulsarIOTest.testReadFromSimpleTopic is very flaky https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing consistently https://github.com/apache/beam/issues/2164
Beam Dependency Check Report (2023-03-02)
<<< text/html; charset=UTF-8: Unrecognized >>>
Re: Dependabot questions
I do not find the dependency reports to have a useful signal-to-noise ratio, personally. I do like dependabot if we could make our project more legible to it, if that is an issue. Kenn On Tue, Feb 28, 2023 at 7:56 AM Danny McCormick via dev wrote: > AFAIK Dependabot doesn't have a great replacement for this. I'm not sure > why the dependency reports stopped, but we could probably try to fix them - > looks like they stopped working in October - > https://lists.apache.org/list?dev@beam.apache.org:2021-10:dependency%20report. > We still have the job which generates the empty reports - > https://github.com/apache/beam/blob/fed35133ee1cb9eb0c5ec8a1b13a7c75835a1510/.test-infra/jenkins/job_Dependency_Check.groovy#L43 > > > Also, I noticed that some dependencies are outdated, yet not updated by > Dependabot. Possibly, because a prior update PR was silenced. Is it > possible to see the state of which dependencies are currently opted out? > > There's not an awesome view of this - looking through logs at > https://github.com/apache/beam/network/updates/615364619 is the best I'm > aware of, though it was promised a year and a half ago - > https://github.com/dependabot/dependabot-core/issues/2255#issuecomment-838622025 > > On Mon, Feb 27, 2023 at 8:37 PM Valentyn Tymofieiev via dev < > dev@beam.apache.org> wrote: > >> I noticed that human-readable dependency reports are not being generated. >> Can this functionality be replaced with Dependabot? >> >> Does Dependabot provide a view of what is currently outdated from its >> standpoint? >> >> Also, I noticed that some dependencies are outdated, yet not updated by >> Dependabot. Possibly, because a prior update PR was silenced. Is it >> possible to see the state of which dependencies are currently opted out? >> >> >> Thanks! >> >> >>
Re: Beam SQL Alias issue while using With Clause
Hi Talat, I managed to turn your test case into something against Calcite. It looks like there is a bug affecting tables that contain one or more single element structs and no multi element structs. I've sent the details to the Calcite mailing list here. https://lists.apache.org/thread/tlr9hsmx09by79h91nwp2d4nv8jfwsto I'm experimenting with ideas on how to work around this but a fix will likely require a Calcite upgrade, which is not something I'd have time to help with. (I'm not on the Google Beam team anymore.) Andrew On Wed, Feb 22, 2023 at 12:18 PM Talat Uyarer wrote: > > Hi @Andrew Pilloud > > Sorry for the late response. Yes your test is working fine. I changed the > test input structure like our input structure. Now this test also has the > same exception. > > Feb 21, 2023 2:02:28 PM > org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner convertToBeamRel > INFO: SQL: > WITH `tempTable` AS (SELECT `panwRowTestTable`.`user_info`, > `panwRowTestTable`.`id`, `panwRowTestTable`.`value` > FROM `beam`.`panwRowTestTable` AS `panwRowTestTable` > WHERE `panwRowTestTable`.`user_info`.`name` = 'innerStr') (SELECT > `tempTable`.`user_info`, `tempTable`.`id`, `tempTable`.`value` > FROM `tempTable` AS `tempTable`) > Feb 21, 2023 2:02:28 PM > org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner convertToBeamRel > INFO: SQLPlan> > LogicalProject(user_info=[ROW($0)], id=[$1], value=[$2]) > LogicalFilter(condition=[=($0.name, 'innerStr')]) > LogicalProject(name=[$0.name], id=[$1], value=[$2]) > BeamIOSourceRel(table=[[beam, panwRowTestTable]]) > > > fieldList must not be null, type = VARCHAR > java.lang.AssertionError: fieldList must not be null, type = VARCHAR > > I dont know what is different from yours. I am sharing my version of the test > also. > > > Index: > sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java > IDEA additional info: > Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP > <+>UTF-8 > === > diff --git > a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java > > b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java > --- > a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java > (revision fd383fae1adc545b6b6a22b274902cda956fec49) > +++ > b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java > (date 1677017032324) > @@ -54,6 +54,9 @@ >private static final Schema innerRowSchema = > > Schema.builder().addStringField("string_field").addInt64Field("long_field").build(); > > + private static final Schema innerPanwRowSchema = > + Schema.builder().addStringField("name").build(); > + >private static final Schema innerRowWithArraySchema = >Schema.builder() >.addStringField("string_field") > @@ -127,8 +130,12 @@ >.build())) >.put( >"basicRowTestTable", > - TestBoundedTable.of(FieldType.row(innerRowSchema), "col") > - > .addRows(Row.withSchema(innerRowSchema).addValues("innerStr", 1L).build())) > + TestBoundedTable.of(FieldType.row(innerRowSchema), "col", > FieldType.INT64, "field") > + > .addRows(Row.withSchema(innerRowSchema).addValues("innerStr", 1L).build(), > 1L)) > +.put( > + "panwRowTestTable", > + TestBoundedTable.of(FieldType.row(innerPanwRowSchema), > "user_info", FieldType.INT64, "id", FieldType.STRING, "value") > + > .addRows(Row.withSchema(innerRowSchema).addValues("name", 1L).build(), 1L, > "some_value")) >.put( >"rowWithArrayTestTable", >TestBoundedTable.of(FieldType.row(rowWithArraySchema), > "col") > @@ -219,6 +226,21 @@ > .build()); > pipeline.run().waitUntilFinish(Duration.standardMinutes(2)); >} > + > + @Test > + public void testBasicRowWhereField() { > +BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(readOnlyTableProvider); > +PCollection stream = > +BeamSqlRelUtils.toPCollection( > +pipeline, sqlEnv.parseQuery("WITH tempTable AS (SELECT * FROM > panwRowTestTable WHERE panwRowTestTable.`user_info`.`name` = 'innerStr') > SELECT * FROM tempTable")); > +Schema outputSchema = Schema.builder().addRowField("col", > innerRowSchema).addInt64Field("field").build(); > +PAssert.that(stream) > +.containsInAnyOrder( > +Row.withSchema(outputSchema) > +.addValues(Row.withSchema(innerRowSchema).addValues("name", > 1L).build(), 1L) > +.build()); > +pipeline.run().waitUntilFinish(Duration.standardMinutes(2)); > + } > >@Test
[VOTE] Release 2.46.0, release candidate #1
Hi everyone, Please review and vote on release candidate #1 for the version 2.46.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) Reviewers are encouraged to test their own use cases with the release candidate, and vote +1 if no issues are found. The complete staging area is available for your review, which includes: * GitHub Release notes [1], * the official Apache source release to be deployed to dist.apache.org [2], which is signed with the key with fingerprint FC383FCDE7D7E86699954EF2509872C8031C4DFB [3], * all artifacts to be deployed to the Maven Central Repository [4], * source code tag "v2.46.0-RC1" [5], * website pull request listing the release [6], the blog post [6], and publishing the API reference manual [7]. * Java artifacts were built with Gradle GRADLE_VERSION and OpenJDK/Oracle JDK JDK_VERSION. * Python artifacts are deployed along with the source release to the dist.apache.org [2] and PyPI[8]. * Go artifacts and documentation are available at pkg.go.dev [9] * Validation sheet with a tab for 2.46.0 release to help with validation [10]. * Docker images published to Docker Hub [11]. * PR to run tests against release branch [12]. The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. For guidelines on how to try the release in your projects, check out our blog post at /blog/validate-beam-release/. Thanks, Danny [1] https://github.com/apache/beam/milestone/9 [2] https://dist.apache.org/repos/dist/dev/beam/2.46.0/ [3] https://dist.apache.org/repos/dist/release/beam/KEYS [4] https://repository.apache.org/content/repositories/orgapachebeam-1306/ [5] https://github.com/apache/beam/tree/v2.46.0-RC1 [6] https://github.com/apache/beam/pull/25693 [7] https://github.com/apache/beam-site/pull/641 [8] https://pypi.org/project/apache-beam/2.46.0rc1/ [9] https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.46.0-RC1/go/pkg/beam [10] https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=247587190 [11] https://hub.docker.com/search?q=apache%2Fbeam&type=image [12] https://github.com/apache/beam/pull/25600
Re: Beam SQL Alias issue while using With Clause
Hi Andrew, Thank you so much for your help. Sorry to hear you changed team :( I can handle calcite upgrades if there is a fix. I was working on calcite upgrade but then we started having so many issues. That's why I stopped doing it. Talat On Thu, Mar 2, 2023 at 11:56 AM Andrew Pilloud wrote: > Hi Talat, > > I managed to turn your test case into something against Calcite. It > looks like there is a bug affecting tables that contain one or more > single element structs and no multi element structs. I've sent the > details to the Calcite mailing list here. > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread_tlr9hsmx09by79h91nwp2d4nv8jfwsto&d=DwIFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=zJaLiteP9qPsCpsYH_nZTe5CX525Dz56whg44LRafjvy3wE_-_eJrOOM9OtOuoVr&s=g36wnBGvi7DQG7gvljaG08vXIhROyCoz5vWBBRS43Ag&e= > > I'm experimenting with ideas on how to work around this but a fix will > likely require a Calcite upgrade, which is not something I'd have time > to help with. (I'm not on the Google Beam team anymore.) > > Andrew > > On Wed, Feb 22, 2023 at 12:18 PM Talat Uyarer > wrote: > > > > Hi @Andrew Pilloud > > > > Sorry for the late response. Yes your test is working fine. I changed > the test input structure like our input structure. Now this test also has > the same exception. > > > > Feb 21, 2023 2:02:28 PM > org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner convertToBeamRel > > INFO: SQL: > > WITH `tempTable` AS (SELECT `panwRowTestTable`.`user_info`, > `panwRowTestTable`.`id`, `panwRowTestTable`.`value` > > FROM `beam`.`panwRowTestTable` AS `panwRowTestTable` > > WHERE `panwRowTestTable`.`user_info`.`name` = 'innerStr') (SELECT > `tempTable`.`user_info`, `tempTable`.`id`, `tempTable`.`value` > > FROM `tempTable` AS `tempTable`) > > Feb 21, 2023 2:02:28 PM > org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner convertToBeamRel > > INFO: SQLPlan> > > LogicalProject(user_info=[ROW($0)], id=[$1], value=[$2]) > > LogicalFilter(condition=[=($0.name, 'innerStr')]) > > LogicalProject(name=[$0.name], id=[$1], value=[$2]) > > BeamIOSourceRel(table=[[beam, panwRowTestTable]]) > > > > > > fieldList must not be null, type = VARCHAR > > java.lang.AssertionError: fieldList must not be null, type = VARCHAR > > > > I dont know what is different from yours. I am sharing my version of the > test also. > > > > > > Index: > sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java > > IDEA additional info: > > Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP > > <+>UTF-8 > > === > > diff --git > a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java > b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java > > --- > a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java > (revision fd383fae1adc545b6b6a22b274902cda956fec49) > > +++ > b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java > (date 1677017032324) > > @@ -54,6 +54,9 @@ > >private static final Schema innerRowSchema = > > > Schema.builder().addStringField("string_field").addInt64Field("long_field").build(); > > > > + private static final Schema innerPanwRowSchema = > > + Schema.builder().addStringField("name").build(); > > + > >private static final Schema innerRowWithArraySchema = > >Schema.builder() > >.addStringField("string_field") > > @@ -127,8 +130,12 @@ > >.build())) > >.put( > >"basicRowTestTable", > > - TestBoundedTable.of(FieldType.row(innerRowSchema), > "col") > > - > .addRows(Row.withSchema(innerRowSchema).addValues("innerStr", 1L).build())) > > + TestBoundedTable.of(FieldType.row(innerRowSchema), > "col", FieldType.INT64, "field") > > + > .addRows(Row.withSchema(innerRowSchema).addValues("innerStr", 1L).build(), > 1L)) > > +.put( > > + "panwRowTestTable", > > + > TestBoundedTable.of(FieldType.row(innerPanwRowSchema), "user_info", > FieldType.INT64, "id", FieldType.STRING, "value") > > + > .addRows(Row.withSchema(innerRowSchema).addValues("name", 1L).build(), 1L, > "some_value")) > >.put( > >"rowWithArrayTestTable", > > > TestBoundedTable.of(FieldType.row(rowWithArraySchema), "col") > > @@ -219,6 +226,21 @@ > > .build()); > > pipeline.run().waitUntilFinish(Duration.standardMinutes(2)); > >} > > + > > + @Test > > + public void testBasicRowWhereField() { > > +BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(readOnlyTableProvider); > > +PCollection stream = > > +BeamSqlRelUtils.toPCollection( > > +pipeline