Beam High Priority Issue Report (38)

2023-03-02 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/25675 [Bug]: 
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24367 [Bug]: workflow.tar.gz cannot be 
passed to flink runner
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/24267 [Failing Test]: Timeout waiting to 
lock gradle
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22969 Discrepancy in behavior of 
`DoFn.process()` when `yield` is combined with `return` statement, or vice versa
https://github.com/apache/beam/issues/22961 [Bug]: WriteToBigQuery silently 
skips most of records without job fail
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22115 [Bug]: 
apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses
 is flaky
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with 
grpc.FutureTimeoutError on SDK harness startup
https://github.com/apache/beam/issues/20689 Kafka commitOffsetsInFinalize OOM 
on Flink
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.
https://github.com/apache/beam/issues/19241 Python Dataflow integration tests 
should export the pipeline Job ID and console output to Jenkins Test Result 
section


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/25412 [Feature Request]: Google Cloud 
Bigtable Change Stream Connector
https://github.com/apache/beam/issues/23875 [Bug]: beam.Row.__eq__ returns true 
for unequal rows
https://github.com/apache/beam/issues/23848 Support for Python 3.11
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/2164

Beam Dependency Check Report (2023-03-02)

2023-03-02 Thread Apache Jenkins Server
<<< text/html; charset=UTF-8: Unrecognized >>>


Re: Dependabot questions

2023-03-02 Thread Kenneth Knowles
I do not find the dependency reports to have a useful signal-to-noise
ratio, personally. I do like dependabot if we could make our project more
legible to it, if that is an issue.

Kenn

On Tue, Feb 28, 2023 at 7:56 AM Danny McCormick via dev 
wrote:

> AFAIK Dependabot doesn't have a great replacement for this. I'm not sure
> why the dependency reports stopped, but we could probably try to fix them -
> looks like they stopped working in October -
> https://lists.apache.org/list?dev@beam.apache.org:2021-10:dependency%20report.
> We still have the job which generates the empty reports -
> https://github.com/apache/beam/blob/fed35133ee1cb9eb0c5ec8a1b13a7c75835a1510/.test-infra/jenkins/job_Dependency_Check.groovy#L43
>
> > Also, I noticed that some dependencies are outdated, yet not updated by
> Dependabot. Possibly, because a prior update PR was silenced. Is it
> possible to see the state of which dependencies are currently opted out?
>
> There's not an awesome view of this - looking through logs at
> https://github.com/apache/beam/network/updates/615364619 is the best I'm
> aware of, though it was promised a year and a half ago -
> https://github.com/dependabot/dependabot-core/issues/2255#issuecomment-838622025
>
> On Mon, Feb 27, 2023 at 8:37 PM Valentyn Tymofieiev via dev <
> dev@beam.apache.org> wrote:
>
>> I noticed that human-readable dependency reports are not being generated.
>> Can this functionality be replaced with Dependabot?
>>
>> Does Dependabot provide a view of what is currently outdated from its
>> standpoint?
>>
>> Also, I noticed that some dependencies are outdated, yet not updated by
>> Dependabot. Possibly, because a prior update PR was silenced. Is it
>> possible to see the state of which dependencies are currently opted out?
>>
>>
>> Thanks!
>>
>>
>>


Re: Beam SQL Alias issue while using With Clause

2023-03-02 Thread Andrew Pilloud via dev
Hi Talat,

I managed to turn your test case into something against Calcite. It
looks like there is a bug affecting tables that contain one or more
single element structs and no multi element structs. I've sent the
details to the Calcite mailing list here.
https://lists.apache.org/thread/tlr9hsmx09by79h91nwp2d4nv8jfwsto

I'm experimenting with ideas on how to work around this but a fix will
likely require a Calcite upgrade, which is not something I'd have time
to help with. (I'm not on the Google Beam team anymore.)

Andrew

On Wed, Feb 22, 2023 at 12:18 PM Talat Uyarer
 wrote:
>
> Hi @Andrew Pilloud
>
> Sorry for the late response. Yes your test is working fine. I changed the 
> test input structure like our input structure. Now this test also has the 
> same exception.
>
> Feb 21, 2023 2:02:28 PM 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner convertToBeamRel
> INFO: SQL:
> WITH `tempTable` AS (SELECT `panwRowTestTable`.`user_info`, 
> `panwRowTestTable`.`id`, `panwRowTestTable`.`value`
> FROM `beam`.`panwRowTestTable` AS `panwRowTestTable`
> WHERE `panwRowTestTable`.`user_info`.`name` = 'innerStr') (SELECT 
> `tempTable`.`user_info`, `tempTable`.`id`, `tempTable`.`value`
> FROM `tempTable` AS `tempTable`)
> Feb 21, 2023 2:02:28 PM 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner convertToBeamRel
> INFO: SQLPlan>
> LogicalProject(user_info=[ROW($0)], id=[$1], value=[$2])
>   LogicalFilter(condition=[=($0.name, 'innerStr')])
> LogicalProject(name=[$0.name], id=[$1], value=[$2])
>   BeamIOSourceRel(table=[[beam, panwRowTestTable]])
>
>
> fieldList must not be null, type = VARCHAR
> java.lang.AssertionError: fieldList must not be null, type = VARCHAR
>
> I dont know what is different from yours. I am sharing my version of the test 
> also.
>
>
> Index: 
> sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===
> diff --git 
> a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
>  
> b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
> --- 
> a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
>  (revision fd383fae1adc545b6b6a22b274902cda956fec49)
> +++ 
> b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
>  (date 1677017032324)
> @@ -54,6 +54,9 @@
>private static final Schema innerRowSchema =
>
> Schema.builder().addStringField("string_field").addInt64Field("long_field").build();
>
> +  private static final Schema innerPanwRowSchema =
> +  Schema.builder().addStringField("name").build();
> +
>private static final Schema innerRowWithArraySchema =
>Schema.builder()
>.addStringField("string_field")
> @@ -127,8 +130,12 @@
>.build()))
>.put(
>"basicRowTestTable",
> -  TestBoundedTable.of(FieldType.row(innerRowSchema), "col")
> -  
> .addRows(Row.withSchema(innerRowSchema).addValues("innerStr", 1L).build()))
> +  TestBoundedTable.of(FieldType.row(innerRowSchema), "col", 
> FieldType.INT64, "field")
> +  
> .addRows(Row.withSchema(innerRowSchema).addValues("innerStr", 1L).build(), 
> 1L))
> +.put(
> +  "panwRowTestTable",
> +  TestBoundedTable.of(FieldType.row(innerPanwRowSchema), 
> "user_info", FieldType.INT64, "id", FieldType.STRING, "value")
> +  
> .addRows(Row.withSchema(innerRowSchema).addValues("name", 1L).build(), 1L, 
> "some_value"))
>.put(
>"rowWithArrayTestTable",
>TestBoundedTable.of(FieldType.row(rowWithArraySchema), 
> "col")
> @@ -219,6 +226,21 @@
>  .build());
>  pipeline.run().waitUntilFinish(Duration.standardMinutes(2));
>}
> +
> +  @Test
> +  public void testBasicRowWhereField() {
> +BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(readOnlyTableProvider);
> +PCollection stream =
> +BeamSqlRelUtils.toPCollection(
> +pipeline, sqlEnv.parseQuery("WITH tempTable AS (SELECT * FROM 
> panwRowTestTable WHERE panwRowTestTable.`user_info`.`name` = 'innerStr') 
> SELECT * FROM tempTable"));
> +Schema outputSchema = Schema.builder().addRowField("col", 
> innerRowSchema).addInt64Field("field").build();
> +PAssert.that(stream)
> +.containsInAnyOrder(
> +Row.withSchema(outputSchema)
> +.addValues(Row.withSchema(innerRowSchema).addValues("name", 
> 1L).build(), 1L)
> +.build());
> +pipeline.run().waitUntilFinish(Duration.standardMinutes(2));
> +  }
>
>@Test

[VOTE] Release 2.46.0, release candidate #1

2023-03-02 Thread Danny McCormick via dev
Hi everyone,
Please review and vote on release candidate #1 for the version 2.46.0, as
follows: [ ] +1, Approve the release [ ] -1, Do not approve the release
(please provide specific comments) Reviewers are encouraged to test their
own use cases with the release candidate, and vote +1 if no issues are
found. The complete staging area is available for your review, which
includes: * GitHub Release notes [1], * the official Apache source release
to be deployed to dist.apache.org [2], which is signed with the key with
fingerprint FC383FCDE7D7E86699954EF2509872C8031C4DFB [3], * all artifacts
to be deployed to the Maven Central Repository [4], * source code tag
"v2.46.0-RC1" [5], * website pull request listing the release [6], the blog
post [6], and publishing the API reference manual [7]. * Java artifacts
were built with Gradle GRADLE_VERSION and OpenJDK/Oracle JDK JDK_VERSION. *
Python artifacts are deployed along with the source release to the
dist.apache.org [2] and PyPI[8]. * Go artifacts and documentation are
available at pkg.go.dev [9] * Validation sheet with a tab for 2.46.0
release to help with validation [10]. * Docker images published to Docker
Hub [11].
* PR to run tests against release branch [12]. The vote will be open for at
least 72 hours. It is adopted by majority approval, with at least 3 PMC
affirmative votes. For guidelines on how to try the release in your
projects, check out our blog post at /blog/validate-beam-release/. Thanks,
Danny [1] https://github.com/apache/beam/milestone/9 [2]
https://dist.apache.org/repos/dist/dev/beam/2.46.0/ [3]
https://dist.apache.org/repos/dist/release/beam/KEYS [4]
https://repository.apache.org/content/repositories/orgapachebeam-1306/ [5]
https://github.com/apache/beam/tree/v2.46.0-RC1 [6]
https://github.com/apache/beam/pull/25693 [7]
https://github.com/apache/beam-site/pull/641 [8]
https://pypi.org/project/apache-beam/2.46.0rc1/ [9]
https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.46.0-RC1/go/pkg/beam
[10]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=247587190
[11] https://hub.docker.com/search?q=apache%2Fbeam&type=image
[12] https://github.com/apache/beam/pull/25600


Re: Beam SQL Alias issue while using With Clause

2023-03-02 Thread Talat Uyarer via dev
Hi Andrew,

Thank you so much for your help. Sorry to hear you changed team :(  I can
handle calcite upgrades if there is a fix. I was working on calcite upgrade
but then we started having so many issues. That's why I stopped doing it.

Talat

On Thu, Mar 2, 2023 at 11:56 AM Andrew Pilloud  wrote:

> Hi Talat,
>
> I managed to turn your test case into something against Calcite. It
> looks like there is a bug affecting tables that contain one or more
> single element structs and no multi element structs. I've sent the
> details to the Calcite mailing list here.
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread_tlr9hsmx09by79h91nwp2d4nv8jfwsto&d=DwIFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=zJaLiteP9qPsCpsYH_nZTe5CX525Dz56whg44LRafjvy3wE_-_eJrOOM9OtOuoVr&s=g36wnBGvi7DQG7gvljaG08vXIhROyCoz5vWBBRS43Ag&e=
>
> I'm experimenting with ideas on how to work around this but a fix will
> likely require a Calcite upgrade, which is not something I'd have time
> to help with. (I'm not on the Google Beam team anymore.)
>
> Andrew
>
> On Wed, Feb 22, 2023 at 12:18 PM Talat Uyarer
>  wrote:
> >
> > Hi @Andrew Pilloud
> >
> > Sorry for the late response. Yes your test is working fine. I changed
> the test input structure like our input structure. Now this test also has
> the same exception.
> >
> > Feb 21, 2023 2:02:28 PM
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner convertToBeamRel
> > INFO: SQL:
> > WITH `tempTable` AS (SELECT `panwRowTestTable`.`user_info`,
> `panwRowTestTable`.`id`, `panwRowTestTable`.`value`
> > FROM `beam`.`panwRowTestTable` AS `panwRowTestTable`
> > WHERE `panwRowTestTable`.`user_info`.`name` = 'innerStr') (SELECT
> `tempTable`.`user_info`, `tempTable`.`id`, `tempTable`.`value`
> > FROM `tempTable` AS `tempTable`)
> > Feb 21, 2023 2:02:28 PM
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner convertToBeamRel
> > INFO: SQLPlan>
> > LogicalProject(user_info=[ROW($0)], id=[$1], value=[$2])
> >   LogicalFilter(condition=[=($0.name, 'innerStr')])
> > LogicalProject(name=[$0.name], id=[$1], value=[$2])
> >   BeamIOSourceRel(table=[[beam, panwRowTestTable]])
> >
> >
> > fieldList must not be null, type = VARCHAR
> > java.lang.AssertionError: fieldList must not be null, type = VARCHAR
> >
> > I dont know what is different from yours. I am sharing my version of the
> test also.
> >
> >
> > Index:
> sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
> > IDEA additional info:
> > Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> > <+>UTF-8
> > ===
> > diff --git
> a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
> b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
> > ---
> a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
> (revision fd383fae1adc545b6b6a22b274902cda956fec49)
> > +++
> b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
> (date 1677017032324)
> > @@ -54,6 +54,9 @@
> >private static final Schema innerRowSchema =
> >
> Schema.builder().addStringField("string_field").addInt64Field("long_field").build();
> >
> > +  private static final Schema innerPanwRowSchema =
> > +  Schema.builder().addStringField("name").build();
> > +
> >private static final Schema innerRowWithArraySchema =
> >Schema.builder()
> >.addStringField("string_field")
> > @@ -127,8 +130,12 @@
> >.build()))
> >.put(
> >"basicRowTestTable",
> > -  TestBoundedTable.of(FieldType.row(innerRowSchema),
> "col")
> > -
> .addRows(Row.withSchema(innerRowSchema).addValues("innerStr", 1L).build()))
> > +  TestBoundedTable.of(FieldType.row(innerRowSchema),
> "col", FieldType.INT64, "field")
> > +
> .addRows(Row.withSchema(innerRowSchema).addValues("innerStr", 1L).build(),
> 1L))
> > +.put(
> > +  "panwRowTestTable",
> > +
> TestBoundedTable.of(FieldType.row(innerPanwRowSchema), "user_info",
> FieldType.INT64, "id", FieldType.STRING, "value")
> > +
> .addRows(Row.withSchema(innerRowSchema).addValues("name", 1L).build(), 1L,
> "some_value"))
> >.put(
> >"rowWithArrayTestTable",
> >
> TestBoundedTable.of(FieldType.row(rowWithArraySchema), "col")
> > @@ -219,6 +226,21 @@
> >  .build());
> >  pipeline.run().waitUntilFinish(Duration.standardMinutes(2));
> >}
> > +
> > +  @Test
> > +  public void testBasicRowWhereField() {
> > +BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(readOnlyTableProvider);
> > +PCollection stream =
> > +BeamSqlRelUtils.toPCollection(
> > +pipeline