Great work, Vitaly and your team! Thanks a lot! On Fri, Apr 11, 2025 at 9:48 AM Vitaly Terentyev via dev < dev@beam.apache.org> wrote:
> Dear Community, > > March was a dynamic month for Beam Infrastructure & Health. We began and > ended the month with a solid health level of 98.38%, but encountered two > temporary dips due to a combination of emerging issues and system-level > changes. > > Health Trends and Incident Analysis: > > - > > The first drop was linked to scattered failures across multiple areas > of the codebase, including Python, Java, Go, and both Flink and Spark > runners. These issues were quickly triaged and mitigated. > - > > The second drop occurred due to a method signature change that > required an update to the Dataflow Java container version, alongside a > group of failing XVR workflows caused by an integer overflow during > varint32 encoding. These were promptly resolved. > > Thanks to rapid resolution efforts, the system health recovered to 98.38% > by the end of March. Please see the attached chart for March's Health > Status trends. > > Key Improvements: > > - > > Flaky Test Fixes: > - > > PostCommit and PreCommit jobs across Java, Python, SQL, and Go. > - > > XVR workflows and other runner related jobs. > - > > You can find the full list of closed or fixed 21 issues here > > <https://github.com/apache/beam/issues?q=is%3Aissue%20state%3Aclosed%20label%3Aflaky_test%20closed%3A%3E2025-03-01%20%20closed%3A%3C2025-03-31%20(involves%3AAmar3tto%20OR%20involves%3Aakashorabek)%20> > . > - > > Performance Metrics Update > - > > Added Performance Metrics for Python ML pipelines. > - > > Updated Performance Metrics graphs on the Beam website > <https://beam.apache.org/performance/> using Looker-generated > images up to Beam 2.64.0. > > Currently failing workflows > > - > > Core Infrastructure (1) > - > > Publish Beam SDK Snapshots > <https://github.com/apache/beam/issues/32161> > - > > Important Signals (2) > - > > PostCommit Python Arm <https://github.com/apache/beam/issues/30760> > - > > PostCommit Python <https://github.com/apache/beam/issues/30513> > - > > Dataflow Java Tests (1) > - > > PostCommit XVR GoUsingJava Dataflow > <https://github.com/apache/beam/issues/30519> > - > > Python Runners Tests (1) > - > > Python ValidatesContainer Dataflow ARM > <https://github.com/apache/beam/issues/33065> > - > > Misc Tests (2) > - > > IcebergIO Integration Tests > <https://github.com/apache/beam/issues/31931> > - > > PostCommit XVR Flink <https://github.com/apache/beam/issues/31418> > > Ongoing and Future Work > > - > > Continue stabilizing newly emerging issues, with particular attention > to Python-related workflows. > - > > Investigate and fix instability in IcebergIO Integration Tests > workflow. > - > > Maintain high visibility of flaky and infra issues via our Health > Dashboard. > > As always, if you notice infrastructure-related issues, feel free to open > a GitHub issue with the label “infra > <https://github.com/apache/beam/issues?q=is%3Aissue%20state%3Aopen%20label%3Ainfra>”, > and our team will triage and handle it. > > Your engagement makes a big difference — and is always welcome. > Best regards, > Vitaly Terentyev > Akvelon Inc. > Apache Beam Infrastructure Team > >