Great work, Vitaly and your team! Thanks a lot!

On Fri, Apr 11, 2025 at 9:48 AM Vitaly Terentyev via dev <
dev@beam.apache.org> wrote:

> Dear Community,
>
> March was a dynamic month for Beam Infrastructure & Health. We began and
> ended the month with a solid health level of 98.38%, but encountered two
> temporary dips due to a combination of emerging issues and system-level
> changes.
>
> Health Trends and Incident Analysis:
>
>    -
>
>    The first drop was linked to scattered failures across multiple areas
>    of the codebase, including Python, Java, Go, and both Flink and Spark
>    runners. These issues were quickly triaged and mitigated.
>    -
>
>    The second drop occurred due to a method signature change that
>    required an update to the Dataflow Java container version, alongside a
>    group of failing XVR workflows caused by an integer overflow during
>    varint32 encoding. These were promptly resolved.
>
> Thanks to rapid resolution efforts, the system health recovered to 98.38%
> by the end of March. Please see the attached chart for March's Health
> Status trends.
>
> Key Improvements:
>
>    -
>
>    Flaky Test Fixes:
>    -
>
>       PostCommit and PreCommit jobs across Java, Python, SQL, and Go.
>       -
>
>       XVR workflows and other runner related jobs.
>       -
>
>       You can find the full list of closed or fixed 21 issues here
>       
> <https://github.com/apache/beam/issues?q=is%3Aissue%20state%3Aclosed%20label%3Aflaky_test%20closed%3A%3E2025-03-01%20%20closed%3A%3C2025-03-31%20(involves%3AAmar3tto%20OR%20involves%3Aakashorabek)%20>
>       .
>       -
>
>    Performance Metrics Update
>    -
>
>       Added Performance Metrics for Python ML pipelines.
>       -
>
>       Updated Performance Metrics graphs on the Beam website
>       <https://beam.apache.org/performance/> using Looker-generated
>       images up to Beam 2.64.0.
>
> Currently failing workflows
>
>    -
>
>    Core Infrastructure (1)
>    -
>
>       Publish Beam SDK Snapshots
>       <https://github.com/apache/beam/issues/32161>
>       -
>
>    Important Signals (2)
>    -
>
>       PostCommit Python Arm <https://github.com/apache/beam/issues/30760>
>       -
>
>       PostCommit Python <https://github.com/apache/beam/issues/30513>
>       -
>
>    Dataflow Java Tests (1)
>    -
>
>       PostCommit XVR GoUsingJava Dataflow
>       <https://github.com/apache/beam/issues/30519>
>       -
>
>    Python Runners Tests (1)
>    -
>
>       Python ValidatesContainer Dataflow ARM
>       <https://github.com/apache/beam/issues/33065>
>       -
>
>    Misc Tests (2)
>    -
>
>       IcebergIO Integration Tests
>       <https://github.com/apache/beam/issues/31931>
>       -
>
>       PostCommit XVR Flink <https://github.com/apache/beam/issues/31418>
>
> Ongoing and Future Work
>
>    -
>
>    Continue stabilizing newly emerging issues, with particular attention
>    to Python-related workflows.
>    -
>
>    Investigate and fix instability in IcebergIO Integration Tests
>    workflow.
>    -
>
>    Maintain high visibility of flaky and infra issues via our Health
>    Dashboard.
>
> As always, if you notice infrastructure-related issues, feel free to open
> a GitHub issue with the label “infra
> <https://github.com/apache/beam/issues?q=is%3Aissue%20state%3Aopen%20label%3Ainfra>”,
> and our team will triage and handle it.
>
> Your engagement makes a big difference — and is always welcome.
> Best regards,
> Vitaly Terentyev
> Akvelon Inc.
> Apache Beam Infrastructure Team
>
>

Reply via email to