Dear Pulsar community members,

Here's a report of the flaky tests in Pulsar CI during the observation
period of 2022-05-26 to 2022-06-02 .
The full report is available as a Google Sheet,
https://docs.google.com/spreadsheets/d/165FHpHjs5fHccSsmQM4beeg6brn-zfUjcrXf6xAu4yQ/edit?usp=sharing

The report contains a subset of the test failures.
The flaky tests are observed from builds of merged PRs.
The GitHub Actions logs will be checked for builds where the SHA of the
head of the PR matches the SHA which got merged.
This ensures that all found exceptions are real flakes, since no changes
were made to the PR to make the tests pass later
so that the PR was merged successfully.

Here are the most flaky test methods:
Test method name        Number of build failures due to this test
org.apache.pulsar.tests.integration.functions.java.PulsarFunctionsJavaThreadTest.testJavaLoggingFunction
        32
org.apache.pulsar.client.impl.MessageImplTest.testMessageBrokerAndEntryMetadataTimestampMissed
  18
org.apache.pulsar.client.impl.MultiTopicsConsumerImplTest.testParallelSubscribeAsync
    12
org.apache.pulsar.functions.worker.PulsarFunctionTlsTest.tearDown       10
org.apache.pulsar.client.impl.BinaryProtoLookupServiceTest.maxLookupRedirectsTest1
      8
org.apache.pulsar.tests.integration.functions.python.PulsarFunctionsPythonProcessTest.testPythonExclamationFunction
     8
org.apache.pulsar.broker.admin.PersistentTopicsTest.testTriggerCompactionTopic  
7
org.apache.pulsar.broker.service.RackAwareTest.testRackUpdate   6
org.apache.pulsar.metadata.bookkeeper.PulsarLedgerIdGeneratorTest.testGenerateLedgerId
  5
org.apache.pulsar.broker.service.RackAwareTest.testPlacement    5
org.apache.pulsar.client.impl.ClientCnxTest.testClientCnxTimeout        5
org.apache.pulsar.tests.integration.functions.PulsarStateTest.testPythonWordCountFunction
       5
org.apache.pulsar.metadata.ZKSessionTest.testSessionLost        4
org.apache.pulsar.tests.integration.io.sources.debezium.PulsarDebeziumOracleSourceTest.testDebeziumOracleDbSource
       4
org.apache.pulsar.client.impl.ConnectionTimeoutTest.testLowTimeout      4
org.apache.pulsar.client.impl.ProducerCloseTest.brokerCloseTopicTest    3

Markdown formatted summary reports for each test class can be accessed at
https://github.com/lhotari/pulsar-flakes/tree/master/2022-05-26-to-2022-06-02
The summary report links are now available in the Google sheet
https://docs.google.com/spreadsheets/d/165FHpHjs5fHccSsmQM4beeg6brn-zfUjcrXf6xAu4yQ/edit?usp=sharing

We need more help in addressing the flaky tests. Please join the efforts
so that we can get CI to a more stable state.

To coordinate the work,
1) please search for an existing issues or search for all flaky issues with
"flaky" or the test class name (without package) in the search:
https://github.com/apache/pulsar/issues?q=is%3Aopen+flaky+sort%3Aupdated-desc
2) If there isn't an issue for a particular flaky test failure that you'd
like to fix, please create an issue using the "Flaky test" template at
https://github.com/apache/pulsar/issues/new/choose
3) Please comment on the issue that you are working on it.

We have a few active contributors working on the flaky tests, thanks for
the contributions.

I'm looking forward to more contributors joining the efforts. Please join
the #testing channel on Slack if you'd like to ask questions and tips about
reproducing flaky tests locally and how to fix them.
Sharing stories about fixing flaky tests is also helpful for sharing the
knowledge about how flaky tests can be fixed. That's also a valuable way to
contribute.
Some flaky tests might be actual real production code bugs. Fixing
the flaky test might result in fixing a real production code bug.

Current contributors, please keep up the good work! 
New contributors, you are welcome to join the efforts! You will learn about 
Pulsar and its internals as a side effect.  If you'd love to learn Pulsar 
internals and Pulsar OSS development, start by fixing flaky tests. :)

BR, -Lari

Reply via email to