Dear Pulsar community members, Here's a report of the flaky tests in Pulsar CI during the observation period of 2021-10-04 - 2021-11-01 . The full report has been also updated to a Google Sheet, https://docs.google.com/spreadsheets/d/165FHpHjs5fHccSsmQM4beeg6brn-zfUjcrXf6xAu4yQ/edit?usp=sharing Test method name Number of build failures due to this test org.apache.pulsar.broker.service.ReplicatorTest.testTopicReplicatedAndProducerCreate 56 org.apache.pulsar.broker.service.ReplicatorRateLimiterTest.testReplicatorRateLimiterDynamicallyChange 50 org.apache.pulsar.broker.service.persistent.SimpleProducerConsumerTestStreamingDispatcherTest.testConcurrentConsumerReceiveWhileReconnect 48 org.apache.pulsar.broker.admin.TopicPoliciesTest.testDisableSubscribeRate 45 org.apache.pulsar.broker.admin.TopicPoliciesTest.testGetDispatchRateApplied 38 org.apache.pulsar.client.api.KeySharedSubscriptionTest.testRemoveFirstConsumer 29 org.apache.pulsar.functions.source.batch.BatchSourceExecutorTest.testLifeCycle 28 org.apache.pulsar.broker.admin.TopicPoliciesTest.testGetSetSubscribeRate 23 org.apache.pulsar.broker.admin.TopicPoliciesTest.testGetSubscribeRateApplied 22 org.apache.pulsar.broker.service.PersistentTopicE2ETest.testBrokerConnectionStats 22 org.apache.pulsar.functions.source.batch.BatchSourceExecutorTest.testPushLifeCycle 22 org.apache.pulsar.broker.stats.PrometheusMetricsTest.testCompaction 18 org.apache.pulsar.broker.service.persistent.PersistentTopicStreamingDispatcherE2ETest.testBrokerConnectionStats 18 org.apache.pulsar.metadata.LocalMemoryMetadataStoreTest.testSharedInstance 16 org.apache.pulsar.broker.service.persistent.PersistentSubscriptionMessageDispatchStreamingDispatcherThrottlingTest.testRelativeMessageRateLimitingThrottling 14 org.apache.pulsar.tests.integration.io.sources.debezium.PulsarDebeziumSourcesTest.testDebeziumMsSqlSource 12 org.apache.pulsar.io.elasticsearch.ElasticSearchClientTests.testBulkRetry 12 org.apache.pulsar.tests.integration.functions.java.PulsarFunctionsJavaProcessTest.testJavaExclamationFunction 8 org.apache.pulsar.tests.integration.transaction.TransactionTest.transferNormalTest 8 org.apache.pulsar.metadata.ZKSessionTest.testReacquireLeadershipAfterSessionLost 7 org.apache.pulsar.broker.service.ReplicatorTest.testConcurrentReplicator 7
Markdown formatted summary reports for each test class can be accessed at https://github.com/lhotari/pulsar-flakes/tree/master/2021-10-04-to-2021-11-01 (type 't' to activate search by filename in GitHub UI, then type the test class name to find the report) Some of the flaky tests seen in the above report have already been fixed. One of such test is LocalMemoryMetadataStoreTest.testSharedInstance (fixed by https://github.com/apache/pulsar/pull/12540) . The test report doesn't capture all test failures and there might also be some other flakiness issues such as problems where the test environment runs out of memory. One of such issue was recently fixed by https://github.com/apache/pulsar/pull/12547 . *We need more help in addressing the flaky tests. Please join the efforts so that we can get CI to a more stable state. * To coordinate the work, 1) please search for an existing issues or search for all flaky issues with "flaky" or the test class name (without package) in the search: https://github.com/apache/pulsar/issues?q=is%3Aopen+flaky+sort%3Aupdated-desc 2) If there isn't an issue for a particular flaky test failure that you'd like to fix, please create an issue using the "Flaky test" template at https://github.com/apache/pulsar/issues/new/choose 3) Please comment on the issue that you are working on it. We have a few active contributors working on the flaky tests, thanks for the contributions. I'm looking forward to more contributors joining the efforts. Please join the #testing channel on Slack if you'd like to ask questions and tips about reproducing flaky tests locally and how to fix them. Sharing stories about fixing flaky tests is also helpful for sharing the knowledge about how flaky tests can be fixed. That's a valuable way to contribute too. *Some of the flaky tests might be actual real production code bugs. Fixing the flaky test might result in fixing a real production code bug.* *Current contributors, please keep up the good work! * *New contributors, you are welcome to join the efforts! You will learn about Pulsar and its internals as a side-effect. * *If you'd love to learn Pulsar internals and Pulsar OSS development, start by fixing flaky tests.* BR, -Lari