Dear Pulsar community members, Thanks for picking up the work so quickly! I noticed that at least Renkai and Michael already pushed pull requests to fix the flaky tests that were mentioned in the previous email. Some of the PRs have already been merged.
Here are 3 more flaky tests with links to a lot of example failures: https://github.com/apache/pulsar/issues/9407 https://github.com/apache/pulsar/issues/9408 https://github.com/apache/pulsar/issues/9409 I'll report more flaky tests tomorrow. Today I was working on some tooling to mine the logs and gather some statistics. I parsed the logs of the few last days and these are the test methods that fail the most: 273 org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete 102 org.apache.pulsar.compaction.CompactionTest.cleanup 81 org.apache.pulsar.admin.cli.PulsarAdminToolTest.topics 51 org.apache.pulsar.broker.loadbalance.LoadBalancerTest.testLeaderElection 45 org.apache.pulsar.io.PulsarFunctionE2ETest.shutdown 40 org.apache.pulsar.broker.service.ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions 36 org.apache.pulsar.websocket.proxy.ProxyPublishConsumeTest.cleanup 30 org.apache.pulsar.functions.worker.PulsarFunctionLocalRunTest.shutdown 30 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationTopicPatternFunction 29 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationFunction 27 org.apache.pulsar.client.api.v1.V1_ProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect 26 org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3 22 org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail 22 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaLoggingFunction 21 org.apache.pulsar.tests.integration.SmokeTest.setup 20 org.apache.pulsar.client.impl.MessageIdTest.testChecksumReconnection 20 org.apache.pulsar.client.impl.MessageIdTest.testChecksumVersionComptability 19 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionLocalRun 19 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testAutoSchemaFunction 14 org.apache.pulsar.client.api.SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect 14 org.apache.pulsar.broker.service.MessagePublishBufferThrottleTest.testBlockByPublishRateLimiting 14 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testSlidingCountWindowTest 13 org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2.testResetCursorCompatibility 12 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonPublishFunction 12 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationTopicPatternFunction 12 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunctionWithExtraDeps 12 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationZipFunction 12 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionNegAck 12 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunction 12 org.apache.pulsar.compaction.CompactorTest.cleanup 12 org.apache.pulsar.broker.service.BrokerServiceAutoSubscriptionCreationTest.cleanupTest 12 org.apache.pulsar.websocket.proxy.ProxyAuthenticationTest.cleanup 12 org.apache.pulsar.websocket.proxy.v1.V1_ProxyAuthenticationTest.cleanup 12 org.apache.pulsar.client.impl.BatchMessageIndexAckTest.testBatchMessageIndexAckForSharedSubscription 11 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaPublishFunction 11 org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup I'll report more flaky tests after I have checked that my tooling is producing correct results. For contributing to fix flaky tests, please pick a flaky test for fixing from the reported ones: https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen We can all join the #testing channel on Pulsar Slack to share detailed tips and tricks while working on fixing flaky tests. See you, BR, Lari On Fri, Jan 29, 2021 at 8:26 PM Lari Hotari <lari.hot...@sagire.fi> wrote: > Dear Pulsar community members, > > In order to improve our CI, we will have to fix the flaky tests. In some > cases it might be necessary to replace an existing test with a redesigned > test. > > The draft PIP "Changes to flaky test handling" document > <https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing> > lists > the top 10 flaky tests. A lot of them have already been address by pull > requests in the past week or so. > > This is the list of recent PRs that fix flaky tests from the top 10 flaky > tests list: > https://github.com/apache/pulsar/pull/9286 > https://github.com/apache/pulsar/pull/9243 > https://github.com/apache/pulsar/pull/9258 > https://github.com/apache/pulsar/pull/9356 > > These are the GH issues for the remaining ones in the top 10 flaky tests > list: > https://github.com/apache/pulsar/issues/6368 > https://github.com/apache/pulsar/issues/9369 > https://github.com/apache/pulsar/issues/9368 > > If you would like to help to fix flaky tests you can pick one of the open > issues above. Just add a comment on the issue when you start working on it > so that we can coordinate activities. > > It is also helpful to report a flaky test when you encounter one. I've > been using this type of template for reporting a flaky test: > https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 . The > issues #9368 and #9369 have been reported using this template. > Search for the test name before reporting so that we don't end up with > duplicates. > > The issues #6368, #9369 and #9368 are the 3 next important issues to fix. > I'm planning to create a more extensive list of the flaky failures so that > we can target the most flaky ones when we continue fixing the flaky tests. > I have some scripts in development to assist in mining the Pulsar Github > Action workflow run logs. > > This is a search to find flaky issues in Pulsar GH issues: > > https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen > > Looking forward to the contributions for fixing flaky tests, > > BR, > > Lari >