The good progress continues! One way to see the issue & PR activity where "flaky" is mentioned: https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc Thank you to the contributors and PR reviewers!
Here's the next flaky test for someone to fix: https://github.com/apache/pulsar/issues/6646 (reported a long time ago, I added some example of recent failures) It's about PulsarFunctionsTest. This test class contributes to a lot of failures. I have uploaded a list of failures to https://gist.github.com/lhotari/9bae3e16674c297a6bbc2b4831515a74 . I haven't validated that all failures are from flaky test runs. It's possible that some are from a build which broke the test. 1) Who could pick up fixing the multiple issues in PulsarFunctionsTest, https://github.com/apache/pulsar/issues/6646 ? You can comment directly on issue #6646 and start working on it if you wish. It would be a really important fix to have. 2) Another one: https://github.com/apache/pulsar/issues/9431 3) The 3rd one might be a quick fix, it's a NPE in cleanup: https://github.com/apache/pulsar/issues/9432 I'm looking for the sprinting to continue. It seems that the issues get fixed sooner than I can report more of them. :) BR, Lari On Mon, Feb 1, 2021 at 8:18 PM Lari Hotari <lari.hot...@sagire.fi> wrote: > Dear Pulsar community members, > > Thanks for picking up the work so quickly! I noticed that at least Renkai > and Michael already pushed pull requests to fix the flaky tests that were > mentioned in the previous email. Some of the PRs have already been merged. > > Here are 3 more flaky tests with links to a lot of example failures: > https://github.com/apache/pulsar/issues/9407 > https://github.com/apache/pulsar/issues/9408 > https://github.com/apache/pulsar/issues/9409 > > I'll report more flaky tests tomorrow. Today I was working on some tooling > to mine the logs and gather some statistics. > > I parsed the logs of the few last days and these are the test methods that > fail the most: > > 273 org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete > 102 org.apache.pulsar.compaction.CompactionTest.cleanup > 81 org.apache.pulsar.admin.cli.PulsarAdminToolTest.topics > 51 > org.apache.pulsar.broker.loadbalance.LoadBalancerTest.testLeaderElection > 45 org.apache.pulsar.io.PulsarFunctionE2ETest.shutdown > 40 > > org.apache.pulsar.broker.service.ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions > 36 org.apache.pulsar.websocket.proxy.ProxyPublishConsumeTest.cleanup > 30 > org.apache.pulsar.functions.worker.PulsarFunctionLocalRunTest.shutdown > 30 > > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationTopicPatternFunction > 29 > > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationFunction > 27 > > org.apache.pulsar.client.api.v1.V1_ProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect > 26 > > org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3 > 22 > org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail > 22 > > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaLoggingFunction > 21 org.apache.pulsar.tests.integration.SmokeTest.setup > 20 > org.apache.pulsar.client.impl.MessageIdTest.testChecksumReconnection > 20 > org.apache.pulsar.client.impl.MessageIdTest.testChecksumVersionComptability > 19 > > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionLocalRun > 19 > > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testAutoSchemaFunction > 14 > > org.apache.pulsar.client.api.SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect > 14 > > org.apache.pulsar.broker.service.MessagePublishBufferThrottleTest.testBlockByPublishRateLimiting > 14 > > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testSlidingCountWindowTest > 13 > > org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2.testResetCursorCompatibility > 12 > > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonPublishFunction > 12 > > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationTopicPatternFunction > 12 > > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunctionWithExtraDeps > 12 > > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationZipFunction > 12 > > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionNegAck > 12 > > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunction > 12 org.apache.pulsar.compaction.CompactorTest.cleanup > 12 > > org.apache.pulsar.broker.service.BrokerServiceAutoSubscriptionCreationTest.cleanupTest > 12 org.apache.pulsar.websocket.proxy.ProxyAuthenticationTest.cleanup > 12 > org.apache.pulsar.websocket.proxy.v1.V1_ProxyAuthenticationTest.cleanup > 12 > > org.apache.pulsar.client.impl.BatchMessageIndexAckTest.testBatchMessageIndexAckForSharedSubscription > 11 > > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaPublishFunction > 11 > > org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup > > I'll report more flaky tests after I have checked that my tooling is > producing correct results. > > For contributing to fix flaky tests, please pick a flaky test for fixing > from the reported ones: > > https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen > > We can all join the #testing channel on Pulsar Slack to share detailed > tips and tricks while working on fixing flaky tests. > > See you, > > BR, Lari > > > On Fri, Jan 29, 2021 at 8:26 PM Lari Hotari <lari.hot...@sagire.fi> wrote: > >> Dear Pulsar community members, >> >> In order to improve our CI, we will have to fix the flaky tests. In some >> cases it might be necessary to replace an existing test with a redesigned >> test. >> >> The draft PIP "Changes to flaky test handling" document >> <https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing> >> lists >> the top 10 flaky tests. A lot of them have already been address by pull >> requests in the past week or so. >> >> This is the list of recent PRs that fix flaky tests from the top 10 flaky >> tests list: >> https://github.com/apache/pulsar/pull/9286 >> https://github.com/apache/pulsar/pull/9243 >> https://github.com/apache/pulsar/pull/9258 >> https://github.com/apache/pulsar/pull/9356 >> >> These are the GH issues for the remaining ones in the top 10 flaky tests >> list: >> https://github.com/apache/pulsar/issues/6368 >> https://github.com/apache/pulsar/issues/9369 >> https://github.com/apache/pulsar/issues/9368 >> >> If you would like to help to fix flaky tests you can pick one of the open >> issues above. Just add a comment on the issue when you start working on it >> so that we can coordinate activities. >> >> It is also helpful to report a flaky test when you encounter one. I've >> been using this type of template for reporting a flaky test: >> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 . The >> issues #9368 and #9369 have been reported using this template. >> Search for the test name before reporting so that we don't end up with >> duplicates. >> >> The issues #6368, #9369 and #9368 are the 3 next important issues to fix. >> I'm planning to create a more extensive list of the flaky failures so that >> we can target the most flaky ones when we continue fixing the flaky tests. >> I have some scripts in development to assist in mining the Pulsar Github >> Action workflow run logs. >> >> This is a search to find flaky issues in Pulsar GH issues: >> >> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen >> >> Looking forward to the contributions for fixing flaky tests, >> >> BR, >> >> Lari >> >