The good progress continues!
One way to see the issue & PR activity where "flaky" is mentioned:
https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc
Thank you to the contributors and PR reviewers!

Here's the next flaky test for someone to fix:
https://github.com/apache/pulsar/issues/6646 (reported a long time ago, I
added some example of recent failures)
It's about PulsarFunctionsTest. This test class contributes to a lot of
failures. I have uploaded a list of failures to
https://gist.github.com/lhotari/9bae3e16674c297a6bbc2b4831515a74 .
I haven't validated that all failures are from flaky test runs. It's
possible that some are from a build which broke the test.

1) Who could pick up fixing the multiple issues in PulsarFunctionsTest,
https://github.com/apache/pulsar/issues/6646 ? You can comment directly on
issue #6646 and start working on it if you wish. It would be a really
important fix to have.

2) Another one: https://github.com/apache/pulsar/issues/9431

3) The 3rd one might be a quick fix, it's a NPE in cleanup:
https://github.com/apache/pulsar/issues/9432

I'm looking for the sprinting to continue. It seems that the issues get
fixed sooner than I can report more of them. :)

BR, Lari


On Mon, Feb 1, 2021 at 8:18 PM Lari Hotari <lari.hot...@sagire.fi> wrote:

> Dear Pulsar community members,
>
> Thanks for picking up the work so quickly! I noticed that at least Renkai
> and Michael already pushed pull requests to fix the flaky tests that were
> mentioned in the previous email. Some of the PRs have already been merged.
>
> Here are 3 more flaky tests with links to a lot of example failures:
> https://github.com/apache/pulsar/issues/9407
> https://github.com/apache/pulsar/issues/9408
> https://github.com/apache/pulsar/issues/9409
>
> I'll report more flaky tests tomorrow. Today I was working on some tooling
> to mine the logs and gather some statistics.
>
> I parsed the logs of the few last days and these are the test methods that
> fail the most:
>
> 273     org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete
> 102     org.apache.pulsar.compaction.CompactionTest.cleanup
> 81      org.apache.pulsar.admin.cli.PulsarAdminToolTest.topics
> 51
>  org.apache.pulsar.broker.loadbalance.LoadBalancerTest.testLeaderElection
> 45      org.apache.pulsar.io.PulsarFunctionE2ETest.shutdown
> 40
>  
> org.apache.pulsar.broker.service.ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions
> 36      org.apache.pulsar.websocket.proxy.ProxyPublishConsumeTest.cleanup
> 30
>  org.apache.pulsar.functions.worker.PulsarFunctionLocalRunTest.shutdown
> 30
>  
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationTopicPatternFunction
> 29
>  
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationFunction
> 27
>  
> org.apache.pulsar.client.api.v1.V1_ProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
> 26
>  
> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3
> 22
>  org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
> 22
>  
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaLoggingFunction
> 21      org.apache.pulsar.tests.integration.SmokeTest.setup
> 20
>  org.apache.pulsar.client.impl.MessageIdTest.testChecksumReconnection
> 20
>  org.apache.pulsar.client.impl.MessageIdTest.testChecksumVersionComptability
> 19
>  
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionLocalRun
> 19
>  
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testAutoSchemaFunction
> 14
>  
> org.apache.pulsar.client.api.SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
> 14
>  
> org.apache.pulsar.broker.service.MessagePublishBufferThrottleTest.testBlockByPublishRateLimiting
> 14
>  
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testSlidingCountWindowTest
> 13
>  
> org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2.testResetCursorCompatibility
> 12
>  
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonPublishFunction
> 12
>  
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationTopicPatternFunction
> 12
>  
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunctionWithExtraDeps
> 12
>  
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationZipFunction
> 12
>  
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionNegAck
> 12
>  
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunction
> 12      org.apache.pulsar.compaction.CompactorTest.cleanup
> 12
>  
> org.apache.pulsar.broker.service.BrokerServiceAutoSubscriptionCreationTest.cleanupTest
> 12      org.apache.pulsar.websocket.proxy.ProxyAuthenticationTest.cleanup
> 12
>  org.apache.pulsar.websocket.proxy.v1.V1_ProxyAuthenticationTest.cleanup
> 12
>  
> org.apache.pulsar.client.impl.BatchMessageIndexAckTest.testBatchMessageIndexAckForSharedSubscription
> 11
>  
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaPublishFunction
> 11
>  
> org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup
>
> I'll report more flaky tests after I have checked that my tooling is
> producing correct results.
>
> For contributing to fix flaky tests, please pick a flaky test for fixing
> from the reported ones:
>
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>
> We can all join the #testing channel on Pulsar Slack to share detailed
> tips and tricks while working on fixing flaky tests.
>
> See you,
>
> BR, Lari
>
>
> On Fri, Jan 29, 2021 at 8:26 PM Lari Hotari <lari.hot...@sagire.fi> wrote:
>
>> Dear Pulsar community members,
>>
>> In order to improve our CI, we will have to fix the flaky tests. In some
>> cases it might be necessary to replace an existing test with a redesigned
>> test.
>>
>> The draft PIP "Changes to flaky test handling" document
>> <https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing>
>>  lists
>> the top 10 flaky tests. A lot of them have already been address by pull
>> requests in the past week or so.
>>
>> This is the list of recent PRs that fix flaky tests from the top 10 flaky
>> tests list:
>> https://github.com/apache/pulsar/pull/9286
>> https://github.com/apache/pulsar/pull/9243
>> https://github.com/apache/pulsar/pull/9258
>> https://github.com/apache/pulsar/pull/9356
>>
>> These are the GH issues for the remaining ones in the top 10 flaky tests
>> list:
>> https://github.com/apache/pulsar/issues/6368
>> https://github.com/apache/pulsar/issues/9369
>> https://github.com/apache/pulsar/issues/9368
>>
>> If you would like to help to fix flaky tests you can pick one of the open
>> issues above. Just add a comment on the issue when you start working on it
>> so that we can coordinate activities.
>>
>> It is also helpful to report a flaky test when you encounter one. I've
>> been using this type of template for reporting a flaky test:
>> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 . The
>> issues #9368 and #9369 have been reported using this template.
>> Search for the test name before reporting so that we don't end up with
>> duplicates.
>>
>> The issues #6368, #9369 and #9368 are the 3 next important issues to fix.
>> I'm planning to create a more extensive list of the flaky failures so that
>> we can target the most flaky ones when we continue fixing the flaky tests.
>> I have some scripts in development to assist in mining the Pulsar Github
>> Action workflow run logs.
>>
>> This is a search to find flaky issues in Pulsar GH issues:
>>
>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>>
>> Looking forward to the contributions for fixing flaky tests,
>>
>> BR,
>>
>> Lari
>>
>

Reply via email to