Dear Pulsar community members,

Thanks for picking up the work so quickly! I noticed that at least Renkai
and Michael already pushed pull requests to fix the flaky tests that were
mentioned in the previous email. Some of the PRs have already been merged.

Here are 3 more flaky tests with links to a lot of example failures:
https://github.com/apache/pulsar/issues/9407
https://github.com/apache/pulsar/issues/9408
https://github.com/apache/pulsar/issues/9409

I'll report more flaky tests tomorrow. Today I was working on some tooling
to mine the logs and gather some statistics.

I parsed the logs of the few last days and these are the test methods that
fail the most:

273     org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete
102     org.apache.pulsar.compaction.CompactionTest.cleanup
81      org.apache.pulsar.admin.cli.PulsarAdminToolTest.topics
51
 org.apache.pulsar.broker.loadbalance.LoadBalancerTest.testLeaderElection
45      org.apache.pulsar.io.PulsarFunctionE2ETest.shutdown
40
 
org.apache.pulsar.broker.service.ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions
36      org.apache.pulsar.websocket.proxy.ProxyPublishConsumeTest.cleanup
30
 org.apache.pulsar.functions.worker.PulsarFunctionLocalRunTest.shutdown
30
 
org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationTopicPatternFunction
29
 
org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationFunction
27
 
org.apache.pulsar.client.api.v1.V1_ProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
26
 
org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3
22
 org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
22
 
org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaLoggingFunction
21      org.apache.pulsar.tests.integration.SmokeTest.setup
20      org.apache.pulsar.client.impl.MessageIdTest.testChecksumReconnection
20
 org.apache.pulsar.client.impl.MessageIdTest.testChecksumVersionComptability
19
 
org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionLocalRun
19
 
org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testAutoSchemaFunction
14
 
org.apache.pulsar.client.api.SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
14
 
org.apache.pulsar.broker.service.MessagePublishBufferThrottleTest.testBlockByPublishRateLimiting
14
 
org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testSlidingCountWindowTest
13
 
org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2.testResetCursorCompatibility
12
 
org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonPublishFunction
12
 
org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationTopicPatternFunction
12
 
org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunctionWithExtraDeps
12
 
org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationZipFunction
12
 
org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionNegAck
12
 
org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunction
12      org.apache.pulsar.compaction.CompactorTest.cleanup
12
 
org.apache.pulsar.broker.service.BrokerServiceAutoSubscriptionCreationTest.cleanupTest
12      org.apache.pulsar.websocket.proxy.ProxyAuthenticationTest.cleanup
12
 org.apache.pulsar.websocket.proxy.v1.V1_ProxyAuthenticationTest.cleanup
12
 
org.apache.pulsar.client.impl.BatchMessageIndexAckTest.testBatchMessageIndexAckForSharedSubscription
11
 
org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaPublishFunction
11
 
org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup

I'll report more flaky tests after I have checked that my tooling is
producing correct results.

For contributing to fix flaky tests, please pick a flaky test for fixing
from the reported ones:
https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen

We can all join the #testing channel on Pulsar Slack to share detailed tips
and tricks while working on fixing flaky tests.

See you,

BR, Lari


On Fri, Jan 29, 2021 at 8:26 PM Lari Hotari <lari.hot...@sagire.fi> wrote:

> Dear Pulsar community members,
>
> In order to improve our CI, we will have to fix the flaky tests. In some
> cases it might be necessary to replace an existing test with a redesigned
> test.
>
> The draft PIP "Changes to flaky test handling" document
> <https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing>
>  lists
> the top 10 flaky tests. A lot of them have already been address by pull
> requests in the past week or so.
>
> This is the list of recent PRs that fix flaky tests from the top 10 flaky
> tests list:
> https://github.com/apache/pulsar/pull/9286
> https://github.com/apache/pulsar/pull/9243
> https://github.com/apache/pulsar/pull/9258
> https://github.com/apache/pulsar/pull/9356
>
> These are the GH issues for the remaining ones in the top 10 flaky tests
> list:
> https://github.com/apache/pulsar/issues/6368
> https://github.com/apache/pulsar/issues/9369
> https://github.com/apache/pulsar/issues/9368
>
> If you would like to help to fix flaky tests you can pick one of the open
> issues above. Just add a comment on the issue when you start working on it
> so that we can coordinate activities.
>
> It is also helpful to report a flaky test when you encounter one. I've
> been using this type of template for reporting a flaky test:
> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 . The
> issues #9368 and #9369 have been reported using this template.
> Search for the test name before reporting so that we don't end up with
> duplicates.
>
> The issues #6368, #9369 and #9368 are the 3 next important issues to fix.
> I'm planning to create a more extensive list of the flaky failures so that
> we can target the most flaky ones when we continue fixing the flaky tests.
> I have some scripts in development to assist in mining the Pulsar Github
> Action workflow run logs.
>
> This is a search to find flaky issues in Pulsar GH issues:
>
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>
> Looking forward to the contributions for fixing flaky tests,
>
> BR,
>
> Lari
>

Reply via email to