[ 
https://issues.apache.org/jira/browse/KAFKA-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski updated KAFKA-7266:
---------------------------------------
    Description: 
The test `kafka.api.MetricsTest.testMetrics` has been failing intermittently in 
kafka builds (recent proof: 
https://github.com/apache/kafka/pull/5436#issuecomment-409683955)
The particular failure is in the `MessageConversionsTimeMs` metric assertion -
{code}
java.lang.AssertionError: Message conversion time not recorded 0.0
{code}

There has been work done previously (https://github.com/apache/kafka/pull/4681) 
to combat the flakiness of the test and while it has improved it, the test 
still fails sometimes.

h3. Solution
On my machine, the test failed 5 times out of 25 runs. Increasing the record 
size and using compression should slow down message conversion enough to have 
it be above 1ms. Locally this test has not failed in 200 runs and counting with 
those changes

  was:
The test `kafka.api.MetricsTest.testMetrics` has been failing intermittently in 
kafka builds (recent proof: 
https://github.com/apache/kafka/pull/5436#issuecomment-409683955)
The particular failure is in the `MessageConversionsTimeMs` metric assertion -
{code}
java.lang.AssertionError: Message conversion time not recorded 0.0
{code}

There has been work done previously (https://github.com/apache/kafka/pull/4681) 
to combat the flakiness of the test and while it has improved it, the test 
still fails sometimes.

h3. Solution
On my machine, the test failed 5 times out of 25 runs. I suspect the solution 
would be to increase the record batch size to ensure the conversion takes more 
than 1ms time so as to be recorded by the metric. Increasing the maximum batch 
size from 1MB to 8MB made the test fail locally once out of 100 times. Setting 
it to 16MBs seems to fix the problem. I've ran 300 runs and have not seen a 
failure with 16MBs set as the batch size


> Fix MetricsTest test flakiness
> ------------------------------
>
>                 Key: KAFKA-7266
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7266
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Stanislav Kozlovski
>            Assignee: Stanislav Kozlovski
>            Priority: Minor
>             Fix For: 2.0.1, 2.1.0
>
>
> The test `kafka.api.MetricsTest.testMetrics` has been failing intermittently 
> in kafka builds (recent proof: 
> https://github.com/apache/kafka/pull/5436#issuecomment-409683955)
> The particular failure is in the `MessageConversionsTimeMs` metric assertion -
> {code}
> java.lang.AssertionError: Message conversion time not recorded 0.0
> {code}
> There has been work done previously 
> (https://github.com/apache/kafka/pull/4681) to combat the flakiness of the 
> test and while it has improved it, the test still fails sometimes.
> h3. Solution
> On my machine, the test failed 5 times out of 25 runs. Increasing the record 
> size and using compression should slow down message conversion enough to have 
> it be above 1ms. Locally this test has not failed in 200 runs and counting 
> with those changes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to