[ 
https://issues.apache.org/jira/browse/KAFKA-17371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875170#comment-17875170
 ] 

Ao Li edited comment on KAFKA-17371 at 8/20/24 12:35 PM:
---------------------------------------------------------

[~frankvicky] Thanks for taking care of this! Making both 
`DefaultTaskExecutor::currentTask` and `DefaultTaskExecutor::runOnce` 
synchronized is a quick solution. However, I'm not sure if this will introduce 
any performance regression. 

[~chia7712] I'm currently running a concurrency testing tool that reruns all 
Kafka tests with different thread schedules. This helped me to find many bugs. 
While I can reproduce these failures and (sometimes) identify the root cause, I 
don't have enough context to fix these issues since I am not familiar with the 
codebase. Please let me know if you want me to report these bugs through 
another channel. 


was (Author: JIRAUSER306156):
[~frankvicky] Thanks for taking care of this! Making both 
`DefaultTaskExecutor::currentTask` and `DefaultTaskExecutor::runOnce` 
synchronized a quick solution. However, I'm not sure if this will introduce any 
performance regression. 

[~chia7712] I'm currently running a concurrency testing tool that reruns all 
Kafka tests with different thread schedules. This helped me to find many bugs. 
While I can reproduce these failures and (sometimes) identify the root cause, I 
don't have enough context to fix these issues since I am not familiar with the 
codebase. Please let me know if you want me to report these bugs through 
another channel. 

> Flaky test in DefaultTaskExecutorTest.shouldUnassignTaskWhenRequired
> --------------------------------------------------------------------
>
>                 Key: KAFKA-17371
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17371
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Ao Li
>            Assignee: TengYao Chi
>            Priority: Minor
>
> Please see this fork https://github.com/aoli-al/kafka/tree/KAFKA-251 for a 
> deterministic reproduction.  
> The test failed with 
> {code}
> expected: not <null>
> org.opentest4j.AssertionFailedError: expected: not <null>
>       at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:152)
>       at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>       at org.junit.jupiter.api.AssertNotNull.failNull(AssertNotNull.java:49)
>       at 
> org.junit.jupiter.api.AssertNotNull.assertNotNull(AssertNotNull.java:35)
>       at 
> org.junit.jupiter.api.AssertNotNull.assertNotNull(AssertNotNull.java:30)
>       at org.junit.jupiter.api.Assertions.assertNotNull(Assertions.java:304)
>       at 
> org.apache.kafka.streams.processor.internals.tasks.DefaultTaskExecutorTest.shouldUnassignTaskWhenRequired(DefaultTaskExecutorTest.java:233)
>       at java.base/java.lang.reflect.Method.invoke(Method.java:580)
>       at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
>       at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
> {code}
> The root cause of the failure is that `currentTask = 
> taskManager.assignNextTask(DefaultTaskExecutor.this);` is not an atomic 
> operation. This means that calling `taskManager.assignNextTask` will unblock 
> the `verify(taskManager, 
> timeout(VERIFICATION_TIMEOUT)).assignNextTask(taskExecutor);` statement in 
> the test method. 
> If `assertNotNull(taskExecutor.currentTask());` is executed before the 
> assignment `currentTaks = [...]` the test will fail. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to