[
https://issues.apache.org/jira/browse/KAFKA-9632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajini Sivaram resolved KAFKA-9632.
-----------------------------------
Fix Version/s: 2.6.0
Reviewer: Manikumar
Resolution: Fixed
> Transient test failure: PartitionLockTest.testAppendReplicaFetchWithUpdateIsr
> -----------------------------------------------------------------------------
>
> Key: KAFKA-9632
> URL: https://issues.apache.org/jira/browse/KAFKA-9632
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 2.5.0
> Reporter: Rajini Sivaram
> Assignee: Rajini Sivaram
> Priority: Major
> Fix For: 2.6.0
>
>
> When running this test with _numRecordsPerProducer=500_, the test fails
> intermittently. The test uses MockTime and runs concurrent log operations.
> This can cause issues when attempting to roll a segment since Log and
> MockScheduler don't work well together. MockScheduler currently runs tasks
> while holding the MockScheduler lock. This can cause a deadlock if a thread
> attempts to schedule a task while holding a lock which is also acquired
> within a scheduled task.
> The issue in this test occurs when these two operations happen concurrently:
> 1) LogManager.cleanupLogs is a scheduled task that acquires Log lock. When
> run with MockScheduler, the thread holds MockScheduler lock and then attempts
> to acquire Log lock.
> 2) Partition.appendLogsToLeader holds Log lock and attempts to acquire
> MockScheduler lock in order to schedule a roll().
> Since locking order is reversed in 1) and 2), this causes a deadlock.
> The test itself can be easily fixed by avoiding roll() in the test. But it
> will be good to fix MockScheduler to enable it to be used in this case.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)