Rajini Sivaram created KAFKA-9632:
-------------------------------------
Summary: Transient test failure:
PartitionLockTest.testAppendReplicaFetchWithUpdateIsr
Key: KAFKA-9632
URL: https://issues.apache.org/jira/browse/KAFKA-9632
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 2.5.0
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram
When running this test with {color:#660e7a}numRecordsPerProducer {color}=
{color:#0000ff}500, {color:#172b4d}the test fails intermittently. The test uses
MockTime and runs concurrent log operations. This can cause issues when
attempting to roll a segment since Log and MockScheduler don't work well
together. MockScheduler currently runs tasks while holding the MockScheduler
lock. This can cause a deadlock if a thread attempts to schedule a task while
holding a lock which is also acquired within a scheduled task.{color}
{color}
{color:#0000ff}{color:#172b4d}The issue in this test occurs when these two
operations happen concurrently:{color}{color}
{color:#0000ff}{color:#172b4d}1) LogManager.cleanupLogs is a scheduled task
that acquires Log lock. When run with MockScheduler, the thread holds
MockScheduler lock and then attempts to acquire Log lock.{color}{color}
{color:#0000ff}{color:#172b4d}2) Partition.appendLogsToLeader holds Log lock
and attempts to acquire MockScheduler lock in order to schedule a
roll().{color}{color}
{color:#0000ff}{color:#172b4d}Since locking order is reversed in 1) and 2),
this causes a deadlock.{color}{color}
{color:#0000ff}{color:#172b4d}The test itself can be easily fixed by avoiding
roll() in the test. But it will be good to fix MockScheduler to enable it to be
used in this case.{color}{color}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)