Onur Karaman created KAFKA-5099:
-----------------------------------
Summary: Replica Deletion Regression from KIP-101
Key: KAFKA-5099
URL: https://issues.apache.org/jira/browse/KAFKA-5099
Project: Kafka
Issue Type: Bug
Reporter: Onur Karaman
Assignee: Onur Karaman
It appears that replica deletion regressed from KIP-101. Replica deletion
happens when a broker receives a StopReplicaRequest with delete=true. Ever
since KAFKA-1911, replica deletion has been async, meaning the broker responds
with a StopReplicaResponse simply after marking the replica directory as staged
for deletion. This marking happens by moving a data log directory and its
contents such as /tmp/kafka-logs1/t1-0 to a marked directory like
/tmp/kafka-logs1/t1-0.8c9c4c0c61c44cc59ebeb00075a2a07f-delete, acting as a
soft-delete. A scheduled thread later actually deletes the data. It appears
that the regression occurs while the scheduled thread is actually trying to
delete the data, which means the controller considers operations such as
partition reassignment and topic deletion complete. But if you look at the
log4j logs and data logs, you'll find that the soft-deleted data logs haven't
actually won't get deleted. It seems that restarting the broker actually allows
for the soft-deleted directories to get deleted.
Here's the setup:
{code}
> ./bin/zookeeper-server-start.sh config/zookeeper.properties
> export LOG_DIR=logs0 && ./bin/kafka-server-start.sh config/server0.properties
> export LOG_DIR=logs1 && ./bin/kafka-server-start.sh config/server1.properties
> ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic t0
> --replica-assignment 1:0
> ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic t1
> --replica-assignment 1:0
> ./bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic t0
> cat p.txt
{"partitions":
[
{"topic": "t1", "partition": 0, "replicas": [0] }
],
"version":1
}
> ./bin/kafka-reassign-partitions.sh --zookeeper localhost:2181
> --reassignment-json-file p.txt --execute
{code}
Here are sample logs:
{code}
[2017-04-20 17:46:54,801] INFO [ReplicaFetcherManager on broker 1] Removed
fetcher for partitions t0-0 (kafka.server.ReplicaFetcherManager)
[2017-04-20 17:46:54,814] INFO Log for partition t0-0 is renamed to
/tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete and is scheduled
for deletion (kafka.log.LogManager)
[2017-04-20 17:47:27,585] INFO Deleting index
/tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete/00000000000000000000.index
(kafka.log.OffsetIndex)
[2017-04-20 17:47:27,586] INFO Deleting index
/tmp/kafka-logs1/t0-0/00000000000000000000.timeindex (kafka.log.TimeIndex)
[2017-04-20 17:47:27,587] ERROR Exception in deleting
Log(/tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete). Moving it
to the end of the queue. (kafka.log.LogManager)
java.io.FileNotFoundException:
/tmp/kafka-logs1/t0-0/leader-epoch-checkpoint.tmp (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
at kafka.server.checkpoints.CheckpointFile.write(CheckpointFile.scala:41)
at
kafka.server.checkpoints.LeaderEpochCheckpointFile.write(LeaderEpochCheckpointFile.scala:61)
at
kafka.server.epoch.LeaderEpochFileCache.kafka$server$epoch$LeaderEpochFileCache$$flush(LeaderEpochFileCache.scala:178)
at
kafka.server.epoch.LeaderEpochFileCache$$anonfun$clear$1.apply$mcV$sp(LeaderEpochFileCache.scala:161)
at
kafka.server.epoch.LeaderEpochFileCache$$anonfun$clear$1.apply(LeaderEpochFileCache.scala:159)
at
kafka.server.epoch.LeaderEpochFileCache$$anonfun$clear$1.apply(LeaderEpochFileCache.scala:159)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:221)
at
kafka.server.epoch.LeaderEpochFileCache.clear(LeaderEpochFileCache.scala:159)
at kafka.log.Log.delete(Log.scala:1051)
at kafka.log.LogManager.kafka$log$LogManager$$deleteLogs(LogManager.scala:442)
at kafka.log.LogManager$$anonfun$startup$5.apply$mcV$sp(LogManager.scala:241)
at
kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}
The same appears on broker 1 for t1-0 after partition reassignment moves the
replica off broker 1.
Here's what the data logs show:
{code}
> l /tmp/kafka-logs*/t*
/tmp/kafka-logs0/t0-0.166137ca8c28475d8c9fbf6f423a937e-delete:
total 0
-rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 00000000000000000000.timeindex
-rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 leader-epoch-checkpoint
drwxr-xr-x 2 okaraman wheel 68 Apr 20 17:46 pid-mapping
/tmp/kafka-logs0/t1-0:
total 40960
-rw-r--r-- 1 okaraman wheel 10485760 Apr 20 17:46 00000000000000000000.index
-rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 00000000000000000000.log
-rw-r--r-- 1 okaraman wheel 10485756 Apr 20 17:46
00000000000000000000.timeindex
-rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 leader-epoch-checkpoint
drwxr-xr-x 2 okaraman wheel 68 Apr 20 17:46 pid-mapping
/tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete:
total 0
-rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 00000000000000000000.timeindex
-rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 leader-epoch-checkpoint
drwxr-xr-x 2 okaraman wheel 68 Apr 20 17:46 pid-mapping
/tmp/kafka-logs1/t1-0.8c9c4c0c61c44cc59ebeb00075a2a07f-delete:
total 0
-rw-r--r-- 1 okaraman wheel 0 Apr 20 17:48 00000000000000000000.timeindex
-rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 leader-epoch-checkpoint
drwxr-xr-x 2 okaraman wheel 68 Apr 20 17:46 pid-mapping
{code}
I ran the above test on the checkin preceding KIP-101 and it worked just fine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)