[ https://issues.apache.org/jira/browse/KAFKA-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999104#comment-15999104 ]
Jun Rao commented on KAFKA-5099: -------------------------------- [~onurkaraman], thanks for reporting this. It seem the issue is that when log.delete is called, we try to flush the leader epoch file, which requires the creation of a tmp file in the original log dir. Since the original log dir has been renamed, the creation with throw a FileNotFoundException. One way to fix this is to rename LeaderEpochCache.clear to LeaderEpochCache.clearAndFlush, and introduce a new method LeaderEpochCache.clear that just clears without flush. Log.delete will call LeaderEpochCache.clear. All other cases will call LeaderEpochCache.clearAndFlush. > Replica Deletion Regression from KIP-101 > ---------------------------------------- > > Key: KAFKA-5099 > URL: https://issues.apache.org/jira/browse/KAFKA-5099 > Project: Kafka > Issue Type: Bug > Reporter: Onur Karaman > Assignee: Onur Karaman > Priority: Blocker > Fix For: 0.11.0.0 > > > It appears that replica deletion regressed from KIP-101. Replica deletion > happens when a broker receives a StopReplicaRequest with delete=true. Ever > since KAFKA-1911, replica deletion has been async, meaning the broker > responds with a StopReplicaResponse simply after marking the replica > directory as staged for deletion. This marking happens by moving a data log > directory and its contents such as /tmp/kafka-logs1/t1-0 to a marked > directory like /tmp/kafka-logs1/t1-0.8c9c4c0c61c44cc59ebeb00075a2a07f-delete, > acting as a soft-delete. A scheduled thread later actually deletes the data. > It appears that the regression occurs while the scheduled thread is actually > trying to delete the data, which means the controller considers operations > such as partition reassignment and topic deletion complete. But if you look > at the log4j logs and data logs, you'll find that the soft-deleted data logs > haven't actually won't get deleted. It seems that restarting the broker > actually allows for the soft-deleted directories to get deleted. > Here's the setup: > {code} > > ./bin/zookeeper-server-start.sh config/zookeeper.properties > > export LOG_DIR=logs0 && ./bin/kafka-server-start.sh > > config/server0.properties > > export LOG_DIR=logs1 && ./bin/kafka-server-start.sh > > config/server1.properties > > ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic t0 > > --replica-assignment 1:0 > > ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic t1 > > --replica-assignment 1:0 > > ./bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic t0 > > cat p.txt > {"partitions": > [ > {"topic": "t1", "partition": 0, "replicas": [0] } > ], > "version":1 > } > > ./bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 > > --reassignment-json-file p.txt --execute > {code} > Here are sample logs: > {code} > [2017-04-20 17:46:54,801] INFO [ReplicaFetcherManager on broker 1] Removed > fetcher for partitions t0-0 (kafka.server.ReplicaFetcherManager) > [2017-04-20 17:46:54,814] INFO Log for partition t0-0 is renamed to > /tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete and is > scheduled for deletion (kafka.log.LogManager) > [2017-04-20 17:47:27,585] INFO Deleting index > /tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete/00000000000000000000.index > (kafka.log.OffsetIndex) > [2017-04-20 17:47:27,586] INFO Deleting index > /tmp/kafka-logs1/t0-0/00000000000000000000.timeindex (kafka.log.TimeIndex) > [2017-04-20 17:47:27,587] ERROR Exception in deleting > Log(/tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete). Moving it > to the end of the queue. (kafka.log.LogManager) > java.io.FileNotFoundException: > /tmp/kafka-logs1/t0-0/leader-epoch-checkpoint.tmp (No such file or directory) > at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) > at java.io.FileOutputStream.<init>(FileOutputStream.java:213) > at java.io.FileOutputStream.<init>(FileOutputStream.java:162) > at kafka.server.checkpoints.CheckpointFile.write(CheckpointFile.scala:41) > at > kafka.server.checkpoints.LeaderEpochCheckpointFile.write(LeaderEpochCheckpointFile.scala:61) > at > kafka.server.epoch.LeaderEpochFileCache.kafka$server$epoch$LeaderEpochFileCache$$flush(LeaderEpochFileCache.scala:178) > at > kafka.server.epoch.LeaderEpochFileCache$$anonfun$clear$1.apply$mcV$sp(LeaderEpochFileCache.scala:161) > at > kafka.server.epoch.LeaderEpochFileCache$$anonfun$clear$1.apply(LeaderEpochFileCache.scala:159) > at > kafka.server.epoch.LeaderEpochFileCache$$anonfun$clear$1.apply(LeaderEpochFileCache.scala:159) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213) > at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:221) > at > kafka.server.epoch.LeaderEpochFileCache.clear(LeaderEpochFileCache.scala:159) > at kafka.log.Log.delete(Log.scala:1051) > at > kafka.log.LogManager.kafka$log$LogManager$$deleteLogs(LogManager.scala:442) > at > kafka.log.LogManager$$anonfun$startup$5.apply$mcV$sp(LogManager.scala:241) > at > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > The same appears on broker 1 for t1-0 after partition reassignment moves the > replica off broker 1. > Here's what the data logs show: > {code} > > l /tmp/kafka-logs*/t* > /tmp/kafka-logs0/t0-0.166137ca8c28475d8c9fbf6f423a937e-delete: > total 0 > -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 00000000000000000000.timeindex > -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 leader-epoch-checkpoint > drwxr-xr-x 2 okaraman wheel 68 Apr 20 17:46 pid-mapping > /tmp/kafka-logs0/t1-0: > total 40960 > -rw-r--r-- 1 okaraman wheel 10485760 Apr 20 17:46 > 00000000000000000000.index > -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 00000000000000000000.log > -rw-r--r-- 1 okaraman wheel 10485756 Apr 20 17:46 > 00000000000000000000.timeindex > -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 leader-epoch-checkpoint > drwxr-xr-x 2 okaraman wheel 68 Apr 20 17:46 pid-mapping > /tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete: > total 0 > -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 00000000000000000000.timeindex > -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 leader-epoch-checkpoint > drwxr-xr-x 2 okaraman wheel 68 Apr 20 17:46 pid-mapping > /tmp/kafka-logs1/t1-0.8c9c4c0c61c44cc59ebeb00075a2a07f-delete: > total 0 > -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:48 00000000000000000000.timeindex > -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 leader-epoch-checkpoint > drwxr-xr-x 2 okaraman wheel 68 Apr 20 17:46 pid-mapping > {code} > I ran the above test on the checkin preceding KIP-101 and it worked just fine. -- This message was sent by Atlassian JIRA (v6.3.15#6346)