[
https://issues.apache.org/jira/browse/KAFKA-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977902#comment-15977902
]
Onur Karaman commented on KAFKA-5099:
-------------------------------------
My initial guess is that the new epoch checkpoint file isn't being moved to the
soft-deleted directory correctly.
> Replica Deletion Regression from KIP-101
> ----------------------------------------
>
> Key: KAFKA-5099
> URL: https://issues.apache.org/jira/browse/KAFKA-5099
> Project: Kafka
> Issue Type: Bug
> Reporter: Onur Karaman
> Assignee: Onur Karaman
>
> It appears that replica deletion regressed from KIP-101. Replica deletion
> happens when a broker receives a StopReplicaRequest with delete=true. Ever
> since KAFKA-1911, replica deletion has been async, meaning the broker
> responds with a StopReplicaResponse simply after marking the replica
> directory as staged for deletion. This marking happens by moving a data log
> directory and its contents such as /tmp/kafka-logs1/t1-0 to a marked
> directory like /tmp/kafka-logs1/t1-0.8c9c4c0c61c44cc59ebeb00075a2a07f-delete,
> acting as a soft-delete. A scheduled thread later actually deletes the data.
> It appears that the regression occurs while the scheduled thread is actually
> trying to delete the data, which means the controller considers operations
> such as partition reassignment and topic deletion complete. But if you look
> at the log4j logs and data logs, you'll find that the soft-deleted data logs
> haven't actually won't get deleted. It seems that restarting the broker
> actually allows for the soft-deleted directories to get deleted.
> Here's the setup:
> {code}
> > ./bin/zookeeper-server-start.sh config/zookeeper.properties
> > export LOG_DIR=logs0 && ./bin/kafka-server-start.sh
> > config/server0.properties
> > export LOG_DIR=logs1 && ./bin/kafka-server-start.sh
> > config/server1.properties
> > ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic t0
> > --replica-assignment 1:0
> > ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic t1
> > --replica-assignment 1:0
> > ./bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic t0
> > cat p.txt
> {"partitions":
> [
> {"topic": "t1", "partition": 0, "replicas": [0] }
> ],
> "version":1
> }
> > ./bin/kafka-reassign-partitions.sh --zookeeper localhost:2181
> > --reassignment-json-file p.txt --execute
> {code}
> Here are sample logs:
> {code}
> [2017-04-20 17:46:54,801] INFO [ReplicaFetcherManager on broker 1] Removed
> fetcher for partitions t0-0 (kafka.server.ReplicaFetcherManager)
> [2017-04-20 17:46:54,814] INFO Log for partition t0-0 is renamed to
> /tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete and is
> scheduled for deletion (kafka.log.LogManager)
> [2017-04-20 17:47:27,585] INFO Deleting index
> /tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete/00000000000000000000.index
> (kafka.log.OffsetIndex)
> [2017-04-20 17:47:27,586] INFO Deleting index
> /tmp/kafka-logs1/t0-0/00000000000000000000.timeindex (kafka.log.TimeIndex)
> [2017-04-20 17:47:27,587] ERROR Exception in deleting
> Log(/tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete). Moving it
> to the end of the queue. (kafka.log.LogManager)
> java.io.FileNotFoundException:
> /tmp/kafka-logs1/t0-0/leader-epoch-checkpoint.tmp (No such file or directory)
> at java.io.FileOutputStream.open0(Native Method)
> at java.io.FileOutputStream.open(FileOutputStream.java:270)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
> at kafka.server.checkpoints.CheckpointFile.write(CheckpointFile.scala:41)
> at
> kafka.server.checkpoints.LeaderEpochCheckpointFile.write(LeaderEpochCheckpointFile.scala:61)
> at
> kafka.server.epoch.LeaderEpochFileCache.kafka$server$epoch$LeaderEpochFileCache$$flush(LeaderEpochFileCache.scala:178)
> at
> kafka.server.epoch.LeaderEpochFileCache$$anonfun$clear$1.apply$mcV$sp(LeaderEpochFileCache.scala:161)
> at
> kafka.server.epoch.LeaderEpochFileCache$$anonfun$clear$1.apply(LeaderEpochFileCache.scala:159)
> at
> kafka.server.epoch.LeaderEpochFileCache$$anonfun$clear$1.apply(LeaderEpochFileCache.scala:159)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
> at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:221)
> at
> kafka.server.epoch.LeaderEpochFileCache.clear(LeaderEpochFileCache.scala:159)
> at kafka.log.Log.delete(Log.scala:1051)
> at
> kafka.log.LogManager.kafka$log$LogManager$$deleteLogs(LogManager.scala:442)
> at
> kafka.log.LogManager$$anonfun$startup$5.apply$mcV$sp(LogManager.scala:241)
> at
> kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The same appears on broker 1 for t1-0 after partition reassignment moves the
> replica off broker 1.
> Here's what the data logs show:
> {code}
> > l /tmp/kafka-logs*/t*
> /tmp/kafka-logs0/t0-0.166137ca8c28475d8c9fbf6f423a937e-delete:
> total 0
> -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 00000000000000000000.timeindex
> -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 leader-epoch-checkpoint
> drwxr-xr-x 2 okaraman wheel 68 Apr 20 17:46 pid-mapping
> /tmp/kafka-logs0/t1-0:
> total 40960
> -rw-r--r-- 1 okaraman wheel 10485760 Apr 20 17:46
> 00000000000000000000.index
> -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 00000000000000000000.log
> -rw-r--r-- 1 okaraman wheel 10485756 Apr 20 17:46
> 00000000000000000000.timeindex
> -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 leader-epoch-checkpoint
> drwxr-xr-x 2 okaraman wheel 68 Apr 20 17:46 pid-mapping
> /tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete:
> total 0
> -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 00000000000000000000.timeindex
> -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 leader-epoch-checkpoint
> drwxr-xr-x 2 okaraman wheel 68 Apr 20 17:46 pid-mapping
> /tmp/kafka-logs1/t1-0.8c9c4c0c61c44cc59ebeb00075a2a07f-delete:
> total 0
> -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:48 00000000000000000000.timeindex
> -rw-r--r-- 1 okaraman wheel 0 Apr 20 17:46 leader-epoch-checkpoint
> drwxr-xr-x 2 okaraman wheel 68 Apr 20 17:46 pid-mapping
> {code}
> I ran the above test on the checkin preceding KIP-101 and it worked just fine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)