[ 
https://issues.apache.org/jira/browse/KAFKA-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999104#comment-15999104
 ] 

Jun Rao commented on KAFKA-5099:
--------------------------------

[~onurkaraman], thanks for reporting this. It seem the issue is that when 
log.delete is called, we try to flush the leader epoch file, which requires the 
creation of a tmp file in the original log dir. Since the original log dir has 
been renamed, the creation with throw a FileNotFoundException. One way to fix 
this is to rename LeaderEpochCache.clear to LeaderEpochCache.clearAndFlush, and 
introduce a new method LeaderEpochCache.clear that just clears without flush. 
Log.delete will call LeaderEpochCache.clear. All other cases will call 
LeaderEpochCache.clearAndFlush.

> Replica Deletion Regression from KIP-101
> ----------------------------------------
>
>                 Key: KAFKA-5099
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5099
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Onur Karaman
>            Assignee: Onur Karaman
>            Priority: Blocker
>             Fix For: 0.11.0.0
>
>
> It appears that replica deletion regressed from KIP-101. Replica deletion 
> happens when a broker receives a StopReplicaRequest with delete=true. Ever 
> since KAFKA-1911, replica deletion has been async, meaning the broker 
> responds with a StopReplicaResponse simply after marking the replica 
> directory as staged for deletion. This marking happens by moving a data log 
> directory and its contents such as /tmp/kafka-logs1/t1-0 to a marked 
> directory like /tmp/kafka-logs1/t1-0.8c9c4c0c61c44cc59ebeb00075a2a07f-delete, 
> acting as a soft-delete. A scheduled thread later actually deletes the data. 
> It appears that the regression occurs while the scheduled thread is actually 
> trying to delete the data, which means the controller considers operations 
> such as partition reassignment and topic deletion complete. But if you look 
> at the log4j logs and data logs, you'll find that the soft-deleted data logs 
> haven't actually won't get deleted. It seems that restarting the broker 
> actually allows for the soft-deleted directories to get deleted.
> Here's the setup:
> {code}
> > ./bin/zookeeper-server-start.sh config/zookeeper.properties
> > export LOG_DIR=logs0 && ./bin/kafka-server-start.sh 
> > config/server0.properties
> > export LOG_DIR=logs1 && ./bin/kafka-server-start.sh 
> > config/server1.properties
> > ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic t0 
> > --replica-assignment 1:0
> > ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic t1 
> > --replica-assignment 1:0
> > ./bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic t0
> > cat p.txt
> {"partitions":
>  [
>   {"topic": "t1", "partition": 0, "replicas": [0] }
>  ],
> "version":1
> }
> > ./bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 
> > --reassignment-json-file p.txt --execute
> {code}
> Here are sample logs:
> {code}
> [2017-04-20 17:46:54,801] INFO [ReplicaFetcherManager on broker 1] Removed 
> fetcher for partitions t0-0 (kafka.server.ReplicaFetcherManager)
> [2017-04-20 17:46:54,814] INFO Log for partition t0-0 is renamed to 
> /tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete and is 
> scheduled for deletion (kafka.log.LogManager)
> [2017-04-20 17:47:27,585] INFO Deleting index 
> /tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete/00000000000000000000.index
>  (kafka.log.OffsetIndex)
> [2017-04-20 17:47:27,586] INFO Deleting index 
> /tmp/kafka-logs1/t0-0/00000000000000000000.timeindex (kafka.log.TimeIndex)
> [2017-04-20 17:47:27,587] ERROR Exception in deleting 
> Log(/tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete). Moving it 
> to the end of the queue. (kafka.log.LogManager)
> java.io.FileNotFoundException: 
> /tmp/kafka-logs1/t0-0/leader-epoch-checkpoint.tmp (No such file or directory)
>   at java.io.FileOutputStream.open0(Native Method)
>   at java.io.FileOutputStream.open(FileOutputStream.java:270)
>   at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
>   at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
>   at kafka.server.checkpoints.CheckpointFile.write(CheckpointFile.scala:41)
>   at 
> kafka.server.checkpoints.LeaderEpochCheckpointFile.write(LeaderEpochCheckpointFile.scala:61)
>   at 
> kafka.server.epoch.LeaderEpochFileCache.kafka$server$epoch$LeaderEpochFileCache$$flush(LeaderEpochFileCache.scala:178)
>   at 
> kafka.server.epoch.LeaderEpochFileCache$$anonfun$clear$1.apply$mcV$sp(LeaderEpochFileCache.scala:161)
>   at 
> kafka.server.epoch.LeaderEpochFileCache$$anonfun$clear$1.apply(LeaderEpochFileCache.scala:159)
>   at 
> kafka.server.epoch.LeaderEpochFileCache$$anonfun$clear$1.apply(LeaderEpochFileCache.scala:159)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
>   at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:221)
>   at 
> kafka.server.epoch.LeaderEpochFileCache.clear(LeaderEpochFileCache.scala:159)
>   at kafka.log.Log.delete(Log.scala:1051)
>   at 
> kafka.log.LogManager.kafka$log$LogManager$$deleteLogs(LogManager.scala:442)
>   at 
> kafka.log.LogManager$$anonfun$startup$5.apply$mcV$sp(LogManager.scala:241)
>   at 
> kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
>   at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> The same appears on broker 1 for t1-0 after partition reassignment moves the 
> replica off broker 1.
> Here's what the data logs show:
> {code}
> > l /tmp/kafka-logs*/t*
> /tmp/kafka-logs0/t0-0.166137ca8c28475d8c9fbf6f423a937e-delete:
> total 0
> -rw-r--r--  1 okaraman  wheel   0 Apr 20 17:46 00000000000000000000.timeindex
> -rw-r--r--  1 okaraman  wheel   0 Apr 20 17:46 leader-epoch-checkpoint
> drwxr-xr-x  2 okaraman  wheel  68 Apr 20 17:46 pid-mapping
> /tmp/kafka-logs0/t1-0:
> total 40960
> -rw-r--r--  1 okaraman  wheel  10485760 Apr 20 17:46 
> 00000000000000000000.index
> -rw-r--r--  1 okaraman  wheel         0 Apr 20 17:46 00000000000000000000.log
> -rw-r--r--  1 okaraman  wheel  10485756 Apr 20 17:46 
> 00000000000000000000.timeindex
> -rw-r--r--  1 okaraman  wheel         0 Apr 20 17:46 leader-epoch-checkpoint
> drwxr-xr-x  2 okaraman  wheel        68 Apr 20 17:46 pid-mapping
> /tmp/kafka-logs1/t0-0.bbc8fa126e3e4ff787f6b68d158ab771-delete:
> total 0
> -rw-r--r--  1 okaraman  wheel   0 Apr 20 17:46 00000000000000000000.timeindex
> -rw-r--r--  1 okaraman  wheel   0 Apr 20 17:46 leader-epoch-checkpoint
> drwxr-xr-x  2 okaraman  wheel  68 Apr 20 17:46 pid-mapping
> /tmp/kafka-logs1/t1-0.8c9c4c0c61c44cc59ebeb00075a2a07f-delete:
> total 0
> -rw-r--r--  1 okaraman  wheel   0 Apr 20 17:48 00000000000000000000.timeindex
> -rw-r--r--  1 okaraman  wheel   0 Apr 20 17:46 leader-epoch-checkpoint
> drwxr-xr-x  2 okaraman  wheel  68 Apr 20 17:46 pid-mapping
> {code}
> I ran the above test on the checkin preceding KIP-101 and it worked just fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to