[
https://issues.apache.org/jira/browse/KAFKA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922366#comment-16922366
]
dingsainan edited comment on KAFKA-8738 at 9/4/19 10:41 AM:
------------------------------------------------------------
Sorry for my late reply.
Below are the detail of this case.
h2. 3.1 command
{panel}
./kafka-reassign-partitions.sh --zookeeper 127.0.1.1:2181/local-cluster
--bootstrap-server 127.0.1.1:9092 --reassignment-json-file
/mnt/storage00/Nora/reassignment211.json --execute
{panel}
h2. 3.2 content of 1st reassign
{code:java}
{"partitions":
[{"topic": "lancer_ops_billions_all_log_json_billions",
"partition": 1,
"replicas": [6,15],
"log_dirs": ["any","/data/mnt/storage02/datum/kafka_data"]}]
}
{code}
h2. 3.3 content of 2nd reassign
{code:java}
{"partitions":
[{"topic": "lancer_ops_billions_all_log_json_billions",
"partition": 1,
"replicas": [6,15],
"log_dirs": ["any","/data/mnt/storage03/datum/kafka_data"]}]
}
Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))
{code}
h2. 3.4 result
lancer_ops_billions_all_log_json_billions-1 :this log cleaner does not work for
this partition
h2. 3.5 the status of the log cleaner
the status of the log cleaner
{code:java}
1.None
2.Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))
3.Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(2))
4.Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))
5.keep in LogCleaningPaused(1){code}
h2. 3.6 issue code
|{{//}}
{{if}} {{(cleaner != null && !isFuture) {}}
{{ }}{{trace(s}}{{"the cleaner is not null and the isfure is false "}}{{)}}
{{ }}{{cleaner.abortCleaning(topicPartition)}}
{{ }}{{cleaner.updateCheckpoints(removedLog.}}{{dir}}{{.getParentFile)}}
{{}}}|
h2. 3.7 fix to do
[^migrationCase.pdf]
was (Author: norading):
Sorry for my late reply.
Below are the detail of this case.
h1. 三、执行过程
h2. 3.1执行的命令
{panel}
./kafka-reassign-partitions.sh --zookeeper 127.0.1.1:2181/local-cluster
--bootstrap-server 127.0.1.1:9092 --reassignment-json-file
/mnt/storage00/Nora/reassignment211.json --execute{panel}
h2. 3.2第一次迁移内容
{code:java}
// code placeholder
{"partitions":
[{"topic": "lancer_ops_billions_all_log_json_billions",
"partition": 1,
"replicas": [6,15],
"log_dirs": ["any","/data/mnt/storage02/datum/kafka_data"]}]
}
{code}
h2. 3.3第二次迁移内容
{code:java}
{"partitions":
[{"topic": "lancer_ops_billions_all_log_json_billions",
"partition": 1,
"replicas": [6,15],
"log_dirs": ["any","/data/mnt/storage03/datum/kafka_data"]}]
}
Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))
{code}
h2. 3.4结果
lancer_ops_billions_all_log_json_billions-1这个分区的日志保留时间不再起作用
该TP的状态改变情况
从
None
--->Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))
--->Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(2))
--->Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))
最后保持在 LogCleaningPaused(1)状态。
h2. 3.5导致问题出现的代码块
|{{//}}{{只有当不是future文件的时候才会执行resume操作}}
{{if}} {{(cleaner != null && !isFuture) {}}
{{ }}{{trace(s}}{{"the cleaner is not null and the isfure is false "}}{{)}}
{{ }}{{cleaner.abortCleaning(topicPartition)}}
{{ }}{{cleaner.updateCheckpoints(removedLog.}}{{dir}}{{.getParentFile)}}
{{}}}|
h2. 3.6fix方式
|{{//}}{{针对future文件也需要进行tp的resume clean的操作}}|
[^migrationCase.pdf]
> Cleaning thread blocked when more than one ALTER_REPLICA_LOG_DIRS requests
> sent
> --------------------------------------------------------------------------------
>
> Key: KAFKA-8738
> URL: https://issues.apache.org/jira/browse/KAFKA-8738
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 2.1.1
> Reporter: dingsainan
> Priority: Major
> Attachments: migrationCase.pdf
>
>
> Hi,
>
> I am experiencing one situation that the log cleaner dose not work for the
> related topic-partition when using --kafka-reassign-partitions.sh tool for
> V2.1.1 for more than one time frequently.
>
> My operation:
> submitting one task for migration replica in one same broker first, when
> the previous task still in progress, we submit one new task for the same
> topic-partition.
>
> {code:java}
> // the first task:
> {"partitions":
> [{"topic": "lancer_ops_billions_all_log_json_billions",
> "partition": 1,
> "replicas": [6,15],
> "log_dirs": ["any","/data/mnt/storage02/datum/kafka_data"]}]
> }
> //the second task
> {"partitions":
> [{"topic": "lancer_ops_billions_all_log_json_billions",
> "partition": 1,
> "replicas": [6,15],
> "log_dirs": ["any","/data/mnt/storage03/datum/kafka_data"]}]
> }
>
> {code}
>
> My search:
> Kafka executes abortAndPauseCleaning() once task is submitted, shortly,
> another task is submitted for the same topic-partition, so the clean thread
> status is {color:#ff0000}LogCleaningPaused(2){color} currently. When the
> second task completed, the clean thread will be resumed for this
> topic-partition once. In my case, the previous task is killed directly, no
> resumeClean() is executed for the first task, so when the second task is
> completed, the clean status for the topic-partition is still
> {color:#ff0000}LogCleaningPaused(1){color}, which blocks the clean thread for
> the topic-partition.
>
> _That's all my search, please confirm._
>
> _Thanks_
> _Nora_
--
This message was sent by Atlassian Jira
(v8.3.2#803003)