[jira] [Comment Edited] (KAFKA-8738) Cleaning thread blocked when more than one ALTER_REPLICA_LOG_DIRS requests sent

dingsainan (Jira) Wed, 04 Sep 2019 03:42:23 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922366#comment-16922366
 ]


dingsainan edited comment on KAFKA-8738 at 9/4/19 10:41 AM:
------------------------------------------------------------

Sorry for my late reply.

Below are the detail of this case.
h2. 3.1 command
{panel}
 ./kafka-reassign-partitions.sh --zookeeper 127.0.1.1:2181/local-cluster 
--bootstrap-server 127.0.1.1:9092 --reassignment-json-file 
/mnt/storage00/Nora/reassignment211.json --execute
{panel}
h2. 3.2 content of 1st reassign
{code:java}
{"partitions":
            [{"topic": "lancer_ops_billions_all_log_json_billions",
              "partition": 1,
              "replicas": [6,15],
              "log_dirs": ["any","/data/mnt/storage02/datum/kafka_data"]}]
}
{code}
h2. 3.3 content of 2nd reassign
{code:java}
{"partitions":
            [{"topic": "lancer_ops_billions_all_log_json_billions",
              "partition": 1,
              "replicas": [6,15],
              "log_dirs": ["any","/data/mnt/storage03/datum/kafka_data"]}]
}


Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))
{code}
h2. 3.4 result

lancer_ops_billions_all_log_json_billions-1 ：this log cleaner does not work for 
this partition
h2. 3.5 the status of the log cleaner

the status of the log cleaner

 
{code:java}
1.None
2.Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))
3.Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(2))
4.Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))
5.keep in LogCleaningPaused(1){code}
 
h2. 3.6 issue code
|{{//}}
 {{if}} {{(cleaner != null && !isFuture) {}}
 {{  }}{{trace(s}}{{"the cleaner is not null and the isfure is false "}}{{)}}
  
 {{  }}{{cleaner.abortCleaning(topicPartition)}}
 {{  }}{{cleaner.updateCheckpoints(removedLog.}}{{dir}}{{.getParentFile)}}
 {{}}}|
h2. 3.7 fix to do 

 
 [^migrationCase.pdf]


was (Author: norading):
Sorry for my late reply.

Below are the detail of this case.
h1. 三、执行过程
h2. 3.1执行的命令
{panel}
 ./kafka-reassign-partitions.sh --zookeeper 127.0.1.1:2181/local-cluster 
--bootstrap-server 127.0.1.1:9092 --reassignment-json-file 
/mnt/storage00/Nora/reassignment211.json --execute{panel}
 
h2. 3.2第一次迁移内容
{code:java}
// code placeholder
{"partitions":
            [{"topic": "lancer_ops_billions_all_log_json_billions",
              "partition": 1,
              "replicas": [6,15],
              "log_dirs": ["any","/data/mnt/storage02/datum/kafka_data"]}]
}
{code}
 
h2. 3.3第二次迁移内容
{code:java}
{"partitions":
            [{"topic": "lancer_ops_billions_all_log_json_billions",
              "partition": 1,
              "replicas": [6,15],
              "log_dirs": ["any","/data/mnt/storage03/datum/kafka_data"]}]
}


Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))
{code}
h2. 3.4结果

lancer_ops_billions_all_log_json_billions-1这个分区的日志保留时间不再起作用
 该TP的状态改变情况

从

None

--->Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))

--->Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(2))

--->Map(lancer_ops_billions_all_log_json_billions-1 -> LogCleaningPaused(1))

最后保持在 LogCleaningPaused(1)状态。

 
h2. 3.5导致问题出现的代码块
|{{//}}{{只有当不是future文件的时候才会执行resume操作}}
 {{if}} {{(cleaner != null && !isFuture) {}}
 {{  }}{{trace(s}}{{"the cleaner is not null and the isfure is false "}}{{)}}
  
 {{  }}{{cleaner.abortCleaning(topicPartition)}}
 {{  }}{{cleaner.updateCheckpoints(removedLog.}}{{dir}}{{.getParentFile)}}
 {{}}}|
h2. 3.6fix方式
|{{//}}{{针对future文件也需要进行tp的resume clean的操作}}|

 
 [^migrationCase.pdf]

> Cleaning thread blocked  when more than one ALTER_REPLICA_LOG_DIRS requests 
> sent
> --------------------------------------------------------------------------------
>
>                 Key: KAFKA-8738
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8738
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.1.1
>            Reporter: dingsainan
>            Priority: Major
>         Attachments: migrationCase.pdf
>
>
> Hi,
>   
>  I am experiencing one situation  that the log cleaner dose not work  for the 
> related topic-partition when using --kafka-reassign-partitions.sh tool for 
> V2.1.1 for more than one time frequently.
>   
>  My operation:
>  submitting one task for migration replica in one same broker first,  when 
> the previous task still in progress, we submit one new task for the same 
> topic-partition.
>  
> {code:java}
> // the first task:
> {"partitions":
>             [{"topic": "lancer_ops_billions_all_log_json_billions",
>               "partition": 1,
>               "replicas": [6,15],
>               "log_dirs": ["any","/data/mnt/storage02/datum/kafka_data"]}]
> }
> //the second task
> {"partitions":
>             [{"topic": "lancer_ops_billions_all_log_json_billions",
>               "partition": 1,
>               "replicas": [6,15],
>               "log_dirs": ["any","/data/mnt/storage03/datum/kafka_data"]}]
> }
>  
> {code}
>  
>  My search:
>  Kafka executes abortAndPauseCleaning() once task is submitted, shortly, 
> another task is submitted for the same topic-partition, so the clean thread 
> status is {color:#ff0000}LogCleaningPaused(2){color} currently. When the 
> second task completed, the clean thread will be resumed for this 
> topic-partition once. In my case, the previous task is killed directly, no 
> resumeClean() is executed for the first task, so when the second task is 
> completed, the clean status for the topic-partition is still 
> {color:#ff0000}LogCleaningPaused(1){color}, which blocks the clean thread for 
> the topic-partition.
>   
>  _That's all my search, please confirm._
>   
>  _Thanks_
>  _Nora_



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Comment Edited] (KAFKA-8738) Cleaning thread blocked when more than one ALTER_REPLICA_LOG_DIRS requests sent

Reply via email to