[jira] [Commented] (KAFKA-6134) High memory usage on controller during partition reassignment

ASF GitHub Bot (JIRA) Thu, 26 Oct 2017 19:29:22 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221596#comment-16221596
 ]


ASF GitHub Bot commented on KAFKA-6134:
---------------------------------------

GitHub user hachikuji opened a pull request:

    https://github.com/apache/kafka/pull/4141

    KAFKA-6134: Read partition reassignment lazily on event handling

    This patch prevents an O(n^2) increase in memory utilization during 
partition reassignment. Instead of storing the reassigned partitions in the 
`PartitionReassignment` object (which is added after ever partition 
reassignment), we read the data fresh from ZK when processing the event.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hachikuji/kafka KAFKA-6134

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/4141.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4141
    
----
commit 5131bb19f6fe7fc1939035c48ead052a0ac967a4
Author: Jason Gustafson <ja...@confluent.io>
Date:   2017-10-27T02:01:05Z

    KAFKA-6134: Read partition reassignment lazily on event handling

----


> High memory usage on controller during partition reassignment
> -------------------------------------------------------------
>
>                 Key: KAFKA-6134
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6134
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.11.0.0, 0.11.0.1
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Critical
>              Labels: regression
>             Fix For: 1.0.0, 0.11.0.2
>
>         Attachments: Screen Shot 2017-10-26 at 3.05.40 PM.png
>
>
> We've had a couple users reporting spikes in memory usage when the controller 
> is performing partition reassignment in 0.11. After investigation, we found 
> that the controller event queue was using most of the retained memory. In 
> particular, we found several thousand {{PartitionReassignment}} objects, each 
> one containing one fewer partition than the previous one (see the attached 
> image).
> From the code, it seems clear why this is happening. We have a watch on the 
> partition reassignment path which adds the {{PartitionReassignment}} object 
> to the event queue:
> {code}
>   override def handleDataChange(dataPath: String, data: Any): Unit = {
>     val partitionReassignment = 
> ZkUtils.parsePartitionReassignmentData(data.toString)
>     eventManager.put(controller.PartitionReassignment(partitionReassignment))
>   }
> {code}
> In the {{PartitionReassignment}} event handler, we iterate through all of the 
> partitions in the reassignment. After we complete reassignment for each 
> partition, we remove that partition and update the node in zookeeper. 
> {code}
>     // remove this partition from that list
>     val updatedPartitionsBeingReassigned = partitionsBeingReassigned - 
> topicAndPartition
>     // write the new list to zookeeper
>   
> zkUtils.updatePartitionReassignmentData(updatedPartitionsBeingReassigned.mapValues(_.newReplicas))
> {code}
> This triggers the handler above which adds a new event in the queue. So what 
> you get is an n^2 increase in memory where n is the number of partitions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6134) High memory usage on controller during partition reassignment

Reply via email to