[
https://issues.apache.org/jira/browse/KAFKA-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157991#comment-16157991
]
Ted Yu commented on KAFKA-5857:
-------------------------------
bq. After step 5
You listed 4 steps. Did you mean 'after step 4' ?
> Excessive heap usage on controller node during reassignment
> -----------------------------------------------------------
>
> Key: KAFKA-5857
> URL: https://issues.apache.org/jira/browse/KAFKA-5857
> Project: Kafka
> Issue Type: Bug
> Components: controller
> Affects Versions: 0.11.0.0
> Environment: CentOs 7, Java 1.8
> Reporter: Raoufeh Hashemian
> Attachments: CPU.png, disk_write_x.png, memory.png,
> reassignment_plan.txt
>
>
> I was trying to expand our kafka cluster of 6 broker nodes to 12 broker
> nodes.
> Before expansion, we had a single topic with 960 partitions and a replication
> factor of 3. So each node had 480 partitions. The size of data in each node
> was 3TB .
> To do the expansion, I submitted a partition reassignment plan (see attached
> file for the current/new assignments). The plan was optimized to minimize
> data movement and be rack aware.
> When I submitted the plan, it took approximately 3 hours for moving data from
> old to new nodes to complete. After that, it started deleting source
> partitions (I say this based on the number of file descriptors) and
> rebalancing leaders which has not been successful. Meanwhile, the heap usage
> in the controller node started to go up with a large slope (along with long
> GC times) and it took 5 hours for the controller to go out of memory and
> another controller started to have the same behaviour for another 4 hours. At
> this time the zookeeper ran out of disk and the service stopped.
> To recover from this condition:
> 1) Removed zk logs to free up disk and restarted all 3 zk nodes
> 2) Deleted /kafka/admin/reassign_partitions node from zk
> 3) Had to do unclean restarts of kafka service on oom controller nodes which
> took 3 hours to complete . After this stage there was still 676 under
> replicated partitions.
> 4) Do a clean restart on all 12 broker nodes.
> After step 5 , number of under replicated nodes went to 0.
> So I was wondering if this memory footprint from controller is expected for
> 1k partitions ? Did we do sth wrong or it is a bug?
> Attached are some resource usage graph during this 30 hours event and the
> reassignment plan. I'll try to add log files as well
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)