[ https://issues.apache.org/jira/browse/KAFKA-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907672#comment-16907672 ]
David Jacot commented on KAFKA-8796: ------------------------------------ [~rmarou] Have you tried to experiment with throttling the replication traffic? There is doc available here: [https://kafka.apache.org/documentation/#rep-throttle] > A broker joining the cluster should be able to replicate without impacting > the cluster > -------------------------------------------------------------------------------------- > > Key: KAFKA-8796 > URL: https://issues.apache.org/jira/browse/KAFKA-8796 > Project: Kafka > Issue Type: Bug > Affects Versions: 1.1.0 > Reporter: Marouane RAJI > Priority: Major > Attachments: image-2019-08-13-10-26-19-282.png, > image-2019-08-13-10-28-42-337.png > > > Hi, > We run a cluster of 50 brokers, 1.4M msgs/sec at max, on AWS. We were using > m4.2xlarge. We are now moving to m5.2xlarge. Everytime we replace a broker > from scratch (EBSs are linked to ec2 instance..), the byte sent on the > replaced broker increase significantly and that seem to impact the cluster, > increasing the produce time and fetch time.. > This is our configuration per broker : > > > {code:java} > broker.id=11 > ############################# Socket Server Settings > ############################# > # The port the socket server listens on > port=9092 > advertised.host.name=ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com > # The number of threads handling network requests > num.network.threads=32 > # The number of threads doing disk I/O > num.io.threads=16socket server socket.receive.buffer.bytes=1048576 > socket.request.max.bytes=104857600 # The max time a connection can be idle > connections.max.idle.ms = 60000 > num.partitions=2 > default.replication.factor=2 > auto.leader.rebalance.enable=true > delete.topic.enable=true > compression.type=producer > log.message.format.version=0.9.0.1 > message.max.bytes=8000000 > # The minimum age of a log file to be eligible for deletion > log.retention.hours=48 > log.retention.bytes=3000000000 > log.segment.bytes=268435456 > log.retention.check.interval.ms=60000 > log.cleaner.enable=true > log.cleaner.dedupe.buffer.size=268435456 > replica.fetch.max.bytes=8388608 > replica.fetch.wait.max.ms=500 > replica.lag.time.max.ms=10000 > num.replica.fetchers = 3 > # Auto creation of topics on the server > auto.create.topics.enable=true > controlled.shutdown.enable=true > inter.broker.protocol.version=0.10.2 > unclean.leader.election.enabled=True > {code} > > This is what we notice on replication : > I high increase in byte received on the replaced broker > > !image-2019-08-13-10-26-19-282.png! > !image-2019-08-13-10-28-42-337.png! > You can't see it the graph above but the increase in produce time stayed high > for 20minutes.. > We didn't see anything out of the ordinary in the logs. > Please let us know if there is anything wrong in our config or if it is a > potential issue that needs fixing with kafka. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.14#76016)