[
https://issues.apache.org/jira/browse/KAFKA-6343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589131#comment-16589131
]
ASF GitHub Bot commented on KAFKA-6343:
---------------------------------------
junrao closed pull request #4358: KAFKA-6343 Documentation update in OS-level
tuning section: add vm.ma…
URL: https://github.com/apache/kafka/pull/4358
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):
diff --git a/docs/ops.html b/docs/ops.html
index b9e3a4b32a1..c97d31f978d 100644
--- a/docs/ops.html
+++ b/docs/ops.html
@@ -668,10 +668,11 @@ <h4><a id="os" href="#os">OS</a></h4>
<p>
We have seen a few issues running on Windows and Windows is not currently a
well supported platform though we would be happy to change that.
<p>
- It is unlikely to require much OS-level tuning, but there are two
potentially important OS-level configurations:
+ It is unlikely to require much OS-level tuning, but there are three
potentially important OS-level configurations:
<ul>
- <li>File descriptor limits: Kafka uses file descriptors for log segments
and open connections. If a broker hosts many partitions, consider that the
broker needs at least (number_of_partitions)*(partition_size/segment_size) to
track all log segments in addition to the number of connections the broker
makes. We recommend at least 100000 allowed file descriptors for the broker
processes as a starting point.
+ <li>File descriptor limits: Kafka uses file descriptors for log segments
and open connections. If a broker hosts many partitions, consider that the
broker needs at least (number_of_partitions)*(partition_size/segment_size) to
track all log segments in addition to the number of connections the broker
makes. We recommend at least 100000 allowed file descriptors for the broker
processes as a starting point. Note: The mmap() function adds an extra
reference to the file associated with the file descriptor fildes which is not
removed by a subsequent close() on that file descriptor. This reference is
removed when there are no more mappings to the file.
<li>Max socket buffer size: can be increased to enable high-performance
data transfer between data centers as <a
href="http://www.psc.edu/index.php/networking/641-tcp-tune">described here</a>.
+ <li>Maximum number of memory map areas a process may have (aka
vm.max_map_count). <a
href="http://kernel.org/doc/Documentation/sysctl/vm.txt">See the Linux kernel
documentation</a>. You should keep an eye at this OS-level property when
considering the maximum number of partitions a broker may have. By default, on
a number of Linux systems, the value of vm.max_map_count is somewhere around
65535. Each log segment, allocated per partition, requires a pair of
index/timeindex files, and each of these files consumes 1 map area. In other
words, each log segment uses 2 map areas. Thus, each partition requires minimum
2 map areas, as long as it hosts a single log segment. That is to say, creating
50000 partitions on a broker will result allocation of 100000 map areas and
likely cause broker crash with OutOfMemoryError (Map failed) on a system with
default vm.max_map_count. Keep in mind that the number of log segments per
partition varies depending on the segment size, load intensity, retention
policy and, generally, tends to be more than one.
</ul>
<p>
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> OOM as the result of creation of 5k topics
> ------------------------------------------
>
> Key: KAFKA-6343
> URL: https://issues.apache.org/jira/browse/KAFKA-6343
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.10.1.1, 0.10.2.0, 0.10.2.1, 0.11.0.1, 0.11.0.2, 1.0.0
> Environment: RHEL 7, RAM 755GB per host
> Reporter: Alex Dunayevsky
> Priority: Major
>
> *Reproducing*: Create 5k topics *from the code* quickly, without any delays.
> Wait until brokers will finish loading them. This will actually never happen,
> since all brokers will go down one by one after approx 10-15 minutes or more,
> depending on the hardware.
> *Heap*: -Xmx/Xms: 5G, 10G, 50G, 256G, 512G
>
> *Topology*: 3 brokers, 3 zk.
> *Code for 5k topic creation:*
> {code:java}
> package kafka
> import kafka.admin.AdminUtils
> import kafka.utils.{Logging, ZkUtils}
> object TestCreateTopics extends App with Logging {
> val zkConnect = "grid978:2185"
> var zkUtils = ZkUtils(zkConnect, 6000, 6000, isZkSecurityEnabled = false)
> for (topic <- 1 to 5000) {
> AdminUtils.createTopic(
> topic = s"${topic.toString}",
> partitions = 10,
> replicationFactor = 2,
> zkUtils = zkUtils
> )
> logger.info(s"Created topic ${topic.toString}")
> }
> }
> {code}
> *Cause of death:*
> {code:java}
> java.io.IOException: Map failed
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:920)
> at kafka.log.AbstractIndex.<init>(AbstractIndex.scala:61)
> at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:52)
> at kafka.log.LogSegment.<init>(LogSegment.scala:67)
> at kafka.log.Log.loadSegments(Log.scala:255)
> at kafka.log.Log.<init>(Log.scala:108)
> at kafka.log.LogManager.createLog(LogManager.scala:362)
> at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:94)
> at
> kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
> at
> kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:174)
> at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:168)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:242)
> at kafka.cluster.Partition.makeLeader(Partition.scala:168)
> at
> kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:758)
> at
> kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:757)
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
> at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:757)
> at
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:703)
> at
> kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:82)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.OutOfMemoryError: Map failed
> at sun.nio.ch.FileChannelImpl.map0(Native Method)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:917)
> ... 28 more
> {code}
> Broker restart results the same OOM issues. All brokers will not be able to
> start again.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)