[jira] [Commented] (KAFKA-6343) OOM as the result of creation of 5k topics

ASF GitHub Bot (JIRA) Wed, 22 Aug 2018 10:14:39 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-6343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589131#comment-16589131
 ]


ASF GitHub Bot commented on KAFKA-6343:
---------------------------------------

junrao closed pull request #4358: KAFKA-6343 Documentation update in OS-level 
tuning section: add vm.ma…
URL: https://github.com/apache/kafka/pull/4358
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/ops.html b/docs/ops.html
index b9e3a4b32a1..c97d31f978d 100644
--- a/docs/ops.html
+++ b/docs/ops.html
@@ -668,10 +668,11 @@ <h4><a id="os" href="#os">OS</a></h4>
   <p>
   We have seen a few issues running on Windows and Windows is not currently a 
well supported platform though we would be happy to change that.
   <p>
-  It is unlikely to require much OS-level tuning, but there are two 
potentially important OS-level configurations:
+  It is unlikely to require much OS-level tuning, but there are three 
potentially important OS-level configurations:
   <ul>
-      <li>File descriptor limits: Kafka uses file descriptors for log segments 
and open connections.  If a broker hosts many partitions, consider that the 
broker needs at least (number_of_partitions)*(partition_size/segment_size) to 
track all log segments in addition to the number of connections the broker 
makes.  We recommend at least 100000 allowed file descriptors for the broker 
processes as a starting point.
+      <li>File descriptor limits: Kafka uses file descriptors for log segments 
and open connections.  If a broker hosts many partitions, consider that the 
broker needs at least (number_of_partitions)*(partition_size/segment_size) to 
track all log segments in addition to the number of connections the broker 
makes. We recommend at least 100000 allowed file descriptors for the broker 
processes as a starting point. Note: The mmap() function adds an extra 
reference to the file associated with the file descriptor fildes which is not 
removed by a subsequent close() on that file descriptor. This reference is 
removed when there are no more mappings to the file.
       <li>Max socket buffer size: can be increased to enable high-performance 
data transfer between data centers as <a 
href="http://www.psc.edu/index.php/networking/641-tcp-tune";>described here</a>.
+      <li>Maximum number of memory map areas a process may have (aka 
vm.max_map_count). <a 
href="http://kernel.org/doc/Documentation/sysctl/vm.txt";>See the Linux kernel 
documentation</a>. You should keep an eye at this OS-level property when 
considering the maximum number of partitions a broker may have. By default, on 
a number of Linux systems, the value of vm.max_map_count is somewhere around 
65535. Each log segment, allocated per partition, requires a pair of 
index/timeindex files, and each of these files consumes 1 map area. In other 
words, each log segment uses 2 map areas. Thus, each partition requires minimum 
2 map areas, as long as it hosts a single log segment. That is to say, creating 
50000 partitions on a broker will result allocation of 100000 map areas and 
likely cause broker crash with OutOfMemoryError (Map failed) on a system with 
default vm.max_map_count. Keep in mind that the number of log segments per 
partition varies depending on the segment size, load intensity, retention 
policy and, generally, tends to be more than one.
   </ul>
   <p>
 


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> OOM as the result of creation of 5k topics
> ------------------------------------------
>
>                 Key: KAFKA-6343
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6343
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.10.1.1, 0.10.2.0, 0.10.2.1, 0.11.0.1, 0.11.0.2, 1.0.0
>         Environment: RHEL 7, RAM 755GB per host
>            Reporter: Alex Dunayevsky
>            Priority: Major
>
> *Reproducing*: Create 5k topics *from the code* quickly, without any delays. 
> Wait until brokers will finish loading them. This will actually never happen, 
> since all brokers will go down one by one after approx 10-15 minutes or more, 
> depending on the hardware.
> *Heap*: -Xmx/Xms: 5G, 10G, 50G, 256G, 512G
>  
> *Topology*: 3 brokers, 3 zk.
> *Code for 5k topic creation:*
> {code:java}
> package kafka
> import kafka.admin.AdminUtils
> import kafka.utils.{Logging, ZkUtils}
> object TestCreateTopics extends App with Logging {
>   val zkConnect = "grid978:2185"
>   var zkUtils = ZkUtils(zkConnect, 6000, 6000, isZkSecurityEnabled = false)
>   for (topic <- 1 to 5000) {
>     AdminUtils.createTopic(
>       topic             = s"${topic.toString}",
>       partitions        = 10,
>       replicationFactor = 2,
>       zkUtils           = zkUtils
>     )
>     logger.info(s"Created topic ${topic.toString}")
>   }
> }
> {code}
> *Cause of death:*
> {code:java}
>     java.io.IOException: Map failed
>         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:920)
>         at kafka.log.AbstractIndex.<init>(AbstractIndex.scala:61)
>         at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:52)
>         at kafka.log.LogSegment.<init>(LogSegment.scala:67)
>         at kafka.log.Log.loadSegments(Log.scala:255)
>         at kafka.log.Log.<init>(Log.scala:108)
>         at kafka.log.LogManager.createLog(LogManager.scala:362)
>         at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:94)
>         at 
> kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
>         at 
> kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>         at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:174)
>         at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:168)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>         at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:242)
>         at kafka.cluster.Partition.makeLeader(Partition.scala:168)
>         at 
> kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:758)
>         at 
> kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:757)
>         at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>         at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>         at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>         at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>         at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:757)
>         at 
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:703)
>         at 
> kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148)
>         at kafka.server.KafkaApis.handle(KafkaApis.scala:82)
>         at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.OutOfMemoryError: Map failed
>         at sun.nio.ch.FileChannelImpl.map0(Native Method)
>         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:917)
>         ... 28 more
> {code}
> Broker restart results the same OOM issues. All brokers will not be able to 
> start again. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-6343) OOM as the result of creation of 5k topics

Reply via email to