[ https://issues.apache.org/jira/browse/FLINK-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966691#comment-14966691 ]
Ufuk Celebi commented on FLINK-2865: ------------------------------------ OK, sorry. "There is no limit anymore" was confusing me. Stephan was confused as well I guess. ;) > OutOfMemory error (Direct buffer memory) > ---------------------------------------- > > Key: FLINK-2865 > URL: https://issues.apache.org/jira/browse/FLINK-2865 > Project: Flink > Issue Type: Bug > Components: Distributed Runtime > Affects Versions: 0.10 > Reporter: Greg Hogan > Assignee: Maximilian Michels > Fix For: 0.10 > > > I see the following TaskManager error when using off-heap memory and a > relatively high number of network buffers. Setting > {{taskmanager.memory.off-heap: false}} or halving the number of network > buffers (6 GB instead of 12 GB) results in a successful start. > {noformat} > 18:17:25,912 WARN org.apache.hadoop.util.NativeCodeLoader > - Unable to load native-hadoop library for your platform... using > builtin-java classes where applicable > 18:17:26,024 INFO org.apache.flink.runtime.taskmanager.TaskManager > - > -------------------------------------------------------------------------------- > 18:17:26,024 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Starting TaskManager (Version: 0.10-SNAPSHOT, Rev:d047ddb, > Date:18.10.2015 @ 08:54:59 UTC) > 18:17:26,025 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Current user: ec2-user > 18:17:26,025 INFO org.apache.flink.runtime.taskmanager.TaskManager > - JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - > 1.8/25.60-b23 > 18:17:26,025 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Maximum heap size: 5104 MiBytes > 18:17:26,025 INFO org.apache.flink.runtime.taskmanager.TaskManager > - JAVA_HOME: /usr/java/latest > 18:17:26,026 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Hadoop version: 2.3.0 > 18:17:26,026 INFO org.apache.flink.runtime.taskmanager.TaskManager > - JVM Options: > 18:17:26,026 INFO org.apache.flink.runtime.taskmanager.TaskManager > - -Xms5325M > 18:17:26,026 INFO org.apache.flink.runtime.taskmanager.TaskManager > - -Xmx5325M > 18:17:26,026 INFO org.apache.flink.runtime.taskmanager.TaskManager > - -XX:MaxDirectMemorySize=53248M > 18:17:26,026 INFO org.apache.flink.runtime.taskmanager.TaskManager > - > -Dlog.file=/home/ec2-user/flink/log/flink-ec2-user-taskmanager-0-ip-10-0-98-3.log > 18:17:26,027 INFO org.apache.flink.runtime.taskmanager.TaskManager > - -Dlog4j.configuration=file:/home/ec2-user/flink/conf/log4j.properties > 18:17:26,027 INFO org.apache.flink.runtime.taskmanager.TaskManager > - > -Dlogback.configurationFile=file:/home/ec2-user/flink/conf/logback.xml > 18:17:26,027 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Program Arguments: > 18:17:26,027 INFO org.apache.flink.runtime.taskmanager.TaskManager > - --configDir > 18:17:26,027 INFO org.apache.flink.runtime.taskmanager.TaskManager > - /home/ec2-user/flink/conf > 18:17:26,027 INFO org.apache.flink.runtime.taskmanager.TaskManager > - --streamingMode > 18:17:26,027 INFO org.apache.flink.runtime.taskmanager.TaskManager > - batch > 18:17:26,027 INFO org.apache.flink.runtime.taskmanager.TaskManager > - > -------------------------------------------------------------------------------- > 18:17:26,033 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Maximum number of open file descriptors is 1048576 > 18:17:26,051 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Loading configuration from /home/ec2-user/flink/conf > 18:17:26,079 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Security is not enabled. Starting non-authenticated TaskManager. > 18:17:26,094 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils > - Trying to select the network interface and address to use by connecting > to the leading JobManager. > 18:17:26,094 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils > - TaskManager will try to connect for 10000 milliseconds before falling > back to heuristics > 18:17:26,097 INFO org.apache.flink.runtime.net.ConnectionUtils > - Retrieved new target address /127.0.0.1:6123. > 18:17:26,461 INFO org.apache.flink.runtime.taskmanager.TaskManager > - TaskManager will use hostname/address 'ip-10-0-98-3' (10.0.98.3) for > communication. > 18:17:26,462 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Starting TaskManager in streaming mode BATCH_ONLY > 18:17:26,462 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Starting TaskManager actor system at 10.0.98.3:0 > 18:17:26,735 INFO akka.event.slf4j.Slf4jLogger > - Slf4jLogger started > 18:17:26,767 INFO Remoting > - Starting remoting > 18:17:26,877 INFO Remoting > - Remoting started; listening on addresses > :[akka.tcp://flink@10.0.98.3:47484] > 18:17:26,881 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Starting TaskManager actor > 18:17:26,925 INFO org.apache.flink.runtime.io.network.netty.NettyConfig > - NettyConfig [server address: ip-10-0-98-3/10.0.98.3, server port: 45728, > memory segment size (bytes): 32768, transport type: NIO, number of server > threads: 0 (use Netty's default), number of client threads: 0 (use Netty's > default), server connect backlog: 0 (use Netty's default), client connect > timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)] > 18:17:26,927 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Messages between TaskManager and JobManager have a max timeout of 100000 > milliseconds > 18:17:26,931 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Temporary file directory '/volumes/xvdb/tmp': total 319 GB, usable 319 > GB (100.00% usable) > 18:17:26,931 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Temporary file directory '/volumes/xvdc/tmp': total 319 GB, usable 319 > GB (100.00% usable) > 18:17:32,194 INFO > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated > 12288 MB for network buffer pool (number of memory segments: 393216, bytes > per segment: 32768). > 18:17:32,195 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Using 0.9 of the maximum memory size for Flink managed off-heap memory > (45940 MB). > 18:17:50,371 ERROR org.apache.flink.runtime.taskmanager.TaskManager > - Error while starting up taskManager > java.lang.Exception: OutOfMemory error (Direct buffer memory) while > allocating the TaskManager off-heap memory (48172092966 bytes). Try > increasing the maximum direct memory (-XX:MaxDirectMemorySize) > at > org.apache.flink.runtime.taskmanager.TaskManager$.startTaskManagerComponentsAndActor(TaskManager.scala:1633) > at > org.apache.flink.runtime.taskmanager.TaskManager$.runTaskManager(TaskManager.scala:1460) > at > org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1325) > at > org.apache.flink.runtime.taskmanager.TaskManager$.main(TaskManager.scala:1235) > at > org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.scala) > Caused by: java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:658) > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.flink.runtime.memory.MemoryManager$HybridOffHeapMemoryPool.<init>(MemoryManager.java:661) > at > org.apache.flink.runtime.memory.MemoryManager.<init>(MemoryManager.java:166) > at > org.apache.flink.runtime.taskmanager.TaskManager$.startTaskManagerComponentsAndActor(TaskManager.scala:1618) > ... 4 more > 18:17:50,374 ERROR org.apache.flink.runtime.taskmanager.TaskManager > - Failed to run TaskManager. > java.lang.Exception: OutOfMemory error (Direct buffer memory) while > allocating the TaskManager off-heap memory (48172092966 bytes). Try > increasing the maximum direct memory (-XX:MaxDirectMemorySize) > at > org.apache.flink.runtime.taskmanager.TaskManager$.startTaskManagerComponentsAndActor(TaskManager.scala:1633) > at > org.apache.flink.runtime.taskmanager.TaskManager$.runTaskManager(TaskManager.scala:1460) > at > org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1325) > at > org.apache.flink.runtime.taskmanager.TaskManager$.main(TaskManager.scala:1235) > at > org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.scala) > Caused by: java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:658) > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.flink.runtime.memory.MemoryManager$HybridOffHeapMemoryPool.<init>(MemoryManager.java:661) > at > org.apache.flink.runtime.memory.MemoryManager.<init>(MemoryManager.java:166) > at > org.apache.flink.runtime.taskmanager.TaskManager$.startTaskManagerComponentsAndActor(TaskManager.scala:1618) > ... 4 more > {noformat} > {noformat} > ################################################################################ > # Licensed to the Apache Software Foundation (ASF) under one > # or more contributor license agreements. See the NOTICE file > # distributed with this work for additional information > # regarding copyright ownership. The ASF licenses this file > # to you under the Apache License, Version 2.0 (the > # "License"); you may not use this file except in compliance > # with the License. You may obtain a copy of the License at > # > # http://www.apache.org/licenses/LICENSE-2.0 > # > # Unless required by applicable law or agreed to in writing, software > # distributed under the License is distributed on an "AS IS" BASIS, > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > # See the License for the specific language governing permissions and > # limitations under the License. > ################################################################################ > jobmanager.web.history: 50 > taskmanager.debug.memory.startLogThread: true > taskmanager.debug.memory.logIntervalMs: 1000 > taskmanager.memory.fraction: 0.9 > taskmanager.memory.off-heap: true > taskmanager.runtime.hashjoin-bloom-filters: true > taskmanager.runtime.max-fan: 1024 > #============================================================================== > # Common > #============================================================================== > # The host on which the JobManager runs. Only used in non-high-availability > mode. > # The JobManager process will use this hostname to bind the listening servers > to. > # The TaskManagers will try to connect to the JobManager on that host. > jobmanager.rpc.address: localhost > # The port where the JobManager's main actor system listens for messages. > jobmanager.rpc.port: 6123 > # The heap size for the JobManager JVM > jobmanager.heap.mb: 1024 > # The heap size for the TaskManager JVM > taskmanager.heap.mb: 53248 > # The number of task slots that each TaskManager offers. Each slot runs one > parallel pipeline. > taskmanager.numberOfTaskSlots: 32 > # The parallelism used for programs that did not specify and other > parallelism. > parallelism.default: 32 > #============================================================================== > # Web Frontend > #============================================================================== > # The port under which the web-based runtime monitor listens. > # A value of -1 deactivates the web server. > jobmanager.web.port: 8081 > # The port uder which the standalone web client > # (for job upload and submit) listens. > webclient.port: 8080 > # Temporary: Uncomment this to be able to use the new web frontend > jobmanager.new-web-frontend: true > #============================================================================== > # Streaming state checkpointing > #============================================================================== > # The backend that will be used to store operator state checkpoints if > # checkpointing is enabled. > # > # Supported backends: jobmanager, filesystem > state.backend: jobmanager > # Directory for storing checkpoints in a flink supported filesystem > # Note: State backend must be accessible from the JobManager, use file:// > # only for local setups. > # > # state.backend.fs.checkpointdir: hdfs://checkpoints > #============================================================================== > # Advanced > #============================================================================== > # The number of buffers for the network stack. > taskmanager.network.numberOfBuffers: 393216 > # Directories for temporary files. > # > # Add a delimited list for multiple directories, using the system directory > # delimiter (colon ':' on unix) or a comma, e.g.: > # /data1/tmp:/data2/tmp:/data3/tmp > # > # Note: Each directory entry is read from and written to by a different I/O > # thread. You can include the same directory multiple times in order to create > # multiple I/O threads against that directory. This is for example relevant > for > # high-throughput RAIDs. > # > # If not specified, the system-specific Java temporary directory > (java.io.tmpdir > # property) is taken. > taskmanager.tmp.dirs: /volumes/xvdb/tmp:/volumes/xvdc/tmp > # Path to the Hadoop configuration directory. > # > # This configuration is used when writing into HDFS. Unless specified > otherwise, > # HDFS file creation will use HDFS default settings with respect to > block-size, > # replication factor, etc. > # > # You can also directly specify the paths to hdfs-default.xml and > hdfs-site.xml > # via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'. > # > # fs.hdfs.hadoopconf: /path/to/hadoop/conf/ > #============================================================================== > # High Availability > #============================================================================== > # The list of ZooKepper quorum peers that coordinate the high-availability > # setup. This must be a list of the form > # "host_1[:peerPort[:leaderPort]],host_2[:peerPort[:leaderPort]],..." > # > # recovery.mode: zookeeper > # > # ha.zookeeper.quorum: localhost > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)