[ https://issues.apache.org/jira/browse/FLINK-36348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuannan Su resolved FLINK-36348. -------------------------------- Resolution: Fixed > Netty shuffle direct memory consumption end-to-end test failed due to direct > memory OOM > --------------------------------------------------------------------------------------- > > Key: FLINK-36348 > URL: https://issues.apache.org/jira/browse/FLINK-36348 > Project: Flink > Issue Type: Bug > Components: Tests > Affects Versions: 2.0-preview > Reporter: Weijie Guo > Assignee: Xuannan Su > Priority: Major > > Found the root cause from downloaded artifacts. > {code:java} > org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > Direct buffer memory (connection to 'localhost/127.0.0.1:45889 > [localhost:42633-cbcb9d]') > at > org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.exceptionCaught(CreditBasedPartitionRequestClientHandler.java:175) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:325) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:317) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:143) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:265) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:238) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:231) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1398) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:258) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:238) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:895) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.fulfillConnectPromise(AbstractEpollChannel.java:658) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:691) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at java.lang.Thread.run(Thread.java:829) ~[?:?] > Caused by: java.lang.OutOfMemoryError: Direct buffer memory. The direct > out-of-memory error has occurred. This can mean two things: either job(s) > require(s) a larger size of JVM direct memory or there is a direct memory > leak. The direct memory can be allocated by user code or some of its > dependencies. In this case 'taskmanager.memory.task.off-heap.size' > configuration option should be increased. Flink framework and its > dependencies also consume the direct memory, mostly for network > communication. The most of network memory is managed by Flink and should not > result in out-of-memory error. In certain special cases, in particular for > jobs with high parallelism, the framework may require more direct memory > which is not managed by Flink. In this case > 'taskmanager.memory.framework.off-heap.size' configuration option should be > increased. If the error persists then there is probably a direct memory leak > in user code or some of its dependencies which has to be investigated and > fixed. The task executor has to be shutdown... > at java.nio.Bits.reserveMemory(Bits.java:175) ~[?:?] > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) ~[?:?] > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) ~[?:?] > at > org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:717) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:692) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:215) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.tcacheAllocateSmall(PoolArena.java:180) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.allocate(PoolArena.java:137) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.allocate(PoolArena.java:129) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:395) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.runtime.io.network.netty.BufferResponseDecoder.onChannelActive(BufferResponseDecoder.java:54) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelActive(NettyMessageClientDecoderDelegate.java:74) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:262) > ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] > ... 14 more > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62343&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=10d6732b-d79a-5c68-62a5-668516de5313&l=13005 -- This message was sent by Atlassian Jira (v8.20.10#820010)