Hi, Thanks for your answers. I see the point that this might be an issue with the profiler, although I cannot see the epoll function in the stack trace. It seems that the NioEventLoop is not using this function but rather seems to work around it (cf. NioEventLoop:220). However, are you using any other profiling tools that you could recommend? Then I could tell you whether other tools are confirming this issue.
To answer some of your questions regarding my setup: I could observe this workload in both 0.8.1 and the current master. The exact method is io.netty.channel.nio.NioEventLoop.select(). Interestingly, VisualVM displays SortMergerReading threads as the most time-consuming ones. I would say that my job is rather data-heavy as I am trying to keep my UDFs as efficient as possible. ;) However, the CPUs of the slaves are fully loaded and - if I can trust VisualVM - most of it is for networking and serialization. Cheers, Sebastian -----Original Message----- From: ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] On Behalf Of Stephan Ewen Sent: Mittwoch, 6. Mai 2015 12:25 To: dev@flink.apache.org Subject: Re: NioEventLoop consumes most of the CPU Uful has a good point The NIO epoll wait method leaves the thread in state RUNNABLE. That may explain things. Still, would be good to have more information on your setup. Stephan Am 06.05.2015 10:15 schrieb "Ufuk Celebi" <u...@apache.org>: > I agree with Stephan's points. Thanks for reporting and let's > investigate this further. > > To keep in mind: I think VisualVM is using hprof for CPU sampling, > which has some known issues ( > > http://www.brendangregg.com/blog/2014-06-09/java-cpu-sampling-using-hp > rof.html > ). > For one thing, it's profiling Java's RUNNABLE state, which does not > necessarily correspond to a running Thread (in OS terms) consuming > CPU. The select call (like epollWait()) keeps the Thread in this state. > > > On Tue, May 5, 2015 at 9:23 PM, Stephan Ewen <se...@apache.org> wrote: > > > Hi! > > > > That does not sound right, I agree. Can you tell us a bit more? > > > > - What version of Flink are you using? > > > > - I assume the NIO loop is executed by a Netty thread. Can you tell > > us whether it is from a "io.netty.*" thread, or a "org.jboss.netty.*" > thread? > > The former is from Flink's data network thread, the later from akka. > > > > - Is you job data heavy (data transfer is in progress most of the > > time), > or > > is it compute heavy (network is not fully utilized) > > > > Thanks for your help! > > Stephan > > Am 05.05.2015 16:52 schrieb "Kruse, Sebastian" > ><sebastian.kr...@hpi.de > >: > > > > > Hi everyone, > > > > > > Everytime when I am running jvisualvm on one of the machines in > > > our cluster during a Flink job, I see that NioEventLoop.select() > > > is taking > > 50% > > > to 70% CPU self-time. I wonder how severe this is. It might be > > busy-waiting > > > time that cannot be filled otherwise, but I wanted to ask you if > > > you > also > > > faced this issue and/or you know the cause of that circumstance. > > > > > > Cheers, > > > Sebastian > > > > > >