Hi,

Thanks for your answers. I see the point that this might be an issue with the 
profiler, although I cannot see the epoll function in the stack trace. It seems 
that the NioEventLoop is not using this function but rather seems to work 
around it (cf. NioEventLoop:220). However, are you using any other profiling 
tools that you could recommend? Then I could tell you whether other tools are 
confirming this issue.

To answer some of your questions regarding my setup:
I could observe this workload in both 0.8.1 and the current master.

The exact method is io.netty.channel.nio.NioEventLoop.select(). Interestingly, 
VisualVM displays SortMergerReading threads as the most time-consuming ones.

I would say that my job is rather data-heavy as I am trying to keep my UDFs as 
efficient as possible. ;) However, the CPUs of the slaves are fully loaded and 
- if I can trust VisualVM - most of it is for networking and serialization.

Cheers,
Sebastian

-----Original Message-----
From: ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] On Behalf Of Stephan 
Ewen
Sent: Mittwoch, 6. Mai 2015 12:25
To: dev@flink.apache.org
Subject: Re: NioEventLoop consumes most of the CPU

Uful has a good point
The NIO epoll wait method leaves the thread in state RUNNABLE. That may explain 
things.

Still, would be good to have more information on your setup.

Stephan
 Am 06.05.2015 10:15 schrieb "Ufuk Celebi" <u...@apache.org>:

> I agree with Stephan's points. Thanks for reporting and let's 
> investigate this further.
>
> To keep in mind: I think VisualVM is using hprof for CPU sampling, 
> which has some known issues (
>
> http://www.brendangregg.com/blog/2014-06-09/java-cpu-sampling-using-hp
> rof.html
> ).
> For one thing, it's profiling Java's RUNNABLE state, which does not 
> necessarily correspond to a running Thread (in OS terms) consuming 
> CPU. The select call (like epollWait()) keeps the Thread in this state.
>
>
> On Tue, May 5, 2015 at 9:23 PM, Stephan Ewen <se...@apache.org> wrote:
>
> > Hi!
> >
> > That does not sound right, I agree. Can you tell us a bit more?
> >
> > - What version of Flink are you using?
> >
> > - I assume the NIO loop is executed by a Netty thread. Can you tell 
> > us whether it is from a "io.netty.*" thread, or a "org.jboss.netty.*"
> thread?
> > The former is from Flink's data network thread, the later from akka.
> >
> > - Is you job data heavy (data transfer is in progress most of the 
> > time),
> or
> > is it compute heavy (network is not fully utilized)
> >
> > Thanks for your help!
> > Stephan
> >  Am 05.05.2015 16:52 schrieb "Kruse, Sebastian" 
> ><sebastian.kr...@hpi.de
> >:
> >
> > > Hi everyone,
> > >
> > > Everytime when I am running jvisualvm on one of the machines in 
> > > our cluster during a Flink job, I see that NioEventLoop.select() 
> > > is taking
> > 50%
> > > to 70% CPU self-time. I wonder how severe this is. It might be
> > busy-waiting
> > > time that cannot be filled otherwise, but I wanted to ask you if 
> > > you
> also
> > > faced this issue and/or you know the cause of that circumstance.
> > >
> > > Cheers,
> > > Sebastian
> > >
> >
>

Reply via email to