On 2019/10/05 11:12:46, Rémy Maucherat <r...@apache.org> wrote:
> On Fri, Oct 4, 2019 at 10:38 PM Emmanuel Lecharny <elecha...@apache.org>
> wrote:
>
> > Hi remy,
> >
> > On 2019/10/04 15:37:36, Rémy Maucherat <r...@apache.org> wrote:
> > > On Fri, Oct 4, 2019 at 3:40 PM Emmanuel Lecharny <elecha...@apache.org>
> > > wrote:
> > >
> > > > Hi !
> > > >
> > > > I filled a ticket yesterday about a pb we face with many NIO framework,
> > > > which I think could hit Tomcat too (see
> > > > https://bz.apache.org/bugzilla/show_bug.cgi?id=63802). Actually, I
> > think
> > > > I'm facing this problem on a project I'm working on atm.
> > > >
> > > > Remy suggested we discuss it on this mailing list.
> > > >
> > > > Bottom line, what happens is that under some circumstances not well
> > > > defined, the call to select() might end to an infinite loop eating all
> > the
> > > > CPU (select() returns 0, so select is immediately called again, and we
> > > > loop).
> > > >
> > > > In various NIO framworks - and being a MINA committer, I have
> > implemented
> > > > the discussed workaround -, we are controlling this situation by
> > breaking
> > > > this infinite loop this way :
> > > > - if the select() call returns 0
> > > > - then if we have called select() more than N times in less than M ms
> > > > (N=10, M=100 in MINA)
> > > > - then we create a new Selector, register all the selectionKey that
> > were
> > > > registered on the broken selector, and ditch the old selector.
> > > >
> > > > This workaround does not cost a lot when the selector works as
> > designed,
> > > > as a select() call should never return 0.
> > > >
> > >
> > > There's actually a very similar hack for APR that has been placed by
> > myself
> > > a long time ago [
> > >
> > https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/AprEndpoint.java#L1410
> > > ], I don't even know if it's actually useful and it's certainly not
> > > testable. Overall what it does is pretty terrible :(
> > >
> > > Personally I would like to know more about this "long lived bug either in
> > > the JDK or even in Linux epoll implementation" like actual platform
> > details
> > > and JVM versions used since I've never heard about it in the first place.
> >
> > for the record, I had a discussion yesterday with one of my close friend
> > and co-worker back in the 90's. He remember clearly, while working on the
> > SUN TCP stack, that such a problem occorded back then. Yes, 25 years
> > ago... Ok, that was just for the fun, it's likely be perfectly unrelated ;-)
> >
> > At MINA, we were hit by this bug in 2009 (see
> > https://issues.apache.org/jira/browse/DIRMINA-678), and it was linked to
> > a bug reported on Jetty (
> > http://jetty.4.x6.nabble.com/jira-Created-JETTY-937-SelectChannelConnector-100-CPU-usage-on-Linux-td36385.html),
> > itself related to some JDK bugs, supposedly fixed since then.
> >
> > I had a long conversation with Jean-François Arcand somewhere around this
> > date, and he suggested we adopt the same workaround he applied to Grizzly.
> > We also had a convo with Alan Bateman during a Java One in SF, but nothing
> > specific resulted from this convo, except that AFAICR, he aknowledge there
> > is an issue.
> >
> > So this problem started with JDK 6, but I can't guarantee it wasn't
> > already present in JDK 5 or 4, on linux, and not on any other OS like
> > windows or Mac OSX. It's not exactly fresh in my mind, because it was
> > already 10 years ago.
> >
>
> NIO support was added in Tomcat 6.0, supporting Java 5+, it wasn't very
> good then. It's only with Java 6 that NIO started getting epoll support ant
> I'm pretty sure the original issue did not actually survive. Despite the
> popularity of the NIO connector this was not reported for Tomcat, if we got
> the report at the same time as the others it would be more logical so
> something is different here.
> https://github.com/netty/netty/issues/327 has details but I'm still not
> very convinced. You should give details on your platform and everything
> else since it's obvious at this point this is far less common with Tomcat.
There is not much I can tell about this issue, beside what I already said. I
can just stress out that for a few users of MINA, this was a real burden, and
the very same for Netty, Grizzly and Jetty. I would be *very* surprised that
those four different projects, all based on NIO, are facing such an issue, but
that Tomcat is immune to it.
> You should try the NIO2 connector first.
I'll do that right away. if it fixes the 100% CPU usage I see from time to
time, then I would consider the issue resolved (there is no mean to workaround
something in the NIO code if NIO2 solves it...)
Thanks !
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org