On Mon, Oct 7, 2019 at 11:15 AM Emmanuel Lecharny <elecha...@apache.org>
wrote:

>
>
> On 2019/10/05 11:12:46, Rémy Maucherat <r...@apache.org> wrote:
> > On Fri, Oct 4, 2019 at 10:38 PM Emmanuel Lecharny <elecha...@apache.org>
> > wrote:
> >
> > > Hi remy,
> > >
> > > On 2019/10/04 15:37:36, Rémy Maucherat <r...@apache.org> wrote:
> > > > On Fri, Oct 4, 2019 at 3:40 PM Emmanuel Lecharny <
> elecha...@apache.org>
> > > > wrote:
> > > >
> > > > > Hi !
> > > > >
> > > > > I filled a ticket yesterday about a pb we face with many NIO
> framework,
> > > > > which I think could hit Tomcat too (see
> > > > > https://bz.apache.org/bugzilla/show_bug.cgi?id=63802). Actually, I
> > > think
> > > > > I'm facing this problem on a project I'm working on atm.
> > > > >
> > > > > Remy suggested we discuss it on this mailing list.
> > > > >
> > > > > Bottom line, what happens is that under some circumstances not well
> > > > > defined, the call to select() might end to an infinite loop eating
> all
> > > the
> > > > > CPU (select() returns 0, so select is immediately called again,
> and we
> > > > > loop).
> > > > >
> > > > > In various NIO framworks - and being a MINA committer, I have
> > > implemented
> > > > > the discussed workaround -, we are controlling this situation by
> > > breaking
> > > > > this infinite loop this way :
> > > > > - if the select() call returns 0
> > > > > - then if we have called select() more than N times in less than M
> ms
> > > > > (N=10, M=100 in MINA)
> > > > > - then we create a new Selector, register all the selectionKey that
> > > were
> > > > > registered on the broken selector, and ditch the old selector.
> > > > >
> > > > > This workaround does not cost a lot when the selector works as
> > > designed,
> > > > > as a select() call should never return 0.
> > > > >
> > > >
> > > > There's actually a very similar hack for APR that has been placed by
> > > myself
> > > > a long time ago [
> > > >
> > >
> https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/AprEndpoint.java#L1410
> > > > ], I don't even know if it's actually useful and it's certainly not
> > > > testable. Overall what it does is pretty terrible :(
> > > >
> > > > Personally I would like to know more about this "long lived bug
> either in
> > > > the JDK or even in Linux epoll implementation" like actual platform
> > > details
> > > > and JVM versions used since I've never heard about it in the first
> place.
> > >
> > > for the record, I had a discussion yesterday with one of my close
> friend
> > > and co-worker back in the 90's. He remember clearly, while working on
> the
> > > SUN TCP stack,  that such a problem occorded back then. Yes, 25 years
> > > ago... Ok, that was just for the fun, it's likely be perfectly
> unrelated ;-)
> > >
> > > At MINA, we were hit by this bug in 2009 (see
> > > https://issues.apache.org/jira/browse/DIRMINA-678), and it was linked
> to
> > > a bug reported on Jetty (
> > >
> http://jetty.4.x6.nabble.com/jira-Created-JETTY-937-SelectChannelConnector-100-CPU-usage-on-Linux-td36385.html
> ),
> > > itself related to some JDK bugs, supposedly fixed since then.
> > >
> > > I had a long conversation with Jean-François Arcand somewhere around
> this
> > > date, and he suggested we adopt the same workaround he applied to
> Grizzly.
> > > We also had a convo with Alan Bateman during a Java One in SF, but
> nothing
> > > specific resulted from this convo, except that AFAICR, he aknowledge
> there
> > > is an issue.
> > >
> > > So this problem started with JDK 6, but I can't guarantee it wasn't
> > > already present in JDK 5 or 4, on linux, and not on any other OS like
> > > windows or Mac OSX. It's not exactly fresh in my mind, because it was
> > > already 10 years ago.
> > >
> >
> > NIO support was added in Tomcat 6.0, supporting Java 5+, it wasn't very
> > good then. It's only with Java 6 that NIO started getting epoll support
> ant
> > I'm pretty sure the original issue did not actually survive. Despite the
> > popularity of the NIO connector this was not reported for Tomcat, if we
> got
> > the report at the same time as the others it would be more logical so
> > something is different here.
> > https://github.com/netty/netty/issues/327 has details but I'm still not
> > very convinced. You should give details on your platform and everything
> > else since it's obvious at this point this is far less common with
> Tomcat.
>
> There is not much I can tell about this issue, beside what I already said.
> I can just stress out that for a few users of MINA, this was a real burden,
> and the very same for Netty, Grizzly and Jetty. I would be *very* surprised
> that those four different projects, all based on NIO, are facing such an
> issue, but that Tomcat is immune to it.
>

One person on the Netty issue I linked reported it on Tomcat, that's the
only one I could fine so it's far less common. It could still be useful to
give info on the platform (was Java 11 and a recent Linux like RHEL8/Fedora
tested ?) and use pattern. If the issue still happens, I think this needs
to be reported with OpenJDK (with details since it needs to be reproducible
...).


>
> > You should try the NIO2 connector first.
>
> I'll do that right away. if it fixes the 100% CPU usage I see from time to
> time, then I would consider the issue resolved (there is no mean to
> workaround something in the NIO code if NIO2 solves it...)
>

Well, the main point is to know the behavior of NIO2 and that's it, what
happens with NIO is independent.

Rémy


>
> Thanks !
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>

Reply via email to