On Mon, Oct 7, 2019 at 11:15 AM Emmanuel Lecharny <elecha...@apache.org> wrote:
> > > On 2019/10/05 11:12:46, Rémy Maucherat <r...@apache.org> wrote: > > On Fri, Oct 4, 2019 at 10:38 PM Emmanuel Lecharny <elecha...@apache.org> > > wrote: > > > > > Hi remy, > > > > > > On 2019/10/04 15:37:36, Rémy Maucherat <r...@apache.org> wrote: > > > > On Fri, Oct 4, 2019 at 3:40 PM Emmanuel Lecharny < > elecha...@apache.org> > > > > wrote: > > > > > > > > > Hi ! > > > > > > > > > > I filled a ticket yesterday about a pb we face with many NIO > framework, > > > > > which I think could hit Tomcat too (see > > > > > https://bz.apache.org/bugzilla/show_bug.cgi?id=63802). Actually, I > > > think > > > > > I'm facing this problem on a project I'm working on atm. > > > > > > > > > > Remy suggested we discuss it on this mailing list. > > > > > > > > > > Bottom line, what happens is that under some circumstances not well > > > > > defined, the call to select() might end to an infinite loop eating > all > > > the > > > > > CPU (select() returns 0, so select is immediately called again, > and we > > > > > loop). > > > > > > > > > > In various NIO framworks - and being a MINA committer, I have > > > implemented > > > > > the discussed workaround -, we are controlling this situation by > > > breaking > > > > > this infinite loop this way : > > > > > - if the select() call returns 0 > > > > > - then if we have called select() more than N times in less than M > ms > > > > > (N=10, M=100 in MINA) > > > > > - then we create a new Selector, register all the selectionKey that > > > were > > > > > registered on the broken selector, and ditch the old selector. > > > > > > > > > > This workaround does not cost a lot when the selector works as > > > designed, > > > > > as a select() call should never return 0. > > > > > > > > > > > > > There's actually a very similar hack for APR that has been placed by > > > myself > > > > a long time ago [ > > > > > > > > https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/AprEndpoint.java#L1410 > > > > ], I don't even know if it's actually useful and it's certainly not > > > > testable. Overall what it does is pretty terrible :( > > > > > > > > Personally I would like to know more about this "long lived bug > either in > > > > the JDK or even in Linux epoll implementation" like actual platform > > > details > > > > and JVM versions used since I've never heard about it in the first > place. > > > > > > for the record, I had a discussion yesterday with one of my close > friend > > > and co-worker back in the 90's. He remember clearly, while working on > the > > > SUN TCP stack, that such a problem occorded back then. Yes, 25 years > > > ago... Ok, that was just for the fun, it's likely be perfectly > unrelated ;-) > > > > > > At MINA, we were hit by this bug in 2009 (see > > > https://issues.apache.org/jira/browse/DIRMINA-678), and it was linked > to > > > a bug reported on Jetty ( > > > > http://jetty.4.x6.nabble.com/jira-Created-JETTY-937-SelectChannelConnector-100-CPU-usage-on-Linux-td36385.html > ), > > > itself related to some JDK bugs, supposedly fixed since then. > > > > > > I had a long conversation with Jean-François Arcand somewhere around > this > > > date, and he suggested we adopt the same workaround he applied to > Grizzly. > > > We also had a convo with Alan Bateman during a Java One in SF, but > nothing > > > specific resulted from this convo, except that AFAICR, he aknowledge > there > > > is an issue. > > > > > > So this problem started with JDK 6, but I can't guarantee it wasn't > > > already present in JDK 5 or 4, on linux, and not on any other OS like > > > windows or Mac OSX. It's not exactly fresh in my mind, because it was > > > already 10 years ago. > > > > > > > NIO support was added in Tomcat 6.0, supporting Java 5+, it wasn't very > > good then. It's only with Java 6 that NIO started getting epoll support > ant > > I'm pretty sure the original issue did not actually survive. Despite the > > popularity of the NIO connector this was not reported for Tomcat, if we > got > > the report at the same time as the others it would be more logical so > > something is different here. > > https://github.com/netty/netty/issues/327 has details but I'm still not > > very convinced. You should give details on your platform and everything > > else since it's obvious at this point this is far less common with > Tomcat. > > There is not much I can tell about this issue, beside what I already said. > I can just stress out that for a few users of MINA, this was a real burden, > and the very same for Netty, Grizzly and Jetty. I would be *very* surprised > that those four different projects, all based on NIO, are facing such an > issue, but that Tomcat is immune to it. > One person on the Netty issue I linked reported it on Tomcat, that's the only one I could fine so it's far less common. It could still be useful to give info on the platform (was Java 11 and a recent Linux like RHEL8/Fedora tested ?) and use pattern. If the issue still happens, I think this needs to be reported with OpenJDK (with details since it needs to be reproducible ...). > > > You should try the NIO2 connector first. > > I'll do that right away. if it fixes the 100% CPU usage I see from time to > time, then I would consider the issue resolved (there is no mean to > workaround something in the NIO code if NIO2 solves it...) > Well, the main point is to know the behavior of NIO2 and that's it, what happens with NIO is independent. Rémy > > Thanks ! > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >