On Fri, Oct 29, 2021 at 2:02 PM Rémy Maucherat <r...@apache.org> wrote: > > On Fri, Oct 29, 2021 at 9:28 AM Suvendu Sekhar Mondal <suv3...@gmail.com> > wrote: > > > > Hello Chris, > > > > On Fri, Oct 29, 2021 at 2:46 AM Christopher Schultz > > <ch...@christopherschultz.net> wrote: > > > > > > Suvendu, > > > > > > On 10/28/21 12:55, Suvendu Sekhar Mondal wrote: > > > > Hello Everyone, > > > > > > > > I was investigating one thread pool exhaustion issue. Thread dump > > > > analysis showed that all HTTP threads were waiting for a ReentrantLock > > > > object. Object address 0x000000066d727f28 were same for all of the > > > > waiting threads: > > > > > > > > "http-nio-18100-exec-86" #32808 daemon prio=5 os_prio=0 > > > > tid=0x0000000051835800 nid=0x29bc waiting on condition > > > > [0x000000007a5be000] > > > > java.lang.Thread.State: WAITING (parking) > > > > at sun.misc.Unsafe.park(Native Method) > > > > - parking to wait for <0x000000066d727f28> (a > > > > java.util.concurrent.locks.ReentrantLock$NonfairSync) > > > > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > > > > at > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > > > > at > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > > > > at > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > > > > at > > > > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > > > > at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > > > > at org.apache.catalina.realm.JNDIRealm.get(JNDIRealm.java:2385) > > > > at org.apache.catalina.realm.JNDIRealm.authenticate(JNDIRealm.java:1274) > > > > > > > > There was no hint in the thread dump about which thread was owning the > > > > lock. Luckily, one heap dump was taken before generating thread dump. > > > > When I queried the heap dump for that ReentrantLock object, I saw that > > > > another thread(http-nio-18100-exec-4) was holding the > > > > lock(exclusiveOwnerThread). There was NO trace of > > > > http-nio-18100-exec-4 thread in any of the thread dumps! So it was a > > > > "lock without an owner" case. > > > > > > I think you are looking at several pieces of evidence that may or may > > > not correlate to each other at all. The fact that the thread wasn't in > > > the thread dump indicates that the thread (or even the whole JVM) had > > > terminated between the time you took the heap-dump and the thread dump. > > > Most likely, the monitor was owned by another thread when you took your > > > thread-dump. Try using other tools which *do* reveal the lock-holders > > > identity. > > > > > > > This issue has happened a few times. "Busy Thread Count" was high > > during the problem period. JVM was up and running when I collected > > heap and thread dumps - pid was not changed in-between. I used jstack, > > visualvm, jcmd - nothing revealed owing thread details. Only heap > > dumps had some information on that object and which thread was holding > > onto it. Here is a snap: https://pasteboard.co/D7dV3jej6zId.jpg > > > > I can simulate similar blocking without Tomcat with dummy code. There > > also nothing reveals the owner's identity except the heap dump. Here > > is sample: https://gist.github.com/suv3ndu/2ec9fe660d2b833996817ed62186eac2 > > > > > > After glancing through the Tomcat’s JNDIRealm.get() code and > > > > beyond[1], I can see lock is being acquired on singleConnectionLock. > > > > That lock is getting released either in the close() or release() > > > > method. So, if something bad happens to the thread which is trying to > > > > establish a connection, then lock will be held without a proper owner > > > > and a thread blocking situation will be created. Am I interpreting the > > > > code correctly? Should we not handle any failure inside get()? > > > > > > > > Also, I still have not got the reason why the thread got terminated. > > > > Any suggestions on how I can enable any specific logging? > > > > > > > > My setup is: > > > > Tomcat version: 9.0.39 > > > > Connector: NIO > > > > JDK: AdoptOpenJDK: 1.8.192 > > > > OS: Windows 2016 > > > > > > Looks like you need a whole bunch of upgrades. Search the Tomcat 9.x > > > changelog for "JNDIRealm" and you'll see there have been changes since > > > 9.0.39 that may have already resolved this issue. Are you able to > > > re-test with Tomcat 9.0.54? > > > > > > > It will not be easy for me to upgrade it and test it. Lots of approval > > is required to get that done. :( > > > > > > [1] > > > https://github.com/apache/tomcat/blob/57a6a40fc9f995e4d449358bbde047aab6d9f39a/java/org/apache/catalina/realm/JNDIRealm.java#L2553 > > > > > > Note that you are looking at the current version of JNDIRealm.java. The > > > version you are running is 17 commits behind that. > > > > > > The line of code calling ReentrantLock.lock in your code would be > > > https://github.com/apache/tomcat/blob/57a6a40fc9f995e4d449358bbde047aab6d9f39a/java/org/apache/catalina/realm/JNDIRealm.java#L2385 > > > which is "return null" indicating that there is a version mismatch > > > between the code you are running and the code you are reading. > > > > > > > Yeah, that's correct. Sorry for the confusion. Our version is running: > > https://github.com/apache/tomcat/blob/95658dfd868216db0773c38aad8eebf544024b09/java/org/apache/catalina/realm/JNDIRealm.java#L2385 > > That get() has not changed since then. That's why I asked about > > handling failure inside get(). > > This should be handled. So I recommend you update. > Nearly always, you'll get a NamingException in authenticate, this works now: > https://github.com/apache/tomcat/blob/main/java/org/apache/catalina/realm/JNDIRealm.java#L1235 > goes: > https://github.com/apache/tomcat/blob/main/java/org/apache/catalina/realm/JNDIRealm.java#L1288 > > After review, I can see Naming exceptions are caught, which should be > enough, but it would be safer to catch Exception. Also getPassword was > missing handling for an exception. I will fix these, but I doubt you > were affected. I'll tighten this up. > > Rémy >
Thanks Rémy! Any suggestions on how I can see the actual error which might have triggered this? Is there any way to increase logging level for JNDIRealm module? > > > > I am also trying to find why it's failing in the first place. We might > > be having some intermittent connection problems which might be > > triggering this. Is there any way to get more info about the failure > > from Tomcat? Please share your thoughts. > > > > > -chris > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > > > For additional commands, e-mail: users-h...@tomcat.apache.org > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > > For additional commands, e-mail: users-h...@tomcat.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org