Hello Chris, On Fri, Oct 29, 2021 at 2:46 AM Christopher Schultz <ch...@christopherschultz.net> wrote: > > Suvendu, > > On 10/28/21 12:55, Suvendu Sekhar Mondal wrote: > > Hello Everyone, > > > > I was investigating one thread pool exhaustion issue. Thread dump > > analysis showed that all HTTP threads were waiting for a ReentrantLock > > object. Object address 0x000000066d727f28 were same for all of the > > waiting threads: > > > > "http-nio-18100-exec-86" #32808 daemon prio=5 os_prio=0 > > tid=0x0000000051835800 nid=0x29bc waiting on condition > > [0x000000007a5be000] > > java.lang.Thread.State: WAITING (parking) > > at sun.misc.Unsafe.park(Native Method) > > - parking to wait for <0x000000066d727f28> (a > > java.util.concurrent.locks.ReentrantLock$NonfairSync) > > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > > at > > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > > at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > > at org.apache.catalina.realm.JNDIRealm.get(JNDIRealm.java:2385) > > at org.apache.catalina.realm.JNDIRealm.authenticate(JNDIRealm.java:1274) > > > > There was no hint in the thread dump about which thread was owning the > > lock. Luckily, one heap dump was taken before generating thread dump. > > When I queried the heap dump for that ReentrantLock object, I saw that > > another thread(http-nio-18100-exec-4) was holding the > > lock(exclusiveOwnerThread). There was NO trace of > > http-nio-18100-exec-4 thread in any of the thread dumps! So it was a > > "lock without an owner" case. > > I think you are looking at several pieces of evidence that may or may > not correlate to each other at all. The fact that the thread wasn't in > the thread dump indicates that the thread (or even the whole JVM) had > terminated between the time you took the heap-dump and the thread dump. > Most likely, the monitor was owned by another thread when you took your > thread-dump. Try using other tools which *do* reveal the lock-holders > identity. >
This issue has happened a few times. "Busy Thread Count" was high during the problem period. JVM was up and running when I collected heap and thread dumps - pid was not changed in-between. I used jstack, visualvm, jcmd - nothing revealed owing thread details. Only heap dumps had some information on that object and which thread was holding onto it. Here is a snap: https://pasteboard.co/D7dV3jej6zId.jpg I can simulate similar blocking without Tomcat with dummy code. There also nothing reveals the owner's identity except the heap dump. Here is sample: https://gist.github.com/suv3ndu/2ec9fe660d2b833996817ed62186eac2 > > After glancing through the Tomcat’s JNDIRealm.get() code and > > beyond[1], I can see lock is being acquired on singleConnectionLock. > > That lock is getting released either in the close() or release() > > method. So, if something bad happens to the thread which is trying to > > establish a connection, then lock will be held without a proper owner > > and a thread blocking situation will be created. Am I interpreting the > > code correctly? Should we not handle any failure inside get()? > > > > Also, I still have not got the reason why the thread got terminated. > > Any suggestions on how I can enable any specific logging? > > > > My setup is: > > Tomcat version: 9.0.39 > > Connector: NIO > > JDK: AdoptOpenJDK: 1.8.192 > > OS: Windows 2016 > > Looks like you need a whole bunch of upgrades. Search the Tomcat 9.x > changelog for "JNDIRealm" and you'll see there have been changes since > 9.0.39 that may have already resolved this issue. Are you able to > re-test with Tomcat 9.0.54? > It will not be easy for me to upgrade it and test it. Lots of approval is required to get that done. :( > > [1] > https://github.com/apache/tomcat/blob/57a6a40fc9f995e4d449358bbde047aab6d9f39a/java/org/apache/catalina/realm/JNDIRealm.java#L2553 > > Note that you are looking at the current version of JNDIRealm.java. The > version you are running is 17 commits behind that. > > The line of code calling ReentrantLock.lock in your code would be > https://github.com/apache/tomcat/blob/57a6a40fc9f995e4d449358bbde047aab6d9f39a/java/org/apache/catalina/realm/JNDIRealm.java#L2385 > which is "return null" indicating that there is a version mismatch > between the code you are running and the code you are reading. > Yeah, that's correct. Sorry for the confusion. Our version is running: https://github.com/apache/tomcat/blob/95658dfd868216db0773c38aad8eebf544024b09/java/org/apache/catalina/realm/JNDIRealm.java#L2385 That get() has not changed since then. That's why I asked about handling failure inside get(). I am also trying to find why it's failing in the first place. We might be having some intermittent connection problems which might be triggering this. Is there any way to get more info about the failure from Tomcat? Please share your thoughts. > -chris > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org