On Fri, Oct 29, 2021 at 2:02 PM Rémy Maucherat <r...@apache.org> wrote:
>
> On Fri, Oct 29, 2021 at 9:28 AM Suvendu Sekhar Mondal <suv3...@gmail.com> 
> wrote:
> >
> > Hello Chris,
> >
> > On Fri, Oct 29, 2021 at 2:46 AM Christopher Schultz
> > <ch...@christopherschultz.net> wrote:
> > >
> > > Suvendu,
> > >
> > > On 10/28/21 12:55, Suvendu Sekhar Mondal wrote:
> > > > Hello Everyone,
> > > >
> > > > I was investigating one thread pool exhaustion issue. Thread dump
> > > > analysis showed that all HTTP threads were waiting for a ReentrantLock
> > > > object. Object address 0x000000066d727f28 were same for all of the
> > > > waiting threads:
> > > >
> > > > "http-nio-18100-exec-86" #32808 daemon prio=5 os_prio=0
> > > > tid=0x0000000051835800 nid=0x29bc waiting on condition
> > > > [0x000000007a5be000]
> > > >     java.lang.Thread.State: WAITING (parking)
> > > > at sun.misc.Unsafe.park(Native Method)
> > > > - parking to wait for  <0x000000066d727f28> (a
> > > > java.util.concurrent.locks.ReentrantLock$NonfairSync)
> > > > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> > > > at 
> > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> > > > at 
> > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> > > > at 
> > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> > > > at 
> > > > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> > > > at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> > > > at org.apache.catalina.realm.JNDIRealm.get(JNDIRealm.java:2385)
> > > > at org.apache.catalina.realm.JNDIRealm.authenticate(JNDIRealm.java:1274)
> > > >
> > > > There was no hint in the thread dump about which thread was owning the
> > > > lock. Luckily, one heap dump was taken before generating thread dump.
> > > > When I queried the heap dump for that ReentrantLock object, I saw that
> > > > another thread(http-nio-18100-exec-4) was holding the
> > > > lock(exclusiveOwnerThread). There was NO trace of
> > > > http-nio-18100-exec-4 thread in any of the thread dumps! So it was a
> > > > "lock without an owner" case.
> > >
> > > I think you are looking at several pieces of evidence that may or may
> > > not correlate to each other at all. The fact that the thread wasn't in
> > > the thread dump indicates that the thread (or even the whole JVM) had
> > > terminated between the time you took the heap-dump and the thread dump.
> > > Most likely, the monitor was owned by another thread when you took your
> > > thread-dump. Try using other tools which *do* reveal the lock-holders
> > > identity.
> > >
> >
> > This issue has happened a few times. "Busy Thread Count" was high
> > during the problem period. JVM was up and running when I collected
> > heap and thread dumps - pid was not changed in-between. I used jstack,
> > visualvm, jcmd - nothing revealed owing thread details. Only heap
> > dumps had some information on that object and which thread was holding
> > onto it. Here is a snap: https://pasteboard.co/D7dV3jej6zId.jpg
> >
> > I can simulate similar blocking without Tomcat with dummy code. There
> > also nothing reveals the owner's identity except the heap dump. Here
> > is sample: https://gist.github.com/suv3ndu/2ec9fe660d2b833996817ed62186eac2
> >
> > > > After glancing through the Tomcat’s JNDIRealm.get() code and
> > > > beyond[1], I can see lock is being acquired on singleConnectionLock.
> > > > That lock is getting released either in the close() or release()
> > > > method. So, if something bad happens to the thread which is trying to
> > > > establish a connection, then lock will be held without a proper owner
> > > > and a thread blocking situation will be created. Am I interpreting the
> > > > code correctly? Should we not handle any failure inside get()?
> > > >
> > > > Also, I still have not got the reason why the thread got terminated.
> > > > Any suggestions on how I can enable any specific logging?
> > > >
> > > > My setup is:
> > > > Tomcat version: 9.0.39
> > > > Connector: NIO
> > > > JDK: AdoptOpenJDK: 1.8.192
> > > > OS: Windows 2016
> > >
> > > Looks like you need a whole bunch of upgrades. Search the Tomcat 9.x
> > > changelog for "JNDIRealm" and you'll see there have been changes since
> > > 9.0.39 that may have already resolved this issue. Are you able to
> > > re-test with Tomcat 9.0.54?
> > >
> >
> > It will not be easy for me to upgrade it and test it. Lots of approval
> > is required to get that done. :(
> >
> > >  > [1]
> > > https://github.com/apache/tomcat/blob/57a6a40fc9f995e4d449358bbde047aab6d9f39a/java/org/apache/catalina/realm/JNDIRealm.java#L2553
> > >
> > > Note that you are looking at the current version of JNDIRealm.java. The
> > > version you are running is 17 commits behind that.
> > >
> > > The line of code calling ReentrantLock.lock in your code would be
> > > https://github.com/apache/tomcat/blob/57a6a40fc9f995e4d449358bbde047aab6d9f39a/java/org/apache/catalina/realm/JNDIRealm.java#L2385
> > > which is "return null" indicating that there is a version mismatch
> > > between the code you are running and the code you are reading.
> > >
> >
> > Yeah, that's correct. Sorry for the confusion. Our version is running:
> > https://github.com/apache/tomcat/blob/95658dfd868216db0773c38aad8eebf544024b09/java/org/apache/catalina/realm/JNDIRealm.java#L2385
> > That get() has not changed since then. That's why I asked about
> > handling failure inside get().
>
> This should be handled. So I recommend you update.
> Nearly always, you'll get a NamingException in authenticate, this works now:
> https://github.com/apache/tomcat/blob/main/java/org/apache/catalina/realm/JNDIRealm.java#L1235
> goes: 
> https://github.com/apache/tomcat/blob/main/java/org/apache/catalina/realm/JNDIRealm.java#L1288
>
> After review, I can see Naming exceptions are caught, which should be
> enough, but it would be safer to catch Exception. Also getPassword was
> missing handling for an exception. I will fix these, but I doubt you
> were affected. I'll tighten this up.
>
> Rémy
>

Thanks Rémy! Any suggestions on how I can see the actual error which
might have triggered this? Is there any way to increase logging level
for JNDIRealm module?

> >
> > I am also trying to find why it's failing in the first place. We might
> > be having some intermittent connection problems which might be
> > triggering this. Is there any way to get more info about the failure
> > from Tomcat? Please share your thoughts.
> >
> > > -chris
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > > For additional commands, e-mail: users-h...@tomcat.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > For additional commands, e-mail: users-h...@tomcat.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to