> -----Original Message----- > From: Christopher Schultz <ch...@christopherschultz.net> > Sent: Wednesday, December 2, 2020 10:21 AM > To: users@tomcat.apache.org > Subject: Re: Weirdest Tomcat Behavior Ever? > > Mark, > > On 11/26/20 05:14, Mark Thomas wrote: > > On 26/11/2020 04:57, Christopher Schultz wrote: > > > > <snip/> > > > >>> After a normal clean-up the parent then calls close on the two file > >>> descriptors associated with the pipe for a second time." > >> > >> So the child cleans them up AND the parent cleans them up? Or the > >> parent cleans when up twice? The child should be able to call close() > >> as many times as it wants and only poison itself. Does the child > >> process ever exit()? > > > > With the caveat that some of the below is educated guess work because > > the strace was configured to look at the events we were interested in > > so I am having to fill in some of the gaps. > > > > The parent "process" is a Java thread currently in native code in a > > 3rd party library. > > > > The parent creates a pipe which comes with two file descriptors. One > > for the read end, one for the write end. > > > > The parent process then forks. The child process now has copies of the > > two file descriptors. (see man fork for more details). > > > > The parent closes its fd for the write end of the pipe. The child > > closes its fd for the read end of the pipe. > > > > The child writes to the pipe and the parent reads from it. > > > > The child exits and closes its fd for the write end of the pipe. > > > > The parent closes its fd for the read end of the pipe. > > > > At this point all is good. All the closes completely cleanly. > > Everything has been shutdown properly. > > +1 > > > The two fds allocated to the parent are back in the pool any may be > > reused by other threads in the JVM. > > > > The parent then attempts to close the fds associated with the pipe > > again. For each fd, if it has not been reallocated an EBADF error > > occurs. If it has been reallocated, it is closed thereby breaking > > whatever was using it legitimately. > > Thanks for clarifying this. I was confused and thinking you were saying that > the child process was the one breaking things, but it's the parent process. > Since the parent is the JVM (the long-running process), all hell breaks loose. > > >> The parent process must be the JVM process, right? And the parent > >> process (this native library, running within the JVM process) > >> double-closes file descriptors, with some measurable delay? > > > > Correct. In the instance where I did most of the log analysis the > > delay was about 0.1 seconds. In other logs I did observe longer delays > > with what looked like a very different failure mode. > > > >> That's the > >> only way this could make sense. And of course it mess mess everything > >> up in really *really* unpredictable ways. > > > > Yep. > > Fascinating. > > Thanks for the wild ride, Eric and Mark :) > > -chris >
I case anyone thought I had forgotten about all this... sorry, no such luck. You’re stuck with me! Things have been quieter for the past several days because, at Mark's suggestion, we changed the nginx proxy and tomcat configurations to make them more tolerant of the underlying condition. Specifically, we configured nginx to use HTTP/1.1 instead of 1.0, enabled keepalives, and set maxKeepAliveRequests="-1" in server.xml. This reduced the frequency of the issue. The vendor was unable to dispute the quality of the analysis, so they accepted that the third-party component (which I can now name: PDFTron) could be the root cause. They disabled use of the component, and that seems to have quieting things down a bit more. We are still seeing occasional session disconnects, so it is possible that the component is leveraged for more than one function in their software and it was only disabled for a subset of them. The big difference now is that, instead of seeing a GET request from the proxy followed by a FIN from the upstream, now it’s a GET followed by an RST. We'll begin the packet captures and straces again on Monday. Mark, besides network and fd tracing, is there anything else you want from strace to make the analysis more certain? -Eric Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments. --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org