> -----Original Message-----
> From: Stefan Mayr <ste...@mayr-stefan.de>
> Sent: Thursday, November 5, 2020 4:24 PM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> Am 03.11.2020 um 16:05 schrieb Eric Robinson:
> >> -----Original Message-----
> >> From: Eric Robinson <eric.robin...@psmnv.com>
> >> Sent: Tuesday, November 3, 2020 8:21 AM
> >> To: Tomcat Users List <users@tomcat.apache.org>
> >> Subject: RE: Weirdest Tomcat Behavior Ever?
> >>
> >>> From: Mark Thomas <ma...@apache.org>
> >>> Sent: Tuesday, November 3, 2020 2:06 AM
> >>> To: Tomcat Users List <users@tomcat.apache.org>
> >>> Subject: Re: Weirdest Tomcat Behavior Ever?
> >>>
> >>> On 02/11/2020 12:16, Eric Robinson wrote:
> >>>
> >>> <snip/>
> >>>
> >>>> Gotcha, thanks for the clarification. Let's see what happens when
> >>>> the users
> >>> start hitting it at 8:00 am Eastern.
> >>>
> >>> Progress. The first attempt to write to the socket triggers the
> >>> following
> >>> exception:
> >>>
> >>> 02-Nov-2020 14:33:54.083 FINE [http-bio-3016-exec-13]
> >>> org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write
> >>> [301361476]
> >>>  java.net.SocketException: Bad file descriptor (Write failed)
> >>>     at java.net.SocketOutputStream.socketWrite0(Native Method)
> >>>     at
> >>>
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> >>>     at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> >>>     at
> >>>
> ...
>
> >>> Because this is an instance of an IOException, Tomcat assumes it has
> >>> been caused by the client dropping the connection and silently
> >>> swallows it. I'll be changing that later today so the exception is
> >>> logged as DEBUG level for all new Tomcat versions.
> >>>
> >>> Possible causes of "java.net.SocketException: Bad file descriptor"
> >>> I've been able to find are:
> >>>
> >>> 1. OS running out of file descriptors.
> >>>
> >>> 2.Trying to use a closed socket.
> >>>
> >>> I want to review the source code to see if there are any others.
> >>>
> >>> I don't think we are seeing 2 as there is no indication of the
> >>> Socket, InputStream or OutputStream being closed in the logs.
> >>>
> >>> That leaves 1. Possible causes here are a file descriptor leak or
> >>> normal operations occasionally needing more than the current limit.
> >>> I don't think it is a leak as I'd expect to see many more errors of
> >>> this type after the first and we aren't seeing that. That leaves the
> >>> possibility of the current limit being a little too low.
> >>>
> >>> My recommendation at this point is to increase the limit for file
> descriptors.
> >>> Meanwhile, I'll look at the JRE source to see if there are any other
> >>> possible triggers for this exception.
> >>>
> >>> Mark
> >>>
> >>>
> >>
> >> On the tomcat server, max open file descriptors is currently 2853957
> >>
> >> [root@001app01a ~]# sysctl fs.file-max fs.file-max = 2853957
> >>
> >> Most of the time, the number of open files appears to run about 600,000.
> >>
> >>  What do you think of watching the open file count and seeing if the
> >> number gets up around the ceiling when the socket write failure
> >> occurs? Something like...
> >>
> >> [root@001app01a ~]#  while [ TRUE ];do FILES=$(lsof|wc -l);echo
> >> "$(date
> >> +%H:%M:%S) $FILES";done
> >> 09:11:15 591671
> >> 09:11:35 627347
> >> 09:11:54 626851
> >> 09:12:11 626429
> >> 09:12:26 545748
> >> 09:12:42 548578
> >> 09:12:58 551487
> >> 09:13:14 516700
> >> 09:13:30 513312
> >> 09:13:45 512830
> >> 09:14:02 533338
> >> 09:14:18 568233
> >> 09:14:35 570158
> >> 09:14:51 566269
> >> 09:15:07 547389
> >> 09:15:23 544203
> >> 09:15:38 546395
> >>
> >> It's not ideal; as it seems to take 15-20 seconds to count them using lsof.
> >>
> >>
> >>
> >
> > Wait, never mind. I realized the per-process limits are what matters. I
> checked, and nofile was set to 4096 for the relevant java process.
> >
> > I did...
> >
> > # prlimit --pid 8730 --nofile=16384:16384
> >
> > That should give java some extra breathing room if the issue is max open
> files, right?
>
> From my experience you should see a different exception if you hit the
> NOFILE limit: java.net.SocketException: Too many open files
>
> But I've only seen that when you open or accept a new connection. Never
> seen this later when something is written to that already open socket.
>
> To me a bad file descriptor sounds more like a closed socket. This reminds me
> of a database or http-client connection pool that handed out connections
> with already closed sockets. I think this could be suspicious because Mark
> wrote this happens on the first write to the socket.
>
> - Stefan
>

It's unclear whether increasing the max file descriptors had a positive effect. 
The error only once yesterday, but that may have been because of low user 
activity.

We may have a new clue, though. Until yesterday, every instance of the error 
was accompanied by a premature disconnection. In other words, the proxy would 
send a GET request, but the tomcat would FIN the connection without sending a 
response (even though tomcat recorded an HTTP 200 in the localhost_access log.)

Yesterday's error was different. In that case, tomcat apparently RESET the 
connection instead of FINing it. The nginx proxy recorded the following in its 
logs...

2020/11/05 08:35:59 [error] 951#951: *22627061 recv() failed (104: Connection 
reset by peer) while reading response header from upstream,

2020/11/05 08:35:59 [warn] 951#951: *22627061 upstream server temporarily 
disabled while reading response header from upstream,

Unfortunately, I did not capture the whole exchange with tcpdump, so I am 
hoping to catch it in the act again today.

--Eric
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to