-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

David,

On 8/27/20 18:14, David wrote:
>> I used the http to 8080 in order to read the Tomcat webmanager
>> stats.   I originally had issues with the JVM being too small,
>> running out of memory, CPU spiking, threads maxing out, and
>> whole system instability.  Getting more machine memory and upping
>> the JVM allocation has remedied all of that except for apparently
>> the thread issue.
What is the memory size of the server and of the JVM?

>> I'm unsure that they were aging at that time as I couldn't get
>> into anything, but with no room for GC to take place it would
>> make sense that the threads would not be released.

That's not usually an issue, unless the application is being a
significant amount of memory during a request and then releasing it
after the request has completed.

>> My intention was to restart Tomcat nightly to lessen the chance
>> of an occurrence until I could find a way to restart Tomcat based
>> on the thread count and script a thread dump at the same time,
>> (likely through Solarwinds).  Now that you've explained that the
>> NIO threads are a part of the system threads, I may be able to
>> script something like that directly on the system, with a
>> chrontab to check count, if
> 295 contains NIO dump thread to / systemctl stop-start tomcat.

I wouldn't do that. Just because the threads exist does not mean they
are stuck. They may be doing useful work or otherwise running just
fine. I would look for other ways to detect problems.

>> That's very warming as it seems a viable way to get the data I
>> need without posing much impact to users.   Your explanation of
>> threads leads me to believe that the nightly restart may be
>> rather moot as it could likely be exhaustion on the downstream
>> causing the backup on the front end.  I didn't see these
>> connected in this way and assumed they were asynchronous and
>> independent processes.  There are timeouts configured for all the
>> DB2 backend connections, and I was in the mindset of the least
>> timeout would kill all connections upstream/downstream by
>> presenting the application a forcibly closed by remote host or a
>> timeout.

If you can suffer through a few more incidents, you can probably get a
LOT more information about the root problem and maybe even get it
solved, instead of just trying to stop the bleeding.

>> I greatly appreciate the assistance, In looking through various
>> articles none of this was really discussed because either
>> everyone knows it, or maybe it was discussed on a level where I
>> couldn't understand it, there certainly doesn't seem to be any
>> other instances of connections being open for 18-45minutes or if
>> there is it's not an issue for them.

If you have a load-balancer (which you do), then I'd expect HTTP
keep-alived to keep those connections open literally all day long,
only maybe expiring when you have configured them to expire "just in
case" or maybe after some amount of inactivity. For a lb-environment,
I'd want those keep-alive timeouts to be fairly high so you don't
waste any time re-constructing sockets between the lb and the app server
.

When an lb is NOT in the mix, you generally want /low/ keep-alive
timeouts because you can't rely on clients sticking around for very
long and you want to get them off your doorstep ASAP.

>> During a normal glance at the manager page, there are no
>> connections and maybe like 5 empty lines in a "Ready" stage,
>> even if I spam the server's logon landing page I can never see a
>> persistent connection, so it baffled me as to how connections
>> could hang and build up, so I'm thinking something was perhaps
>> messed up with the backend.

If by "backend" you mean like databasse, etc. then that is probably
the issue. The login page is (realtively) static, so it's very
difficult to put Tomcat under such load that it's hosing just giving
you that same page over and over again.

I don't know what your "spamming" strategy is, but you might want to
use a real load-generating tool like ApacheBench (ab) or, even better,
JMeter which can actually swarm among several machines to basically
DDoS your internal servers, which can be useful sometimes for
stress-testing. But your tests really do have to comprise a realistic
scenario, not just hammering on the login page all day.

>> The webapp names /URL's for the oldest connections didn't
>> coincide between the two outages, so I kind of brushed it off as
>> being application specific, however it may still be.
>
>> I need it to occur again and get some dumps!

Unfortunately, yes.

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9JYLIACgkQHPApP6U8
pFgE6w//YnYR85ETrZJ6jvV0+jGM0qHeZeIz7tP49MZfp5fczPoYb93vrxeQ8W2T
TaoiJCYpyN37w9IAZo4cxuIGaaF/j10OY2sLAqB+Ogu6FRYXmWLvzqkO+fpX6Kw+
/KKjl3cru0XKQqYpYXpfAl99G0EAFOAeT8r43guBjeF5vyqfZyaQJC/YyfEfFL9R
eREnXOAcyWHwiZLZ99964TdOsfTrMeNFuz7DG2AVlbNRgbR6uRn/+RaavH4iXcUJ
VekAIT4nrU8dhzOj8gjzoVBTmyfROWeJgcCv3TLv2w/kNkViO4zuO6ndNvJCkZzD
3x3C8MhVHNPGABVKUQQ4pWcNRb4wG005Ny9t3F/xfsKgYE5/LinSFICKEmDo5vGy
jA8/0rtnP2yZBxYYjgFV+NvR2kWzWUFed0Id4zsG802HfqdNMgv3C56gli4IeEc5
CXFecAPKpYPMFqFNA6iB4GF6wYEvy9PbiJf9H/lRcsW91QnblMyEcB1SNXvkz4hw
n6SjhJdJ/cnLKHUT6H03mVMwtlR14Oedc3qS7ZED0WFrNYOFAjo49FR2PCcaKA3Q
smmEwIcKonGp+fRo53M14oGNn1OFqYjaezcEZ3/oGvXQWXjLy3SGpfOYVe1CS08Z
uFFfctFoGqrBx2FDu3IBvAmX5Xc6KvjsnHLvcyqLh1CnBlivz/8=
=3hGV
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to