-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Mladen,
On 11/26/19 11:13, Mladen Adamović wrote: > I dig into Tomcat source code and from what I've seen this is what > happens: > > Tomcat "worker thread", tries the polls the work and if it doesn't > succeed it invokes sun.misc.Unsafe.park (that's waiting which can > be interrupted by another thread), internally in Linux goes through > kernel function FUTEX_WAIT (futex definition is " The futex() > system call provides a method for waiting until a certain condition > becomes true." This is usually the underlying JVM implementation doing all of this. I don't believe Tomcat calls Unsafe.park() directly. Theoretically, park() is a blocking call, so even if the thread reports that it is RUNNABLE it is really (effectively) BLOCKED, although it might be a slightly busy-wait[1]. > However, these futex_wait functions sums to top CPU load, although > I don't find it's actually load here, as per infamous Linux change > of 29 Oct 1993 by Matthias Urlichs: > http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html Well, > CPU load average doesn't tell the whole story. The real question is how much work is being done per unit time. If you are observing a performance *problem* then maybe we can look into that. But a high CPU load average itself doesn't represent anything worrying IMHO. If you have sustained high CPU load average, you probably need to be looking at scaling-out, anyway, but you might not have a performance problem ... yet. I have a medium-busy service running 2-3 distinct web applications on two application servers (~1k peak requests per minute in aggregate; that includes static resources being served by web servers without hitting Tomcat). The load-average on any given application server is usually well under the number of CPUs. We are running a 4.9.0 kernel and I'm not sure if the load-average is being scaled by the number of CPUs or not, but my 1/5/15 load averages are all under 1.0 right now, and the 13:00 hour (/now/ in my timezone) is when we start approaching our peak daily load. We have a very simple connector configuration which isn't much different than the default: <Connector port="8015" redirectPort="443" protocol="org.apache.coyote.ajp.AjpNioProtocol" URIEncoding="UTF-8" packetSize="65536" executor="tomcatThreadPool" /> <Connector port="8017" protocol="org.apache.coyote.http11.Http11NioProtocol" address="127.0.0.1" secure="true" URIEncoding="UTF-8" executor="tomcatThreadPool" /> We are using NIO (and not e.g. APR) for both connectors. The one with the most traffic is the AJP connector which maintains a bunch of (nominally) persistent connections from the small number of web servers. I have many threads that are blocked on I/O, such as the client poller for the HTTP connector: "http-nio-127.0.0.1-8017-ClientPoller-0" #49 daemon prio=5 os_prio=0 tid=0x00007fc3901c4000 nid=0x74f runnable [0x00007fc359325000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000000d915c318> (a sun.nio.ch.Util$3) - locked <0x00000000d915c308> (a java.util.Collections$UnmodifiableSet) - locked <0x00000000d915c0c0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:825) at java.lang.Thread.run(Thread.java:748) Note that the thread state says "runnable" (and "RUNNABLE") but it's essentially blocked. The request-processing threads waiting for work are in a similar state: "catalina-exec-25" #44 daemon prio=5 os_prio=0 tid=0x00007fc3901c2000 nid=0x74a waiting on condition [0x00007fc35982a000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000d915d3a0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.aw ait(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:4 42) at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:103) at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:31) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java: 1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav a:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja va:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThrea d.java:61) at java.lang.Thread.run(Thread.java:748) ... although this one says "parking"/"WAITING" which is a more appropriate description of what's happening. The above thread states can both be thought of as "asleep". So perhaps the "problem" isn't so much that you are seeing lots of threads in these states, than that you have a lot of requests actually being processed, and they are being processed *quickly*. That's a good thing. :) Similar to the way that high memory-usage isn't a bad thing (memory is there /precisely/ to be used!), a high CPU usage shouldn't necessarily be considered a big problem. You never actually said what your definition of "high CPU usage" was. Are you looking at load-average? %-CPU usage from "top"? Do you have the option of comparing your CPU usage when you use APR versus NIO? I would expect your CPU usage to o *up* by switching from APR to NIO due to two factors: 1. NIO is slightly less CPU-efficient than APR when it comes to polling and data transfer (per performance testing by myself and jfclere; see Tomcat's "presentations" page for data and conclusions) 2. NIO uses JSSE which may or may not be using OpenSSL. Ideally, you'll configure it to use OpenSSL but the pure-Java cryptographic provider has terrible performance relative to OpenSSL. Do you have any scope to separate-out TLS termination from the application server? That is, use a reverse-proxy for TLS termination and proxy to another server that only handles the application logic? You may find that TLS is just "expensive" in terms of CPU which, well, is simply the truth. (Note that my servers, which not configured to terminate TLS, are running stunnel as a separate process to proxy those AJP connections through an encrypted tunnel, so while the configuration does not show it, my server is still performing all that necessary cryptographic work on the same set of CPUs as my Tomcat instances are using.) Hope that helps, - -chris [1] https://en.wikipedia.org/wiki/Busy_waiting > On Tue, Nov 26, 2019 at 4:50 PM Christopher Schultz < > ch...@christopherschultz.net> wrote: > > Mladen, > > On 11/25/19 14:36, Mladen Adamović wrote: >>>> On Mon, Nov 25, 2019 at 5:57 PM Christopher Schultz < >>>> ch...@christopherschultz.net> wrote: >>>> >>>>>> We certainly want to be able to serve 10000 hits per >>>>>> second (!), while some connections might be stalled. >>>>> >>>>> What might stall a connection? The network, or the >>>>> application (or database, etc.)? >>>>> >>>> >>>> Underlying (synchronized) monitors could stall every thread, >>>> the network, whatever. >>>> >>>> The network itself demands a large number of connection, >>>> i.e. current situation at the server (displaying only remove >>>> connections): >>>> >>>> root@condor1796 ~ # netstat -tnp | grep -v "127.0.0" | wc -l >>>> 1220 > > Note this is every connection, bound port, and cleanup connection > the kernel knows about ; not just established/active connections to > your application specifically. > >>>> If we now have 1220, we definitely need at least 10000 >>>> active connections for Tomcat and I don't see that setting >>>> this to 50000 is a bad idea. > > Okay. I think you need a reverse proxy and more servers if you > think 50000 is going to be your peak load. > >>>>> For real DDOS protection, you need a provider who can >>>>> handle lots of traffic and respond quickly by black-holing >>>>> that kind of traffic as >>>> >>>> Depending on how large server farm they use (hypothetically). >>>> We want to be able to survive some DDoS attacks. If we limit >>>> the number of concurrent connections by IP address and the >>>> number of connections per second, that's some DoS >>>> protection. > > But honestly, this is better done at another layer of the network; > not at the host-level. > >>>> Regarding network delays, out of currently 1220 active >>>> remove connections, most of them are in TIME_WAIT state. >>>> Lowering TIME_WAIT settings in Linux are not recommended. > > Hmm. Lots of TIME_WAIT connections isn't good. I actually don't > know if they count "against" your 50000 limit in the Java process. > > -chris >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >> For additional commands, e-mail: users-h...@tomcat.apache.org >> >> > -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl3ddD8ACgkQHPApP6U8 pFjTJBAAuWpntlG9XFNS90GpJc3SWWeOPFQmJN8FTrQNPhJ49sPlk6aOgTbujGwS eHxQI2NIpgeP0MDqEKWEotOfxn+wwxUMJKOKnuxyxbDU8CGytJ4UwBJ5CddnsA5T QaIbANdoGI2+K+9v5jjlbv97DK2Vz/dh92v7QaKdJjND/ql61i7g/ZfBnJJmSZSE ScwVlexuYdG+izy2Vr1PX2lSltMeI+7Dth5JkyhHFVbw1wGF9qZsQ4rsszRKO0ZB jPrCK2VmHNUcYQNG1q0Gi9bzAUI67fHoaJjmRIU3A8PtoFMehIomKn8HkgBrc9aQ kmtb7BPxD63VcTK2rVGuMfa5y70AWB2hPcvUtKAO7CBC7LyC9/ux2jZqNTxMVUH6 wkxIkeQklLYpSDeI0E2xwxiH4OPakP2kZABp2zXH5JyfRQljlnbchWg/gT3DrCck lDt0ZmZPEfz792Pw8K/vJ4ZZre2BuQXRZhL3XvQUyWMHkHO3XTuWsJ2beaXzbo8E qFrrU0iXdErC6TT00V75t3MUQWto3Zrvb7Y/n8k3rh4X3pfblUQw6z1mojZui6Ik XZ4qrWkR9unxYHlMuaYOg3e9Ug67UHgUVM+Vvj3tlI81nJrDw8ikbVDHJ2R6R5Ft SqZoM7i+Y0i02jH0hpNAukFlw3Vdig1YmoPciAvCJcbZkJDU96E= =bUtP -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org