Mark, The difference between after_start and after_load is the below sockets which is just a sample from the repeated list, the ports are random. How to know what these connections are related to ?
java 5021 tomcat8 3162u IPv6 98361 0t0 TCP localhost:http-alt->localhost:51746 (ESTABLISHED) java 5021 tomcat8 3163u IPv6 98362 0t0 TCP localhost:http-alt->localhost:51748 (ESTABLISHED) java 5021 tomcat8 3164u IPv6 98363 0t0 TCP localhost:http-alt->localhost:51750 (ESTABLISHED) java 5021 tomcat8 3165u IPv6 98364 0t0 TCP localhost:http-alt->localhost:51752 (ESTABLISHED) java 5021 tomcat8 3166u IPv6 25334 0t0 TCP localhost:http-alt->localhost:51754 (ESTABLISHED) java 5021 tomcat8 3167u IPv6 25335 0t0 TCP localhost:http-alt->localhost:51756 (ESTABLISHED) java 5021 tomcat8 3168u IPv6 25336 0t0 TCP localhost:http-alt->localhost:51758 (ESTABLISHED) java 5021 tomcat8 3169u IPv6 25337 0t0 TCP localhost:http-alt->localhost:51760 (ESTABLISHED) java 5021 tomcat8 3170u IPv6 25338 0t0 TCP localhost:http-alt->localhost:51762 (ESTABLISHED) java 5021 tomcat8 3171u IPv6 25339 0t0 TCP localhost:http-alt->localhost:51764 (ESTABLISHED) java 5021 tomcat8 3172u IPv6 25340 0t0 TCP localhost:http-alt->localhost:51766 (ESTABLISHED) java 5021 tomcat8 3173u IPv6 25341 0t0 TCP localhost:http-alt->localhost:51768 (ESTABLISHED) java 5021 tomcat8 3174u IPv6 25342 0t0 TCP localhost:http-alt->localhost:51770 (ESTABLISHED) java 5021 tomcat8 3175u IPv6 25343 0t0 TCP localhost:http-alt->localhost:51772 (ESTABLISHED) java 5021 tomcat8 3176u IPv6 25344 0t0 TCP localhost:http-alt->localhost:51774 (ESTABLISHED) java 5021 tomcat8 3177u IPv6 25345 0t0 TCP localhost:http-alt->localhost:51776 (ESTABLISHED) java 5021 tomcat8 3178u IPv6 25346 0t0 TCP localhost:http-alt->localhost:51778 (ESTABLISHED) java 5021 tomcat8 3179u IPv6 25347 0t0 TCP localhost:http-alt->localhost:51780 (ESTABLISHED) java 5021 tomcat8 3180u IPv6 25348 0t0 TCP localhost:http-alt->localhost:51782 (ESTABLISHED) java 5021 tomcat8 3181u IPv6 25349 0t0 TCP localhost:http-alt->localhost:51784 (ESTABLISHED) java 5021 tomcat8 3182u IPv6 25350 0t0 TCP localhost:http-alt->localhost:51786 (ESTABLISHED) java 5021 tomcat8 3183u IPv6 25351 0t0 TCP localhost:http-alt->localhost:51788 (ESTABLISHED) On Thu, Nov 12, 2020 at 4:05 PM Martin Grigorov <mgrigo...@apache.org> wrote: > On Thu, Nov 12, 2020 at 2:40 PM Ayub Khan <ayub...@gmail.com> wrote: > > > Martin, > > > > Could you provide me a command which you want me to run and provide you > the > > results which might help you to debug this issue ? > > > > 1) start your app and click around to load the usual FDs > 2) lsof -p `cat /var/run/tomcat8.pid` > after_start.txt > 3) load your app > 4) lsof -p `cat /var/run/tomcat8.pid` > after_load.txt > > you can analyze the differences in the files yourself before sending them > to us :-) > > > > > > > > On Thu, Nov 12, 2020 at 1:36 PM Martin Grigorov <mgrigo...@apache.org> > > wrote: > > > > > On Thu, Nov 12, 2020 at 10:37 AM Ayub Khan <ayub...@gmail.com> wrote: > > > > > > > Martin, > > > > > > > > These are file descriptors, some are related to the jar files which > are > > > > included in the web application and some are related to the sockets > > from > > > > nginx to tomcat and some are related to database connections. I use > the > > > > below command to count the open file descriptors > > > > > > > > > > which type of connections increase ? > > > the sockets ? the DB ones ? > > > > > > > > > > > > > > watch "sudo ls /proc/`cat /var/run/tomcat8.pid`/fd/ | wc -l" > > > > > > > > > > you can also use lsof command > > > > > > > > > > > > > > > > > > > > > > On Thu, Nov 12, 2020 at 10:56 AM Martin Grigorov < > mgrigo...@apache.org > > > > > > > wrote: > > > > > > > > > On Wed, Nov 11, 2020 at 11:17 PM Ayub Khan <ayub...@gmail.com> > > wrote: > > > > > > > > > > > Chris, > > > > > > > > > > > > I was load testing using the ec2 load balancer dns. I have > > increased > > > > the > > > > > > connector timeout to 6000 and also gave 32gig to the JVM of > > tomcat. I > > > > am > > > > > > not seeing connection timeout in nginx logs now. No errors in > > > > kernel.log > > > > > I > > > > > > am not seeing any errors in tomcat catalina.out. > > > > > > During regular operations when the request count is between 4 to > 6k > > > > > > requests per minute the open files count for the tomcat process > is > > > > > between > > > > > > 200 to 350. Responses from tomcat are within 5 seconds. > > > > > > If the requests count goes beyond 6.5 k open files slowly move up > > to > > > > > 2300 > > > > > > to 3000 and the request responses from tomcat become slow. > > > > > > > > > > > > I am not concerned about high open files as I do not see any > errors > > > > > related > > > > > > to open files. Only side effect of open files going above 700 is > > the > > > > > > response from tomcat is slow. I checked if this is caused from > > > elastic > > > > > > search, aws cloud watch shows elastic search response is within 5 > > > > > > milliseconds. > > > > > > > > > > > > what might be the reason that when the open files goes beyond > 600, > > it > > > > > slows > > > > > > down the response time for tomcat. I tried with tomcat 9 and it's > > the > > > > > same > > > > > > behavior > > > > > > > > > > > > > > > > Do you know what kind of files are being opened ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Nov 3, 2020 at 9:40 PM Christopher Schultz < > > > > > > ch...@christopherschultz.net> wrote: > > > > > > > > > > > > > Ayub, > > > > > > > > > > > > > > On 11/3/20 10:56, Ayub Khan wrote: > > > > > > > > *I'm curious about why you are using all of cloudflare and > ALB > > > and > > > > > > > > nginx.Seems like any one of those could provide what you are > > > > getting > > > > > > from > > > > > > > > all3 of them. * > > > > > > > > > > > > > > > > Cloudflare is doing just the DNS and nginx is doing ssl > > > termination > > > > > > > > > > > > > > What do you mean "Cloudflare is doing just the DNS?" > > > > > > > > > > > > > > So what is ALB doing, then? > > > > > > > > > > > > > > > *What is the maximum number of simultaneous requests that one > > > > > > > nginxinstance > > > > > > > > will accept? What is the maximum number of simultaneous > > > > > proxiedrequests > > > > > > > one > > > > > > > > nginx instance will make to a back-end Tomcat node? Howmany > > nginx > > > > > nodes > > > > > > > do > > > > > > > > you have? How many Tomcat nodes? * > > > > > > > > > > > > > > > > We have 4 vms each having nginx and tomcat running on them > and > > > each > > > > > > > tomcat > > > > > > > > has nginx in front of them to proxy the requests. So it's one > > > Nginx > > > > > > > > proxying to a dedicated tomcat on the same VM. > > > > > > > > > > > > > > Okay. > > > > > > > > > > > > > > > below is the tomcat connector configuration > > > > > > > > > > > > > > > > <Connector port="8080" > > > > > > > > connectionTimeout="60000" maxThreads="2000" > > > > > > > > > > > > protocol="org.apache.coyote.http11.Http11NioProtocol" > > > > > > > > URIEncoding="UTF-8" > > > > > > > > redirectPort="8443" /> > > > > > > > > > > > > > > 60 seconds is a *long* time for a connection timeout. > > > > > > > > > > > > > > Do you actually need 2000 threads? That's a lot, though not > > insane. > > > > > 2000 > > > > > > > threads means you expect to handle 2000 concurrent (non-async, > > > > > > > non-Wewbsocket) requests. Do you need that (per node)? Are you > > > > > expecting > > > > > > > 8000 concurrent requests? Does your load-balancer understand > the > > > > > > > topography and current-load on any given node? > > > > > > > > > > > > > > > When I am doing a load test of 2000 concurrent users I see > the > > > open > > > > > > files > > > > > > > > increase to 10,320 and when I take thread dump I see the > > threads > > > > are > > > > > > in a > > > > > > > > waiting state.Slowly as the requests are completed I see the > > open > > > > > files > > > > > > > > come down to normal levels. > > > > > > > > > > > > > > Are you performing your load-test against the > CF/ALB/nginx/Tomcat > > > > > stack, > > > > > > > or just hitting Tomcat (or nginx) directly? > > > > > > > > > > > > > > Are you using HTTP keepalive in your load-test (from the client > > to > > > > > > > whichever server is being contacted)? > > > > > > > > > > > > > > > The output of the below command is > > > > > > > > sudo cat /proc/sys/kernel/pid_max > > > > > > > > 131072 > > > > > > > > > > > > > > > > I am testing this on a c4.8xlarge VM in AWS. > > > > > > > > > > > > > > > > below is the config I changed in nginx.conf file > > > > > > > > > > > > > > > > events { > > > > > > > > worker_connections 50000; > > > > > > > > # multi_accept on; > > > > > > > > } > > > > > > > > > > > > > > This will allow 50k incoming connections, and Tomcat will > accept > > an > > > > > > > unbounded number of connections (for NIO connector). So > limiting > > > your > > > > > > > threads to 2000 only means that the work of each request will > be > > > done > > > > > in > > > > > > > groups of 2000. > > > > > > > > > > > > > > > worker_rlimit_nofile 30000; > > > > > > > > > > > > > > I'm not sure how many connections are handled by a single nginx > > > > worker. > > > > > > > If you accept 50k connections and only allow 30k file handles, > > you > > > > may > > > > > > > have a problem if that's all being done by a single worker. > > > > > > > > > > > > > > > What would be the ideal config for tomcat and Nginx so this > > setup > > > > on > > > > > > > > c4.8xlarge vm could serve at least 5k or 10k requests > > > > simultaneously > > > > > > > > without causing the open files to spike to 10K. > > > > > > > > > > > > > > You will never be able to serve 10k simultaneous requests > without > > > > > having > > > > > > > 10k open files on the server. If you mean 10k requests across > the > > > > whole > > > > > > > 4-node environment, then I'd expect 10k requests to open > > (roughly) > > > > 2500 > > > > > > > open files on each server. And of course, you need all kinds of > > > other > > > > > > > files open as well, from JAR files to DB connections or other > > > network > > > > > > > connections. > > > > > > > > > > > > > > But each connection needs a file descriptor, full stop. If you > > need > > > > to > > > > > > > handle 10k connections, then you will need to make it possible > to > > > > open > > > > > > > 10k file handles /just for incoming network connections/ for > that > > > > > > > process. There is no way around it. > > > > > > > > > > > > > > Are you trying to hit a performance target or are you actively > > > > getting > > > > > > > errors with a particular configuration? Your subject says > > > "Connection > > > > > > > Timed Out". Is it nginx that is reporting the connection > timeout? > > > > Have > > > > > > > you checked on the Tomcat side what is happening with those > > > requests? > > > > > > > > > > > > > > -chris > > > > > > > > > > > > > > > On Thu, Oct 29, 2020 at 10:29 PM Christopher Schultz < > > > > > > > > ch...@christopherschultz.net> wrote: > > > > > > > > > > > > > > > >> Ayub, > > > > > > > >> > > > > > > > >> On 10/28/20 23:28, Ayub Khan wrote: > > > > > > > >>> During high load of 16k requests per minute, we notice > below > > > > error > > > > > in > > > > > > > >> log. > > > > > > > >>> > > > > > > > >>> [error] 2437#2437: *13335389 upstream timed out (110: > > > > Connection > > > > > > > timed > > > > > > > >>> out) while reading response header from upstream, server: > > > > > jahez.net > > > > > > , > > > > > > > >>> request: "GET /serviceContext/ServiceName?callback= > > HTTP/1.1", > > > > > > > upstream: > > > > > > > >> " > > > > > > > >>> http://127.0.0.1:8080/serviceContext/ServiceName > > > > > > > >>> > > > > > > > >>> Below is the flow of requests: > > > > > > > >>> > > > > > > > >>> cloudflare-->AWS ALB--> NGINX--> Tomcat-->Elastic-search > > > > > > > >> > > > > > > > >> I'm curious about why you are using all of cloudflare and > ALB > > > and > > > > > > nginx. > > > > > > > >> Seems like any one of those could provide what you are > getting > > > > from > > > > > > all > > > > > > > >> 3 of them. > > > > > > > >> > > > > > > > >>> In NGINX we have the below config > > > > > > > >>> > > > > > > > >>> location /serviceContext/ServiceName{ > > > > > > > >>> > > > > > > > >>> proxy_pass > > > > > > > >> http://localhost:8080/serviceContext/ServiceName; > > > > > > > >>> proxy_http_version 1.1; > > > > > > > >>> proxy_set_header Connection > > > > $connection_upgrade; > > > > > > > >>> proxy_set_header Upgrade > $http_upgrade; > > > > > > > >>> proxy_set_header Host $host; > > > > > > > >>> proxy_set_header X-Real-IP > > $remote_addr; > > > > > > > >>> proxy_set_header X-Forwarded-For > > > > > > > $proxy_add_x_forwarded_for; > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> proxy_buffers 16 16k; > > > > > > > >>> proxy_buffer_size 32k; > > > > > > > >>> } > > > > > > > >> > > > > > > > >> What is the maximum number of simultaneous requests that one > > > nginx > > > > > > > >> instance will accept? What is the maximum number of > > simultaneous > > > > > > proxied > > > > > > > >> requests one nginx instance will make to a back-end Tomcat > > node? > > > > How > > > > > > > >> many nginx nodes do you have? How many Tomcat nodes? > > > > > > > >> > > > > > > > >>> below is tomcat connector config > > > > > > > >>> > > > > > > > >>> <Connector port="8080" > > > > > > > >>> > > > > > > protocol="org.apache.coyote.http11.Http11NioProtocol" > > > > > > > >>> connectionTimeout="200" maxThreads="50000" > > > > > > > >>> URIEncoding="UTF-8" > > > > > > > >>> redirectPort="8443" /> > > > > > > > >> > > > > > > > >> 50,000 threads is a LOT of threads. > > > > > > > >> > > > > > > > >>> We monitor the open file using *watch "sudo ls /proc/`cat > > > > > > > >>> /var/run/tomcat8.pid`/fd/ | wc -l" *the number of tomcat > open > > > > files > > > > > > > keeps > > > > > > > >>> increasing slowing the responses. the only option to > recover > > > from > > > > > > this > > > > > > > is > > > > > > > >>> to restart tomcat. > > > > > > > >> > > > > > > > >> So this looks like Linux (/proc filesystem). Linux kernels > > have > > > a > > > > > > 16-bit > > > > > > > >> pid space which means a theoretical max pid of 65535. In > > > practice, > > > > > the > > > > > > > >> max pid is actually to be found here: > > > > > > > >> > > > > > > > >> $ cat /proc/sys/kernel/pid_max > > > > > > > >> 32768 > > > > > > > >> > > > > > > > >> (on my Debian Linux system, 4.9.0-era kernel) > > > > > > > >> > > > > > > > >> Each thread takes a pid. 50k threads means more than the > > maximum > > > > > > allowed > > > > > > > >> on the OS. So you will eventually hit some kind of serious > > > problem > > > > > > with > > > > > > > >> that many threads. > > > > > > > >> > > > > > > > >> How many fds do you get in the process before Tomcat grinds > > to a > > > > > halt? > > > > > > > >> What does the CPU usage look like? The process I/O? Disk > > usage? > > > > What > > > > > > > >> does a thread dump look like (if you have the disk space to > > dump > > > > > it!)? > > > > > > > >> > > > > > > > >> Why do you need that many threads? > > > > > > > >> > > > > > > > >> -chris > > > > > > > >> > > > > > > > >> > > > > > > --------------------------------------------------------------------- > > > > > > > >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > > > > > > > >> For additional commands, e-mail: > users-h...@tomcat.apache.org > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > > > > > > > For additional commands, e-mail: users-h...@tomcat.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > -------------------------------------------------------------------- > > > > > > Sun Certified Enterprise Architect 1.5 > > > > > > Sun Certified Java Programmer 1.4 > > > > > > Microsoft Certified Systems Engineer 2000 > > > > > > http://in.linkedin.com/pub/ayub-khan/a/811/b81 > > > > > > mobile:+966-502674604 > > > > > > > > > ---------------------------------------------------------------------- > > > > > > It is proved that Hard Work and kowledge will get you close but > > > > attitude > > > > > > will get you there. However, it's the Love > > > > > > of God that will put you over the top!! > > > > > > > > > > > > > > > > > > > > > > > -- > > > > -------------------------------------------------------------------- > > > > Sun Certified Enterprise Architect 1.5 > > > > Sun Certified Java Programmer 1.4 > > > > Microsoft Certified Systems Engineer 2000 > > > > http://in.linkedin.com/pub/ayub-khan/a/811/b81 > > > > mobile:+966-502674604 > > > > > ---------------------------------------------------------------------- > > > > It is proved that Hard Work and kowledge will get you close but > > attitude > > > > will get you there. However, it's the Love > > > > of God that will put you over the top!! > > > > > > > > > > > > > -- > > -------------------------------------------------------------------- > > Sun Certified Enterprise Architect 1.5 > > Sun Certified Java Programmer 1.4 > > Microsoft Certified Systems Engineer 2000 > > http://in.linkedin.com/pub/ayub-khan/a/811/b81 > > mobile:+966-502674604 > > ---------------------------------------------------------------------- > > It is proved that Hard Work and kowledge will get you close but attitude > > will get you there. However, it's the Love > > of God that will put you over the top!! > > > -- -------------------------------------------------------------------- Sun Certified Enterprise Architect 1.5 Sun Certified Java Programmer 1.4 Microsoft Certified Systems Engineer 2000 http://in.linkedin.com/pub/ayub-khan/a/811/b81 mobile:+966-502674604 ---------------------------------------------------------------------- It is proved that Hard Work and kowledge will get you close but attitude will get you there. However, it's the Love of God that will put you over the top!!