Ayub,

On 11/3/20 10:56, Ayub Khan wrote:
*I'm curious about why you are using all of cloudflare and ALB and
nginx.Seems like any one of those could provide what you are getting from
all3 of them. *

Cloudflare is doing just the DNS and nginx is doing ssl termination

What do you mean "Cloudflare is doing just the DNS?"

So what is ALB doing, then?

*What is the maximum number of simultaneous requests that one nginxinstance
will accept? What is the maximum number of simultaneous proxiedrequests one
nginx instance will make to a back-end Tomcat node? Howmany nginx nodes do
you have? How many Tomcat nodes?  *

We have 4 vms each having nginx and tomcat running on them and each tomcat
has nginx in front of them to proxy the requests. So it's one Nginx
proxying to a dedicated tomcat on the same VM.

Okay.

below is the tomcat connector configuration

<Connector port="8080"
                connectionTimeout="60000" maxThreads="2000"
                protocol="org.apache.coyote.http11.Http11NioProtocol"
                URIEncoding="UTF-8"
                redirectPort="8443" />

60 seconds is a *long* time for a connection timeout.

Do you actually need 2000 threads? That's a lot, though not insane. 2000 threads means you expect to handle 2000 concurrent (non-async, non-Wewbsocket) requests. Do you need that (per node)? Are you expecting 8000 concurrent requests? Does your load-balancer understand the topography and current-load on any given node?

When I am doing a load test of 2000 concurrent users I see the open files
increase to 10,320 and when I take thread dump I see the threads are in a
waiting state.Slowly as the requests are completed I see the open files
come down to normal levels.

Are you performing your load-test against the CF/ALB/nginx/Tomcat stack, or just hitting Tomcat (or nginx) directly?

Are you using HTTP keepalive in your load-test (from the client to whichever server is being contacted)?

The output of the below command is
sudo cat /proc/sys/kernel/pid_max
131072

I am testing this on a c4.8xlarge VM in AWS.

below is the config I changed in nginx.conf file

events {
         worker_connections 50000;
         # multi_accept on;
}

This will allow 50k incoming connections, and Tomcat will accept an unbounded number of connections (for NIO connector). So limiting your threads to 2000 only means that the work of each request will be done in groups of 2000.

worker_rlimit_nofile 30000;

I'm not sure how many connections are handled by a single nginx worker. If you accept 50k connections and only allow 30k file handles, you may have a problem if that's all being done by a single worker.

What would be the ideal config for tomcat and Nginx so this setup on
c4.8xlarge vm could serve at least 5k or 10k requests simultaneously
without causing the open files to spike to 10K.

You will never be able to serve 10k simultaneous requests without having 10k open files on the server. If you mean 10k requests across the whole 4-node environment, then I'd expect 10k requests to open (roughly) 2500 open files on each server. And of course, you need all kinds of other files open as well, from JAR files to DB connections or other network connections.

But each connection needs a file descriptor, full stop. If you need to handle 10k connections, then you will need to make it possible to open 10k file handles /just for incoming network connections/ for that process. There is no way around it.

Are you trying to hit a performance target or are you actively getting errors with a particular configuration? Your subject says "Connection Timed Out". Is it nginx that is reporting the connection timeout? Have you checked on the Tomcat side what is happening with those requests?

-chris

On Thu, Oct 29, 2020 at 10:29 PM Christopher Schultz <
ch...@christopherschultz.net> wrote:

Ayub,

On 10/28/20 23:28, Ayub Khan wrote:
During high load of 16k requests per minute, we notice below error in
log.

   [error] 2437#2437: *13335389 upstream timed out (110: Connection timed
out) while reading response header from upstream,  server: jahez.net,
request: "GET /serviceContext/ServiceName?callback= HTTP/1.1", upstream:
"
http://127.0.0.1:8080/serviceContext/ServiceName

Below is the flow of requests:

cloudflare-->AWS ALB--> NGINX--> Tomcat-->Elastic-search

I'm curious about why you are using all of cloudflare and ALB and nginx.
Seems like any one of those could provide what you are getting from all
3 of them.

In NGINX we have the below config

location /serviceContext/ServiceName{

      proxy_pass
http://localhost:8080/serviceContext/ServiceName;
     proxy_http_version  1.1;
      proxy_set_header    Connection          $connection_upgrade;
      proxy_set_header    Upgrade             $http_upgrade;
      proxy_set_header    Host                      $host;
      proxy_set_header    X-Real-IP              $remote_addr;
      proxy_set_header    X-Forwarded-For     $proxy_add_x_forwarded_for;


          proxy_buffers 16 16k;
          proxy_buffer_size 32k;
}

What is the maximum number of simultaneous requests that one nginx
instance will accept? What is the maximum number of simultaneous proxied
requests one nginx instance will make to a back-end Tomcat node? How
many nginx nodes do you have? How many Tomcat nodes?

below is tomcat connector config

<Connector port="8080"
                 protocol="org.apache.coyote.http11.Http11NioProtocol"
                 connectionTimeout="200" maxThreads="50000"
                 URIEncoding="UTF-8"
                 redirectPort="8443" />

50,000 threads is a LOT of threads.

We monitor the open file using *watch "sudo ls /proc/`cat
/var/run/tomcat8.pid`/fd/ | wc -l" *the number of tomcat open files keeps
increasing slowing the responses. the only option to recover from this is
to restart tomcat.

So this looks like Linux (/proc filesystem). Linux kernels have a 16-bit
pid space which means a theoretical max pid of 65535. In practice, the
max pid is actually to be found here:

$ cat /proc/sys/kernel/pid_max
32768

(on my Debian Linux system, 4.9.0-era kernel)

Each thread takes a pid. 50k threads means more than the maximum allowed
on the OS. So you will eventually hit some kind of serious problem with
that many threads.

How many fds do you get in the process before Tomcat grinds to a halt?
What does the CPU usage look like? The process I/O? Disk usage? What
does a thread dump look like (if you have the disk space to dump it!)?

Why do you need that many threads?

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to