Martin,

Could you provide me a command which you want me to run and provide you the
results which might help you to debug this issue ?


On Thu, Nov 12, 2020 at 1:36 PM Martin Grigorov <mgrigo...@apache.org>
wrote:

> On Thu, Nov 12, 2020 at 10:37 AM Ayub Khan <ayub...@gmail.com> wrote:
>
> > Martin,
> >
> > These are file descriptors, some are related to the jar files which are
> > included in the web application and some are related to the sockets from
> > nginx to tomcat and some are related to database connections. I use the
> > below command to count the open file descriptors
> >
>
> which type of connections increase ?
> the sockets ? the DB ones ?
>
>
> >
> > watch "sudo ls /proc/`cat /var/run/tomcat8.pid`/fd/ | wc -l"
> >
>
> you can also use lsof command
>
>
> >
> >
> >
> > On Thu, Nov 12, 2020 at 10:56 AM Martin Grigorov <mgrigo...@apache.org>
> > wrote:
> >
> > > On Wed, Nov 11, 2020 at 11:17 PM Ayub Khan <ayub...@gmail.com> wrote:
> > >
> > > > Chris,
> > > >
> > > > I was load testing using the ec2 load balancer dns. I have increased
> > the
> > > > connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I
> > am
> > > > not seeing connection timeout in nginx logs now. No errors in
> > kernel.log
> > > I
> > > > am not seeing any errors in tomcat catalina.out.
> > > > During regular operations when the request count is between 4 to 6k
> > > > requests per minute the open files count for the tomcat process is
> > > between
> > > > 200 to 350. Responses from tomcat are within 5 seconds.
> > > > If the requests count goes beyond 6.5 k open files slowly move up  to
> > > 2300
> > > > to 3000 and the request responses from tomcat become slow.
> > > >
> > > > I am not concerned about high open files as I do not see any errors
> > > related
> > > > to open files. Only side effect of  open files going above 700 is the
> > > > response from tomcat is slow. I checked if this is caused from
> elastic
> > > > search, aws cloud watch shows elastic search response is within 5
> > > > milliseconds.
> > > >
> > > > what might be the reason that when the open files goes beyond 600, it
> > > slows
> > > > down the response time for tomcat. I tried with tomcat 9 and it's the
> > > same
> > > > behavior
> > > >
> > >
> > > Do you know what kind of files are being opened ?
> > >
> > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Nov 3, 2020 at 9:40 PM Christopher Schultz <
> > > > ch...@christopherschultz.net> wrote:
> > > >
> > > > > Ayub,
> > > > >
> > > > > On 11/3/20 10:56, Ayub Khan wrote:
> > > > > > *I'm curious about why you are using all of cloudflare and ALB
> and
> > > > > > nginx.Seems like any one of those could provide what you are
> > getting
> > > > from
> > > > > > all3 of them. *
> > > > > >
> > > > > > Cloudflare is doing just the DNS and nginx is doing ssl
> termination
> > > > >
> > > > > What do you mean "Cloudflare is doing just the DNS?"
> > > > >
> > > > > So what is ALB doing, then?
> > > > >
> > > > > > *What is the maximum number of simultaneous requests that one
> > > > > nginxinstance
> > > > > > will accept? What is the maximum number of simultaneous
> > > proxiedrequests
> > > > > one
> > > > > > nginx instance will make to a back-end Tomcat node? Howmany nginx
> > > nodes
> > > > > do
> > > > > > you have? How many Tomcat nodes?  *
> > > > > >
> > > > > > We have 4 vms each having nginx and tomcat running on them and
> each
> > > > > tomcat
> > > > > > has nginx in front of them to proxy the requests. So it's one
> Nginx
> > > > > > proxying to a dedicated tomcat on the same VM.
> > > > >
> > > > > Okay.
> > > > >
> > > > > > below is the tomcat connector configuration
> > > > > >
> > > > > > <Connector port="8080"
> > > > > >                 connectionTimeout="60000" maxThreads="2000"
> > > > > >
> >  protocol="org.apache.coyote.http11.Http11NioProtocol"
> > > > > >                 URIEncoding="UTF-8"
> > > > > >                 redirectPort="8443" />
> > > > >
> > > > > 60 seconds is a *long* time for a connection timeout.
> > > > >
> > > > > Do you actually need 2000 threads? That's a lot, though not insane.
> > > 2000
> > > > > threads means you expect to handle 2000 concurrent (non-async,
> > > > > non-Wewbsocket) requests. Do you need that (per node)? Are you
> > > expecting
> > > > > 8000 concurrent requests? Does your load-balancer understand the
> > > > > topography and current-load on any given node?
> > > > >
> > > > > > When I am doing a load test of 2000 concurrent users I see the
> open
> > > > files
> > > > > > increase to 10,320 and when I take thread dump I see the threads
> > are
> > > > in a
> > > > > > waiting state.Slowly as the requests are completed I see the open
> > > files
> > > > > > come down to normal levels.
> > > > >
> > > > > Are you performing your load-test against the CF/ALB/nginx/Tomcat
> > > stack,
> > > > > or just hitting Tomcat (or nginx) directly?
> > > > >
> > > > > Are you using HTTP keepalive in your load-test (from the client to
> > > > > whichever server is being contacted)?
> > > > >
> > > > > > The output of the below command is
> > > > > > sudo cat /proc/sys/kernel/pid_max
> > > > > > 131072
> > > > > >
> > > > > > I am testing this on a c4.8xlarge VM in AWS.
> > > > > >
> > > > > > below is the config I changed in nginx.conf file
> > > > > >
> > > > > > events {
> > > > > >          worker_connections 50000;
> > > > > >          # multi_accept on;
> > > > > > }
> > > > >
> > > > > This will allow 50k incoming connections, and Tomcat will accept an
> > > > > unbounded number of connections (for NIO connector). So limiting
> your
> > > > > threads to 2000 only means that the work of each request will be
> done
> > > in
> > > > > groups of 2000.
> > > > >
> > > > > > worker_rlimit_nofile 30000;
> > > > >
> > > > > I'm not sure how many connections are handled by a single nginx
> > worker.
> > > > > If you accept 50k connections and only allow 30k file handles, you
> > may
> > > > > have a problem if that's all being done by a single worker.
> > > > >
> > > > > > What would be the ideal config for tomcat and Nginx so this setup
> > on
> > > > > > c4.8xlarge vm could serve at least 5k or 10k requests
> > simultaneously
> > > > > > without causing the open files to spike to 10K.
> > > > >
> > > > > You will never be able to serve 10k simultaneous requests without
> > > having
> > > > > 10k open files on the server. If you mean 10k requests across the
> > whole
> > > > > 4-node environment, then I'd expect 10k requests to open (roughly)
> > 2500
> > > > > open files on each server. And of course, you need all kinds of
> other
> > > > > files open as well, from JAR files to DB connections or other
> network
> > > > > connections.
> > > > >
> > > > > But each connection needs a file descriptor, full stop. If you need
> > to
> > > > > handle 10k connections, then you will need to make it possible to
> > open
> > > > > 10k file handles /just for incoming network connections/ for that
> > > > > process. There is no way around it.
> > > > >
> > > > > Are you trying to hit a performance target or are you actively
> > getting
> > > > > errors with a particular configuration? Your subject says
> "Connection
> > > > > Timed Out". Is it nginx that is reporting the connection timeout?
> > Have
> > > > > you checked on the Tomcat side what is happening with those
> requests?
> > > > >
> > > > > -chris
> > > > >
> > > > > > On Thu, Oct 29, 2020 at 10:29 PM Christopher Schultz <
> > > > > > ch...@christopherschultz.net> wrote:
> > > > > >
> > > > > >> Ayub,
> > > > > >>
> > > > > >> On 10/28/20 23:28, Ayub Khan wrote:
> > > > > >>> During high load of 16k requests per minute, we notice below
> > error
> > > in
> > > > > >> log.
> > > > > >>>
> > > > > >>>    [error] 2437#2437: *13335389 upstream timed out (110:
> > Connection
> > > > > timed
> > > > > >>> out) while reading response header from upstream,  server:
> > > jahez.net
> > > > ,
> > > > > >>> request: "GET /serviceContext/ServiceName?callback= HTTP/1.1",
> > > > > upstream:
> > > > > >> "
> > > > > >>> http://127.0.0.1:8080/serviceContext/ServiceName
> > > > > >>>
> > > > > >>> Below is the flow of requests:
> > > > > >>>
> > > > > >>> cloudflare-->AWS ALB--> NGINX--> Tomcat-->Elastic-search
> > > > > >>
> > > > > >> I'm curious about why you are using all of cloudflare and ALB
> and
> > > > nginx.
> > > > > >> Seems like any one of those could provide what you are getting
> > from
> > > > all
> > > > > >> 3 of them.
> > > > > >>
> > > > > >>> In NGINX we have the below config
> > > > > >>>
> > > > > >>> location /serviceContext/ServiceName{
> > > > > >>>
> > > > > >>>       proxy_pass
> > > > > >> http://localhost:8080/serviceContext/ServiceName;
> > > > > >>>      proxy_http_version  1.1;
> > > > > >>>       proxy_set_header    Connection
> > $connection_upgrade;
> > > > > >>>       proxy_set_header    Upgrade             $http_upgrade;
> > > > > >>>       proxy_set_header    Host                      $host;
> > > > > >>>       proxy_set_header    X-Real-IP              $remote_addr;
> > > > > >>>       proxy_set_header    X-Forwarded-For
> > > > >  $proxy_add_x_forwarded_for;
> > > > > >>>
> > > > > >>>
> > > > > >>>           proxy_buffers 16 16k;
> > > > > >>>           proxy_buffer_size 32k;
> > > > > >>> }
> > > > > >>
> > > > > >> What is the maximum number of simultaneous requests that one
> nginx
> > > > > >> instance will accept? What is the maximum number of simultaneous
> > > > proxied
> > > > > >> requests one nginx instance will make to a back-end Tomcat node?
> > How
> > > > > >> many nginx nodes do you have? How many Tomcat nodes?
> > > > > >>
> > > > > >>> below is tomcat connector config
> > > > > >>>
> > > > > >>> <Connector port="8080"
> > > > > >>>
> > > > protocol="org.apache.coyote.http11.Http11NioProtocol"
> > > > > >>>                  connectionTimeout="200" maxThreads="50000"
> > > > > >>>                  URIEncoding="UTF-8"
> > > > > >>>                  redirectPort="8443" />
> > > > > >>
> > > > > >> 50,000 threads is a LOT of threads.
> > > > > >>
> > > > > >>> We monitor the open file using *watch "sudo ls /proc/`cat
> > > > > >>> /var/run/tomcat8.pid`/fd/ | wc -l" *the number of tomcat open
> > files
> > > > > keeps
> > > > > >>> increasing slowing the responses. the only option to recover
> from
> > > > this
> > > > > is
> > > > > >>> to restart tomcat.
> > > > > >>
> > > > > >> So this looks like Linux (/proc filesystem). Linux kernels have
> a
> > > > 16-bit
> > > > > >> pid space which means a theoretical max pid of 65535. In
> practice,
> > > the
> > > > > >> max pid is actually to be found here:
> > > > > >>
> > > > > >> $ cat /proc/sys/kernel/pid_max
> > > > > >> 32768
> > > > > >>
> > > > > >> (on my Debian Linux system, 4.9.0-era kernel)
> > > > > >>
> > > > > >> Each thread takes a pid. 50k threads means more than the maximum
> > > > allowed
> > > > > >> on the OS. So you will eventually hit some kind of serious
> problem
> > > > with
> > > > > >> that many threads.
> > > > > >>
> > > > > >> How many fds do you get in the process before Tomcat grinds to a
> > > halt?
> > > > > >> What does the CPU usage look like? The process I/O? Disk usage?
> > What
> > > > > >> does a thread dump look like (if you have the disk space to dump
> > > it!)?
> > > > > >>
> > > > > >> Why do you need that many threads?
> > > > > >>
> > > > > >> -chris
> > > > > >>
> > > > > >>
> > > ---------------------------------------------------------------------
> > > > > >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > > > > >> For additional commands, e-mail: users-h...@tomcat.apache.org
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > > > > For additional commands, e-mail: users-h...@tomcat.apache.org
> > > > >
> > > > >
> > > >
> > > > --
> > > > --------------------------------------------------------------------
> > > > Sun Certified Enterprise Architect 1.5
> > > > Sun Certified Java Programmer 1.4
> > > > Microsoft Certified Systems Engineer 2000
> > > > http://in.linkedin.com/pub/ayub-khan/a/811/b81
> > > > mobile:+966-502674604
> > > >
> ----------------------------------------------------------------------
> > > > It is proved that Hard Work and kowledge will get you close but
> > attitude
> > > > will get you there. However, it's the Love
> > > > of God that will put you over the top!!
> > > >
> > >
> >
> >
> > --
> > --------------------------------------------------------------------
> > Sun Certified Enterprise Architect 1.5
> > Sun Certified Java Programmer 1.4
> > Microsoft Certified Systems Engineer 2000
> > http://in.linkedin.com/pub/ayub-khan/a/811/b81
> > mobile:+966-502674604
> > ----------------------------------------------------------------------
> > It is proved that Hard Work and kowledge will get you close but attitude
> > will get you there. However, it's the Love
> > of God that will put you over the top!!
> >
>


-- 
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude
will get you there. However, it's the Love
of God that will put you over the top!!

Reply via email to