Hi Clovis,

Thanks for your answers.

Open files on our production servers are 1024.
Here is the complete output of ulimit -a :
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
max nice                        (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 36352
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
max rt priority                 (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 36352
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Our production servers are connected with gigabit ethernet. However the servers used to reproduce the problems are only 100MBPS ethernet. The problem occurs in both test and production environment.

Our TCP keep alive settings are :
tcp_keepalive_time is 7200.
tcp_keepalive_intvl is 75
tcp_keepalive_probes is 9

I am monitoring threads call stack using jprofiler which displays the stack from my initial mail.

I've tried the suggestion from Filip Hanik (maxKeepAliveRequests="1" in my tomcat connector). But I was still able to reproduce the problem.

Then I switched the connector to the NIO connector (I was previously using the HTTP/1.1 default connector). I was not able to reproduce the problem after hours of test (it usually happens after 10-20 minutes of heavy load). Pushed the configuration change to one of our 4 productions servers to monitor the efficiency.

So far we didn't have any problem on the server with NIO connector after 5 production days...

Christophe.




----- Original Message ----- From: "Clovis Wichoski" <[EMAIL PROTECTED]>
To: "Tomcat Users List" <users@tomcat.apache.org>
Sent: Friday, July 04, 2008 4:17 AM
Subject: Re: Tomcat bottleneck on InternalInputBuffer.parseRequestLine


hi, Christophe,

well, i still dont find the reason about what is my problem, but some things
that help me to avoid the problem to occurs frequently,

i checked the limit for open files on linux, you can check yous with ulimit
-a, here i set to 4096,

how the machines are connected? its with gigabit ethernet?

please show us all your configuration on /proc/sys/net/ipv4/ the most
important is:

cat /proc/sys/net/ipv4/tcp_keepalive_time

but the what more impact on the performance was the right configuration of
JDBC driver on the pool, i use MaxDB and the driver have a problem, that
when getting new physical connections, the drive have a singleton pattern,
that we cant get connections in parallel (really parallel, with multi-core
processors) and when this parallel attempts to get connections occurs, we
get stuck threads, but note, maybe the problem isnt with driver, its just a
suspect, since i dont have a solution for this.
another suspect is that, for some strange reason the socket on java exists,
but the socket on linux (inode) dont exists anymore, and until java know
this, the system stucks, until timeout, but i cant figure or simulate this,
since its really a rare case, and for me just occurs only one time, and i
dont have a way to prove that, i'm try to check my problem using the follow
script:

#!/bin/bash
today=`date +%Y%m%d%H%M%S`
psId=`/opt/java/jdk1.6.0_06/bin/jps | grep Bootstrap |  cut -d' ' -f1`
/opt/java/jdk1.6.0_06/bin/jstack -l $psId >
/mnt/logs/stack/stack${today}.txt
echo "--- pstack ---" >> /mnt/logs/stack/stack${today}.txt
pstack $psId >> /mnt/logs/stack/stack${today}.txt
echo "--- lsof ---" >> /mnt/logs/stack/stack${today}.txt
lsof >> /mnt/logs/stack/stack${today}.txt
echo "--- ls -l /proc/${psId}/fd/ ---" >> /mnt/logs/stack/stack${today}.txt
ls -l /proc/${psId}/fd/ >> /mnt/logs/stack/stack${today}.txt
echo "stack do processo $psId gravado em /mnt/logs/stack/stack${today}.txt"

when users reports a stuck, i run this script manually, ten times, then
compare outputs, IBM have a tool that you can use to read better the jstack
output, i dont remember the name right now, but tomorrow i will post the
link here.

lets see if we can share knowledge to win this fight, ;)

regards

Clóvis

On Tue, Jul 1, 2008 at 12:23 PM, Christophe Fondacci <
[EMAIL PROTECTED]> wrote:

Hello all,

We have a problem with tomcat on our production server.
This problem may be related to the one listed here :
http://grokbase.com/profile/id:hNxqA0ZEdnD-6GYFRNs-iIkKEvF907FNWdczKYQ719Q

Here it is :
- We got 2 tomcat servers on 2 distinct machines.
- 1 server is our application (let's call it A for Application server)
- The other server is hosting solr (let's call it S for Solr server)
- All servers are Tomcat 6.0.14 running on jdk 1.6.0_02-b05 on Linux
(Fedora core 6)
- Server A performs http requests to server S by 2 ways :
   > A http get (using apache commons HttpClient) with URL like
http://S:8080/solr/select/?q=cityuri%3AXEABDBFDDACCXmaidenheadXEABDBFDDACCX&facet=true&facet.field=price&fl=id&facet.sort=false&facet.mincount=1&facet.limit=-1<http://s:8080/solr/select/?q=cityuri%3AXEABDBFDDACCXmaidenheadXEABDBFDDACCX&facet=true&facet.field=price&fl=id&facet.sort=false&facet.mincount=1&facet.limit=-1>
> A http post with URL like : http://S:8080/solr/select/<http://s:8080/solr/select/>with a set of 12 NameValuePair

When traffic is light on our server A, everything works great.
When traffic is high on our server A (simultation of 40 simultaneous users
with Jmeter), some requests to our server S take more than 200 seconds. It
happens randomly and we couldn't isolate an URL-pattern: an URL can return
in less than 500ms and the exact same URL can take 300s before returning...

We performed deep jvm analysis (using jprofiler) to observe what was going
on on the Solr-server. When the problem occurs, we can see threads which
are
stucked with the following call stack :

at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at
org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:700)
at
org.apache.coyote.http11.InternalInputBuffer.parseRequestLine(InternalInputBuffer.java:366)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:805)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)

Requests which returns in 200s+ seem to spend almost all their time reading
this input stream...
The javadoc says parseRequestLine is used to parse the http header. As I
stated above, our URL seem quite small so I can't understand why it happens.
The response from server S is very small as well.

We are able to reproduce the problem with less than 40 threads, but it is
more difficult to repoduce.
As I said at the beginning, I have found a user which had a similar problem
but the mailing list thread does not give any solution...

Has anyone an idea of what is going on ? Is there settings we can use to
avoid this problems ?
I am out of ideas on what to try to fix this...

Any help would be highly appreciated...thank you very much.
Christophe.


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to