Re: Deadlock situation detected/avoided with jk_log_lock

Rainer Jung Sat, 07 Feb 2009 09:10:27 -0800

On 06.02.2009 20:40, fredk2 wrote:

Do I understand you correctly that when Mr. Orton said to never use pthread
nor posixsem mutex (http://marc.info/?l=apr-dev&m=108720968023158&w=2) that
is now obsolete news and that Solaris perfected pthread mutex support since.

Joe Orton is always very careful with his statements, precise andcorrect. My personal experience with pthread mutexes on Solaris wasfine, but I must confess, that I didn't do specialized tests todetermine behaviour in crash situations.

I now did some searching and it turns out that the implementation ofpthread mutexes for Solaris 10 has very recently changed quite a bit. Soall speculations about improved pthread mutex behaviour (especially for"robust" mutexes) in the last years might have become obsolete.

The new implementation is contained in Solaris kernel patch 137137-09and most likely also in Solaris 10 Update 6 (10/08). I didn't check,whether that update simply contains the kernel patch or the fix isincluded independently.


Some detail is logged in Sunsolve under the bug IDs

6296770
2160259
6664275
6697344
6729759
6564706

You mention that mod_jk uses pthread is that the same as the httpd itself?

mod_jk uses a global mutex provided by the apr libraries for access tothe log file. It gets a default mutex, i.e. it lets APR decide, whichtype of mutex to use (APR_LOCK_DEFAULT, for Solaris it should be fcntl).You can't configure like for httpd's accept or ssl mutex.

mod_jk uses a couple of more locks, which are all not APR provided, butinstead directly coded to use pthreads. All of those mutexes are onlythread mutexes, so used locally in each process and not shared betweenprocesses. They won't have a problem with crashing processes.


They are:

- one mutex for each AJP worker, synchronizing access to the connectionpool, which exists per process


- one mutex for each lb worker

- a mutex for access to the shared memory when changing or readingconfiguration parameters. That might be a little unsafe, because itactually should be a global mutex, not a process local, but those configchanges are only done due to interaction with the status worker, sothere's very little chance for unwanted concurrency here. All dynamicruntime data are already marked as being volatile.

- a mutex used during dynamic update of uriworkermap.properties toprevent concurrent updates. Updates are done per process.

- a mutex to prevent concurrent execution of the process local internalmaintenance task

Some fellow at Covalent back in the early Apache 2.0 days, posted a white
paper about his various mutex testing, but it does not appear to be
available anymore. Would be interesting to know how it was tested and how it
would playout today.

Lots of the Covalent people are still around in various projects, likeWilliam (Bill) A. Rowe and Jim Jagielski. You could post at apr-dev,because Apache httpd uses the mutex implementations coming from the APRlibraries.

Rainer Jung-3 wrote:

On 06.02.2009 18:13, fredk2 wrote:

I was doing some stress test (with apache ab, 100 users, 100K requests)
to
compare an Apache prefork and worker mpm.  The test url is a simple hello
servlet on Tomcat 6.0.x via mod_jk. On my Sparc Solaris 10 server with
only
the Apache set to worker mpm I see following error messages in my jk log:

Apache/2.2.11 (Unix) with mod_jk/1.2.26 on Solaris 10.
. . .
[Thu Jan 08 11:42:28 2009] [error] (45)Deadlock situation
detected/avoided:
apr_global_mutex_lock(jk_log_lock) failed
. . .
[Thu Jan 08 11:42:29 2009] [emerg] (45)Deadlock situation
detected/avoided:
apr_proc_mutex_lock failed. Attempting to shutdown process gracefully.
[Thu Jan 08 11:42:29 2009] [error] (45)Deadlock situation
detected/avoided:
apr_global_mutex_lock(jk_log_lock) failed
. . .

these errors do not appear to impact the test results and the jk log file
seems complete.

I can suppress the errors by choosing another Mutex in the Apache
directive
AcceptMutex, such as sysvsem or pthread.  For Solaris 10 the default
mutex
for worker MPM is fcntl.  Setting the Mutex sysvsem (also the default on
Linux) marginally improves the request time.

Can someone explain what exactly these errors means? when does it occur?
I would have almost expect a "detected/avoided" to be a [warn] instead of
an
[error].

I have seen the trail http://markmail.org/message/dedqpmrrkpa224ns but
I'd
like to hear updated experiences that people have with sysvsem mutexes on
Solaris 10 - what is the better mutex?  sysvsme, posixsem, pthread **?

any comment will be appreciated.

I experienced this too a couple of times and once wrote a small C
program to reproduce the problem. On Solaris the algorithm to detect a
possible deadlock is very careful and returns EDEADLOCK even in
situations were you can mathematically prove, that a deadlock is not
possible. This happens in a multi-threaded environment when more than
one mutex is used.

Apache httpd and mod_jk use such a mutex and SSL also (so you can
observe the same warnings without mod_jk only using SSL with httpd and
doing stress tests).

In older JK versions this could lead to a hang, but we worked around
that a couple of versions ago. I generally recommend the pthread mutex
for Solaris which doesn't have the problem and seems to be robust
despite warnings about pthread mutexes in very old versions of Solaris.

We even once had a discussion about changing the default httpd mutex on
Solaris once, but I think that discussion didn't come to an end.

Regards,

Rainer


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Deadlock situation detected/avoided with jk_log_lock

Reply via email to