Thanks for the patch. However, there are still few big problems and we really need your help ( even if you solve your problem ). First, I can't reproduce it - so it's blind debugging.
I don't think select() is available on all platforms ( for jk2 we could use apr select ), so I doubt we can just check in your fix. Second, this adds a certain overhead ( we double the number of system calls ). The real issue is why tomcat doesn't send the data. Could you try with tomcat4.1 ( or the new coyote-based ajp connector ) ? Is it really a deadlock ( tomcat and mod_jk both waiting for input, i.e. locked in read ) ? Or it is that tomcat for some reasons doesn't send the 'END' message ? Of course, there is the issue of detecting timeouts - but that's extremely tricky, as some requests may take a long time to process, and waiting 3 seconds ( or any other timeout ) is not a good solution. It is the java side who should send the END message when the requests ends. Can you try more debugging, also on the java side ? Maybe the etherreal AJP pluging can help :-) BTW, even if you solved the deadlock you may run into other problems, as requests longer than 3 secs will fail. Costin On 6 Jun 2002, Jean-Francois Nadeau wrote: > Hi. > > The lock/unlock fix may help but it doesn't fix the problem. I patched > my tree with the jk_mt.h modification and I investigated the bug even > deeper. > > The problem was in jk_connect.c, jk_tcp_socket_recvfull, recv call. It > seems that Tomcat 4.03 (I didn't try with CVS head version...) sometimes > doesn't send all the data required. So, mod_jk blocks in recv forever, > causing a deadlock. > > I patched my tree with the following: > > -- jk_connect.c, jk_tcp_socket_recvfull > -- after while(rdlen < len) { > > int this_time, select_ret; > fd_set set; > struct timeval timeout; > > FD_ZERO(&set); > FD_SET(sd, &set); > > timeout.tv_sec = 3; > timeout.tv_usec = 0; > > select_ret = select(sd+1, &set, NULL, NULL, &timeout); > > if (-1 == select_ret) { > return -1; > } > > if (0 == select_ret) { > return -1; > } > > -- before this_time = recv(sd, > > The deadlock is gone and I'm very happy! :) > > Thanks, > > jeff > > On Wed, 2002-06-05 at 21:25, [EMAIL PROTECTED] wrote: > > Hi, > > > > I found the problem, it seems the lock/unlock were in the wrong order. > > > > Please checkout from head and try again, and let me know if it still > > fails. > > > > ( thanks for reporting it ) > > > > Costin > > > > On 5 Jun 2002, Jean-Francois Nadeau wrote: > > > > > Hi. > > > > > > I started to load / stress test our web application. It is running under > > > Apache 1.3.22 and Tomcat 4.03 and the mod_jk binary that came with it. > > > The OS is Linux 2.4.7, RedHat 7.2 without any updates. > > > > > > I discovered that httpd processes deadlock after a certain amount of > > > huge requests. > > > > > > I decided to investigate the issue by looking at the source code. The > > > jk_handler function does not terminate. In fact, the call to > > > end->done($end, l) (just before the jk_close_pool) deadlock (not always > > > however). That function calls pthread mutex lock/unlock for connection > > > reuse. > > > > > > I tried to comment all connection reuse code. (in jk_ajp_done, > > > jk_ajp_service, jk_ajp_getendpoint). The deadlock is not gone, but it > > > appears later. > > > > > > Have you ever encountered this problem before? I'd like to fix it. May > > > it be a kernel bug, glibc bug? (The problem seems to come from pthread > > > mutexes...) > > > > > > Thanks a lot, > > > > > > jeff > > > > > > > > > -- > > > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > > > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > > > > > > > > > > > -- > > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > > > > > > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>