We have four Windows Servers running Apache 2.4.27 acting as load balancers for 
our application server cluster, which is running Tomcat. Recently, we have 
started to experience a high number of crashes with the web servers. Within the 
Apache error logs we see the following:

[Mon Jan 15 15:12:08.271099 2018] [mpm_winnt:notice] [pid 1696:tid 432] 
AH00428: Parent: child process 38240 exited with status 3221225477 -- 
Restarting.
[Mon Jan 15 15:12:08.944108 2018] [mpm_winnt:notice] [pid 1696:tid 432] 
AH00455: Apache/2.4.27 (Win64) OpenSSL/1.0.2l configured -- resuming normal 
operations
[Mon Jan 15 15:12:08.944108 2018] [mpm_winnt:notice] [pid 1696:tid 432] 
AH00456: Apache Lounge VC11 Server built: Jul 10 2017 14:15:02
[Mon Jan 15 15:12:08.957110 2018] [mpm_winnt:notice] [pid 1696:tid 432] 
AH00418: Parent: Created child process 43540

Between the four web servers, we often see over a dozen such crashes a day - 
sometimes more, sometimes less. In some cases Apache will crash after the child 
process was restarted only 5 minutes before. The number of crashes goes down 
significantly during the night and weekends, but it still happens. As far as we 
can tell, we have not made any major changes to the configuration recently and 
have only started to experience this in the past few weeks.

We were able to get a core dump from one of the web servers as it was crashing. 
The following is seem pieces extracted from it:
FAULTING_IP:
libaprutil_1!apr_brigade_writev+37a
00000000`6f8f21da 488908          mov     qword ptr [rax],rcx

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 000000006f8f21da 
(libaprutil_1!apr_brigade_writev+0x000000000000037a)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000001
   Parameter[1]: 0000000000000000
Attempt to write to address 0000000000000000

STACK_TEXT:
libaprutil_1!apr_brigade_writev+0x37a
libapr_1!apr_pool_destroy+0x6e
libaprutil_1!apr_brigade_cleanup+0x43
mod_ssl!ssl_run_init_server+0x2ddf
mod_ssl!ssl_run_init_server+0x1cf5
libhttpd!ap_process_request_after_handler+0x5c
libhttpd!ap_process_request+0x17
libhttpd!ap_sys_privileges_handlers+0x3953
libhttpd!ap_run_process_connection+0x35
libhttpd!ap_process_connection+0x45
libhttpd!ap_regkey_value_set+0x21f3
kernel32!BaseThreadInitThunk+0x22
ntdll!RtlUserThreadStart+0x34

Looking at the Windows Event Viewer, we see modules "libaprutil-1" and 
"libapr-1" as the faulting modules when the crashes occur. One some rarer 
occasions, we will see "ntdll" and "libhttpd" as the faulting modules.

We have tried increasing the thread stack size (based on similar reports 
online) but that has not helped. We've enabled forensic logging, trying to 
determine if there was some sort of rogue request that could be knocking us 
over, but nothing seemed really out of place.

Is there anything we can do to determine what the root cause is?

Thanks
-Tim

Reply via email to