On Fri, Apr 12, 2013 at 1:22 PM, Techienote com <techienote....@gmail.com>wrote:
> > > On Fri, Apr 12, 2013 at 10:12 PM, Jeff Trawick <traw...@gmail.com> wrote: > >> On Fri, Apr 12, 2013 at 12:10 PM, Techienote com < >> techienote....@gmail.com> wrote: >> >>> >>> >>> On Fri, Apr 12, 2013 at 4:59 PM, Jeff Trawick <traw...@gmail.com>wrote: >>> >>>> On Fri, Apr 12, 2013 at 2:51 AM, Techienote com < >>>> techienote....@gmail.com> wrote: >>>> >>>>> Hi Folks, >>>>> >>>>> >>>>> >>>>> Recently we are facing core dump in Oracle HTTP Server which is build >>>>> on Apache 1.3 >>>>> >>>>> >>>>> >>>>> Following is the output of httpd -V command >>>>> >>>>> >>>>> ------------------------------------------------------------------------------------------------------- >>>>> >>>>> Server version: Oracle-Application-Server-10g/10.1.3.1.0 >>>>> Oracle-HTTP-Server >>>>> Server built: Sep 22 2006 04:35:27 >>>>> Server's Module Magic Number: 19990320:18 >>>>> Server compiled with.... >>>>> -D EAPI >>>>> -D EAPI_MM >>>>> -D EAPI_MM_CORE_PATH="logs/mm" >>>>> -D HAVE_MMAP >>>>> -D USE_MMAP_SCOREBOARD >>>>> -D USE_MMAP_FILES >>>>> -D HAVE_FCNTL_SERIALIZED_ACCEPT >>>>> -D HAVE_SYSVSEM_SERIALIZED_ACCEPT >>>>> -D HAVE_PTHREAD_SERIALIZED_ACCEPT >>>>> -D DYNAMIC_MODULE_LIMIT=64 >>>>> -D HARD_SERVER_LIMIT=8192 >>>>> -D HTTPD_ROOT="/tmp/apache" >>>>> -D SUEXEC_BIN="/tmp/apache/bin/suexec" >>>>> -D DEFAULT_PIDLOG="logs/httpd.pid" >>>>> -D DEFAULT_SCOREBOARD="logs/httpd.scoreboard" >>>>> -D DEFAULT_LOCKFILE="logs/httpd.lock" >>>>> -D DEFAULT_ERRORLOG="logs/error_log" >>>>> -D TYPES_CONFIG_FILE="conf/mime.types" >>>>> -D SERVER_CONFIG_FILE="conf/httpd.conf" >>>>> -D ACCESS_CONFIG_FILE="conf/access.conf" >>>>> -D RESOURCE_CONFIG_FILE="conf/srm.conf" >>>>> >>>>> >>>>> ------------------------------------------------------------------------------------------------------ >>>>> >>>>> >>>>> >>>>> I have tried to run the same using pstack command. Following is the >>>>> output of the pstack command >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------------------------------- >>>>> >>>>> core 'core' of 13893: /ora10gas/OracleAS/Apache/Apache/bin/httpd -d >>>>> /ora10gas/OracleAS/Apach >>>>> ----------------- lwp# 1 / thread# 1 -------------------- >>>>> ff091a28 memcpy (ffbfc998, fddb2838, ffbff1e4, ffbff1f4, fe0d3d64, >>>>> fe0d3d84) + 104c >>>>> fe063bbc shmcb_retrieve_session (259f40, fddb2838, ffbff270, >>>>> 6b63feff, 80808080, 1010101) + 118 >>>>> fe063044 ssl_scache_shmcb_retrieve (259f40, ffbff2e0, 20683c, >>>>> ffbff60c, 468740, 468e74) + 7c >>>>> fe061430 ssl_scache_retrieve (259f40, ffbff360, 0, 0, 46895c, >>>>> ffbff3a0) + f4 >>>>> fe05e8f4 ssl_callback_GetSessionCacheEntry (2000, 33f498, ffffffff, >>>>> ffffffff, ffbff6f8, 4683c4) + 88 >>>>> >>>> >>>> SSLSessionCache none (or something) will avoid this code/crash, but >>>> you'll likely encounter noticeable performance degradation (client response >>>> time and/or server CPU). Unless the crash is happening very frequently >>>> (i.e., severely affecting service) you probably don't want to do that. >>>> >>>> You need to get assistance from Oracle. This is a proprietary SSL >>>> toolkit and proprietary patches to old levels of open source. >>>> >>>> Can you please let me know why you are suspecting SSLSessionCache? >>> >> >> Because the stack traces I responded to above are for looking up session >> cache entries... >> > I have gone through the link > http://publib.boulder.ibm.com/httpserv/ihsdiag/get_backtrace.html > As per this following is my pflag output of core dump > > --------------------------------------------------------------------------------------------------------------------------------------------------------- > core 'core' of 13893: /ora10gas/OracleAS/Apache/Apache/bin/httpd -d > /ora10gas/OracleAS/Apach > data model = _ILP32 flags = MSACCT|MSFORK > /1: flags = STOPPED > why = PR_SUSPENDED > lwppend = 0x00000400,0x00000000 > /2: flags = DETACH > sigmask = 0xfffffefd,0x0000ffff cursig = SIGSEGV > /3: flags = DETACH|STOPPED lwp_park(0x4,0xfdafbe08,0x0) > why = PR_SUSPENDED > sigmask = 0x0000e001,0x00000000 > > --------------------------------------------------------------------------------------------------------------------------------------------------------- > > Note that thread 2 has cursig = SIGSEGV next to it. That is the flag that > Solaris thinks did the dirty deed. Thread 2 output is as follows > > > ----------------- lwp# 2 / thread# 2 -------------------- > ff16e298 __pollsys (fdc1be68, 0, fdc1bed0, 0, 0, 0) + 8 > ff109abc pselect (fdc1be68, ff1e6790, ff1e6790, 0, fdc1bed0, 0) + 1c8 > ff109e34 select (0, 0, 0, 0, fdc1bf38, fe002394) + a0 > fe0035d8 swwwcsl_Sleep (ea60, 7, fe012d30, 3645, fdb00200, 1) + 40 > fe00470c wwccuctp_CleanupThreadProc (42bca0, fdc1c000, 0, 0, fe00465c, 1) > + b0 > ff16a9c8 _lwp_start (0, 0, 0, 0, 0, 0) > > So i want to understand why you are suspecting thread 1 > > >> >> >>> >>>> >>>>> fe1730bc nzospGetSession (ffbff448, ffbff450, 4683cc, 468740, >>>>> 468740, 45b2cb) + 24 >>>>> f87fdde0 ssl_Hshk_GetSessionID (20, 468909, ffbff534, 4, 468710, >>>>> 468740) + a8 >>>>> f8884274 ssl_Hshk_Priv_GetSessionDBRecord (468710, ffbff53f, >>>>> ffbff534, 468964, 0, 1) + 74 >>>>> f8883f04 ssl_Hshk_Priv_ProcessClientHello (300, 300, 469190, 468710, >>>>> 0, ffbff5b0) + 174 >>>>> f8879d24 STM_ExecuteLine (455238, f906374c, 1001, 469190, 0, 45525c) >>>>> + 40 >>>>> f8879a94 STM_DoOneCycle (455238, ffbff6dc, 20683c, ffbff60c, 468740, >>>>> 468e74) + 148 >>>>> f88798fc STM_Operate (455238, ffbff6dc, f8882b74, 468710, 46895c, >>>>> 20683c) + 14 >>>>> f87f3f90 ssl_Hshk_HandshakeProceed (468710, 0, 0, 4000, ffbff6f8, >>>>> 4683c4) + b0 >>>>> f87f2a1c ssl_Handshake (468710, 810a0038, 810d0013, 810d0000, >>>>> ffbff780, 4683c4) + 30 >>>>> fe16d80c nzos_Handshake (4683b8, 33f4cc, 434478, fffffff8, 0, 4364b0) >>>>> + b0 >>>>> fe05ca90 SSL_new_server_side (fe100818, 33f4cc, fe0ec98c, 2400, 2664, >>>>> 4314b0) + 13c >>>>> fe05c8a0 ssl_hook_NewConnection (431440, 91314, 9b3f0, ffbff8d8, >>>>> 8c4c0, 902ec) + 14c >>>>> 00030538 new_connection (a59e0, 933a0, a5a18, ffbff994, ffbff984, 2) >>>>> + 12c >>>>> 00031ee4 child_main (90ad4, d8c, e20, a7c, 1800, c00) + 95c >>>>> 00032278 make_child (933a0, 2, 516551db, 10, 1cf4, ff1e8140) + 16c >>>>> 00032358 startup_children (5, 14, 869d8, 1b840, 0, 21cc8) + 8c >>>>> 00032be8 standalone_main (800, 878, c00, d64, 1800, 1a44c) + 28c >>>>> 00033820 main (c00, dec, 1800, 1a20, 1800, 19ec) + 568 >>>>> 000193c0 _start (0, 0, 0, 0, 0, 0) + 108 >>>>> ----------------- lwp# 2 / thread# 2 -------------------- >>>>> ff16e298 __pollsys (fdc1be68, 0, fdc1bed0, 0, 0, 0) + 8 >>>>> ff109abc pselect (fdc1be68, ff1e6790, ff1e6790, 0, fdc1bed0, 0) + 1c8 >>>>> ff109e34 select (0, 0, 0, 0, fdc1bf38, fe002394) + a0 >>>>> fe0035d8 swwwcsl_Sleep (ea60, 7, fe012d30, 3645, fdb00200, 1) + 40 >>>>> fe00470c wwccuctp_CleanupThreadProc (42bca0, fdc1c000, 0, 0, >>>>> fe00465c, 1) + b0 >>>>> ff16a9c8 _lwp_start (0, 0, 0, 0, 0, 0) >>>>> ----------------- lwp# 3 / thread# 3 -------------------- >>>>> ff16aa6c __lwp_park (1, 42be50, fdafbe08, 0, 7dc18, 0) + 14 >>>>> ff164ab0 cond_wait_queue (42bdc8, 42be50, fdafbe08, 0, 0, 0) + 4c >>>>> ff164ef4 cond_wait_common (42bdc8, 42be50, fdafbe08, 0, 0, 0) + 294 >>>>> ff165088 _cond_timedwait (42bdc8, 42be50, fdafbed0, 0, 0, 0) + 34 >>>>> ff16517c cond_timedwait (42bdc8, 42be50, fdafbed0, 0, 0, fdafbed8) + >>>>> 14 >>>>> f8df683c sltspctimewait (21e658, 3f99e0, 3f99e4, 493e0, 42be50, >>>>> ff163f48) + d4 >>>>> fe0034ac swwwctwe_TimedWaitForEvent (3f99e0, 493e0, fe0121ec, 8, >>>>> f4240, 0) + 40 >>>>> fe000afc wwchmctp_CHMThreadProc (18f4, 1800, 1800, 1800, 1800, 1a3c) >>>>> + 118 >>>>> ff16a9c8 _lwp_start (0, 0, 0, 0, 0, 0) >>>>> >>>>> >>>>> ------------------------------------------------------------------------------------------------------ >>>>> >>>>> >>>>> >>>>> Need your help to analyze further is this issue. Need to understand >>>>> the cause of core dump and how we can fix it. Simultaneously I am >>>>> also raising this with Oracle support but not receiving any proper reply. >>>>> >>>> >>>> Sadly, pflags is not always correct. (I think it may be related to sig_coredump() blocking the synchronous signal upon entry, but in my experience only customers^H^H^H^H...users can duplicate this so I haven't played.) A way to double check this is to look at what thread 2 is doing -- a blocking syscall. Those don't crash. A best-effort way to double check that thread 1 really could have crashed is to look what it was doing -- memcpy. What about all the other threads? Also blocking. > >>>> >>>> -- >>>> Born in Roswell... married an alien... >>>> http://emptyhammock.com/ >>>> >>> >>> >> >> >> -- >> Born in Roswell... married an alien... >> http://emptyhammock.com/ >> > > -- Born in Roswell... married an alien... http://emptyhammock.com/