We saw the problem with async_notify again (See thread with subject "Postgres 7.4.6 hang in async_notify" in first message to this list dated "Mon, 25 Apr 2005 15:42:35 -0400") in a production setting. Since our last instance, we converted to compiling postgres with debugging, so we have a stack trace. Looking at it, the problem appears at first blush like it might be pretty obvious: an ill-timed signal which arrives during a malloc while malloc has some data-structure locked, and one of the extensive operations that Async_NotifyHandler did probably involved getting the same lock. That doesn't explain everything, since the stack trace looks like it was munged a little bit and it is not completely obvious to me how we got from tag_hash to posix_memalign.
For the record, while this postgres should be (of two) generating notifies out of triggers, we do not believe it should be listening for any, and indeed examination of pg_listener suggests it does not. If it matters, the notify should probably have been on the "wwww_aaaaa" relname. This was on a 64bit Opteron multiprocessor running Fedora Core 3. As you see from the stack trace, we have SSL enabled for the connection from the postgres client. We have a coredump, binary, and associated build tree if anyone wants it. -Seth Robertson ---------------------------------------------------------------------- sysd=> select * from pg_locks; relation | database | transaction | pid | mode | granted ----------+----------+-------------+-------+-----------------+--------- | | 1452163 | 5043 | ExclusiveLock | t | | 1408041 | 29980 | ExclusiveLock | t 16759 | 17212 | | 5043 | AccessShareLock | t ---------------------------------------------------------------------- 5043 is the above psql. 29980 is the postgres stuck in async_notify: ---------------------------------------------------------------------- sysd=> select * from pg_listener ; relname | listenerpid | notification ------------------------------+-------------+-------------- kludge_28041 | 28055 | 0 antura_sysd_jobs | 25754 | 0 antura_sysd_files | 25754 | 0 wwww_ssssssss_eeeeeeee | 30219 | 0 sensor_rrrrrrrr_policy | 30219 | 0 kludge_30217 | 30219 | 0 antura_sysd_job_cron_actions | 25754 | 0 wwww_aaaaa | 30219 | 0 Pending | 28055 | 0 (9 rows) ---------------------------------------------------------------------- ---------------------------------------------------------------------- 29980 ? S 158:06 postgres: sysd sysd 127.0.0.1 async_notify ---------------------------------------------------------------------- ---------------------------------------------------------------------- Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1". Attaching to program: /usr/counterstorm/postgres/bin/postgres, process 29980 Reading symbols from /lib64/libssl.so.4...done. Loaded symbols for /lib64/libssl.so.4 Reading symbols from /lib64/libcrypto.so.4...done. Loaded symbols for /lib64/libcrypto.so.4 Reading symbols from /usr/lib64/libz.so.1...done. Loaded symbols for /usr/lib64/libz.so.1 Reading symbols from /usr/lib64/libreadline.so.4...done. Loaded symbols for /usr/lib64/libreadline.so.4 Reading symbols from /lib64/libtermcap.so.2...done. Loaded symbols for /lib64/libtermcap.so.2 Reading symbols from /lib64/libcrypt.so.1...done. Loaded symbols for /lib64/libcrypt.so.1 Reading symbols from /lib64/libresolv.so.2...done. Loaded symbols for /lib64/libresolv.so.2 Reading symbols from /lib64/libnsl.so.1...done. Loaded symbols for /lib64/libnsl.so.1 Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/tls/libm.so.6...done. Loaded symbols for /lib64/tls/libm.so.6 Reading symbols from /lib64/tls/libc.so.6...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /usr/lib64/libgssapi_krb5.so.2...done. Loaded symbols for /usr/lib64/libgssapi_krb5.so.2 Reading symbols from /usr/lib64/libkrb5.so.3...done. Loaded symbols for /usr/lib64/libkrb5.so.3 Reading symbols from /lib64/libcom_err.so.2...done. Loaded symbols for /lib64/libcom_err.so.2 Reading symbols from /usr/lib64/libk5crypto.so.3...done. Loaded symbols for /usr/lib64/libk5crypto.so.3 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libnss_files.so.2...done. Loaded symbols for /lib64/libnss_files.so.2 Reading symbols from /usr/counterstorm/postgres/lib/plpgsql.so...done. Loaded symbols for /usr/counterstorm/postgres/lib/plpgsql.so Reading symbols from /usr/counterstorm/postgres/lib/array_sz.so...done. Loaded symbols for /usr/counterstorm/postgres/lib/array_sz.so Reading symbols from /usr/counterstorm/postgres/lib/hostinfo.so...done. Loaded symbols for /usr/counterstorm/postgres/lib/hostinfo.so Reading symbols from /usr/counterstorm/postgres/lib/pending.so...done. Loaded symbols for /usr/counterstorm/postgres/lib/pending.so 0x00000036dacd09ab in __lll_mutex_lock_wait () from /lib64/tls/libc.so.6 (gdb) where #0 0x00000036dacd09ab in __lll_mutex_lock_wait () from /lib64/tls/libc.so.6 #1 0x00000000185d6840 in ?? () #2 0x000000000000000d in ?? () #3 0x00000036dac6d181 in posix_memalign () from /lib64/tls/libc.so.6 #4 0x0000000000000001 in ?? () #5 0x00000000006cd449 in tag_hash (key=0x36dae2e620, keysize=0) at hashfn.c:40 #6 0x00000000006cd076 in hash_search (hashp=0x2aa221da00, keyPtr=0x7fbfffc6a0, action=20, foundPtr=0x6ccf5c "\205uׁ'A#t$\034\230fff\220ff\220H\213-6\"") at dynahash.c:647 #7 0x00000000005ce55e in mdclose (reln=0x1e0) at md.c:384 #8 0x00000000005d0eec in smgrclose (which=-6624, reln=0x2aa221da00) at smgr.c:292 #9 0x00000000006b9dcc in RelationIdInvalidateRelationCacheByRelationId (relationId=2002738) at relcache.c:1748 #10 0x00000000006b5628 in LocalExecuteInvalidationMessage (msg=0x7fbfffc6e0) at inval.c:440 #11 0x00000000005c0388 in ReceiveSharedInvalidMessages (invalFunction=0x6b5550 <LocalExecuteInvalidationMessage>, resetFunction=0x6b5670 <InvalidateSystemCaches>) at sinval.c:125 #12 0x00000000004684d6 in StartTransactionCommand () at xact.c:446 #13 0x00000000004d3a84 in Async_NotifyHandler (postgres_signal_arg=-622664160) at async.c:814 #14 <signal handler called> #15 0x00000036dac69094 in _int_malloc () from /lib64/tls/libc.so.6 #16 0x00000036dac6ab12 in malloc () from /lib64/tls/libc.so.6 #17 0x00000036dcb4d05b in CRYPTO_malloc () from /lib64/libcrypto.so.4 #18 0x00000036dcb99dc1 in EVP_MD_CTX_copy_ex () from /lib64/libcrypto.so.4 #19 0x00000036dcb585cc in HMAC_Final () from /lib64/libcrypto.so.4 #20 0x00000036db91fe10 in tls1_mac () from /lib64/libssl.so.4 #21 0x00000036db91bcbf in ssl3_read_bytes () from /lib64/libssl.so.4 #22 0x00000036db919a9f in ssl3_renegotiate_check () from /lib64/libssl.so.4 #23 0x000000000053492c in secure_read (port=0x929e80, ptr=0x8a7740, len=8192) at be-secure.c:261 #24 0x000000000053ba47 in pq_getbyte () at pqcomm.c:685 #25 0x00000000005d759d in PostgresMain (argc=9992288, argv=0x52, username=0x7fbfffd660 "") at postgres.c:275 #26 0x00000000005911e4 in PostmasterMain (argc=9470864, argv=0x924a80) at postmaster.c:2561 #27 0x000000000053e0ef in main (argc=4, argv=0x7fbffff6b8) at main.c:214 ---------------------------------------------------------------------- ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq