On Sun, Oct 15, 2017 at 02:44:58PM +0200, Tomas Vondra wrote: > Thanks, but I'm not sure that'll help, at this point. We already know > what happened (corrupted memory), we don't know "how". And core files > are mostly just "snapshots" so are not very useful in answering that :-(
Is there anything I should be saving for these or hints how else to debug? I just got a segfault while looking for logs from an segfault from 2hr earlier...but it was a duplicate from the first stacktrace... < 2017-10-16 23:21:23.172 -04 >LOG: server process (PID 31543) was terminated by signal 11: Segmentation fault < 2017-10-16 23:21:23.172 -04 >DETAIL: Failed process was running: autovacuum: BRIN summarize public.gtt 162175 Oct 16 23:21:22 database kernel: postmaster[31543] general protection ip:4bd467 sp:7ffe08a94890 error:0 in postgres[400000+692000] ... Oct 17 01:58:36 database kernel: postmaster[8646]: segfault at 8 ip 000000000084a177 sp 00007ffe08a94a88 error 4 in postgres[400000+692000] Oct 17 01:58:38 database abrt[9192]: Saved core dump of pid 8646 (/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-17-01:58:38-8646 (7692288 bytes) Oct 17 01:58:38 database abrtd: Directory 'ccpp-2017-10-17-01:58:38-8646' creation detected Oct 17 01:58:38 database abrt[9192]: Saved core dump of pid 8646 (/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-17-01:58:38-8646 (7692288 bytes) Oct 17 01:58:38 database abrtd: Directory 'ccpp-2017-10-17-01:58:38-8646' creation detected Oct 17 01:59:18 database abrtd: Generating core_backtrace Oct 17 01:59:18 database abrtd: Duplicate: core backtrace Oct 17 01:59:18 database abrtd: DUP_OF_DIR: /var/spool/abrt/ccpp-2017-10-14-18:05:35-26500 Oct 17 01:59:18 database abrtd: Deleting problem directory ccpp-2017-10-17-01:58:38-8646 (dup of ccpp-2017-10-14-18:05:35-26500) postgres=# SELECT * FROM postgres_log_2017_10_16_2300 WHERE pid=31543; -[ RECORD 1 ]----------+------------------------------------------------------------------------------------------------- log_time | 2017-10-16 23:21:22.021-04 pid | 31543 session_id | 59e57677.7b37 session_line | 2 session_start_time | 2017-10-16 23:18:15-04 virtual_transaction_id | 22/54376 transaction_id | 0 error_severity | ERROR sql_state_code | 57014 message | canceling autovacuum task context | automatic analyze of table "gtt.public.cdrs_eric_egsnpdprecord_2017_10_15" -[ RECORD 2 ]----------+------------------------------------------------------------------------------------------------- log_time | 2017-10-16 23:21:22.021-04 pid | 31543 session_id | 59e57677.7b37 session_line | 3 session_start_time | 2017-10-16 23:18:15-04 virtual_transaction_id | 22/54377 transaction_id | 0 error_severity | ERROR sql_state_code | 57014 message | canceling autovacuum task context | processing work entry for relation "gtt.public.cdrs_eric_msc_voice_2017_10_16_startofcharge_idx" -[ RECORD 3 ]----------+------------------------------------------------------------------------------------------------- log_time | 2017-10-16 23:21:21.003-04 pid | 31543 session_id | 59e57677.7b37 session_line | 1 session_start_time | 2017-10-16 23:18:15-04 virtual_transaction_id | 22/54374 transaction_id | 0 error_severity | ERROR sql_state_code | 57014 message | canceling autovacuum task context | automatic analyze of table "gtt.public.cdrs_eric_ggsnpdprecord_2017_10_15" Note, the most recent LOCK TABLE I could find beofre 23:21:22 was actually ~10min earlier at 2017-10-16 23:12:16.519-04. Core was generated by `postgres: autovacuum worker process gtt '. Program terminated with signal 11, Segmentation fault. #0 index_close (relation=0x324647603246466, lockmode=1) at indexam.c:178 178 LockRelId relid = relation->rd_lockInfo.lockRelId; Missing separate debuginfos, use: debuginfo-install audit-libs-2.4.5-6.el6.x86_64 cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-23.el6.x86_64 libgcc-4.4.7-18.el6.x86_64 libicu-4.2.1-14.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 libstdc++-4.4.7-18.el6.x86_64 nspr-4.13.1-1.el6.x86_64 nss-3.28.4-1.el6_9.x86_64 nss-softokn-freebl-3.14.3-23.3.el6_8.x86_64 nss-util-3.28.4-1.el6_9.x86_64 openldap-2.4.40-16.el6.x86_64 openssl-1.0.1e-57.el6.x86_64 pam-1.1.1-24.el6.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) p relation $1 = (struct RelationData *) 0x324647603246466 (gdb) p relation->rd_lockInfo Cannot access memory at address 0x3246476032464aa (gdb) bt #0 index_close (relation=0x324647603246466, lockmode=1) at indexam.c:178 #1 0x00000000004bc3c2 in systable_endscan (sysscan=0x145b3c8) at genam.c:504 #2 0x000000000080f266 in SearchCatCache (cache=0x1427168, v1=<value optimized out>, v2=<value optimized out>, v3=<value optimized out>, v4=<value optimized out>) at catcache.c:1190 #3 0x000000000081cd7b in get_rel_name (relid=<value optimized out>) at lsyscache.c:1730 #4 0x00000000006a5299 in perform_work_item (workitem=0x7fd7c7f94860) at autovacuum.c:2609 #5 0x00000000006a6fd9 in do_autovacuum () at autovacuum.c:2533 #6 0x00000000006a7dc8 in AutoVacWorkerMain (argc=<value optimized out>, argv=<value optimized out>) at autovacuum.c:1694 #7 0x00000000006a7e96 in StartAutoVacWorker () at autovacuum.c:1499 #8 0x00000000006b595a in StartAutovacuumWorker (postgres_signal_arg=<value optimized out>) at postmaster.c:5414 #9 sigusr1_handler (postgres_signal_arg=<value optimized out>) at postmaster.c:5111 #10 <signal handler called> #11 0x0000003bd74e1603 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 #12 0x00000000006b6c96 in ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1717 #13 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1361 #14 0x0000000000638450 in main (argc=3, argv=0x13cd3a0) at main.c:228 #0 index_close (relation=0x324647603246466, lockmode=1) at indexam.c:178 relid = {relId = 21345368, dbId = 0} #1 0x00000000004bc3c2 in systable_endscan (sysscan=0x145b3c8) at genam.c:504 No locals. #2 0x000000000080f266 in SearchCatCache (cache=0x1427168, v1=<value optimized out>, v2=<value optimized out>, v3=<value optimized out>, v4=<value optimized out>) at catcache.c:1190 cur_skey = {{sk_flags = 0, sk_attno = 1, sk_strategy = 3, sk_subtype = 0, sk_collation = 0, sk_func = {fn_addr = 0x7a8d50 <oideq>, fn_oid = 184, fn_nargs = 2, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2 '\002', fn_extra = 0x0, fn_mcxt = 0x13cf3d8, fn_expr = 0x0}, sk_argument = 618293329}, {sk_flags = 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_collation = 0, sk_func = {fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000', fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0}, {sk_flags = 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_collation = 0, sk_func = {fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000', fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0}, {sk_flags = 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_collation = 0, sk_func = {fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000', fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0}} hashValue = 884019 hashIndex = 51 iter = {cur = <value optimized out>, end = 0x14276a8} bucket = 0x14276a8 ct = <value optimized out> relation = 0x7fd6b8788e58 scandesc = 0x145b3c8 ntp = <value optimized out> #3 0x000000000081cd7b in get_rel_name (relid=<value optimized out>) at lsyscache.c:1730 tp = <value optimized out> #4 0x00000000006a5299 in perform_work_item (workitem=0x7fd7c7f94860) at autovacuum.c:2609 cur_datname = 0x0 cur_nspname = 0x0 cur_relname = 0x0 __func__ = "perform_work_item" #5 0x00000000006a6fd9 in do_autovacuum () at autovacuum.c:2533 workitem = 0x7fd7c7f94860 classRel = 0x7fd6b8788e58 tuple = <value optimized out> relScan = <value optimized out> dbForm = <value optimized out> table_oids = <value optimized out> orphan_oids = 0x0 ctl = {num_partitions = 0, ssize = 0, dsize = 0, max_dsize = 0, ffactor = 0, keysize = 4, entrysize = 80, hash = 0, match = 0, keycopy = 0, alloc = 0, hcxt = 0x0, hctl = 0x0} table_toast_map = 0x14d5228 cell = 0x0 shared = 0x1499eb8 dbentry = 0x149a140 bstrategy = 0x156ef38 key = {sk_flags = 0, sk_attno = 16, sk_strategy = 3, sk_subtype = 0, sk_collation = 100, sk_func = {fn_addr = 0x750430 <chareq>, fn_oid = 61, fn_nargs = 2, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2 '\002', fn_extra = 0x0, fn_mcxt = 0x1400838, fn_expr = 0x0}, sk_argument = 116} pg_class_desc = 0x14c3748 effective_multixact_freeze_max_age = 400000000 did_vacuum = <value optimized out> found_concurrent_worker = <value optimized out> i = <value optimized out> __func__ = "do_autovacuum" #6 0x00000000006a7dc8 in AutoVacWorkerMain (argc=<value optimized out>, argv=<value optimized out>) at autovacuum.c:1694 dbname = "gtt\000\000\000\000\000\325I\001\327;\000\000\000\067{\000\000\000\000\000\000E\352\200\327;\000\000\000\001\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\374\024\316", '\000' <repeats 12 times> local_sigjmp_buf = {{__jmpbuf = {0, -6625136415178345871, 20928464, 1508210280, 31539, 1508209622, 6625596434918027889, -6625136601089035663}, __mask_was_saved = 1, __saved_mask = {__val = { 18446744066192964103, 13184024, 1, 13182664, 0, 31539, 1508209622, 0, 257010229664, 1, 0, 140729043734800, 257014425560, 4, 0, 4}}}} dbid = 16400 __func__ = "AutoVacWorkerMain" #7 0x00000000006a7e96 in StartAutoVacWorker () at autovacuum.c:1499 worker_pid = 0 __func__ = "StartAutoVacWorker" #8 0x00000000006b595a in StartAutovacuumWorker (postgres_signal_arg=<value optimized out>) at postmaster.c:5414 bn = 0x13f57d0 #9 sigusr1_handler (postgres_signal_arg=<value optimized out>) at postmaster.c:5111 save_errno = 4 __func__ = "sigusr1_handler" #10 <signal handler called> No symbol table info available. #11 0x0000003bd74e1603 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 ---Type <return> to continue, or q <return> to quit--- No locals. #12 0x00000000006b6c96 in ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1717 timeout = {tv_sec = 59, tv_usec = 981940} rmask = {fds_bits = {120, 0 <repeats 15 times>}} selres = <value optimized out> now = <value optimized out> readmask = {fds_bits = {120, 0 <repeats 15 times>}} nSockets = 7 last_lockfile_recheck_time = 1508210280 last_touch_time = 1508209622 #13 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1361 opt = <value optimized out> status = <value optimized out> userDoption = <value optimized out> listen_addr_saved = <value optimized out> i = <value optimized out> output_config_variable = <value optimized out> __func__ = "PostmasterMain" #14 0x0000000000638450 in main (argc=3, argv=0x13cd3a0) at main.c:228 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers