[HACKERS] SIGSEGV in BRIN autosummarize

Justin Pryzby Fri, 13 Oct 2017 20:58:58 -0700

I upgraded one of our customers to PG10 Tuesday night, and Wednesday replaced
an BTREE index with BRIN index (WITH autosummarize).


Today I see:
< 2017-10-13 17:22:47.839 -04  >LOG:  server process (PID 32127) was terminated 
by signal 11: Segmentation fault
< 2017-10-13 17:22:47.839 -04  >DETAIL:  Failed process was running: 
autovacuum: BRIN summarize public.gtt 747263

postmaster[32127] general protection ip:4bd467 sp:7ffd9b349990 error:0 in 
postgres[400000+692000]

[pryzbyj@database ~]$ rpm -qa postgresql10
postgresql10-10.0-1PGDG.rhel6.x86_64

Oct 13 17:22:45 database kernel: postmaster[32127] general protection ip:4bd467 
sp:7ffd9b349990 error:0 in postgres[400000+692000]
Oct 13 17:22:47 database abrtd: Directory 'ccpp-2017-10-13-17:22:47-32127' 
creation detected
Oct 13 17:22:47 database abrt[32387]: Saved core dump of pid 32127 
(/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-13-17:22:47-32127 
(15040512 bytes)

..unfortunately:
Oct 13 17:22:47 database abrtd: Package 'postgresql10-server' isn't signed with 
proper key
Oct 13 17:22:47 database abrtd: 'post-create' on 
'/var/spool/abrt/ccpp-2017-10-13-17:22:47-32127' exited with 1
Oct 13 17:22:47 database abrtd: DELETING PROBLEM DIRECTORY 
'/var/spool/abrt/ccpp-2017-10-13-17:22:47-32127'

postgres=# SELECT * FROM bak_postgres_log_2017_10_13_1700 WHERE pid=32127 ORDER 
BY log_time DESC LIMIT 9;
-[ RECORD 1 
]----------+---------------------------------------------------------------------------------------------------------
log_time               | 2017-10-13 17:22:45.56-04
pid                    | 32127
session_id             | 59e12e67.7d7f
session_line           | 2
command_tag            | 
session_start_time     | 2017-10-13 17:21:43-04
error_severity         | ERROR
sql_state_code         | 57014
message                | canceling autovacuum task
context                | processing work entry for relation 
"gtt.public.cdrs_eric_egsnpdprecord_2017_10_13_recordopeningtime_idx"
-[ RECORD 2 
]----------+---------------------------------------------------------------------------------------------------------
log_time               | 2017-10-13 17:22:44.557-04
pid                    | 32127
session_id             | 59e12e67.7d7f
session_line           | 1
session_start_time     | 2017-10-13 17:21:43-04
error_severity         | ERROR
sql_state_code         | 57014
message                | canceling autovacuum task
context                | automatic analyze of table 
"gtt.public.cdrs_huawei_sgsnpdprecord_2017_10_13"

Time: 375.552 ms

It looks like this table was being inserted into simultaneously by a python
program using multiprocessing.  It looks like each subprocess was INSERTing
into several tables, each of which has one BRIN index on timestamp column.

gtt=# \dt+ cdrs_eric_egsnpdprecord_2017_10_13
 public | cdrs_eric_egsnpdprecord_2017_10_13 | table | gtt   | 5841 MB | 

gtt=# \di+ cdrs_eric_egsnpdprecord_2017_10_13_recordopeningtime_idx 
 public | cdrs_eric_egsnpdprecord_2017_10_13_recordopeningtime_idx | index | 
gtt   | cdrs_eric_egsnpdprecord_2017_10_13 | 136 kB | 

I don't have any reason to believe there's memory issue on the server, So I
suppose this is just a "heads up" to early adopters until/in case it happens
again and I can at least provide a stack trace.

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] SIGSEGV in BRIN autosummarize

Reply via email to