2018-03-08 19:16 GMT+01:00 Blair Boadway <bboad...@abebooks.com>: > Hi Pavel, > > > > I don’t have a core yet, the only way I have now is to intentionally crash > the prod system a couple of times. Haven’t resorted to that yet. >
hard to help without backtrace - and then you need core dump > > > Interesting you mentioned pgaudit—it is installed on this system because > that is a our standard installation but on this particular system we > haven’t yet needed audits so the audit role is ‘empty’. (And on a > different system with same installation and heavy of audit we’ve seen no > segfaults) > > > other extensions are simply or without relation to DDL or well known. So pgaudit is best candidate - but the error can be anywhere Regards Pavel > On this system > > > > pgaudit.role = 'auditor' > > pgaudit.log_parameter = off > > pgaudit.log_catalog = off > > pgaudit.log_statement_once = on > > pgaudit.log_level = log > > > > > > select * from information_schema.role_table_grants where grantee = > 'auditor'; > > (0 rows) > > > > > > thanks, Blair > > > > *From: *Pavel Stehule <pavel.steh...@gmail.com> > *Date: *Thursday, March 8, 2018 at 9:49 AM > *To: *Blair Boadway <bboad...@abebooks.com> > *Cc: *"pgsql-gene...@postgresql.org" <pgsql-gene...@postgresql.org> > *Subject: *Re: Troubleshooting a segfault and instance crash > > > > Hi > > > > 2018-03-08 18:40 GMT+01:00 Blair Boadway <bboad...@abebooks.com>: > > Hello, > > > > We’re seeing an occasional segfault on a particular database > > > > Mar 7 14:46:35 pgprod2 kernel:postgres[29351]: segfault at 0 ip > 000000302f32868a sp 00007ffcf1547498 error 4 in libc-2.12.so[302f200000+ > 18a000] > > Mar 7 14:46:35 pgprod2 POSTGRES[21262]: [5] user=,db=,app=client= LOG: > server process (PID 29351) was terminated by signal 11: Segmentation fault > > > > It crashes the database, though it starts again on its own without any > apparent issues. This has happened 3 times in 2 months and each time the > segfault error and memory address is the same. We’ve only seen it on one > database, though we’ve seen it on both hosts of primary/standby setup—we > switched over primary to other host and got a segfault there, which seems > to eliminate a hardware issue. Oddly the database has no issues for normal > DML workloads (it is a moderately busy prod oltp system) but the segfault > has happened very shortly after DML changes are made. Most recently it > happened while running a series of grants for new db users we were > deploying (ie. running a sql script from psql on the primary host) > > > > grant usage on schema app to app_user1; > > grant usage on schema app to app_user2; > > ... > > > > Our set up is > > RHEL 6.9 - 2.6.32-696.16.1.el6.x86_64 > > PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 > 20120313 (Red Hat 4.4.7-18), 64-bit > > Extensions - pg_cron,repmgr_funcs,pgaudit,pg_stat_statements,pg_hint_ > plan,pglogical > > > > So far can’t reproduce on a test system, have just added some OS config to > collect core from the OS but haven’t collected a core yet. There isn’t any > particular config change or extension that we can link to the problem, this > is a system that has run for months without problems since last config > changes. Appreciate any ideas. > > > > can you get core dump? It can be pgaudit bug maybe? It is complex > extension. > > Regards > > > > Pavel > > > > Regards, > > Blair > > >