Moving to -hackers, hopefully it doesn't confuse the list scripts too much.
On Mon, Feb 04, 2019 at 08:52:17AM +0100, Jakub Glapa wrote: > I see the error showing up every night on 2 different servers. But it's a > bit of a heisenbug because If I go there now it won't be reproducible. Do you have query logging enabled ? If not, could you consider it on at least one of those servers ? I'm interested to know what ELSE is running at the time that query failed. Perhaps you could enable query logging JUST for the interval of time that the server usually errors ? The CSV logs can be imported to postgres for analysis. You might do something like SELECT left(message,99),COUNT(1),max(session_id) FROM postgres_log WHERE log_time BETWEEN .. AND .. GROUP BY 1 ORDER BY 2; And just maybe there'd be a query there that only runs once per day which would allow reproducing the error at will. Or utility command like vacuum.. I think ideally you'd set: log_statement = all log_min_messages = info log_destination = 'stderr,csvlog' # stderr isn't important for this purpose, but I keep it set to capture crash messages, too You should set these to something that works well at your site: log_rotation_age = '2min' log_rotation_size = '32MB' I would normally set these, and I don't see any reason why you wouldn't set them too: log_checkpoints = on log_lock_waits = on log_temp_files = on log_min_error_statement = notice log_temp_files = 0 log_min_duration_statement = '9sec' log_autovacuum_min_duration = '999sec' And I would set these too but maybe you'd prefer to do something else: log_directory = /var/log/postgresql log_file_mode = 0640 log_filename = postgresql-%Y-%m-%d_%H%M%S.log Justin