On Tue, Sep 24, 2019 at 10:41 AM Michael Paquier <mich...@paquier.xyz> wrote: > > On Mon, Sep 23, 2019 at 01:45:14PM +0200, Tomas Vondra wrote: > > On Mon, Sep 23, 2019 at 03:48:50PM +0800, Thunder wrote: > >> Is this an issue? > >> Can we fix like this? > >> Thanks! > >> > > > > I do think it is a valid issue. No opinion on the fix yet, though. > > The report was sent on saturday, so patience ;-) > > And for some others it was even a longer weekend. Anyway, the problem > can be reproduced if you apply the attached which introduces a failure > point, and then if you run the following commands: > create table aa as select 1; > delete from aa; > \! touch /tmp/truncate_flag > vacuum aa; > \! rm /tmp/truncate_flag > vacuum aa; -- panic on standby > > This also points out that there are other things to worry about than > interruptions, as for example DropRelFileNodeLocalBuffers() could lead > to an ERROR, and this happens before the physical truncation is done > but after the WAL record is replayed on the standby, so any failures > happening at the truncation phase before the work is done would be a > problem. However we are talking about failures which should not > happen and these are elog() calls. It would be tempting to add a > critical section here, but we could still have problems if we have a > failure after the WAL record has been flushed, which means that it > would be replayed on the standby, and the surrounding comments are > clear about that.
Could you elaborate what problem adding a critical section there occurs? Regards, -- Fujii Masao