On Wed, Jul 22, 2020 at 6:03 PM Soumyadeep Chakraborty <soumyadeep2...@gmail.com> wrote: > So if we are not going to address those cases, we should change the > syntax and remove the notion of read-only. It could be: > > ALTER SYSTEM SET wal_writes TO off|on; > or > ALTER SYSTEM SET prohibit_wal TO off|on;
This doesn't really work because of the considerations mentioned in http://postgr.es/m/ca+tgmoakctzozr0xeqalfimbcje2rgcbazf4eybpxjtnetp...@mail.gmail.com > If we are going to try to make it truly read-only, and cater to the > other use cases, we have to: > > Perform a checkpoint before declaring the system read-only (i.e. before > the command returns). This may be expensive of course, as Andres has > pointed out in this thread, but it is a price that has to be paid. If we > do this checkpoint, then we can avoid an additional shutdown checkpoint > and an end-of-recovery checkpoint (if we restart the primary after a > crash while in read-only mode). Also, we would have to prevent any > operation that touches control files, which I am not sure we do today in > the current patch. It's basically impossible to create a system for fast failover that involves a checkpoint. See my comments at http://postgr.es/m/ca+tgmoye8ucgtyfgfnv3vwpztygsdksu2f4mniqhkar_ukb...@mail.gmail.com - you can't achieve five nines or even four nines of availability if you have to wait for a checkpoint that might take twenty minutes. I have nothing against a feature that does what you're describing, but this feature is designed to make fast failover easier to accomplish, and it's not going to succeed if it involves a checkpoint. > Why not have the best of both worlds? Consider: > > ALTER SYSTEM SET read_only to {off, on, wal}; > > -- on: wal writes off + no writes to disk > -- off: default > -- wal: only wal writes off > > Of course, there can probably be better syntax for the above. There are a few things you can can imagine doing here: 1. Freeze WAL writes but allow dirty buffers to be flushed afterward. This is the most useful thing for fast failover, I would argue, because it's quick and the fact that some dirty buffers may not be written doesn't matter. 2. Freeze WAL writes except a final checkpoint which will flush dirty buffers along the way. This is like shutting the system down cleanly and bringing it back up as a standby, except without performing a shutdown. 3. Freeze WAL writes and write out all dirty buffers without actually checkpointing. This is sort of a hybrid of #1 and #2. It's probably not much faster than #2 but it avoids generating any more WAL. 4. Freeze WAL writes and just keep all the dirty buffers cached, without writing them out. This seems like a bad idea for the reasons mentioned in Amul's reply. The system might not be able to respond even to read-only queries any more if shared_buffers is full of unevictable dirty buffers. Either #2 or #3 is sufficient to take a filesystem level snapshot of the cluster while it's running, but I'm not sure why that's interesting. You can already do that sort of thing by using pg_basebackup or by running pg_start_backup() and pg_stop_backup() and copying the directory in the middle, and you can do all of that while the cluster is accepting writes, which seems like it will usually be more convenient. If you do want this, you have several options, like running a checkpoint immediately followed by ALTER SYSTEM READ ONLY (so that the amount of WAL generated during the backup is small but maybe not none); or shutting down the system cleanly and restarting it as a standby; or maybe using the proposed pg_ctl demote feature mentioned on a separate thread. Contrary to what you write, I don't think either #2 or #3 is sufficient to enable checksums, at least not without some more engineering, because the server would cache the state from the control file, and a bunch of blocks from the database. I guess it would work if you did a server restart afterward, but I think there are better ways of supporting online checksum enabling that don't require shutting down the server, or even making it read-only; and there's been significant work done on those already. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company