On Mon, Dec 9, 2019 at 10:42 AM Robert Haas <robertmh...@gmail.com> wrote: > > Hm. In my mental model it would be useful for barrier "processors" to > > not acknowledge the state change at certain points. Imagine e.g. needing > > to efficiently wait till all backends have processed a config file > > reload - since we don't reload while a query is being processed, we > > should be able to not ack the barrier at that point. > > Yeah, you mentioned this before, but I think we could leave it for > later. I think it's going to be complex. I suppose the way to do it > would be to add an argument to ProcessProcSignalBarrier() that lets > you specify which barrier events you're OK with processing from the > current point in the code. However, that will mean that whenever > somebody adds a new barrier type, they have got to go through all of > those callers and think about whether they should add their new > barrier type into the flags or not. If we try to do the specific thing > you're talking about with config-file processing, it will also mean > that we could be waiting MUCH longer for acknowledgements, which I > think might have pretty far-reaching ramifications. You might get > stuck waiting for that barrier to be absorbed for hours, and that > would also impact later barriers, since you won't see the generation > bump to 42 until it goes through 40 and 41. That sounds fairly awful. > You could try to work around it by looking at which barrier flags are > set, but I think that has ABA problems.
I might have rejected this idea too hastily. At any rate, I have a new view on it having studied the issue a bit more. In the case of ALTER SYSTEM READ ONLY, the big problem is that we can't afford to find out that we're unable to emit WAL after we've already entered a critical section, because "ERROR: system is read only" or similar will get promoted to PANIC due to the critical section and that will be sad. (Before someone asks, not promoting the ERROR to PANIC would be unsafe, even for this specific case.) Some WAL records don't use a critical section, though it's recently been proposed[1] that we add a critical section in at least some of the cases that currently lack one. For those, I think we can just upgrade the existing test-and-elog cases in XLogInsert() and XLogBeginInsert() to ereport() and call it good. But the cases that do have a critical section are harder. Currently, I see only the following two options: (1) Check just before entering a critical section whether or not the system is read only, and if so, error out. Once we've entered a critical section, ProcessInterrupts() will not do anything, so at that point we're safe. This could be done either by making START_CRIT_SECTION() itself do it or by finding every critical section that is used for inserting WAL and adding a call just beforehand. The latter is probably better, as some critical sections appear to be used for other purposes; see PGSTAT_BEGIN_WRITE_ACTIVITY in particular. But it means changing things in a bunch of places, adding an if-test. That wouldn't add *much* overhead, but it's not quite zero. (2) Accept barrier events for read-only/read-write changes only at command boundaries. In that case, we only need to check for a read only state around the places where we currently do PreventCommandIfReadOnly and similar, and the additional overhead should be basically nil. However, this seems pretty unappealing to me, because it means that if you try to make the system READ ONLY, it might run for hours before actually going read only. If you're trying to make the system read only because the master has become isolated from the rest of your network, that's not a fast enough response. In general, I think the problem here is to avoid TOCTTOU problems. I think the checksums patch has this problem too. For instance, the changes to basebackup.c in that function check DataChecksumsNeedVerify() only once per file, but what is to say that a call to ProcessInterrupts() couldn't happen in the middle of sending a file, changing prevailing value? If someone sets checksums off, and everyone acknowledges that they're off, then backends may begin writing blocks without setting checksums, and then this code will potentially try to verify checksums in a block written without them. That patch also modifies the definition of XLogHintBitIsNeeded(), so e.g. log_heap_visible is wrong if a CHECK_FOR_INTERRUPTS() can happen in XLogRegisterBuffer(). I don't think it can currently, but it seems like a heck of a fragile assumption, and I don't see how we can be sure that there's no test for XLogHintBitIsNeeded() that does CHECK_FOR_INTERRUPTS() between where it's tested and where it does something critical with the result of that test. At the moment, the ALTER SYSTEM READ ONLY problem appears to me to be the lesser of the two (although I may be wrong). Assuming people are OK with inserting an additional check before each xlog-writing critical section, we can just go do that and then I think the problem is pretty much solved. The only way there would be a residual problem, I think, is if something could change between the time XLogInsert() checks whether xlog writing is allowed and the time when it does the insert. Even if that's an issue, it can be fixed with a HOLD_INTERRUPTS/RESUME_INTERRUPTS in just that function. The meaning of READ ONLY is just that xlog insertion is prohibited, so the only use of the value is for allowing XLogInsert() to go ahead or not. But for checksums, there's not such a clear definition of what it means to use the value, nor does it get used in only one place, so the situation seems less clear. In short, I am not still not sure that we need the facility Andres was asking for here, but I do now understand better why Andres was worried about having this capability, and I do think that patches using it are going to need to be very carefully studied for TOCTTOU problems. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] http://postgr.es/m/20191207001232.klidxnm756wqx...@alap3.anarazel.de