On Wed, Oct 31, 2018 at 8:43 PM Michael Paquier <mich...@paquier.xyz> wrote: > Okay, but likely we would not want to signal the postmaster > unnecessarily, no? FALLBACK_PROMOTE_SIGNAL_FILE gets discarded if > promotion is triggered more than once, but that does not like a sane > thing to do if not necessary.
Uh, what in the world does that have to do with the topic at hand? Parallel-safety has nothing to do with preventing functions from being called "unnecessarily" or "more than once". I don't like the fact that Tom keeps arguing for marking things parallel-unsafe without any real justification. During the development of parallel query, he argued that no such concept should exist, because everything should be work in parallel mode just like it does otherwise. I eventually convinced him (or so it seemed) that we had to have the concept because it was impractical to eliminate all the corner cases where things were parallel-unsafe in the short/medium term. However, in the long term, the fact that we have parallel-restricted and parallel-unsafe is an implementation restriction that we should be trying to eliminate. Making the list of parallel-unsafe things longer makes that goal harder to achieve, especially when the marking is applied not out of any real justification based on what the code does, as was my intent, but based on an ill-defined feeling of unease. Eventually, after some future round of infrastructure improvements, we're going to end up having discussions about changing things that are marked parallel-unsafe or parallel-restricted to parallel-safe, and when somebody can't find any reason why the existing markings are like that in the first place, they're going to use that as justification for not changing them ("there must've been a reason!"). Parallel-safety shouldn't be some kind of quasi-religious thing where markings bleed from a function that is properly marked to one with a similar name, or irrational values are applied in the name of preventing unspecified or only-vaguely-connected evils. It should be based on things that can be determined scientifically, like "hey, does this code access any global variables other than those which are automatically synchronized between the master and the workers?" and "hey, does this code ever attempt heap_insert/heap_update/heap_delete?" and "hey, does this ever call other user-defined code that might itself be parallel-unsafe?". The right way to figure out the appropriate parallel-safety marking for a function is to read the code and see what it does. Now you might miss something, as with any code-reading exercise, but if you don't, then you should come up with the right answer. In the case of pg_promote, it calls out to the following functions: RecoveryInProgress - checks shared memory state. equally available to a worker as to the leader. AllocateFile/FreeFile - no problem here; these work fine in workers just like any other backend. kill, unlink - same thing ResetLatch, WaitLatch, CHECK_FOR_INTERRUPTS() - all fine in a worker; in fact, used as part of the worker machinery ereport - works anywhere, worker machinery propagates errors back to leader That's all. There are no obvious references to global variables or anything here. So, it looks to be parallel-safe. If you do the same analysis for pg_start_backup(), you'll immediately notice that it calls get_backup_status(), and if you look at that function, you'll see that it returns the value of a global variable. If you check parallel.c, you'll find that that global variable is not one of the special ones that gets synchronized between a leader and the workers. Therefore, if you ran this function in a worker, it might do the wrong thing, because it might get the wrong value of that global variable. So it's at least (and in fact exactly) parallel-restricted. It doesn't need to be parallel-restricted because it's scary in some ill-defined way or any of these other reasons advanced on this thread -- it needs to be parallel-restricted because it *relies on a global variable that is not synchronized to parallel workers*. If somebody writes code to move that state out of a global variables and into the main shared memory segment or to a DSM shared between the leader and the workers, then it can be marked parallel-safe (unless, for example, it also depends on OTHER global variables that are NOT moved into shared storage). Otherwise not. I hope I'm making sense here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company