Andres Freund <and...@2ndquadrant.com> writes: >> How would postmaster know when to restart a worker that stopped? > > I had imagined we would assign some return codes special > meaning. Currently 0 basically means "restart immediately", 1 means > "crashed, wait for some time", everything else results in a postmaster > restart. It seems we can just assign returncode 2 as "done", probably > with some enum or such hiding the numbers.
In Erlang, the lib that cares about such things in called OTP, and that proposes a model of supervisor that knows when to restart a worker. The specs for the restart behaviour are: Restart = permanent | transient | temporary Restart defines when a terminated child process should be restarted. - A permanent child process is always restarted. - A temporary child process is never restarted (not even when the supervisor's restart strategy is rest_for_one or one_for_all and a sibling's death causes the temporary process to be terminated). - A transient child process is restarted only if it terminates abnormally, i.e. with another exit reason than normal, shutdown or {shutdown,Term}. Then about restart frequency, what they have is: The supervisors have a built-in mechanism to limit the number of restarts which can occur in a given time interval. This is determined by the values of the two parameters MaxR and MaxT in the start specification returned by the callback function [ ... ] If more than MaxR number of restarts occur in the last MaxT seconds, then the supervisor terminates all the child processes and then itself. You can read the whole thing here: http://www.erlang.org/doc/design_principles/sup_princ.html#id71215 I think we should get some inspiration from them here. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers