On Mon, Apr 29, 2013 at 10:04 AM, Ben Pfaff <b...@nicira.com> wrote: > Thank you for figuring this out! It is a subtle problem. More > commentary below. > > On Mon, Apr 29, 2013 at 08:16:36AM -0700, Gurucharan Shetty wrote: > > Currently we are creating the worker after creation of the pidfile. > > This means two things for ovs-vswitchd. One, the pidfile's file > descriptor > > is open in both main process and the worker process and closing of the > file > > descriptor in either of the process means we will loose the lock on the > > pidfile. > > I don't think that that is the case, because file locks are not > inherited across fork. (See fcntl(2) and > http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html.) I > believe that it is only the main process closing the lockfile that will > cause us to lose the lock. > I see. I will clean up the commit message.
> > > Two, the responsibility of deleting the pidfile after process > > termination rests with the worker process. > > I see that it does, but that was unintentional and I think it is a > mistake. I think that the main process should delete the pidfile, > because the main process has the actual responsibility for everything > that ovs-vswitchd does. The worker process only does what it is told to > do. > > > When we restart openvswitch using the startup scripts, we SIGTERM the > main > > process and once it is cleaned up, we start ovs-vswitchd again. This > results > > in a race condition. The new ovs-vswitchd will create a pidfile because > it is > > unlocked. But, if the old worker process exits after the start of new > > ovs-vswitchd, it will simply delete the pidfile underneath the new > ovs-vswitchd. > > This will eventually result in multiple ovs-vswitchd daemons. > > > > This patch moves make_pidfile to daemonize_complete. This way, we create > the > > pidfile after the creation of worker process. This means the > responsibility of > > deleting it lies with the main process and also as a side-effect, only > one > > process owns the file descriptor of the pidfile. > > > > Bug #16669. > > Signed-off-by: Gurucharan Shetty <gshe...@nicira.com> > > I think that it is best to try to create the pidfile as early as we can. > If there is a conflicting pidfile, then it is best to recognize this > early on and avoid doing other initialization work. > > After looking at our code to handle forking and its aftermath, though, > it seems like fixing this code would be more complicated and thus higher > risk than just moving the make_pidfile() call. Do we need to backport > this fix to branch-1.10? If so, then I think that the change you have > here is the appropriate one. We can do better later, probably reverting > this change as part of the improvements. > > You are correct in that it is not a good idea to make_pidfile so late. I feel it may raise another set of issues. I just realized that my code introduces a undesirable behavior. Now, even with a valid pidfile, if someone manually starts a second ovs-vswitchd (not with startup script) , it will not terminate because it checks for the database lock before it checks for pidfile. For the short-term, does it make sense to do something like this? (I will send a refined patch later) diff --git a/lib/daemon.c b/lib/daemon.c index e7ee56c..6e2cb90 100644 --- a/lib/daemon.c +++ b/lib/daemon.c @@ -163,6 +163,22 @@ daemon_save_fd(int fd) save_fds[fd] = true; } +void +remove_pidfile_from_unlink(void) +{ + if (pidfile) { + fatal_signal_remove_file_to_unlink(pidfile); + } +} + +void +add_pidfile_to_unlink(void) +{ + if (pidfile) { + fatal_signal_add_file_to_unlink(pidfile); + } +} + /* If a pidfile has been configured, creates it and stores the running * process's pid in it. Ensures that the pidfile will be deleted when the * process exits. */ @@ -240,8 +256,6 @@ make_pidfile(void) pidfile_dev = s.st_dev; pidfile_ino = s.st_ino; free(tmpfile); - free(pidfile); - pidfile = NULL; } /* If configured with set_pidfile() or set_detach(), creates the pid file and @@ -529,6 +543,11 @@ daemonize_start(void) void daemonize_complete(void) { + if (pidfile) { + free(pidfile); + pidfile = NULL; + } + if (!detached) { detached = true; diff --git a/lib/daemon.h b/lib/daemon.h index 8cbcfaf..14436f3 100644 --- a/lib/daemon.h +++ b/lib/daemon.h @@ -65,6 +65,8 @@ void set_detach(void); bool get_detach(void); void daemon_set_monitor(void); void daemon_save_fd(int fd); +void remove_pidfile_from_unlink(void); +void add_pidfile_to_unlink(void); void daemonize(void); void daemonize_start(void); void daemonize_complete(void); diff --git a/lib/worker.c b/lib/worker.c index ce4a53b..897f618 100644 --- a/lib/worker.c +++ b/lib/worker.c @@ -101,6 +101,7 @@ worker_start(void) xset_nonblocking(work_fds[0]); xset_nonblocking(work_fds[1]); + remove_pidfile_from_unlink(); if (!fork_and_clean_up()) { /* In child (worker) process. */ daemonize_post_detach(); @@ -110,6 +111,7 @@ worker_start(void) } /* In parent (main) process. */ + add_pidfile_to_unlink(); close(work_fds[1]); client_sock = work_fds[0]; rxbuf_init(&client_rx); Thanks, Guru > Thanks, > > Ben. >
_______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev