On Thu, Oct 24, 2019 at 6:07 PM Burakov, Anatoly <anatoly.bura...@intel.com> wrote: > > On 12-Aug-19 11:03 AM, David Marchand wrote: > > On Wed, Jul 24, 2019 at 6:08 PM Anatoly Burakov > > <anatoly.bura...@intel.com> wrote: > >> > >> Currently, primary process holds an exclusive lock on the config > >> file, thereby preventing other primaries from spinning up. However, > >> when the primary dies, the lock is no longer being held, even though > >> there might be other secondary processes still running. > >> > >> The fix is two-fold. First of all, downgrade the primary process's > >> exclusive lock to a shared lock once we have it. Second of all, > >> also take out shared locks on the config from the secondaries. We > >> are using fcntl() locks, which get dropped when the file handle is > >> closed, so also remove the closure of config file handle. > >> > >> Fixes: af75078fece3 ("first public release") > >> Cc: sta...@dpdk.org > >> > >> Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com> > >> --- > >> > > Hi David, > > I've been investigating how to improve this patch, and i've hit a dead end. > > The problems here are two-fold. Using fcntl() and flock() locks together > is not advisable, so both primary-secondary detection and > rte_eal_primary_proc_alive() (as per Harry's point) have to use the same > method for checking locks. > > Using flock() would work for this purpose. Unfortunately, on FreeBSD, > converting exclusive lock into a shared lock involves unlocking first > [1] (creating a race). On Linux it doesn't specifically say that it does > that, but it does mention that it is not guaranteed to be atomic [2]. > So, we can't use flock() here. > > It seems that fcntl() lock conversions are atomic, however fcntl() locks > on both Linux and FreeBSD are implemented in a very stupid way in that > /any/ closure of a file descriptor drops /all/ locks on that file. > Meaning, the moment secondary does the check in primary_proc_alive() and > closes the config file fd, the process-wide lock drops. Mind you, > primary_proc_alive() is implemented using lockf() rather than fcntl(), > which is an issue in itself, but on Linux, lockf() is implemented on top > of fcntl() locks and thus suffers from the same issue. > > So, unless you have better ideas, i think this patch can be marked as > rejected. > > [1] https://www.freebsd.org/cgi/man.cgi?query=flock&sektion=2 > [2] https://linux.die.net/man/2/flock
Sorry, hard to digest, I would need to look at this again later. If you have no easy solution, let's revisit after 19.11. Thanks Anatoly. -- David Marchand