On 12-Aug-19 11:03 AM, David Marchand wrote:
On Wed, Jul 24, 2019 at 6:08 PM Anatoly Burakov
<anatoly.bura...@intel.com> wrote:
Currently, primary process holds an exclusive lock on the config
file, thereby preventing other primaries from spinning up. However,
when the primary dies, the lock is no longer being held, even though
there might be other secondary processes still running.
The fix is two-fold. First of all, downgrade the primary process's
exclusive lock to a shared lock once we have it. Second of all,
also take out shared locks on the config from the secondaries. We
are using fcntl() locks, which get dropped when the file handle is
closed, so also remove the closure of config file handle.
Fixes: af75078fece3 ("first public release")
Cc: sta...@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com>
---
Hi David,
I've been investigating how to improve this patch, and i've hit a dead end.
The problems here are two-fold. Using fcntl() and flock() locks together
is not advisable, so both primary-secondary detection and
rte_eal_primary_proc_alive() (as per Harry's point) have to use the same
method for checking locks.
Using flock() would work for this purpose. Unfortunately, on FreeBSD,
converting exclusive lock into a shared lock involves unlocking first
[1] (creating a race). On Linux it doesn't specifically say that it does
that, but it does mention that it is not guaranteed to be atomic [2].
So, we can't use flock() here.
It seems that fcntl() lock conversions are atomic, however fcntl() locks
on both Linux and FreeBSD are implemented in a very stupid way in that
/any/ closure of a file descriptor drops /all/ locks on that file.
Meaning, the moment secondary does the check in primary_proc_alive() and
closes the config file fd, the process-wide lock drops. Mind you,
primary_proc_alive() is implemented using lockf() rather than fcntl(),
which is an issue in itself, but on Linux, lockf() is implemented on top
of fcntl() locks and thus suffers from the same issue.
So, unless you have better ideas, i think this patch can be marked as
rejected.
[1] https://www.freebsd.org/cgi/man.cgi?query=flock&sektion=2
[2] https://linux.die.net/man/2/flock
--
Thanks,
Anatoly