On 30.09.20 13:21, Fabian Grünbichler wrote: > cpg_mcast_joined (and transitively, cpg_join/leave) are not thread-safe. > pmxcfs triggers such operations via FUSE and CPG dispatch callbacks, > which are running in concurrent threads. > > accordingly, we need to protect these operations with a mutex, otherwise > they might return CS_OK without actually doing what they were supposed > to do (which in turn can lead to the dfsm taking a wrong turn and > getting stuck in a supposedly short-lived state, blocking access via > FUSE and getting whole clusters fenced). > > huge thanks to Alexandre Derumier for providing the initial bug report > and quite a lot of test runs while debugging this issue. > > Signed-off-by: Fabian Grünbichler <f.gruenbich...@proxmox.com> > --- > > Notes: > we could recycle sync_mutex, but that makes it harder to reason > about securing all code paths. it also protects non CPG operations > as part of the sync messsage queue handling, so mixing those up is > non-ideal. > > @Alexandre: this is a slightly different approach compared to the test > build from yesterday, so if you want to test this as well it would > be very welcome :) > > data/src/dfsm.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > >
applied, much thanks to all involved! _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel