https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to AlphaOmegaProgrammer from comment #0)
> The problem with the current implementation is that it relies on a hash of
> the lower 12 bits of the pointer address to index an array of mutexes, and
> because the index can wrap around the array, 2 threads trying to perform an
> atomic operation on data requiring more than half of the locks in the array
> will have potential overlap in the range of locks they attempt to acquire.

This can only occur for atomic operations on huge objects though, right? They
should work without deadlock, but it's not a very important use case.

> Because of the wrap around, the competing threads may start locking threads
> at the end of the other thread's range, which causes a deadlock since
> neither thread would be able to acquire all of the locks it needs.
> 
> Any locking scheme that attempts to be performant by using multiple mutexes
> can not be threadsafe because of race conditions,

That's not true, std::lock() uses an algorithm that works for that case.

I don't know why libatomic locks multiple mutexes for a single object, but it
seems that the deadlock could be avoided by always locking the mutexes in the
same order, i.e. start with the lowest address in the array, rather than
starting with higher addresses and then wrapping around to a lower address.

There are more efficient algorithms, as used in the std::lock in libstdc++. See
https://howardhinnant.github.io/dining_philosophers.html for details on the
problem.

Using a single global mutex for all atomics scales very poorly. For the vast
majority of atomic operations, only a single mutex is locked and no deadlock
can occur. Pessimizing those operations by serializing them all because of a
bug when using atomics on huge objects is not going to be acceptable.

Reply via email to