http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59305

--- Comment #19 from Iain Sandoe <iains at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #18)
> sparc-sun-solaris2.10 is a primary arch, making P1 for now.  As sparc
> implements
> the hook Joseph mentions is this merely a testsuite issue (sparc being
> "slow")?

In Darwin's case, I don't believe it is (simply) a test-suite issue;

Rather it is connected with the implementation of pthread-based locking in
libatomic when entities larger than those natively-supported are used.

So, for example, if libatomic is configured to use a machine supporting
cmpxchg16b, then test-time drops from 50mins -> 1min (c.f. configuring without
cmpxchg16b).

Probing the stalled cases, shows that things are stuck in mutex code.

I started looking at the (default) posix implementation of the locking in
libatomic (partly to see if there was a more BSD-esque way to do it).  However,
I'm out of time for the next couple of weeks.

Two things (in the posix libatomic implementation) that might bear more
examination:

1) adjacent entities that happen to fall within one cache line size (which
would apply to two 32byte numbers stored consecutively, for x86) get the same
hash ID.  I wonder if that can introduce a vulnerability.

2) If the alignment of an entity is < its size, AFAICT the entity could span
two hash IDs without this being detected [the evaluation is carried out modulo
size without considering alignment].

===

On darwin it's possible to resolve the issue by replacing the
pthread_mutex_lock()s with
while ((err = pthead_mutex_trylock(…)) != 0)
 if (err == …) abort();

.. which might indicate an underlying issue with the implementation of pthreads
(or it might simply modify the behaviour enough to cause some other
vulnerability to be hidden).

--

I don't know if the same approach (spinning on try lock) would resolve the
issue on Solaris, or (particularly) how to interpret the findings yet.

Reply via email to