On 10/18/2011 06:25 PM, Andrew MacLeod wrote:
Its impossible to implement a weak compare and swap unless you return
both parameters in one operation.
the compare_exchange matches the atomic interface for c++, and if we
can't resolve it with a lock free instruction sequence, we have to
leave an external call with this format to a library, so I that why I
provide this built-in.
Neither rth nor I like the addressable parameter, so thats why I left
the rtl pattern for weak and strong compare and swap without that
addressable argument, and let this builtin generate wrapper code
around it.
You could provide an __atomic version of the bool and val routines
with memory model.... we toyed with making the compare_and_swap return
both values so we could implement a weak version, but getting 2 return
values is not pretty. It could be done with 2 separate built-ins that
relate to each other, but thats not great either.
I've thought about various longer term schemes, but havent been able to
settle on one I really like. Ideally, if we know we can generate lock
free instructions, we expose the wrapper code to the tree optimizers.
I've considered:
1) adding tree support for a CAS primitive which has 2 results.. that
pretty invasive but has nice features.
2) a 2 part built-in.. one which returns a value, and a second one which
takes that value and then returns the boolean: ie
val = __atomic_compare_and_swap (&mem, expected, desired, model)
if (__atomic_compare_and_swap_success (val))
...
and during expansion from SSA to RTL, you look for uses of the result of
__atomic_compare_and_swap in __atomic_compare_and_swap_success, and you
can decide what RTL pattern to use and 'merge' the 2 builtins into one
pattern. You can optimize the RTL pattern used based on what, if any,
other uses there are of the 2 results. I think this should work OK...
3) Waiting for a flash of brilliance (may never come) or <insert your
suggestion> :-)
I decided for the moment to punt on exploring those and give us more
time to think about the best way to do it. We do need to be able to
call this specific interface for external library calls, but we are not
locked into this for inline expansion of lock free instructions. In
c-common.c where we turn __atomic_exchange_compare into
__atomic_compare_exchange_{1,2,4,8,16}, we can instead turn it into a
code sequence using a new __atomic_compare_and_swap builtin or tree
code. The wrapper code that is currently emitted as RTL could then be
emitted as tree expressions before the SSA optimizers see anything. So
no later than the next release of GCC, I would expect to have a fully
flushed out solution that gives us all the nice inlines and removes
addr-taken flags and such. I just don't feel like there is time at
the moment to make the correct decision while I'm trying to get the
library ABI right.
If I find some time, I may experiment with #2 next week.
Andrew