On 10/18/2011 06:25 PM, Andrew MacLeod wrote:

Its impossible to implement a weak compare and swap unless you return both parameters in one operation.

the compare_exchange matches the atomic interface for c++, and if we can't resolve it with a lock free instruction sequence, we have to leave an external call with this format to a library, so I that why I provide this built-in.

Neither rth nor I like the addressable parameter, so thats why I left the rtl pattern for weak and strong compare and swap without that addressable argument, and let this builtin generate wrapper code around it.

You could provide an __atomic version of the bool and val routines with memory model.... we toyed with making the compare_and_swap return both values so we could implement a weak version, but getting 2 return values is not pretty. It could be done with 2 separate built-ins that relate to each other, but thats not great either.

I've thought about various longer term schemes, but havent been able to settle on one I really like. Ideally, if we know we can generate lock free instructions, we expose the wrapper code to the tree optimizers.

I've considered:
1) adding tree support for a CAS primitive which has 2 results.. that pretty invasive but has nice features.

2) a 2 part built-in.. one which returns a value, and a second one which takes that value and then returns the boolean: ie
   val = __atomic_compare_and_swap (&mem, expected, desired, model)
   if (__atomic_compare_and_swap_success (val))
     ...

and during expansion from SSA to RTL, you look for uses of the result of __atomic_compare_and_swap in __atomic_compare_and_swap_success, and you can decide what RTL pattern to use and 'merge' the 2 builtins into one pattern. You can optimize the RTL pattern used based on what, if any, other uses there are of the 2 results. I think this should work OK...

3) Waiting for a flash of brilliance (may never come) or <insert your suggestion> :-)

I decided for the moment to punt on exploring those and give us more time to think about the best way to do it. We do need to be able to call this specific interface for external library calls, but we are not locked into this for inline expansion of lock free instructions. In c-common.c where we turn __atomic_exchange_compare into __atomic_compare_exchange_{1,2,4,8,16}, we can instead turn it into a code sequence using a new __atomic_compare_and_swap builtin or tree code. The wrapper code that is currently emitted as RTL could then be emitted as tree expressions before the SSA optimizers see anything. So no later than the next release of GCC, I would expect to have a fully flushed out solution that gives us all the nice inlines and removes addr-taken flags and such. I just don't feel like there is time at the moment to make the correct decision while I'm trying to get the library ABI right.

If I find some time, I may experiment with #2 next week.

Andrew


Reply via email to