On 04/04/2012 10:33 AM, Richard Guenther wrote:
On Wed, Apr 4, 2012 at 3:28 PM, Andrew MacLeod<amacl...@redhat.com>  wrote:
This is a WIP... that fntype fields is there for simplicity..   and no...
you can do a 1 byte atomic operation on a full word object if you want by

Oh, so you rather need a size or a mode specified, not a "fntype"?

yes, poorly named perhaps as I created things... its just a type node at the moment that indicates the size being operated on that I collected from the builtin in function.


In the example you only ever use address operands (not memory operands)
to the GIMPLE_ATOMIC - is that true in all cases?  Is the result always
non-memory?
The atomic address can be any arbitrary memory location... I haven't gotten to that yet. its commonly just an address so I'm working with that first as proof of concept. When it gets something else it'll trap and I'll know :-)

Results are always non-memory, other than the side effects of the atomic contents changing and having to up date the second parameter to the compare_exchange routine. The generic routines for arbitary structures (not added in yet), actually just work with blocks of memory, but they are all handled by addresses and the functions themselves are typically void. I was planning on folding them right into the existing atomic_kinds as well... I can recognize from the type that it wont map to a integral type. I needed separate builtins in 4.7 for them since the parameter list was different.
I suppose the GIMPLE_ATOMICs are still optimization barriers for all
memory, not just that possibly referenced by them?

yes, depending on the memory model used. It can force synchronization with other CPUs/threads which will have the appearence of changing any shared memory location. Various guarantees are made about whether those changes are visible to this thread after an atomic operation so we can't reuse shared values in those cases. Various guarantees are made about what changes this thread has made are visible to other CPUs/threads at an atomic call as well, so that precludes moving stores downward in some models.

and during expansion to RTL, can trivially see that cmpxchg.2_4 is not used,
and generate the really efficient compare and swap pattern which only
produces a boolean result.
I suppose gimple stmt folding could transform it as well?
it could if I provided gimple statements for the 3 different forms of C&S. I was planning to just leave it this way since its the interface being forced by C++11 as well as C11... and then just emit the appropriate RTL for this one C&S type. The RTL patterns are already defined for the 2 easy cases for the __sync routines. the third one was added for __atomic. Its possible that the process of integrating the __sync routines with GIMPLE_ATOMIC will indicate its better to add those forms as atomic_kinds and then gimple_fold_stmt could take care of it as well. Maybe that is just a good idea anyway... I'll keep it in mind.


   if only cmpxchg.2_4 were used, we can generate
the C&S pattern which only returns the result.  Only if we see both are
actually used do we have to fall back to the much uglier pattern we have
that produces both results.  Currently we always generate this pattern.

Next, we have the C11 atomic type qualifier which needs to be implemented.
  Every reference to this variable is going to have to be expanded into one
or more atomic operations of some sort.  Yes, I probably could do that by
emitting built-in functions, but they are a bit more unwieldy, its far
simpler to just create gimple_statements.
As I understand you first generate builtins anyway and then lower them?
Or are you planning on emitting those for GENERIC as well?  Remember
GENERIC is not GIMPLE, so you'd need new tree codes anyway ;)
Or do you plan to make __atomic integral part of GENERIC and thus
do this lowering during gimplification?
I was actually thinking about doing it during gimplification... I hadnt gotten as far as figuring out what to do with the functions from the front end yet. I dont know that code well, but I was in fact hoping there was a way to 'recognize' the function names easily and avoid built in functions completely...

The C parser is going to have to understand the set of C11 routine names for all these anyway.. I figured there was something in there that could be done.


I also hope that when done, I can also remove all the ugly built-in overload
code that was created for __sync and continues to be used by __atomic.
But the builtins will stay for our users consumption and libstdc++ use, no?

well, the names must remain exposed and recognizable since they are 'out there'. Maybe under the covers I can just leave them as normal calls and then during gimplification simply recognize the names and generate GIMPLE_ATOMIC statements directly from the CALL_EXPR. That would be ideal. That way there are no builtins any more.


So bottom line, a GIMPLE_ATOMIC statement is just an object that is much
easier to work with.
Yes, I see that it is easier to work with for you.  All other statements will
see GIMPLE_ATOMICs as blockers for their work though, even if they
already deal with calls just fine - that's why I most of the time suggest
to use builtins (or internal fns) to do things (I bet you forgot to update
enough predicates ...).  Can GIMPLE_ATOMICs throw with -fnon-call-exceptions?
I suppose yes.  One thing you missed at least ;)

Not that I am aware of, they are 'noexcept'. But I'm sure I've missed more than a few things so far. Im pretty early in the process :-)

  It cleans up both initial creation and rtl generation,
as well as being easier to manipulate. It also encompasses an entire class
of operations that are becoming more integral *if* we can make them
efficient, and I hope to actually do some optimizations on them eventually.
  I had a discussion last fall with Linus about what we needed to be able to
do to them in order for the kernel to use __atomic instead of their
home-rolled solutions.  Could I do everything with builtins? sure... its
just more awkward and this approach seems cleaner to me.
Cleaner if you look at it in isolation - messy if you consider that not only
things working with atomics need to (not) deal with these new stmt kind.

They can affect shared memory in some ways like a call, but don't have many of the other attributes of call. They are really more like an assignment or other operation with arbitrary shared memory side effects. I do hope to be able to teach the optimizers the directionality of the memory model restrictions. ie, ACQUIRE is only a barrier to hoisting shared memory code... stores can be moved downward past this mode. RELEASE is only a barrier to sinking code. RELAXED is no barrier at all to code motion. In fact, a relaxed store is barely different than a real store... but there is a slight difference so we can't make it a normal store :-P.

By teaching the other parts of the compiler about a GIMPLE_ ATOMIC, we could hopefully lessen their impact eventually.

Andrew

Reply via email to