On 04/04/2012 10:33 AM, Richard Guenther wrote:
On Wed, Apr 4, 2012 at 3:28 PM, Andrew MacLeod<amacl...@redhat.com> wrote:
This is a WIP... that fntype fields is there for simplicity.. and no...
you can do a 1 byte atomic operation on a full word object if you want by
Oh, so you rather need a size or a mode specified, not a "fntype"?
yes, poorly named perhaps as I created things... its just a type node at
the moment that indicates the size being operated on that I collected
from the builtin in function.
In the example you only ever use address operands (not memory operands)
to the GIMPLE_ATOMIC - is that true in all cases? Is the result always
non-memory?
The atomic address can be any arbitrary memory location... I haven't
gotten to that yet. its commonly just an address so I'm working with
that first as proof of concept. When it gets something else it'll trap
and I'll know :-)
Results are always non-memory, other than the side effects of the atomic
contents changing and having to up date the second parameter to the
compare_exchange routine. The generic routines for arbitary structures
(not added in yet), actually just work with blocks of memory, but they
are all handled by addresses and the functions themselves are typically
void. I was planning on folding them right into the existing
atomic_kinds as well... I can recognize from the type that it wont map
to a integral type. I needed separate builtins in 4.7 for them since
the parameter list was different.
I suppose the GIMPLE_ATOMICs are still optimization barriers for all
memory, not just that possibly referenced by them?
yes, depending on the memory model used. It can force synchronization
with other CPUs/threads which will have the appearence of changing any
shared memory location. Various guarantees are made about whether those
changes are visible to this thread after an atomic operation so we can't
reuse shared values in those cases. Various guarantees are made about
what changes this thread has made are visible to other CPUs/threads at
an atomic call as well, so that precludes moving stores downward in some
models.
and during expansion to RTL, can trivially see that cmpxchg.2_4 is not used,
and generate the really efficient compare and swap pattern which only
produces a boolean result.
I suppose gimple stmt folding could transform it as well?
it could if I provided gimple statements for the 3 different forms of
C&S. I was planning to just leave it this way since its the interface
being forced by C++11 as well as C11... and then just emit the
appropriate RTL for this one C&S type. The RTL patterns are already
defined for the 2 easy cases for the __sync routines. the third one was
added for __atomic. Its possible that the process of integrating the
__sync routines with GIMPLE_ATOMIC will indicate its better to add those
forms as atomic_kinds and then gimple_fold_stmt could take care of it as
well. Maybe that is just a good idea anyway... I'll keep it in mind.
if only cmpxchg.2_4 were used, we can generate
the C&S pattern which only returns the result. Only if we see both are
actually used do we have to fall back to the much uglier pattern we have
that produces both results. Currently we always generate this pattern.
Next, we have the C11 atomic type qualifier which needs to be implemented.
Every reference to this variable is going to have to be expanded into one
or more atomic operations of some sort. Yes, I probably could do that by
emitting built-in functions, but they are a bit more unwieldy, its far
simpler to just create gimple_statements.
As I understand you first generate builtins anyway and then lower them?
Or are you planning on emitting those for GENERIC as well? Remember
GENERIC is not GIMPLE, so you'd need new tree codes anyway ;)
Or do you plan to make __atomic integral part of GENERIC and thus
do this lowering during gimplification?
I was actually thinking about doing it during gimplification... I hadnt
gotten as far as figuring out what to do with the functions from the
front end yet. I dont know that code well, but I was in fact hoping
there was a way to 'recognize' the function names easily and avoid built
in functions completely...
The C parser is going to have to understand the set of C11 routine names
for all these anyway.. I figured there was something in there that could
be done.
I also hope that when done, I can also remove all the ugly built-in overload
code that was created for __sync and continues to be used by __atomic.
But the builtins will stay for our users consumption and libstdc++ use, no?
well, the names must remain exposed and recognizable since they are 'out
there'. Maybe under the covers I can just leave them as normal calls
and then during gimplification simply recognize the names and generate
GIMPLE_ATOMIC statements directly from the CALL_EXPR. That would be
ideal. That way there are no builtins any more.
So bottom line, a GIMPLE_ATOMIC statement is just an object that is much
easier to work with.
Yes, I see that it is easier to work with for you. All other statements will
see GIMPLE_ATOMICs as blockers for their work though, even if they
already deal with calls just fine - that's why I most of the time suggest
to use builtins (or internal fns) to do things (I bet you forgot to update
enough predicates ...). Can GIMPLE_ATOMICs throw with -fnon-call-exceptions?
I suppose yes. One thing you missed at least ;)
Not that I am aware of, they are 'noexcept'. But I'm sure I've missed
more than a few things so far. Im pretty early in the process :-)
It cleans up both initial creation and rtl generation,
as well as being easier to manipulate. It also encompasses an entire class
of operations that are becoming more integral *if* we can make them
efficient, and I hope to actually do some optimizations on them eventually.
I had a discussion last fall with Linus about what we needed to be able to
do to them in order for the kernel to use __atomic instead of their
home-rolled solutions. Could I do everything with builtins? sure... its
just more awkward and this approach seems cleaner to me.
Cleaner if you look at it in isolation - messy if you consider that not only
things working with atomics need to (not) deal with these new stmt kind.
They can affect shared memory in some ways like a call, but don't have
many of the other attributes of call. They are really more like an
assignment or other operation with arbitrary shared memory side
effects. I do hope to be able to teach the optimizers the
directionality of the memory model restrictions. ie, ACQUIRE is only a
barrier to hoisting shared memory code... stores can be moved downward
past this mode. RELEASE is only a barrier to sinking code. RELAXED is
no barrier at all to code motion. In fact, a relaxed store is barely
different than a real store... but there is a slight difference so we
can't make it a normal store :-P.
By teaching the other parts of the compiler about a GIMPLE_ ATOMIC, we
could hopefully lessen their impact eventually.
Andrew