On Wed, Apr 4, 2012 at 5:50 PM, Andrew MacLeod <amacl...@redhat.com> wrote: > On 04/04/2012 10:33 AM, Richard Guenther wrote: >> >> On Wed, Apr 4, 2012 at 3:28 PM, Andrew MacLeod<amacl...@redhat.com> >> wrote: >> This is a WIP... that fntype fields is there for simplicity.. and no... >> you can do a 1 byte atomic operation on a full word object if you want by >> >> Oh, so you rather need a size or a mode specified, not a "fntype"? > > > yes, poorly named perhaps as I created things... its just a type node at the > moment that indicates the size being operated on that I collected from the > builtin in function.
Ok. Remember that you should use non-tree things if you can in GIMPLE land. This probably means that both the size and the memmodel "operands" should be + struct GTY(()) gimple_statement_atomic + { + /* [ WORD 1-8 ] */ + struct gimple_statement_with_memory_ops_base membase; + + /* [ WORD 9 ] */ + enum gimple_atomic_kind kind; + enum gimple_atomic_memmodel memmodel; unsigned size; and not be trees in the ops array. Even more, both kind and memmodel should probably go in membase.base.subcode >> In the example you only ever use address operands (not memory operands) >> to the GIMPLE_ATOMIC - is that true in all cases? Is the result always >> non-memory? > > The atomic address can be any arbitrary memory location... I haven't gotten > to that yet. its commonly just an address so I'm working with that first as > proof of concept. When it gets something else it'll trap and I'll know :-) Ok. Please try to avoid memory operands and stick to address operands ;) You can make tree-ssa-operands.c not consider ADDR_EXPRs in the address operands, similar to the ADDR_EXPRs in MEM_REF operand zero. > Results are always non-memory, other than the side effects of the atomic > contents changing and having to up date the second parameter to the > compare_exchange routine. The generic routines for arbitary structures (not > added in yet), actually just work with blocks of memory, but they are all > handled by addresses and the functions themselves are typically void. I was > planning on folding them right into the existing atomic_kinds as well... I > can recognize from the type that it wont map to a integral type. I needed > separate builtins in 4.7 for them since the parameter list was different. > >> I suppose the GIMPLE_ATOMICs are still optimization barriers for all >> memory, not just that possibly referenced by them? > > > yes, depending on the memory model used. It can force synchronization with > other CPUs/threads which will have the appearence of changing any shared > memory location. Various guarantees are made about whether those changes > are visible to this thread after an atomic operation so we can't reuse > shared values in those cases. Various guarantees are made about what > changes this thread has made are visible to other CPUs/threads at an atomic > call as well, so that precludes moving stores downward in some models. > >> >>> and during expansion to RTL, can trivially see that cmpxchg.2_4 is not >>> used, >>> and generate the really efficient compare and swap pattern which only >>> produces a boolean result. >> >> I suppose gimple stmt folding could transform it as well? > > it could if I provided gimple statements for the 3 different forms of C&S. I > was planning to just leave it this way since its the interface being forced > by C++11 as well as C11... and then just emit the appropriate RTL for this > one C&S type. The RTL patterns are already defined for the 2 easy cases for > the __sync routines. the third one was added for __atomic. Its possible > that the process of integrating the __sync routines with GIMPLE_ATOMIC will > indicate its better to add those forms as atomic_kinds and then > gimple_fold_stmt could take care of it as well. Maybe that is just a good > idea anyway... I'll keep it in mind. > > >> >>> if only cmpxchg.2_4 were used, we can generate >>> the C&S pattern which only returns the result. Only if we see both are >>> actually used do we have to fall back to the much uglier pattern we have >>> that produces both results. Currently we always generate this pattern. >>> >>> Next, we have the C11 atomic type qualifier which needs to be >>> implemented. >>> Every reference to this variable is going to have to be expanded into >>> one >>> or more atomic operations of some sort. Yes, I probably could do that by >>> emitting built-in functions, but they are a bit more unwieldy, its far >>> simpler to just create gimple_statements. >> >> As I understand you first generate builtins anyway and then lower them? >> Or are you planning on emitting those for GENERIC as well? Remember >> GENERIC is not GIMPLE, so you'd need new tree codes anyway ;) >> Or do you plan to make __atomic integral part of GENERIC and thus >> do this lowering during gimplification? > > I was actually thinking about doing it during gimplification... I hadnt > gotten as far as figuring out what to do with the functions from the front > end yet. I dont know that code well, but I was in fact hoping there was a > way to 'recognize' the function names easily and avoid built in functions > completely... Heh ... you'd still need a GENERIC representation then. Possibly a ATOMIC_OP tree may do. > The C parser is going to have to understand the set of C11 routine names for > all these anyway.. I figured there was something in there that could be > done. > > > >>> I also hope that when done, I can also remove all the ugly built-in >>> overload >>> code that was created for __sync and continues to be used by __atomic. >> >> But the builtins will stay for our users consumption and libstdc++ use, >> no? > > > well, the names must remain exposed and recognizable since they are 'out > there'. Maybe under the covers I can just leave them as normal calls and > then during gimplification simply recognize the names and generate > GIMPLE_ATOMIC statements directly from the CALL_EXPR. That would be ideal. > That way there are no builtins any more. I suppose my question was whether the frontends need to do sth about the __atomic keyword or if that is simply translated to some type flag - or is that keyword applying to operations, not to objects or types? > > >>> So bottom line, a GIMPLE_ATOMIC statement is just an object that is much >>> easier to work with. >> >> Yes, I see that it is easier to work with for you. All other statements >> will >> see GIMPLE_ATOMICs as blockers for their work though, even if they >> already deal with calls just fine - that's why I most of the time suggest >> to use builtins (or internal fns) to do things (I bet you forgot to update >> enough predicates ...). Can GIMPLE_ATOMICs throw with >> -fnon-call-exceptions? >> I suppose yes. One thing you missed at least ;) > > > Not that I am aware of, they are 'noexcept'. But I'm sure I've missed more > than a few things so far. Im pretty early in the process :-) What about compare-exchange on a pointer dereference? The pointer dereference surely can trap, so it can throw with -fnon-call-exceptions. No? >> >>> It cleans up both initial creation and rtl generation, >>> as well as being easier to manipulate. It also encompasses an entire >>> class >>> of operations that are becoming more integral *if* we can make them >>> efficient, and I hope to actually do some optimizations on them >>> eventually. >>> I had a discussion last fall with Linus about what we needed to be able >>> to >>> do to them in order for the kernel to use __atomic instead of their >>> home-rolled solutions. Could I do everything with builtins? sure... its >>> just more awkward and this approach seems cleaner to me. >> >> Cleaner if you look at it in isolation - messy if you consider that not >> only >> things working with atomics need to (not) deal with these new stmt kind. > > > They can affect shared memory in some ways like a call, but don't have many > of the other attributes of call. They are really more like an assignment or > other operation with arbitrary shared memory side effects. I do hope to be > able to teach the optimizers the directionality of the memory model > restrictions. ie, ACQUIRE is only a barrier to hoisting shared memory > code... stores can be moved downward past this mode. RELEASE is only a > barrier to sinking code. RELAXED is no barrier at all to code motion. In > fact, a relaxed store is barely different than a real store... but there is > a slight difference so we can't make it a normal store :-P. > > By teaching the other parts of the compiler about a GIMPLE_ ATOMIC, we could > hopefully lessen their impact eventually. Ok. I suppose having a GIMPLE_ATOMIC is fine then. + /* In tree-atomic.c. */ + extern bool expand_gimple_atomic_load (gimple); err, gimple-atomic.c please ;) + /* Return the expression field of atomic operation GS. */ + + static inline tree + gimple_atomic_expr (const_gimple gs) + { + GIMPLE_CHECK (gs, GIMPLE_ATOMIC); + gcc_assert (gimple_atomic_has_expr (gs)); + return gimple_op (gs, 2); + } err - what's "expression" in this context? I hope it's not an arbitrary tcc_expression tree?! + static inline bool + gimple_atomic_has_fail_order (const_gimple gs) + { + return gimple_atomic_kind (gs) == GIMPLE_ATOMIC_COMPARE_EXCHANGE; + } btw, these kind of predicates look superfluous to me - if they are true exactly for one atomic kind then users should use the predicate to test for that specific atomic kind, not for some random field presence. + /* Return the arithmetic operation tree code for atomic operation GS. */ + + static inline enum tree_code + gimple_atomic_op_code (const_gimple gs) + { + GIMPLE_CHECK (gs, GIMPLE_ATOMIC); + gcc_assert (gimple_atomic_kind (gs) == GIMPLE_ATOMIC_FETCH_OP || + gimple_atomic_kind (gs) == GIMPLE_ATOMIC_OP_FETCH); + return (enum tree_code) gs->gsbase.subcode; + } now, what was it with this "expression" thing again? Btw, ||s go to the next line. subcode should be the atomic kind - it seems that you glob too much into GIMPLE_ATOMIC and that you maybe should sub-class GIMPLE_ATOMIC properly via the subcode field. You'd have an gimple_atomic_base which fences could use for example. All asserts in inline functions should be gcc_checking_asserts. + gimple + gimple_build_atomic_load (tree type, tree target, tree order) + { + gimple s = gimple_build_with_ops (GIMPLE_ATOMIC, ERROR_MARK, 3); + gimple_atomic_set_kind (s, GIMPLE_ATOMIC_LOAD); + gimple_atomic_set_order (s, order); + gimple_atomic_set_target (s, target); + gimple_atomic_set_type (s, type); + gimple_set_has_volatile_ops (s, true); you should have a gimple_build_atomic_raw function that takes all operands and the atomic kind, avoiding the need for all the repeated calls of gimple_atomic_set_* as well as avoid all the repeated checking this causes. + else if (is_gimple_atomic (stmt)) + { + tree t; + if (visit_store) + { + for (i = 0; i < gimple_atomic_num_lhs (stmt); i++) + { + t = gimple_atomic_lhs (stmt, i); + if (t) + { + t = get_base_loadstore (t); + if (t) + ret |= visit_store (stmt, t, data); + } I thought results are always registers? The walk_stmt_load_store_addr_ops looks wrong to me. As the loads/stores are implicit (you only have addresses) you have to adjust all callers of walk_stmt_load_store_addr_ops to handle atomics specially as they expect to come along all loads/stores that way. Those also handle calls and asms (for "memory" clobber) specially. + case GIMPLE_ATOMIC: + /* Atomic operations are memory barriers in both directions for now. */ + add_virtual_operand (stmt, opf_def | opf_use); It's surely not relevant that "in both directions for now" but instead that "Atomic operations have side-effects on memory." + for (n = 0; n < gimple_atomic_num_lhs (stmt); n++) + get_expr_operands (stmt, gimple_atomic_lhs_ptr (stmt, n), opf_def); + for (n = 0; n < gimple_atomic_num_rhs (stmt); n++) + get_expr_operands (stmt, gimple_op_ptr (stmt, n), opf_use); + break; Do address-takens in operands make the addresses escape? If not you should pass opf_non_addressable as well. Index: tree-ssa-alias.c =================================================================== *** tree-ssa-alias.c (revision 186098) --- tree-ssa-alias.c (working copy) *************** ref_maybe_used_by_stmt_p (gimple stmt, t *** 1440,1445 **** --- 1440,1447 ---- } else if (is_gimple_call (stmt)) return ref_maybe_used_by_call_p (stmt, ref); + else if (is_gimple_atomic (stmt)) + return true; please add a comment before these atomic handlings that we assume they are using/clobbering all refs because they are considered memory optimization barriers. Btw, does this apply to non-address-taken automatic references? I suppose not. Consider: int foo() { struct s; atomic_fence(); s.a = 1; atomic_fence(); return s.a; } we still should be able to optimize this to return 1, no? At least SRA will happily do similar things in a non-flow-sensitive way. Please add a FIXME to the alias predicates at least, or even better fix this missed optimization. There is no need to establish "backwards missed-optimization compatibility" just like we do for asms. *************** stmt_kills_ref_p_1 (gimple stmt, ao_ref *** 1814,1819 **** --- 1818,1825 ---- } } + if (is_gimple_atomic (stmt)) + return true; that's for sure wrong ;) It should return false. Index: tree-ssa-sink.c =================================================================== *** tree-ssa-sink.c (revision 186098) --- tree-ssa-sink.c (working copy) *************** is_hidden_global_store (gimple stmt) *** 145,150 **** --- 145,154 ---- { tree lhs; + /* Don't optimize across an atomic operation. */ + if (is_gimple_atomic (stmt)) + return true; + that's bogus, too (really all uses of is_hidden_global_store should go away). Please look into the few callers of this function and handle atomics in a correct way explicitely. + else if (is_gimple_atomic (stmt)) + { + unsigned n; + + /* We may be able to lessen this with more relaxed memory + models, but for now, its a full barrier. */ + mark_all_reaching_defs_necessary (stmt); + + for (n = 0; n < gimple_atomic_num_rhs (stmt); n++) + { + tree t = gimple_op (stmt, n); + if (TREE_CODE (t) != SSA_NAME && + TREE_CODE (t) != INTEGER_CST && + !is_gimple_min_invariant (t) && + !ref_may_be_aliased (t)) + mark_aliased_reaching_defs_necessary (stmt, t); + } + } for sure atomics do not make non-aliased automatic variable stores necessary. At least I hope so. As there are only addresses in the atomic ops the code looks wrong to me (and &&s go to the next line ...). As you are marking everything needed anway you can just remove the bogus loop completely. + case GIMPLE_ATOMIC: + /* Treat this like a call for now, it may expand into a call. */ + if (gimple_atomic_kind (stmt) != GIMPLE_ATOMIC_FENCE) + cost = gimple_num_ops (stmt) * + estimate_move_cost (TREE_TYPE (gimple_atomic_target (stmt))); + else + cost = 1; + break; for sure this counts way too many "ops". There is at most a single memory operand as far as I understand. A size/speed cost of 1 for a FENCE is also too low. I miss handling of GIMPLE_ATOMICs in tree-ssa-structalias.c. Richard. > Andrew >