On Tue, Jan 31, 2017 at 10:32 AM, Richard Henderson <r...@redhat.com> wrote: > On 01/30/2017 05:10 PM, Andrew Waterman wrote: >>>> +(define_expand "clear_cache" >>>> + [(match_operand 0 "pmode_register_operand") >>>> + (match_operand 1 "pmode_register_operand")] >>>> + "" >>>> + " >>>> +{ >>>> + emit_insn (gen_fence_i ()); >>>> + DONE; >>>> +}") >>> >>> >>> Do you need a FENCE before the FENCE.I? >> >> It's actually not clear to me what the semantics of clear_cache are >> for multiprocessors. Can you shed some light? >> >> If thread A does modifies code then sets a flag, then thread B reads >> the flag and executes a FENCE.I, then thread A needs a FENCE before >> setting the flag and thread B needs a fence before the FENCE.I. But, >> is it not the software's responsibility to insert both fences, rather >> than assuming one of the fences is folded into clear_cache? > > Your introduction of "flag" confuses the issue. > > Having re-read the description in section 2.7, I see that FENCE.I is > thread-local and is all that is required for a single thread to sync its own I > and D caches. I think perhaps I'd mis-read or mis-remembered before. > > Practically speaking, I'm not sure we have put any real thought about what > needs to happen for threads using on-stack trampolines. Certainly no other > gcc > port attempts to broadcast the need for an icache flush to other cpus. > > So just leave as-is -- __builtin_clear_cache works properly for the local > thread.
Sorry, I jumped right to the assumption that you were suggesting a FENCE for multiprocessor synchronization. Indeed, it's not necessary for enforcing ordering between local stores and FENCE.I. > > >>>> +(define_insn "call_value_multiple_internal" >>>> + [(set (match_operand 0 "register_operand" "") >>>> + (call (mem:SI (match_operand 1 "call_insn_operand" "l,S")) >>>> + (match_operand 2 "" ""))) >>>> + (set (match_operand 3 "register_operand" "") >>>> + (call (mem:SI (match_dup 1)) >>>> + (match_dup 2))) >>> >>> >>> Any reason for this? Your return value registers are sequential. The >>> normal thing to do is just use e.g. (reg:TI 10). >> >> I think we'd need different patterns for mixed int/FP struct returns >> (which use a0 and fa0) if we took that approach. > > Ah. Other targets get away with using a PARALLEL. > > >From sparc.c, function_arg_record_value: > > data.ret = gen_rtx_PARALLEL (mode, rtvec_alloc (data.stack + nregs)); > > and sparc.md: > > (define_insn "*call_value_symbolic_sp64" > [(set (match_operand 0 "" "") > (call (mem:DI (match_operand:DI 1 "symbolic_operand" "s")) > (match_operand 2 "" ""))) > (clobber (reg:DI O7_REG))] > > So you wind up with things like > > (set (parallel [ > (reg:DI o0) > (reg:DF fp0) > (reg:DI o1) > (reg:DF fp1) > ]) > (call (mem:DI (symbol_ref:DI "function")) > (const_int 0))) > > We do the same for x86_64 -- in function_value_64: > > ret = construct_container (mode, orig_mode, valtype, 1, > X86_64_REGPARM_MAX, X86_64_SSE_REGPARM_MAX, > x86_64_int_return_registers, 0); Thanks--these pointers were quite helpful. > > > r~