Re: [PATCH 2/6] RISC-V Port: gcc

Andrew Waterman Tue, 31 Jan 2017 13:12:07 -0800

On Tue, Jan 31, 2017 at 10:32 AM, Richard Henderson <r...@redhat.com> wrote:
> On 01/30/2017 05:10 PM, Andrew Waterman wrote:
>>>> +(define_expand "clear_cache"
>>>> +  [(match_operand 0 "pmode_register_operand")
>>>> +   (match_operand 1 "pmode_register_operand")]
>>>> +  ""
>>>> +  "
>>>> +{
>>>> +  emit_insn (gen_fence_i ());
>>>> +  DONE;
>>>> +}")
>>>
>>>
>>> Do you need a FENCE before the FENCE.I?
>>
>> It's actually not clear to me what the semantics of clear_cache are
>> for multiprocessors.  Can you shed some light?
>>
>> If thread A does modifies code then sets a flag, then thread B reads
>> the flag and executes a FENCE.I, then thread A needs a FENCE before
>> setting the flag and thread B needs a fence before the FENCE.I.  But,
>> is it not the software's responsibility to insert both fences, rather
>> than assuming one of the fences is folded into clear_cache?
>
> Your introduction of "flag" confuses the issue.
>
> Having re-read the description in section 2.7, I see that FENCE.I is
> thread-local and is all that is required for a single thread to sync its own I
> and D caches.  I think perhaps I'd mis-read or mis-remembered before.
>
> Practically speaking, I'm not sure we have put any real thought about what
> needs to happen for threads using on-stack trampolines.  Certainly no other 
> gcc
> port attempts to broadcast the need for an icache flush to other cpus.
>
> So just leave as-is -- __builtin_clear_cache works properly for the local 
> thread.


Sorry, I jumped right to the assumption that you were suggesting a
FENCE for multiprocessor synchronization.  Indeed, it's not necessary
for enforcing ordering between local stores and FENCE.I.

>
>
>>>> +(define_insn "call_value_multiple_internal"
>>>> +  [(set (match_operand 0 "register_operand" "")
>>>> +       (call (mem:SI (match_operand 1 "call_insn_operand" "l,S"))
>>>> +             (match_operand 2 "" "")))
>>>> +   (set (match_operand 3 "register_operand" "")
>>>> +       (call (mem:SI (match_dup 1))
>>>> +             (match_dup 2)))
>>>
>>>
>>> Any reason for this?  Your return value registers are sequential.  The
>>> normal thing to do is just use e.g. (reg:TI 10).
>>
>> I think we'd need different patterns for mixed int/FP struct returns
>> (which use a0 and fa0) if we took that approach.
>
> Ah.  Other targets get away with using a PARALLEL.
>
> >From sparc.c, function_arg_record_value:
>
>   data.ret = gen_rtx_PARALLEL (mode, rtvec_alloc (data.stack + nregs));
>
> and sparc.md:
>
> (define_insn "*call_value_symbolic_sp64"
>   [(set (match_operand 0 "" "")
>         (call (mem:DI (match_operand:DI 1 "symbolic_operand" "s"))
>               (match_operand 2 "" "")))
>    (clobber (reg:DI O7_REG))]
>
> So you wind up with things like
>
>   (set (parallel [
>          (reg:DI o0)
>          (reg:DF fp0)
>          (reg:DI o1)
>          (reg:DF fp1)
>         ])
>        (call (mem:DI (symbol_ref:DI "function"))
>              (const_int 0)))
>
> We do the same for x86_64 -- in function_value_64:
>
>   ret = construct_container (mode, orig_mode, valtype, 1,
>                              X86_64_REGPARM_MAX, X86_64_SSE_REGPARM_MAX,
>                              x86_64_int_return_registers, 0);

Thanks--these pointers were quite helpful.

>
>
> r~

Re: [PATCH 2/6] RISC-V Port: gcc

Reply via email to