> On Sep 15, 2020, at 4:11 AM, Richard Sandiford <richard.sandif...@arm.com>
> wrote:
>
> Qing Zhao <qing.z...@oracle.com <mailto:qing.z...@oracle.com>> writes:
>>> On Sep 14, 2020, at 2:20 PM, Richard Sandiford <richard.sandif...@arm.com>
>>> wrote:
>>>
>>> Qing Zhao <qing.z...@oracle.com <mailto:qing.z...@oracle.com>> writes:
>>>>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford
>>>>> <richard.sandif...@arm.com> wrote:
>>>>>
>>>>> Qing Zhao <qing.z...@oracle.com> writes:
>>>>>>> Like I mentioned earlier though, passes that run after
>>>>>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>>>>>> weren't previously used. For example, on x86_64, the function might
>>>>>>> not use %r8 when the prologue, epilogue and returns are generated,
>>>>>>> but pass_regrename might later introduce a new use of %r8. AIUI,
>>>>>>> the “used” version of the new command-line option is supposed to clear
>>>>>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>>>>>> collected at the point that the return is generated.
>>>>>>
>>>>>> Thanks for the information.
>>>>>>
>>>>>>>
>>>>>>> That's why I think it's more robust to do this later (at the beginning
>>>>>>> of pass_late_compilation) and insert the zeroing before returns that
>>>>>>> already exist.
>>>>>>
>>>>>> Yes, looks like it’s not correct to insert the zeroing at the time when
>>>>>> prologue, epilogue and return are generated.
>>>>>> As I also checked, “return” might be also generated as late as pass
>>>>>> “pass_delay_slots”, So, shall we move the
>>>>>> New pass as late as possible?
>>>>>
>>>>> If we insert the zeroing before pass_delay_slots and describe the
>>>>> result correctly, pass_delay_slots should do the right thing.
>>>>>
>>>>> Describing the result correctly includes ensuring that the cleared
>>>>> registers are treated as live on exit from the function, so that the
>>>>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>>>>
>>>> In the current implementation for x86, when we generating a zeroing insn
>>>> as the following:
>>>>
>>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>> (const_int 0 [0])) "t10.c":11:1 -1
>>>> (nil))
>>>> (insn 19 18 20 2 (unspec_volatile [
>>>> (reg:SI 1 dx)
>>>> ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>> (nil))
>>>>
>>>> i.e, after each zeroing insn, the register that is zeroed is marked as
>>>> “UNSPECV_PRO_EPILOGUE_USE”,
>>>> By doing this, we can avoid this zeroing insn from being deleted or
>>>> skipped.
>>>>
>>>> Is doing this enough to describe the result correctly?
>>>> Is there other thing we need to do in addition to this?
>>>
>>> I guess that works, but I think it would be better to abstract
>>> EPILOGUE_USES into a new target-independent wrapper function that
>>> (a) returns true if EPILOGUE_USES itself returns true and (b) returns
>>> true for registers that need to be zero on return, if the zeroing
>>> instructions have already been inserted. The places that currently
>>> test EPILOGUE_USES should then test this new wrapper function instead.
>>
>> Okay, I see.
>> Looks like that EPILOGUE_USES is used in df-scan.c to compute the data flow
>> information. If EPILOUGE_USES return true
>> for registers that need to be zeroed on return, those registers will be
>> included in the data flow information, as a result, later
>> passes will not be able to delete them.
>>
>> This sounds to be a cleaner approach than the current one that marks the
>> registers “UNSPECV_PRO_EPILOGUE_USE”.
>>
>> A more detailed implementation question on this:
>> Where should I put this new target-independent wrapper function in? Which
>> header file will be a proper place to hold this new function?
>
> Not a strong opinion, but: maybe df.h and df-scan.c, since this is
> really a DF query.
Okay.
>
>>> After inserting the zeroing instructions, the pass should recompute the
>>> live-out sets based on this.
>
> Sorry, I was wrong here. It should *cause* the sets to be recomputed
> where necessary (rather than recompute them directly), but see below.
>
>> Is only computing the live-out sets of the block that including the return
>> insn enough? Or we should re-compute the whole procedure?
>>
>> Which utility routine I should use to recompute the live-out sets?
>
> Inserting the instructions will cause the containing block to be marked
> dirty, via df_set_bb_dirty. I think the pass should also call
> df_set_bb_dirty on the exit block itself, to indicate that the
> wrapper around EPILOGUE_USES has changed behaviour, but that might
> not be necessary.
>
> This gives the df machinery enough information to work out what has changed.
> It will then propagate those changes throughout the function. (I don't
> think any propagation would be necessary here, but if I'm wrong about that,
> then the df machinery will do whatever propagation is necessary.)
>
> However, the convention is for a pass that uses the df machinery to call
> df_analyze first. This call to df_analyze updates any stale df information.
>
> So unlike what I said yesterday, the pass itself doesn't need to make sure
> that the df information is up-to-date. It just needs to indicate what
> has changed, as above.
>
> In the case of pass_delay_slots, pass_free_cfg has:
>
> /* The resource.c machinery uses DF but the CFG isn't guaranteed to be
> valid at that point so it would be too late to call df_analyze. */
> if (DELAY_SLOTS && optimize > 0 && flag_delayed_branch)
> {
> df_note_add_problem ();
> df_analyze ();
> }
>
> Any other machine-specific passes that use df already need to call
> df_analyze (if they use the df machinery). So simply marking what
> has changed is enough (by design).
So, in this new pass, I need:
1. Call “df_analyze” in the beginning to get the up-to-data df information;
2. After generating the zero insns, mark the containing block with
“df_set_bb_dirty”.
3. mark the exit block with “df_set_bb_dirty” to indicate the wrapper around
EPILOGUE_USES changed
Behavior. (This might not need since “df_analyze” in the next pass will
call EPILOGUE_USES automatically? )
Is the above enough for DF?
(BTW, how expensive to call “df_analyze”?)
>
>> My understanding is:
>>
>> In our new pass that is put in the beginning of the pass_late_compilation,
>> I,e pass_zero_call_used_regs;
>>
>> PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
>> ++++ NEXT_PASS (pass_zero_call_used_regs);
>> NEXT_PASS (pass_compute_alignments);
>> NEXT_PASS (pass_variable_tracking);
>> NEXT_PASS (pass_free_cfg);
>> NEXT_PASS (pass_machine_reorg);
>> NEXT_PASS (pass_cleanup_barriers);
>> NEXT_PASS (pass_delay_slots);
>>
>> When we scan the EXIT BLOCK of the routine, all the return insns have
>> already been there.
>> The later passes including “pass_delay_slots” will not generate additional
>> returns anymore, they might just call “target.gen_return” or
>> “target.gen_simple_return() to replace
>> “ret_rtx” or “simple_ret_rtx” ?
>
> Kind-of. pass_delay_slots can also duplicate code, so it's not always a
> straight replacement. But the point is that returns don't appear out of
> nowhere. There has to be a semantic reason for them to exist. The
> behaviour of the function after pass_delay_slots has to be the same
> as it was before the pass (disregarding undefined behaviour). Once we've
> added clearing of the zero registers to all return paths, that clearing
> becomes part of the behaviour of the function, and so will be part of
> the behaviour after pass_delay_slots as well.
>
> So I don't think the problem is with passes generating new returns.
> It's more whether they could use new registers that then need to be
> cleared, which is the main justification for running the new pass
> so late in the pipeline.
agreed.
>
> In principle, there's nothing stopping pass_delay_slots allocating
> new registers (like pass_regrename does), and in principle that could
> introduce the need to do more clearing. But I don't think the current
> pass does that. The pass is also very much legacy code at this point,
> so the chances of new optimisations being added to it are fairly low.
> If that did happen, I think it would be reasonable to expect the pass
> to work within the set of registers that have already been allocated,
> at least when your new option is in effect.
Okay, thanks for the information.
Qing
>
> Thanks,
> Richard