Hello,

On Fri, Jan 10 2025, Martin Jambor wrote:
> Hello,
>
> On Wed, Dec 11 2024, Martin Jambor wrote:
>> Hello,
>>
>> even though it is not my work, I would like to ping this patch.  Having
>> it upstream would really help us a lot.
>>
>
> Please, pretty please, consider reviewing this in time for GCC 15,
> having it upstream would really help us a lot and from what I can tell,
> it should no longer be controversial.

Even though we are already in stage4, I believe the risk of accepting
the patch are tiny compared to the benefits it adds.  Can I therefore
still ping it, please?

Thanks,

Martin


>
> Thank you very much in advance,
>
> Martin
>
>>
>> On Wed, Nov 13 2024, Michael Matz wrote:
>>> Hello,
>>>
>>> this is essentially 
>>>
>>>   https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html
>>>
>>> from Kewen in functionality.  When discussing this with Segher at the 
>>> Cauldron he expressed reservations about changing the default 
>>> implementation of -fpatchable-function-entry.  So, to move forward, let's 
>>> move it under a new target option -msplit-patch-nops (expressing the 
>>> important deviation from the default behaviour, namely that all the 
>>> patching nops form a consecutive sequence normally).
>>>
>>> Regstrapping on power9 ppc64le in progress.  Okay if that passes?
>>>
>>>
>>> Ciao,
>>> Michael.
>>>
>>> ---
>>>
>>> as the bug report details some uses of -fpatchable-function-entry
>>> aren't happy with the "before" NOPs being inserted between global and
>>> local entry point on powerpc.  We want the before NOPs be in front
>>> of the global entry point.  That means that the patching NOPs aren't
>>> consecutive for dual entry point functions, but for these usecases
>>> that's not the problem.  But let us support both under the control
>>> of a new target option: -msplit-patch-nops.
>>>
>>>     gcc/
>>>
>>>         PR target/112980
>>>         * config/rs6000/rs6000.opt (msplit-patch-nops): New option.
>>>         * doc/invoke.texi (RS/6000 and PowerPC Options): Document it.
>>>         * config/rs6000/rs6000.h (machine_function.stop_patch_area_print):
>>>         New member.
>>>         * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
>>>         Emit split nops under control of that one.
>>>         * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
>>>         Add handling of split patch nops.
>>> ---
>>>  gcc/config/rs6000/rs6000-logue.cc | 15 +++++++++------
>>>  gcc/config/rs6000/rs6000.cc       | 27 +++++++++++++++++++++++----
>>>  gcc/config/rs6000/rs6000.h        |  6 ++++++
>>>  gcc/config/rs6000/rs6000.opt      |  4 ++++
>>>  gcc/doc/invoke.texi               | 17 +++++++++++++++--
>>>  5 files changed, 57 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
>>> b/gcc/config/rs6000/rs6000-logue.cc
>>> index c87058b435e..aa1e0442f2b 100644
>>> --- a/gcc/config/rs6000/rs6000-logue.cc
>>> +++ b/gcc/config/rs6000/rs6000-logue.cc
>>> @@ -4005,8 +4005,8 @@ rs6000_output_function_prologue (FILE *file)
>>>  
>>>        unsigned short patch_area_size = crtl->patch_area_size;
>>>        unsigned short patch_area_entry = crtl->patch_area_entry;
>>> -      /* Need to emit the patching area.  */
>>> -      if (patch_area_size > 0)
>>> +      /* Emit non-split patching area now.  */
>>> +      if (!TARGET_SPLIT_PATCH_NOPS && patch_area_size > 0)
>>>     {
>>>       cfun->machine->global_entry_emitted = true;
>>>       /* As ELFv2 ABI shows, the allowable bytes between the global
>>> @@ -4027,7 +4027,6 @@ rs6000_output_function_prologue (FILE *file)
>>>                    patch_area_entry);
>>>           rs6000_print_patchable_function_entry (file, patch_area_entry,
>>>                                                  true);
>>> -         patch_area_size -= patch_area_entry;
>>>         }
>>>     }
>>>  
>>> @@ -4037,9 +4036,13 @@ rs6000_output_function_prologue (FILE *file)
>>>        assemble_name (file, name);
>>>        fputs ("\n", file);
>>>        /* Emit the nops after local entry.  */
>>> -      if (patch_area_size > 0)
>>> -   rs6000_print_patchable_function_entry (file, patch_area_size,
>>> -                                          patch_area_entry == 0);
>>> +      if (patch_area_size > patch_area_entry)
>>> +   {
>>> +     patch_area_size -= patch_area_entry;
>>> +     cfun->machine->stop_patch_area_print = false;
>>> +     rs6000_print_patchable_function_entry (file, patch_area_size,
>>> +                                            patch_area_entry == 0);
>>> +   }
>>>      }
>>>  
>>>    else if (rs6000_pcrel_p ())
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 950fd947fda..6427e6913ba 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -15226,11 +15226,25 @@ rs6000_print_patchable_function_entry (FILE *file,
>>>  {
>>>    bool global_entry_needed_p = rs6000_global_entry_point_prologue_needed_p 
>>> ();
>>>    /* For a function which needs global entry point, we will emit the
>>> -     patchable area before and after local entry point under the control of
>>> -     cfun->machine->global_entry_emitted, see the handling in function
>>> -     rs6000_output_function_prologue.  */
>>> -  if (!global_entry_needed_p || cfun->machine->global_entry_emitted)
>>> +     patchable area when it isn't split before and after local entry point
>>> +     under the control of cfun->machine->global_entry_emitted, see the
>>> +     handling in function rs6000_output_function_prologue.  */
>>> +  if (!TARGET_SPLIT_PATCH_NOPS
>>> +      && (!global_entry_needed_p || cfun->machine->global_entry_emitted))
>>>      default_print_patchable_function_entry (file, patch_area_size, 
>>> record_p);
>>> +
>>> +  /* For split patch nops we emit the before nops (from generic code)
>>> +     in front of the global entry point and after the local entry point,
>>> +     under the control of cfun->machine->stop_patch_area_print, see
>>> +     rs6000_output_function_prologue and rs6000_elf_declare_function_name. 
>>>  */
>>> +  if (TARGET_SPLIT_PATCH_NOPS)
>>> +    {
>>> +      if (!cfun->machine->stop_patch_area_print)
>>> +   default_print_patchable_function_entry (file, patch_area_size,
>>> +                                           record_p);
>>> +      else
>>> +   gcc_assert (global_entry_needed_p);
>>> +    }
>>>  }
>>>  
>>>  enum rtx_code
>>> @@ -21423,6 +21437,11 @@ rs6000_elf_declare_function_name (FILE *file, 
>>> const char *name, tree decl)
>>>        fprintf (file, "\t.previous\n");
>>>      }
>>>    ASM_OUTPUT_FUNCTION_LABEL (file, name, decl);
>>> +  /* At this time, the "before" NOPs have been already emitted.
>>> +     For split nops stop generic code from printing the "after" NOPs and
>>> +     emit them just after local entry ourself later.  */
>>> +  if (rs6000_global_entry_point_prologue_needed_p ())
>>> +    cfun->machine->stop_patch_area_print = true;
>>>  }
>>>  
>>>  static void rs6000_elf_file_end (void) ATTRIBUTE_UNUSED;
>>> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
>>> index d460eb06544..2bd56f8d377 100644
>>> --- a/gcc/config/rs6000/rs6000.h
>>> +++ b/gcc/config/rs6000/rs6000.h
>>> @@ -2424,6 +2424,12 @@ typedef struct GTY(()) machine_function
>>>       global entry.  It helps to control the patchable area before and after
>>>       local entry.  */
>>>    bool global_entry_emitted;
>>> +  /* With ELFv2 ABI dual entry points being adopted, generic framework
>>> +     targetm.asm_out.print_patchable_function_entry would generate "after"
>>> +     NOPs before local entry, which is wrong.  This flag is to stop it from
>>> +     printing patch area before local entry, it is only useful when the
>>> +     function requires dual entry points.  */
>>> +  bool stop_patch_area_print;
>>>  } machine_function;
>>>  #endif
>>>  
>>> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
>>> index 94323bd1db2..27a165ef0ee 100644
>>> --- a/gcc/config/rs6000/rs6000.opt
>>> +++ b/gcc/config/rs6000/rs6000.opt
>>> @@ -300,6 +300,10 @@ mfull-toc
>>>  Target
>>>  Put everything in the regular TOC.
>>>  
>>> +msplit-patch-nops
>>> +Target Var(TARGET_SPLIT_PATCH_NOPS) Init(0)
>>> +Emit NOPs before global and after local entry point for 
>>> -fpatchable-function-entry.
>>> +
>>>  mvrsave
>>>  Target Var(TARGET_ALTIVEC_VRSAVE) Save
>>>  Generate VRSAVE instructions when generating AltiVec code.
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index 93e9af3791c..f3d37a48fd0 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -1318,6 +1318,7 @@ See RS/6000 and PowerPC Options.
>>>  -mtraceback=@var{traceback_type}
>>>  -maix-struct-return  -msvr4-struct-return
>>>  -mabi=@var{abi-type}  -msecure-plt  -mbss-plt
>>> +-msplit-patch-nops
>>>  -mlongcall  -mno-longcall  -mpltseq  -mno-pltseq
>>>  -mblock-move-inline-limit=@var{num}
>>>  -mblock-compare-inline-limit=@var{num}
>>> @@ -18569,11 +18570,12 @@ If @code{N=0}, no pad location is recorded.
>>>  The NOP instructions are inserted at---and maybe before, depending on
>>>  @var{M}---the function entry address, even before the prologue.  On
>>>  PowerPC with the ELFv2 ABI, for a function with dual entry points,
>>> -the local entry point is this function entry address.
>>> +the local entry point is this function entry address by default.  See
>>> +the @option{-msplit-patch-nops} option to change this.
>>>  
>>>  The maximum value of @var{N} and @var{M} is 65535.  On PowerPC with the
>>>  ELFv2 ABI, for a function with dual entry points, the supported values
>>> -for @var{M} are 0, 2, 6 and 14.
>>> +for @var{M} are 0, 2, 6 and 14 when not using @option{-msplit-patch-nops}.
>>>  @end table
>>>  
>>>  
>>> @@ -31658,6 +31660,17 @@ requires @code{.plt} and @code{.got}
>>>  sections that are both writable and executable.
>>>  This is a PowerPC 32-bit SYSV ABI option.
>>>  
>>> +@opindex msplit-patch-nops
>>> +@item -msplit-patch-nops
>>> +When adding NOPs for a patchable area via the
>>> +@option{-fpatchable-function-entry} option emit the "before" NOPs in front
>>> +of the global entry point and the "after" NOPs after the local entry point.
>>> +This makes the sequence of NOPs not consecutive when a global entry point
>>> +is generated.  Without this option the NOPs are emitted directly before and
>>> +after the local entry point, making them consecutive but moving global and
>>> +local entry point further apart.  If only a single entry point is generated
>>> +this option has no effect.
>>> +
>>>  @opindex misel
>>>  @opindex mno-isel
>>>  @item -misel
>>> -- 
>>> 2.35.3

Reply via email to