Hello, On Fri, Jan 10 2025, Martin Jambor wrote: > Hello, > > On Wed, Dec 11 2024, Martin Jambor wrote: >> Hello, >> >> even though it is not my work, I would like to ping this patch. Having >> it upstream would really help us a lot. >> > > Please, pretty please, consider reviewing this in time for GCC 15, > having it upstream would really help us a lot and from what I can tell, > it should no longer be controversial.
Even though we are already in stage4, I believe the risk of accepting the patch are tiny compared to the benefits it adds. Can I therefore still ping it, please? Thanks, Martin > > Thank you very much in advance, > > Martin > >> >> On Wed, Nov 13 2024, Michael Matz wrote: >>> Hello, >>> >>> this is essentially >>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html >>> >>> from Kewen in functionality. When discussing this with Segher at the >>> Cauldron he expressed reservations about changing the default >>> implementation of -fpatchable-function-entry. So, to move forward, let's >>> move it under a new target option -msplit-patch-nops (expressing the >>> important deviation from the default behaviour, namely that all the >>> patching nops form a consecutive sequence normally). >>> >>> Regstrapping on power9 ppc64le in progress. Okay if that passes? >>> >>> >>> Ciao, >>> Michael. >>> >>> --- >>> >>> as the bug report details some uses of -fpatchable-function-entry >>> aren't happy with the "before" NOPs being inserted between global and >>> local entry point on powerpc. We want the before NOPs be in front >>> of the global entry point. That means that the patching NOPs aren't >>> consecutive for dual entry point functions, but for these usecases >>> that's not the problem. But let us support both under the control >>> of a new target option: -msplit-patch-nops. >>> >>> gcc/ >>> >>> PR target/112980 >>> * config/rs6000/rs6000.opt (msplit-patch-nops): New option. >>> * doc/invoke.texi (RS/6000 and PowerPC Options): Document it. >>> * config/rs6000/rs6000.h (machine_function.stop_patch_area_print): >>> New member. >>> * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry): >>> Emit split nops under control of that one. >>> * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue): >>> Add handling of split patch nops. >>> --- >>> gcc/config/rs6000/rs6000-logue.cc | 15 +++++++++------ >>> gcc/config/rs6000/rs6000.cc | 27 +++++++++++++++++++++++---- >>> gcc/config/rs6000/rs6000.h | 6 ++++++ >>> gcc/config/rs6000/rs6000.opt | 4 ++++ >>> gcc/doc/invoke.texi | 17 +++++++++++++++-- >>> 5 files changed, 57 insertions(+), 12 deletions(-) >>> >>> diff --git a/gcc/config/rs6000/rs6000-logue.cc >>> b/gcc/config/rs6000/rs6000-logue.cc >>> index c87058b435e..aa1e0442f2b 100644 >>> --- a/gcc/config/rs6000/rs6000-logue.cc >>> +++ b/gcc/config/rs6000/rs6000-logue.cc >>> @@ -4005,8 +4005,8 @@ rs6000_output_function_prologue (FILE *file) >>> >>> unsigned short patch_area_size = crtl->patch_area_size; >>> unsigned short patch_area_entry = crtl->patch_area_entry; >>> - /* Need to emit the patching area. */ >>> - if (patch_area_size > 0) >>> + /* Emit non-split patching area now. */ >>> + if (!TARGET_SPLIT_PATCH_NOPS && patch_area_size > 0) >>> { >>> cfun->machine->global_entry_emitted = true; >>> /* As ELFv2 ABI shows, the allowable bytes between the global >>> @@ -4027,7 +4027,6 @@ rs6000_output_function_prologue (FILE *file) >>> patch_area_entry); >>> rs6000_print_patchable_function_entry (file, patch_area_entry, >>> true); >>> - patch_area_size -= patch_area_entry; >>> } >>> } >>> >>> @@ -4037,9 +4036,13 @@ rs6000_output_function_prologue (FILE *file) >>> assemble_name (file, name); >>> fputs ("\n", file); >>> /* Emit the nops after local entry. */ >>> - if (patch_area_size > 0) >>> - rs6000_print_patchable_function_entry (file, patch_area_size, >>> - patch_area_entry == 0); >>> + if (patch_area_size > patch_area_entry) >>> + { >>> + patch_area_size -= patch_area_entry; >>> + cfun->machine->stop_patch_area_print = false; >>> + rs6000_print_patchable_function_entry (file, patch_area_size, >>> + patch_area_entry == 0); >>> + } >>> } >>> >>> else if (rs6000_pcrel_p ()) >>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >>> index 950fd947fda..6427e6913ba 100644 >>> --- a/gcc/config/rs6000/rs6000.cc >>> +++ b/gcc/config/rs6000/rs6000.cc >>> @@ -15226,11 +15226,25 @@ rs6000_print_patchable_function_entry (FILE *file, >>> { >>> bool global_entry_needed_p = rs6000_global_entry_point_prologue_needed_p >>> (); >>> /* For a function which needs global entry point, we will emit the >>> - patchable area before and after local entry point under the control of >>> - cfun->machine->global_entry_emitted, see the handling in function >>> - rs6000_output_function_prologue. */ >>> - if (!global_entry_needed_p || cfun->machine->global_entry_emitted) >>> + patchable area when it isn't split before and after local entry point >>> + under the control of cfun->machine->global_entry_emitted, see the >>> + handling in function rs6000_output_function_prologue. */ >>> + if (!TARGET_SPLIT_PATCH_NOPS >>> + && (!global_entry_needed_p || cfun->machine->global_entry_emitted)) >>> default_print_patchable_function_entry (file, patch_area_size, >>> record_p); >>> + >>> + /* For split patch nops we emit the before nops (from generic code) >>> + in front of the global entry point and after the local entry point, >>> + under the control of cfun->machine->stop_patch_area_print, see >>> + rs6000_output_function_prologue and rs6000_elf_declare_function_name. >>> */ >>> + if (TARGET_SPLIT_PATCH_NOPS) >>> + { >>> + if (!cfun->machine->stop_patch_area_print) >>> + default_print_patchable_function_entry (file, patch_area_size, >>> + record_p); >>> + else >>> + gcc_assert (global_entry_needed_p); >>> + } >>> } >>> >>> enum rtx_code >>> @@ -21423,6 +21437,11 @@ rs6000_elf_declare_function_name (FILE *file, >>> const char *name, tree decl) >>> fprintf (file, "\t.previous\n"); >>> } >>> ASM_OUTPUT_FUNCTION_LABEL (file, name, decl); >>> + /* At this time, the "before" NOPs have been already emitted. >>> + For split nops stop generic code from printing the "after" NOPs and >>> + emit them just after local entry ourself later. */ >>> + if (rs6000_global_entry_point_prologue_needed_p ()) >>> + cfun->machine->stop_patch_area_print = true; >>> } >>> >>> static void rs6000_elf_file_end (void) ATTRIBUTE_UNUSED; >>> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h >>> index d460eb06544..2bd56f8d377 100644 >>> --- a/gcc/config/rs6000/rs6000.h >>> +++ b/gcc/config/rs6000/rs6000.h >>> @@ -2424,6 +2424,12 @@ typedef struct GTY(()) machine_function >>> global entry. It helps to control the patchable area before and after >>> local entry. */ >>> bool global_entry_emitted; >>> + /* With ELFv2 ABI dual entry points being adopted, generic framework >>> + targetm.asm_out.print_patchable_function_entry would generate "after" >>> + NOPs before local entry, which is wrong. This flag is to stop it from >>> + printing patch area before local entry, it is only useful when the >>> + function requires dual entry points. */ >>> + bool stop_patch_area_print; >>> } machine_function; >>> #endif >>> >>> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt >>> index 94323bd1db2..27a165ef0ee 100644 >>> --- a/gcc/config/rs6000/rs6000.opt >>> +++ b/gcc/config/rs6000/rs6000.opt >>> @@ -300,6 +300,10 @@ mfull-toc >>> Target >>> Put everything in the regular TOC. >>> >>> +msplit-patch-nops >>> +Target Var(TARGET_SPLIT_PATCH_NOPS) Init(0) >>> +Emit NOPs before global and after local entry point for >>> -fpatchable-function-entry. >>> + >>> mvrsave >>> Target Var(TARGET_ALTIVEC_VRSAVE) Save >>> Generate VRSAVE instructions when generating AltiVec code. >>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >>> index 93e9af3791c..f3d37a48fd0 100644 >>> --- a/gcc/doc/invoke.texi >>> +++ b/gcc/doc/invoke.texi >>> @@ -1318,6 +1318,7 @@ See RS/6000 and PowerPC Options. >>> -mtraceback=@var{traceback_type} >>> -maix-struct-return -msvr4-struct-return >>> -mabi=@var{abi-type} -msecure-plt -mbss-plt >>> +-msplit-patch-nops >>> -mlongcall -mno-longcall -mpltseq -mno-pltseq >>> -mblock-move-inline-limit=@var{num} >>> -mblock-compare-inline-limit=@var{num} >>> @@ -18569,11 +18570,12 @@ If @code{N=0}, no pad location is recorded. >>> The NOP instructions are inserted at---and maybe before, depending on >>> @var{M}---the function entry address, even before the prologue. On >>> PowerPC with the ELFv2 ABI, for a function with dual entry points, >>> -the local entry point is this function entry address. >>> +the local entry point is this function entry address by default. See >>> +the @option{-msplit-patch-nops} option to change this. >>> >>> The maximum value of @var{N} and @var{M} is 65535. On PowerPC with the >>> ELFv2 ABI, for a function with dual entry points, the supported values >>> -for @var{M} are 0, 2, 6 and 14. >>> +for @var{M} are 0, 2, 6 and 14 when not using @option{-msplit-patch-nops}. >>> @end table >>> >>> >>> @@ -31658,6 +31660,17 @@ requires @code{.plt} and @code{.got} >>> sections that are both writable and executable. >>> This is a PowerPC 32-bit SYSV ABI option. >>> >>> +@opindex msplit-patch-nops >>> +@item -msplit-patch-nops >>> +When adding NOPs for a patchable area via the >>> +@option{-fpatchable-function-entry} option emit the "before" NOPs in front >>> +of the global entry point and the "after" NOPs after the local entry point. >>> +This makes the sequence of NOPs not consecutive when a global entry point >>> +is generated. Without this option the NOPs are emitted directly before and >>> +after the local entry point, making them consecutive but moving global and >>> +local entry point further apart. If only a single entry point is generated >>> +this option has no effect. >>> + >>> @opindex misel >>> @opindex mno-isel >>> @item -misel >>> -- >>> 2.35.3