On Wed, Jan 4, 2023 at 9:54 AM Jose E. Marchesi via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > ping. > Would this be a good approach for fixing the issue?
adding the is_libcall bit enlarges rtx_def by 8 bytes - there's no room for more bits here. I really wonder how other targets avoid the issue you are pointing out? Do their assemblers prune unused (extern) .global? > > Hi Jakub. > > > >> On Thu, Dec 08, 2022 at 02:02:36PM +0100, Jose E. Marchesi wrote: > >>> So, I guess the right fix would be to call assemble_external_libcall > >>> during final? The `.global FOO' directive would be generated > >>> immediately before the call sequence, but I guess that would be ok. > >> > >> During final only if all the targets can deal with the effects of > >> assemble_external_libcall being done in the middle of emitting assembly > >> for the function. > >> > >> Otherwise, it could be e.g. done in the first loop of shorten_branches. > >> > >> Note, in calls.cc it is done only for emit_library_call_value_1 > >> and not for emit_call_1, so if we do it late, we need to be able to find > >> out what call is to a libcall and what is to a normal call. If there is > >> no way to differentiate it right now, perhaps we need some flag somewhere, > >> say on a SYMBOL_REF. And then assemble_external_libcall either only > >> if such a SYMBOL_REF appears in CALL_INSN or sibcall JUMP_INSN, or > >> perhaps anywhere in the function and its constant pool. > > > > Allright, the quick-and-dirty patch below seems to DTRT with simple > > examples. > > > > First, when libcalls are generated. Note only one .global is generated > > for all calls, and actually it is around the same position than before: > > > > $ cat foo.c > > int foo(unsigned int len, int flag) > > { > > if (flag) > > return (((long)len) * 234 / 5); > > return (((long)len) * 2 / 5); > > } > > $ cc1 -O2 foo.c > > $ cat foo.c > > .file "foo.c" > > .text > > .global __divdi3 > > .align 3 > > .global foo > > .type foo, @function > > foo: > > mov32 %r1,%r1 > > lsh %r2,32 > > jne %r2,0,.L5 > > mov %r2,5 > > lsh %r1,1 > > call __divdi3 > > lsh %r0,32 > > arsh %r0,32 > > exit > > .L5: > > mov %r2,5 > > mul %r1,234 > > call __divdi3 > > lsh %r0,32 > > arsh %r0,32 > > exit > > .size foo, .-foo > > .ident "GCC: (GNU) 13.0.0 20221207 (experimental)" > > > > Second, when libcalls are tried by expand_moddiv in a sequence, but then > > discarded and not linked in the main sequence: > > > > $ cat foo.c > > int foo(unsigned int len, int flag) > > { > > if (flag) > > return (((long)len) * 234 / 5); > > return (((long)len) * 2 / 5); > > } > > $ cc1 -O2 foo.c > > $ cat foo.c > > .file "foo.c" > > .text > > .align 3 > > .global foo > > .type foo, @function > > foo: > > mov32 %r0,%r1 > > lsh %r2,32 > > jne %r2,0,.L5 > > add %r0,%r0 > > div %r0,5 > > lsh %r0,32 > > arsh %r0,32 > > exit > > .L5: > > mul %r0,234 > > div %r0,5 > > lsh %r0,32 > > arsh %r0,32 > > exit > > .size foo, .-foo > > .ident "GCC: (GNU) 13.0.0 20221207 (experimental)" > > > > Note the .global now is not generated, as desired. > > > > As you can see below, I am adding a new RTX flag `is_libcall', with > > written form "/l". > > > > Before I get into serious testing etc, can you please confirm whether > > this is the right approach or not? > > > > In particular, I am a little bit concerned about the expectation I am > > using that the target of the `call' instruction emitted by emit_call_1 > > is always a (MEM (SYMBOL_REF ...)) when it is passed a SYMBOL_REF as the > > first argument (`fun' in emit_library_call_value_1). > > > > Thanks. > > > > diff --git a/gcc/calls.cc b/gcc/calls.cc > > index 6dd6f73e978..6c4a3725272 100644 > > --- a/gcc/calls.cc > > +++ b/gcc/calls.cc > > @@ -4370,10 +4370,6 @@ emit_library_call_value_1 (int retval, rtx orgfun, > > rtx value, > > || argvec[i].partial != 0) > > update_stack_alignment_for_call (&argvec[i].locate); > > > > - /* If this machine requires an external definition for library > > - functions, write one out. */ > > - assemble_external_libcall (fun); > > - > > original_args_size = args_size; > > args_size.constant = (aligned_upper_bound (args_size.constant > > + stack_pointer_delta, > > @@ -4717,6 +4713,9 @@ emit_library_call_value_1 (int retval, rtx orgfun, > > rtx value, > > valreg, > > old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far); > > > > + /* Mark the emitted call as a libcall with the new flag. */ > > + RTL_LIBCALL_P (last_call_insn ()) = 1; > > + > > if (flag_ipa_ra) > > { > > rtx datum = orgfun; > > diff --git a/gcc/final.cc b/gcc/final.cc > > index eea572238f6..df57de5afd0 100644 > > --- a/gcc/final.cc > > +++ b/gcc/final.cc > > @@ -815,6 +815,8 @@ make_pass_compute_alignments (gcc::context *ctxt) > > reorg.cc, since the branch splitting exposes new instructions with delay > > slots. */ > > > > +static rtx call_from_call_insn (rtx_call_insn *insn); > > + > > void > > shorten_branches (rtx_insn *first) > > { > > @@ -850,6 +852,24 @@ shorten_branches (rtx_insn *first) > > for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn)) > > { > > INSN_SHUID (insn) = i++; > > + > > + /* If this is a `call' instruction that implements a libcall, > > + and this machine requires an external definition for library > > + functions, write one out. */ > > + if (CALL_P (insn) && RTL_LIBCALL_P (insn)) > > + { > > + rtx nested_call = call_from_call_insn (safe_as_a > > <rtx_call_insn*> (insn)); > > + rtx mem = XEXP (nested_call, 0); > > + gcc_assert (GET_CODE (mem) == MEM); > > + rtx fun = XEXP (mem, 0); > > + gcc_assert (GET_CODE (fun) == SYMBOL_REF); > > + assemble_external_libcall (fun); > > + > > + /* Clear the LIBCALL flag to make sure we don't assemble the > > + external definition more than once. */ > > + RTL_LIBCALL_P (insn) = 0; > > + } > > + > > if (INSN_P (insn)) > > continue; > > > > diff --git a/gcc/print-rtl.cc b/gcc/print-rtl.cc > > index e115f987173..26a06511619 100644 > > --- a/gcc/print-rtl.cc > > +++ b/gcc/print-rtl.cc > > @@ -882,6 +882,9 @@ rtx_writer::print_rtx (const_rtx in_rtx) > > if (RTX_FLAG (in_rtx, return_val)) > > fputs ("/i", m_outfile); > > > > + if (RTX_FLAG (in_rtx, is_libcall)) > > + fputs ("/l", m_outfile); > > + > > /* Print REG_NOTE names for EXPR_LIST and INSN_LIST. */ > > if ((GET_CODE (in_rtx) == EXPR_LIST > > || GET_CODE (in_rtx) == INSN_LIST > > diff --git a/gcc/read-rtl.cc b/gcc/read-rtl.cc > > index 62c7895af60..eb0ae150a5b 100644 > > --- a/gcc/read-rtl.cc > > +++ b/gcc/read-rtl.cc > > @@ -1543,6 +1543,9 @@ read_flags (rtx return_rtx) > > case 'i': > > RTX_FLAG (return_rtx, return_val) = 1; > > break; > > + case 'l': > > + RTX_FLAG (return_rtx, is_libcall) = 1; > > + break; > > default: > > fatal_with_file_and_line ("unrecognized flag: `%c'", flag_char); > > } > > diff --git a/gcc/rtl.h b/gcc/rtl.h > > index 7a8c4709257..92c802d3876 100644 > > --- a/gcc/rtl.h > > +++ b/gcc/rtl.h > > @@ -401,6 +401,10 @@ struct GTY((desc("0"), tag("0"), > > Dumped as "/i" in RTL dumps. */ > > unsigned return_val : 1; > > > > + /* 1 in a CALL_INSN if it is a libcall. > > + Dumped as "/l" in RTL dumps. */ > > + unsigned is_libcall : 1; > > + > > union { > > /* The final union field is aligned to 64 bits on LP64 hosts, > > giving a 32-bit gap after the fields above. We optimize the > > @@ -1578,6 +1582,10 @@ jump_table_for_label (const rtx_code_label *label) > > #define RTL_PURE_CALL_P(RTX) \ > > (RTL_FLAG_CHECK1 ("RTL_PURE_CALL_P", (RTX), CALL_INSN)->return_val) > > > > +/* 1 if RTX is a libcall. */ > > +#define RTL_LIBCALL_P(RTX) \ > > + (RTL_FLAG_CHECK1 ("RTL_LIBCALL_P", (RTX), CALL_INSN)->is_libcall) > > + > > /* 1 if RTX is a call to a const or pure function. */ > > #define RTL_CONST_OR_PURE_CALL_P(RTX) \ > > (RTL_CONST_CALL_P (RTX) || RTL_PURE_CALL_P (RTX))