ping. Would this be a good approach for fixing the issue?
> Hi Jakub. > >> On Thu, Dec 08, 2022 at 02:02:36PM +0100, Jose E. Marchesi wrote: >>> So, I guess the right fix would be to call assemble_external_libcall >>> during final? The `.global FOO' directive would be generated >>> immediately before the call sequence, but I guess that would be ok. >> >> During final only if all the targets can deal with the effects of >> assemble_external_libcall being done in the middle of emitting assembly >> for the function. >> >> Otherwise, it could be e.g. done in the first loop of shorten_branches. >> >> Note, in calls.cc it is done only for emit_library_call_value_1 >> and not for emit_call_1, so if we do it late, we need to be able to find >> out what call is to a libcall and what is to a normal call. If there is >> no way to differentiate it right now, perhaps we need some flag somewhere, >> say on a SYMBOL_REF. And then assemble_external_libcall either only >> if such a SYMBOL_REF appears in CALL_INSN or sibcall JUMP_INSN, or >> perhaps anywhere in the function and its constant pool. > > Allright, the quick-and-dirty patch below seems to DTRT with simple > examples. > > First, when libcalls are generated. Note only one .global is generated > for all calls, and actually it is around the same position than before: > > $ cat foo.c > int foo(unsigned int len, int flag) > { > if (flag) > return (((long)len) * 234 / 5); > return (((long)len) * 2 / 5); > } > $ cc1 -O2 foo.c > $ cat foo.c > .file "foo.c" > .text > .global __divdi3 > .align 3 > .global foo > .type foo, @function > foo: > mov32 %r1,%r1 > lsh %r2,32 > jne %r2,0,.L5 > mov %r2,5 > lsh %r1,1 > call __divdi3 > lsh %r0,32 > arsh %r0,32 > exit > .L5: > mov %r2,5 > mul %r1,234 > call __divdi3 > lsh %r0,32 > arsh %r0,32 > exit > .size foo, .-foo > .ident "GCC: (GNU) 13.0.0 20221207 (experimental)" > > Second, when libcalls are tried by expand_moddiv in a sequence, but then > discarded and not linked in the main sequence: > > $ cat foo.c > int foo(unsigned int len, int flag) > { > if (flag) > return (((long)len) * 234 / 5); > return (((long)len) * 2 / 5); > } > $ cc1 -O2 foo.c > $ cat foo.c > .file "foo.c" > .text > .align 3 > .global foo > .type foo, @function > foo: > mov32 %r0,%r1 > lsh %r2,32 > jne %r2,0,.L5 > add %r0,%r0 > div %r0,5 > lsh %r0,32 > arsh %r0,32 > exit > .L5: > mul %r0,234 > div %r0,5 > lsh %r0,32 > arsh %r0,32 > exit > .size foo, .-foo > .ident "GCC: (GNU) 13.0.0 20221207 (experimental)" > > Note the .global now is not generated, as desired. > > As you can see below, I am adding a new RTX flag `is_libcall', with > written form "/l". > > Before I get into serious testing etc, can you please confirm whether > this is the right approach or not? > > In particular, I am a little bit concerned about the expectation I am > using that the target of the `call' instruction emitted by emit_call_1 > is always a (MEM (SYMBOL_REF ...)) when it is passed a SYMBOL_REF as the > first argument (`fun' in emit_library_call_value_1). > > Thanks. > > diff --git a/gcc/calls.cc b/gcc/calls.cc > index 6dd6f73e978..6c4a3725272 100644 > --- a/gcc/calls.cc > +++ b/gcc/calls.cc > @@ -4370,10 +4370,6 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx > value, > || argvec[i].partial != 0) > update_stack_alignment_for_call (&argvec[i].locate); > > - /* If this machine requires an external definition for library > - functions, write one out. */ > - assemble_external_libcall (fun); > - > original_args_size = args_size; > args_size.constant = (aligned_upper_bound (args_size.constant > + stack_pointer_delta, > @@ -4717,6 +4713,9 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx > value, > valreg, > old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far); > > + /* Mark the emitted call as a libcall with the new flag. */ > + RTL_LIBCALL_P (last_call_insn ()) = 1; > + > if (flag_ipa_ra) > { > rtx datum = orgfun; > diff --git a/gcc/final.cc b/gcc/final.cc > index eea572238f6..df57de5afd0 100644 > --- a/gcc/final.cc > +++ b/gcc/final.cc > @@ -815,6 +815,8 @@ make_pass_compute_alignments (gcc::context *ctxt) > reorg.cc, since the branch splitting exposes new instructions with delay > slots. */ > > +static rtx call_from_call_insn (rtx_call_insn *insn); > + > void > shorten_branches (rtx_insn *first) > { > @@ -850,6 +852,24 @@ shorten_branches (rtx_insn *first) > for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn)) > { > INSN_SHUID (insn) = i++; > + > + /* If this is a `call' instruction that implements a libcall, > + and this machine requires an external definition for library > + functions, write one out. */ > + if (CALL_P (insn) && RTL_LIBCALL_P (insn)) > + { > + rtx nested_call = call_from_call_insn (safe_as_a <rtx_call_insn*> > (insn)); > + rtx mem = XEXP (nested_call, 0); > + gcc_assert (GET_CODE (mem) == MEM); > + rtx fun = XEXP (mem, 0); > + gcc_assert (GET_CODE (fun) == SYMBOL_REF); > + assemble_external_libcall (fun); > + > + /* Clear the LIBCALL flag to make sure we don't assemble the > + external definition more than once. */ > + RTL_LIBCALL_P (insn) = 0; > + } > + > if (INSN_P (insn)) > continue; > > diff --git a/gcc/print-rtl.cc b/gcc/print-rtl.cc > index e115f987173..26a06511619 100644 > --- a/gcc/print-rtl.cc > +++ b/gcc/print-rtl.cc > @@ -882,6 +882,9 @@ rtx_writer::print_rtx (const_rtx in_rtx) > if (RTX_FLAG (in_rtx, return_val)) > fputs ("/i", m_outfile); > > + if (RTX_FLAG (in_rtx, is_libcall)) > + fputs ("/l", m_outfile); > + > /* Print REG_NOTE names for EXPR_LIST and INSN_LIST. */ > if ((GET_CODE (in_rtx) == EXPR_LIST > || GET_CODE (in_rtx) == INSN_LIST > diff --git a/gcc/read-rtl.cc b/gcc/read-rtl.cc > index 62c7895af60..eb0ae150a5b 100644 > --- a/gcc/read-rtl.cc > +++ b/gcc/read-rtl.cc > @@ -1543,6 +1543,9 @@ read_flags (rtx return_rtx) > case 'i': > RTX_FLAG (return_rtx, return_val) = 1; > break; > + case 'l': > + RTX_FLAG (return_rtx, is_libcall) = 1; > + break; > default: > fatal_with_file_and_line ("unrecognized flag: `%c'", flag_char); > } > diff --git a/gcc/rtl.h b/gcc/rtl.h > index 7a8c4709257..92c802d3876 100644 > --- a/gcc/rtl.h > +++ b/gcc/rtl.h > @@ -401,6 +401,10 @@ struct GTY((desc("0"), tag("0"), > Dumped as "/i" in RTL dumps. */ > unsigned return_val : 1; > > + /* 1 in a CALL_INSN if it is a libcall. > + Dumped as "/l" in RTL dumps. */ > + unsigned is_libcall : 1; > + > union { > /* The final union field is aligned to 64 bits on LP64 hosts, > giving a 32-bit gap after the fields above. We optimize the > @@ -1578,6 +1582,10 @@ jump_table_for_label (const rtx_code_label *label) > #define RTL_PURE_CALL_P(RTX) \ > (RTL_FLAG_CHECK1 ("RTL_PURE_CALL_P", (RTX), CALL_INSN)->return_val) > > +/* 1 if RTX is a libcall. */ > +#define RTL_LIBCALL_P(RTX) \ > + (RTL_FLAG_CHECK1 ("RTL_LIBCALL_P", (RTX), CALL_INSN)->is_libcall) > + > /* 1 if RTX is a call to a const or pure function. */ > #define RTL_CONST_OR_PURE_CALL_P(RTX) \ > (RTL_CONST_CALL_P (RTX) || RTL_PURE_CALL_P (RTX))