Christophe Lyon <christophe.l...@st.com> writes:
> The FDPIC register is hard-coded to r9, as defined in the ABI.
>
> We have to disable tailcall optimizations if we don't know if the
> target function is in the same module. If not, we have to set r9 to
> the value associated with the target module.
>
> When generating a symbol address, we have to take into account whether
> it is a pointer to data or to a function, because different
> relocations are needed.
>
> 2019-XX-XX  Christophe Lyon  <christophe.l...@st.com>
>       Mickaël Guêné <mickael.gu...@st.com>
>
>       * config/arm/arm-c.c (__FDPIC__): Define new pre-processor macro
>       in FDPIC mode.
>       * config/arm/arm-protos.h (arm_load_function_descriptor): Declare
>       new function.
>       * config/arm/arm.c (arm_option_override): Define pic register to
>       FDPIC_REGNUM.
>       (arm_function_ok_for_sibcall): Disable sibcall optimization if we
>       have no decl or go through PLT.
>       (arm_load_pic_register): Handle TARGET_FDPIC.
>       (arm_is_segment_info_known): New function.
>       (arm_pic_static_addr): Add support for FDPIC.
>       (arm_load_function_descriptor): New function.
>       (arm_assemble_integer): Add support for FDPIC.
>       * config/arm/arm.h (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED):
>       Define. (FDPIC_REGNUM): New define.
>       * config/arm/arm.md (call): Add support for FDPIC.
>       (call_value): Likewise.
>       (*restore_pic_register_after_call): New pattern.
>       (untyped_call): Disable if FDPIC.
>       (untyped_return): Likewise.
>       * config/arm/unspecs.md (UNSPEC_PIC_RESTORE): New.
>
> Change-Id: I8fb1a6b85ace672184013568c5d28fbda2f7fda4
>
> diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
> index 6e256ee..34695fa 100644
> --- a/gcc/config/arm/arm-c.c
> +++ b/gcc/config/arm/arm-c.c
> @@ -203,6 +203,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
>        builtin_define ("__ARM_EABI__");
>      }
>  
> +  def_or_undef_macro (pfile, "__FDPIC__", TARGET_FDPIC);
> +
>    def_or_undef_macro (pfile, "__ARM_ARCH_EXT_IDIV__", TARGET_IDIV);
>    def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV);
>  
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 485bc68..272968a 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -139,6 +139,7 @@ extern int arm_max_const_double_inline_cost (void);
>  extern int arm_const_double_inline_cost (rtx);
>  extern bool arm_const_double_by_parts (rtx);
>  extern bool arm_const_double_by_immediates (rtx);
> +extern rtx arm_load_function_descriptor (rtx funcdesc);
>  extern void arm_emit_call_insn (rtx, rtx, bool);
>  bool detect_cmse_nonsecure_call (tree);
>  extern const char *output_call (rtx *);
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 45abcd8..d9397b5 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -3485,6 +3485,15 @@ arm_option_override (void)
>    if (flag_pic && TARGET_VXWORKS_RTP)
>      arm_pic_register = 9;
>  
> +  /* If in FDPIC mode then force arm_pic_register to be r9.  */
> +  if (TARGET_FDPIC)
> +    {
> +      arm_pic_register = FDPIC_REGNUM;
> +      if (! TARGET_ARM && ! TARGET_THUMB2)
> +     sorry ("FDPIC mode is supported on architecture versions that "
> +            "support ARM or Thumb-2 only.");
> +    }
> +
>    if (arm_pic_register_string != NULL)
>      {
>        int pic_register = decode_reg_name (arm_pic_register_string);

Isn't this equivalent to rejecting Thumb-1?  I think that would be
clearer in both the condition and the error message.

How does this interact with arm_pic_data_is_text_relative?  Are both
values supported?

> @@ -7295,6 +7304,21 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
>    if (cfun->machine->sibcall_blocked)
>      return false;
>  
> +  if (TARGET_FDPIC)
> +    {
> +      /* In FDPIC, never tailcall something for which we have no decl:
> +      the target function could be in a different module, requiring
> +      a different FDPIC register value.  */
> +      if (decl == NULL)
> +     return false;
> +
> +      /* Don't tailcall if we go through the PLT since the FDPIC
> +      register is then corrupted and we don't restore it after
> +      static function calls.  */
> +      if (!targetm.binds_local_p (decl))
> +     return false;
> +    }
> +
>    /* Never tailcall something if we are generating code for Thumb-1.  */
>    if (TARGET_THUMB1)
>      return false;
> @@ -7711,7 +7735,9 @@ arm_load_pic_register (unsigned long saved_regs 
> ATTRIBUTE_UNUSED, rtx pic_reg)
>  {
>    rtx l1, labelno, pic_tmp, pic_rtx;
>  
> -  if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
> +  if (crtl->uses_pic_offset_table == 0
> +      || TARGET_SINGLE_PIC_BASE
> +      || TARGET_FDPIC)
>      return;
>  
>    gcc_assert (flag_pic);
> @@ -7780,28 +7806,142 @@ arm_load_pic_register (unsigned long saved_regs 
> ATTRIBUTE_UNUSED, rtx pic_reg)
>    emit_use (pic_reg);
>  }
>  
> +/* Try to determine whether an object, referenced via ORIG, will be
> +   placed in the text or data segment.  This is used in FDPIC mode, to
> +   decide which relocations to use when accessing ORIG.  IS_READONLY
> +   is set to true if ORIG is a read-only location, false otherwise.
> +   Return true if we could determine the location of ORIG, false
> +   otherwise.  IS_READONLY is valid only when we return true.  */

Maybe *IS_READONLY in both cases?

> +static bool
> +arm_is_segment_info_known (rtx orig, bool *is_readonly)
> +{
> +  bool res = false;
> +
> +  *is_readonly = false;
> +
> +  if (GET_CODE (orig) == LABEL_REF)
> +    {
> +      res = true;
> +      *is_readonly = true;
> +    }

Think this function would be easier to read with early returns.

> +  else if (SYMBOL_REF_P (orig))

...so "if" rather than "else if" here.

> +    {
> +      if (CONSTANT_POOL_ADDRESS_P (orig))
> +     {
> +       res = true;
> +       *is_readonly = true;
> +     }
> +      else if (SYMBOL_REF_LOCAL_P (orig)
> +            && !SYMBOL_REF_EXTERNAL_P (orig)
> +            && SYMBOL_REF_DECL (orig)
> +            && (!DECL_P (SYMBOL_REF_DECL (orig))
> +                || !DECL_COMMON (SYMBOL_REF_DECL (orig))))
> +     {
> +       tree decl = SYMBOL_REF_DECL (orig);
> +       tree init = (TREE_CODE (decl) == VAR_DECL)
> +         ? DECL_INITIAL (decl) : (TREE_CODE (decl) == CONSTRUCTOR)
> +         ? decl : 0;
> +       int reloc = 0;
> +       bool named_section, readonly;
> +
> +       if (init && init != error_mark_node)
> +         reloc = compute_reloc_for_constant (init);
> +
> +       named_section = TREE_CODE (decl) == VAR_DECL
> +         && lookup_attribute ("section", DECL_ATTRIBUTES (decl));

Here too I think it would be better to return false early.

How much variation do you support here for named sections?  E.g. can a
linker script really put SECTION_WRITE sections in the text segment?
Seems like there are some cases that could be handled.

(Just asking, not suggesting you should change anything.)

> +       readonly = decl_readonly_section (decl, reloc);
> +
> +       /* We don't know where the link script will put a named
> +          section, so return false in such a case.  */
> +       res = !named_section;
> +
> +       if (!named_section)
> +         *is_readonly = readonly;
> +     }
> +      else
> +     {
> +       /* We don't know.  */
> +       res = false;
> +     }
> +    }
> +  else
> +    gcc_unreachable ();
> +
> +  return res;
> +}
> +
>  /* Generate code to load the address of a static var when flag_pic is set.  
> */
>  static rtx_insn *
>  arm_pic_static_addr (rtx orig, rtx reg)
>  {
>    rtx l1, labelno, offset_rtx;
> +  rtx_insn *insn;
>  
>    gcc_assert (flag_pic);
>  
> -  /* We use an UNSPEC rather than a LABEL_REF because this label
> -     never appears in the code stream.  */
> -  labelno = GEN_INT (pic_labelno++);
> -  l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);
> -  l1 = gen_rtx_CONST (VOIDmode, l1);
> +  bool is_readonly = false;
> +  bool info_known = false;
>  
> -  /* On the ARM the PC register contains 'dot + 8' at the time of the
> -     addition, on the Thumb it is 'dot + 4'.  */
> -  offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);
> -  offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),
> -                               UNSPEC_SYMBOL_OFFSET);
> -  offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);
> +  if (TARGET_FDPIC
> +      && SYMBOL_REF_P (orig)
> +      && !SYMBOL_REF_FUNCTION_P (orig))
> +      info_known = arm_is_segment_info_known (orig, &is_readonly);

Excess indendentation.  Feels like it might be slightly simpler
to handle SYMBOL_REF_FUNCTION_P in arm_is_segment_info_known,
but I guess the idea is that it might not then be clear whether
the caller is asking about a descriptor or the function itself.

>  
> -  return emit_insn (gen_pic_load_addr_unified (reg, offset_rtx, labelno));
> +  if (TARGET_FDPIC
> +      && SYMBOL_REF_P (orig)
> +      && !SYMBOL_REF_FUNCTION_P (orig)
> +      && !info_known)
> +    {
> +      /* We don't know where orig is stored, so we have be
> +      pessimistic and use a GOT relocation.  */
> +      rtx pat;
> +      rtx mem;
> +      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
> +
> +      pat = gen_calculate_pic_address (reg, pic_reg, orig);
> +
> +      /* Make the MEM as close to a constant as possible.  */
> +      mem = SET_SRC (pat);
> +      gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem));
> +      MEM_READONLY_P (mem) = 1;
> +      MEM_NOTRAP_P (mem) = 1;
> +
> +      insn = emit_insn (pat);

Think "pat = ..." onwards should be split out into a helper, since it's
a cut-&-paste of the code in legitimize_pic_address.

> +    }
> +  else if (TARGET_FDPIC
> +        && SYMBOL_REF_P (orig)
> +        && (SYMBOL_REF_FUNCTION_P (orig)
> +            || (info_known && !is_readonly)))
> +    {
> +      /* We use the GOTOFF relocation.  */
> +      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
> +
> +      rtx l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig), UNSPEC_PIC_SYM);
> +      emit_insn (gen_movsi (reg, l1));
> +      insn = emit_insn (gen_addsi3 (reg, reg, pic_reg));
> +    }
> +  else
> +    {
> +      /* Not FDPIC, not SYMBOL_REF_P or readonly: we can use
> +      PC-relative access.  */
> +      /* We use an UNSPEC rather than a LABEL_REF because this label
> +      never appears in the code stream.  */
> +      labelno = GEN_INT (pic_labelno++);
> +      l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);
> +      l1 = gen_rtx_CONST (VOIDmode, l1);
> +
> +      /* On the ARM the PC register contains 'dot + 8' at the time of the
> +      addition, on the Thumb it is 'dot + 4'.  */
> +      offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);
> +      offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),
> +                                UNSPEC_SYMBOL_OFFSET);
> +      offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);
> +
> +      insn = emit_insn (gen_pic_load_addr_unified (reg, offset_rtx,
> +                                                labelno));
> +    }
> +
> +  return insn;
>  }
>  
>  /* Return nonzero if X is valid as an ARM state addressing register.  */
> @@ -16112,9 +16252,36 @@ get_jump_table_size (rtx_jump_table_data *insn)
>    return 0;
>  }
>  
> +/* Emit insns to load the function address from FUNCDESC (an FDPIC
> +   function descriptor) into a register and the GOT address into the
> +   FDPIC register, returning an rtx for the register holding the
> +   function address.  */
> +
> +rtx
> +arm_load_function_descriptor (rtx funcdesc)
> +{
> +  rtx fnaddr_reg = gen_reg_rtx (Pmode);
> +  rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
> +  rtx fnaddr = gen_rtx_MEM (Pmode, funcdesc);
> +  rtx gotaddr = gen_rtx_MEM (Pmode, plus_constant (Pmode, funcdesc, 4));
> +  rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
> +
> +  emit_move_insn (fnaddr_reg, fnaddr);
> +  /* The ABI requires the entry point address to be loaded first, so
> +     prevent the load from being moved after that of the GOT
> +     address.  */

Do you mean that the move insn above has to come before the
pattern below?  If so, I think that should be enforced by making this...

> +  XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode,
> +                                     gen_rtvec (2, pic_reg, gotaddr),
> +                                     UNSPEC_PIC_RESTORE);
> +  XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, gotaddr);
> +  XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, pic_reg);
> +  emit_insn (par);
> +
> +  return fnaddr_reg;
> +}
> +

...use fnaddr_reg.

Does the instruction actually use pic_reg?  We only get here for
non-symbolic addresses after all.

It seems simpler to make *restore_pic_register_after_call a named pattern
and use gen_restore_pic_register_after_call instead.

>  /* Return the maximum amount of padding that will be inserted before
>     label LABEL.  */
> -
>  static HOST_WIDE_INT
>  get_label_padding (rtx label)
>  {
> @@ -23069,9 +23236,37 @@ arm_assemble_integer (rtx x, unsigned int size, int 
> aligned_p)
>                 && (!SYMBOL_REF_LOCAL_P (x)
>                     || (SYMBOL_REF_DECL (x)
>                         ? DECL_WEAK (SYMBOL_REF_DECL (x)) : 0))))
> -         fputs ("(GOT)", asm_out_file);
> +         {
> +           if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))
> +             fputs ("(GOTFUNCDESC)", asm_out_file);
> +           else
> +             fputs ("(GOT)", asm_out_file);
> +         }
>         else
> -         fputs ("(GOTOFF)", asm_out_file);
> +         {
> +           if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))
> +             fputs ("(GOTOFFFUNCDESC)", asm_out_file);
> +           else
> +             {
> +               bool is_readonly;
> +
> +               if (arm_is_segment_info_known (x, &is_readonly))
> +                 fputs ("(GOTOFF)", asm_out_file);
> +               else
> +                 fputs ("(GOT)", asm_out_file);
> +             }
> +         }
> +     }
> +
> +      /* For FDPIC we also have to mark symbol for .data section.  */
> +      if (TARGET_FDPIC
> +       && NEED_GOT_RELOC
> +       && flag_pic
> +       && !making_const_table
> +       && SYMBOL_REF_P (x))
> +     {
> +       if (SYMBOL_REF_FUNCTION_P (x))
> +         fputs ("(FUNCDESC)", asm_out_file);
>       }
>        fputc ('\n', asm_out_file);
>        return true;

Do you expect to reach here for LABEL_REFs with TARGET_FDPIC?  The second
block of code tests for SYMBOL_REF_P but the first tests
SYMBOL_REF_FUNCTION_P without checking SYMBOL_REF_P first.

Can NEED_GOT_RELOC or flag_pic be false for TARGET_FDPIC?
Is !flag_pic TARGET_FDPIC supported?

> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 0aecd03..9036255 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -8127,6 +8127,23 @@
>      rtx callee, pat;
>      tree addr = MEM_EXPR (operands[0]);
>      
> +    /* Force FDPIC register (r9) before call.  */
> +    if (TARGET_FDPIC)
> +      {
> +     /* No need to update r9 if calling a static function.
> +        In other words: set r9 for indirect or non-local calls.  */
> +     callee = XEXP (operands[0], 0);
> +     if (!SYMBOL_REF_P (callee)
> +         || !SYMBOL_REF_LOCAL_P (callee)
> +         || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))

IMO it would be better to calculate this once rather than repeat
it below.

> +       {
> +         emit_insn (gen_blockage ());

Why's the blockage needed?  Seems worth a comment.

> +         rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
> +         emit_move_insn (pic_reg, get_hard_reg_initial_val (Pmode, 
> FDPIC_REGNUM));
> +         emit_insn (gen_rtx_USE (VOIDmode, pic_reg));

Is this use keeping the register live for the call?  If so,
I think it'd be better to attach it to the CALL_INSN_FUNCTION_USAGE
instead.

> +      }
> +      }
> +
>      /* In an untyped call, we can get NULL for operand 2.  */
>      if (operands[2] == NULL_RTX)
>        operands[2] = const0_rtx;
> @@ -8140,6 +8157,13 @@
>       : !REG_P (callee))
>        XEXP (operands[0], 0) = force_reg (Pmode, callee);
>  
> +    if (TARGET_FDPIC && !SYMBOL_REF_P (XEXP (operands[0], 0)))
> +      {
> +     /* Indirect call: set r9 with FDPIC value of callee.  */
> +     XEXP (operands[0], 0)
> +       = arm_load_function_descriptor (XEXP (operands[0], 0));
> +      }
> +
>      if (detect_cmse_nonsecure_call (addr))
>        {
>       pat = gen_nonsecure_call_internal (operands[0], operands[1],

Redundant braces.

> @@ -8151,10 +8175,38 @@
>       pat = gen_call_internal (operands[0], operands[1], operands[2]);
>       arm_emit_call_insn (pat, XEXP (operands[0], 0), false);
>        }
> +
> +    /* Restore FDPIC register (r9) after call.  */
> +    if (TARGET_FDPIC)
> +      {
> +     /* No need to update r9 if calling a static function.  */
> +     if (!SYMBOL_REF_P (callee)
> +         || !SYMBOL_REF_LOCAL_P (callee)
> +         || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))
> +       {
> +         rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
> +         emit_move_insn (pic_reg, get_hard_reg_initial_val (Pmode, 
> FDPIC_REGNUM));
> +         emit_insn (gen_rtx_USE (VOIDmode, pic_reg));
> +         emit_insn (gen_blockage ());
> +       }
> +      }
>      DONE;
>    }"
>  )

What's the general assumption about the validity of r9?  Seems odd that
we need to load this value both before and after the call.

>  
> +(define_insn "*restore_pic_register_after_call"
> +  [(parallel [(unspec [(match_operand:SI 0 "s_register_operand" "=r,r")
> +                    (match_operand:SI 1 "nonimmediate_operand" "r,m")]
> +            UNSPEC_PIC_RESTORE)
> +           (use (match_dup 1))
> +           (clobber (match_dup 0))])
> +  ]
> +  ""
> +  "@
> +  mov\t%0, %1
> +  ldr\t%0, %1"
> +)
> +
>  (define_expand "call_internal"
>    [(parallel [(call (match_operand 0 "memory_operand" "")
>                   (match_operand 1 "general_operand" ""))

Since operand 0 is significant after the instruction, I think this
should be:

(define_insn "*restore_pic_register_after_call"
  [(set (match_operand:SI 0 "s_register_operand" "+r,r")
        (unspec:SI [(match_dup 0)
                    (match_operand:SI 1 "nonimmediate_operand" "r,m")]
                   UNSPEC_PIC_RESTORE))]
  ...

The (use (match_dup 1)) looks redundant, since the unspec itself
uses operand 1.

> @@ -8215,6 +8267,30 @@
>      rtx pat, callee;
>      tree addr = MEM_EXPR (operands[1]);
>      
> +    /* Force FDPIC register (r9) before call.  */
> +    if (TARGET_FDPIC)
> +      {
> +     /* No need to update the FDPIC register (r9) if calling a static 
> function.
> +        In other words: set r9 for indirect or non-local calls.  */
> +     callee = XEXP (operands[1], 0);
> +     if (!SYMBOL_REF_P (callee)
> +         || !SYMBOL_REF_LOCAL_P (callee)
> +         || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))
> +       {
> +         rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
> +         rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
> +         rtx initial_fdpic_reg =
> +             get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);
> +
> +         XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode,
> +             gen_rtvec (2, fdpic_reg, initial_fdpic_reg),
> +             UNSPEC_PIC_RESTORE);
> +         XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, initial_fdpic_reg);
> +         XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, fdpic_reg);
> +         emit_insn (par);
> +       }
> +      }
> +

It's not obvious why this code is different from the call-without-value
case above, which doesn't use UNSPEC_PIC_RESTORE.  I think it should be
split out into a helper function that's used for both call and call_value.

I think it would also be good to have more comments about what
conditions the UNSPEC_PIC_RESTORE pattern is enforcing.

Thanks,
Richard

Reply via email to