Hi Claudiu,

> -----Original Message-----
> From: [email protected] <claudiu.zissulescu-
> [email protected]>
> Sent: 09 December 2025 10:58
> To: [email protected]
> Cc: [email protected]; [email protected]; Tamar Christina
> <[email protected]>; Wilco Dijkstra <[email protected]>
> Subject: [PATCH 1/2] aarch64: Add support for memetag-stack sanitizer using
> MTE insns
> 
> From: Claudiu Zissulescu <[email protected]>
> 
> MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke
> the target-specific hooks to create a random tag, add tag to memory
> address, and finally tag and untag memory.
> 
> Implement the target hooks to emit MTE instructions if MEMTAG sanitizer
> is in effect.  Continue to use the default target hook if HWASAN is
> being used.  Following target hooks are implemented:
>    - TARGET_MEMTAG_INSERT_RANDOM_TAG
>    - TARGET_MEMTAG_ADD_TAG
>    - TARGET_MEMTAG_EXTRACT_TAG
> 
> Apart from the target-specific hooks, set the following to values
> defined by the Memory Tagging Extension (MTE) in aarch64:
>    - TARGET_MEMTAG_TAG_BITSIZE
>    - TARGET_MEMTAG_GRANULE_SIZE
> 
> The next instructions were (re-)defined:
>    - addg/subg (used by TARGET_MEMTAG_ADD_TAG and
>      TARGET_MEMTAG_COMPOSE_OFFSET_TAG hooks)
>    - stg/st2g Used to tag/untag a memory granule.
>    - tag_memory A target specific instruction, it will will emit MTE
>      instructions to tag/untag memory of a given size.
>    - compose_tag A target specific instruction that computes a tagged
>      address as an offset from a base (tagged) address.
>    - gmi Used for randomizing the inserting tag.
>    - irg Likewise.
> 
> gcc/
> 
>       * config/aarch64/aarch64.md (addg): Update pattern to use
>       addg/subg instructions.
>       (stg): Update pattern.
>       (st2g): New pattern.
>       (tag_memory): Likewise.
>       (compose_tag): Likewise.
>       (irq): Update pattern to accept xzr register.
>       (gmi): Likewise.
>       (UNSPECV_TAG_SPACE): Define.
>       * config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE):
>       Define.
>       (AARCH64_MEMTAG_TAG_BITSIZE): Likewise.
>       (aarch64_override_options_internal): Error out if MTE instructions
>       are not available.
>       (aarch64_post_cfi_startproc): Emit .cfi_mte_tagged_frame.
>       (aarch64_can_tag_addresses): Add MEMTAG specific handling.
>       (aarch64_memtag_tag_bitsize): New function
>       (aarch64_memtag_granule_size): Likewise.
>       (aarch64_memtag_insert_random_tag): Likwise.
>       (aarch64_memtag_add_tag): Likewise.
>       (aarch64_memtag_extract_tag): Likewise.
>       (aarch64_granule16_memory_address_p): Likewise.
>       (aarch64_emit_stxg_insn): Likewise.
>       (aarch64_memtag_tag_memory_via_loop): New definition.
>       (aarch64_expand_tag_memory): Likewise.
>       (aarch64_check_memtag_ops): Likewise.
>       (TARGET_MEMTAG_TAG_BITSIZE): Likewise.
>       (TARGET_MEMTAG_GRANULE_SIZE): Likewise.
>       (TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise.
>       (TARGET_MEMTAG_ADD_TAG): Likewise.
>       (TARGET_MEMTAG_EXTRACT_TAG): Likewise.
>       * config/aarch64/aarch64-builtins.cc
>       (aarch64_expand_builtin_memtag): Update set tag builtin logic.
>       * config/aarch64/aarch64-linux.h: Pass memtag-stack sanitizer
>       specific options to the linker.
>       * config/aarch64/aarch64-protos.h
>       (aarch64_granule16_memory_address_p): New prototype.
>       (aarch64_check_memtag_ops): Likewise.
>       (aarch64_expand_tag_memory): Likewise.
>       * config/aarch64/constraints.md (Umg): New memory constraint.
>       (Uag): New constraint.
>       (Ung): Likewise.
>       * config/aarch64/predicates.md (aarch64_memtag_tag_offset):
>       Refactor it.
>       (aarch64_granule16_imm6): Rename from
> aarch64_granule16_uimm6 and
>       refactor it.
>       (aarch64_granule16_memory_operand): New constraint.
>       * config/aarch64/iterators.md (MTE_PP): New code iterator to be
>       used for mte instructions.
>       (stg_ops): New code attributes.
>       (st2g_ops): Likewise.
>       (mte_name): Likewise.
>       * config/aarch64/aarch64.opt (aarch64-tag-memory-loop-
> threshold):
>       New parameter.
> 
> doc/
>         * invoke.texi: Update documentation.
> 
> gcc/testsuite:
> 
>       * gcc.target/aarch64/acle/memtag_1.c: Update test.
> 
> Co-authored-by: Indu Bhagat <[email protected]>
> Signed-off-by: Claudiu Zissulescu <[email protected]>
> ---
>  gcc/config/aarch64/aarch64-builtins.cc        |   7 +-
>  gcc/config/aarch64/aarch64-linux.h            |   4 +-
>  gcc/config/aarch64/aarch64-protos.h           |   3 +
>  gcc/config/aarch64/aarch64.cc                 | 322 +++++++++++++++++-
>  gcc/config/aarch64/aarch64.md                 | 127 +++++--
>  gcc/config/aarch64/aarch64.opt                |   5 +
>  gcc/config/aarch64/constraints.md             |  21 ++
>  gcc/config/aarch64/iterators.md               |  20 ++
>  gcc/config/aarch64/predicates.md              |  13 +-
>  gcc/doc/invoke.texi                           |  11 +-
>  .../gcc.target/aarch64/acle/memtag_1.c        |   4 +-
>  11 files changed, 493 insertions(+), 44 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 408099a50e8..31431693cf2 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -3680,8 +3680,11 @@ aarch64_expand_builtin_memtag (int fcode, tree
> exp, rtx target)
>       pat = GEN_FCN (icode) (target, op0, const0_rtx);
>       break;
>        case AARCH64_MEMTAG_BUILTIN_SET_TAG:
> -     pat = GEN_FCN (icode) (op0, op0, const0_rtx);
> -     break;
> +     {
> +       rtx mem = gen_rtx_MEM (TImode, op0);
> +       pat = GEN_FCN (icode) (mem, op0);
> +       break;
> +     }
>        default:
>       gcc_unreachable();
>      }
> diff --git a/gcc/config/aarch64/aarch64-linux.h
> b/gcc/config/aarch64/aarch64-linux.h
> index 116bb4e69f3..4fa78e0b2f5 100644
> --- a/gcc/config/aarch64/aarch64-linux.h
> +++ b/gcc/config/aarch64/aarch64-linux.h
> @@ -48,7 +48,9 @@
>     %{static-pie:-Bstatic -pie --no-dynamic-linker -z text} \
>     -X                                                \
>     %{mbig-endian:-EB} %{mlittle-endian:-EL}     \
> -   -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b}"
> +   -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b} \
> +   %{%:sanitize(memtag-stack):%{!fsanitize-memtag-mode:-z memtag-stack -
> z memtag-mode=sync}} \
> +   %{%:sanitize(memtag-stack):%{fsanitize-memtag-mode=*:-z memtag-stack
> -z memtag-mode=%}}"
> 
> 
>  #define LINK_SPEC LINUX_TARGET_LINK_SPEC AARCH64_ERRATA_LINK_SPEC
> diff --git a/gcc/config/aarch64/aarch64-protos.h
> b/gcc/config/aarch64/aarch64-protos.h
> index a9e407ba340..a316e6af4aa 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1127,6 +1127,9 @@ void aarch64_expand_sve_vec_cmp_float (rtx,
> rtx_code, rtx, rtx);
> 
>  bool aarch64_prepare_sve_int_fma (rtx *, rtx_code);
>  bool aarch64_prepare_sve_cond_int_fma (rtx *, rtx_code);
> +
> +bool aarch64_granule16_memory_address_p (rtx mem);
> +void aarch64_expand_tag_memory (rtx, rtx, rtx);
>  #endif /* RTX_CODE */
> 
>  bool aarch64_process_target_attr (tree);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 9d2c3431ad3..82005a97380 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -19108,6 +19108,10 @@ aarch64_override_options_internal (struct
> gcc_options *opts)
>  #endif
>      }
> 
> +  if (flag_sanitize & SANITIZE_MEMTAG_STACK && !TARGET_MEMTAG)
> +    error ("%<-fsanitize=memtag-stack%> requires the ISA extension %qs",
> +        "memtag");
> +
>    aarch64_feature_flags isa_flags = aarch64_get_isa_flags (opts);
>    if ((isa_flags & (AARCH64_FL_SM_ON | AARCH64_FL_ZA_ON))
>        && !(isa_flags & AARCH64_FL_SME))
> @@ -25679,6 +25683,19 @@ aarch64_asm_output_external (FILE *stream,
> tree decl, const char* name)
>    aarch64_asm_output_variant_pcs (stream, decl, name);
>  }
> 
> +/* Implement TARGET_MEMTAG_CAN_TAG_ADDRESSES.  Here we tell the
> rest of the
> +   compiler that we automatically ignore the top byte of our pointers, which
> +   allows using -fsanitize=hwaddress.  In case of -fsanitize=memtag, we
> +   additionally ensure that target supports MEMTAG insns.  */
> +
> +bool
> +aarch64_can_tag_addresses ()
> +{
> +  if (memtag_sanitize_p ())
> +    return !TARGET_ILP32 && TARGET_MEMTAG;
> +  return !TARGET_ILP32;
> +}
> +
>  /* Triggered after a .cfi_startproc directive is emitted into the assembly 
> file.
>     Used to output the .cfi_b_key_frame directive when signing the current
>     function with the B key.  */
> @@ -25689,6 +25706,10 @@ aarch64_post_cfi_startproc (FILE *f, tree
> ignored ATTRIBUTE_UNUSED)
>    if (cfun->machine->frame.laid_out &&
> aarch64_return_address_signing_enabled ()
>        && aarch64_ra_sign_key == AARCH64_KEY_B)
>       asm_fprintf (f, "\t.cfi_b_key_frame\n");
> +  if (cfun->machine->frame.laid_out && aarch64_can_tag_addresses ()

NIT: can you move the && to the next line?

Patch is OK with that change.

Thanks for working on this!
Tamar

> +      && memtag_sanitize_p ()
> +      && !known_eq (cfun->machine->frame.frame_size, 0))
> +    asm_fprintf (f, "\t.cfi_mte_tagged_frame\n");
>  }
> 
>  /* Implements TARGET_ASM_FILE_START.  Output the assembly header.  */
> @@ -30365,13 +30386,289 @@ aarch64_invalid_binary_op (int op
> ATTRIBUTE_UNUSED, const_tree type1,
>    return NULL;
>  }
> 
> -/* Implement TARGET_MEMTAG_CAN_TAG_ADDRESSES.  Here we tell the
> rest of the
> -   compiler that we automatically ignore the top byte of our pointers, which
> -   allows using -fsanitize=hwaddress.  */
> +#define AARCH64_MEMTAG_GRANULE_SIZE  16
> +#define AARCH64_MEMTAG_TAG_BITSIZE    4
> +
> +/* Implement TARGET_MEMTAG_TAG_BITSIZE.  */
> +unsigned char
> +aarch64_memtag_tag_bitsize ()
> +{
> +  if (memtag_sanitize_p ())
> +    return AARCH64_MEMTAG_TAG_BITSIZE;
> +  return default_memtag_tag_bitsize ();
> +}
> +
> +/* Implement TARGET_MEMTAG_GRANULE_SIZE.  */
> +unsigned char
> +aarch64_memtag_granule_size ()
> +{
> +  if (memtag_sanitize_p ())
> +    return AARCH64_MEMTAG_GRANULE_SIZE;
> +  return default_memtag_granule_size ();
> +}
> +
> +/* Implement TARGET_MEMTAG_INSERT_RANDOM_TAG.  In the case of MTE
> instructions,
> +   make sure the gmi and irg instructions are generated when
> +   -fsanitize=memtag-stack is used.  The first argument UNTAGGED can be a
> +   tagged pointer, and its tag is used in the exclusion set.  Thus, the 
> TARGET
> +   doesn't use the same tag.  */
> +rtx
> +aarch64_memtag_insert_random_tag (rtx untagged, rtx target)
> +{
> +  if (memtag_sanitize_p ())
> +    {
> +      insn_code icode = CODE_FOR_gmi;
> +      expand_operand ops_gmi[3];
> +      rtx tmp = gen_reg_rtx (Pmode);
> +      create_output_operand (&ops_gmi[0], tmp, Pmode);
> +      create_input_operand  (&ops_gmi[1], untagged, Pmode);
> +      create_integer_operand  (&ops_gmi[2], 0);
> +      expand_insn (icode, 3, ops_gmi);
> +
> +      icode = CODE_FOR_irg;
> +      expand_operand ops_irg[3];
> +      create_output_operand (&ops_irg[0], target, Pmode);
> +      create_input_operand  (&ops_irg[1], untagged, Pmode);
> +      create_input_operand  (&ops_irg[2], ops_gmi[0].value, Pmode);
> +      expand_insn (icode, 3, ops_irg);
> +      return ops_irg[0].value;
> +    }
> +  else
> +    return default_memtag_insert_random_tag (untagged, target);
> +}
> +
> +/* Implement TARGET_MEMTAG_ADD_TAG.  For memtag sanitizer, emit
> addg/subg
> +   instructions, otherwise fall back on the default implementation.  */
> +rtx
> +aarch64_memtag_add_tag (rtx base, poly_int64 offset, uint8_t tag_offset)
> +{
> +  if (memtag_sanitize_p ())
> +    {
> +      rtx target = NULL;
> +      poly_int64 addr_offset = offset;
> +      rtx offset_rtx = gen_int_mode (addr_offset, DImode);
> +
> +      if (!aarch64_granule16_imm6 (offset_rtx, DImode))
> +     {
> +       /* Emit addr arithmetic prior to addg/subg.  */
> +       base = expand_simple_binop (Pmode, PLUS, base, offset_rtx,
> +                                   NULL, true, OPTAB_LIB_WIDEN);
> +       addr_offset = 0;
> +     }
> +
> +      insn_code icode = CODE_FOR_addg;
> +      expand_operand ops[4];
> +      create_output_operand (&ops[0], target, DImode);
> +      create_input_operand (&ops[1], base, DImode);
> +      create_integer_operand (&ops[2], addr_offset);
> +      create_integer_operand (&ops[3], tag_offset);
> +      /* Addr offset and tag offset must be within bounds at this time.  */
> +      gcc_assert (aarch64_memtag_tag_offset (ops[3].value, DImode));
> +
> +      expand_insn (icode, 4, ops);
> +      return ops[0].value;
> +    }
> +  else
> +    return default_memtag_add_tag (base, offset, tag_offset);
> +}
> +
> +/* Implement TARGET_MEMTAG_EXTRACT_TAG.  In the case of memtag
> sanitizer, MTE
> +   instructions allows us to work with tag-address tuple, thus no need to
> +   extract the tag, emit a simple move.  */
> +rtx
> +aarch64_memtag_extract_tag (rtx tagged_pointer, rtx target)
> +{
> +
> +  if (memtag_sanitize_p ())
> +    {
> +      rtx ret = gen_reg_rtx (DImode);
> +      emit_move_insn (ret, gen_lowpart (DImode, tagged_pointer));
> +      return ret;
> +    }
> +  else
> +    return default_memtag_extract_tag (tagged_pointer, target);
> +}
> +
> +/* Return TRUE if x is a valid memory address form for memtag loads and
> +   stores.  */
>  bool
> -aarch64_can_tag_addresses ()
> +aarch64_granule16_memory_address_p (rtx x)
>  {
> -  return !TARGET_ILP32;
> +  struct aarch64_address_info addr;
> +
> +  if (!MEM_P (x)
> +      || !aarch64_classify_address (&addr, XEXP (x, 0), GET_MODE (x), false))
> +    return false;
> +
> +  /* Check that the offset, if any, is encodable as 9-bit immediate.  */
> +  switch (addr.type)
> +    {
> +    case ADDRESS_REG_IMM:
> +      return aarch64_granule16_simm9 (gen_int_mode (addr.const_offset,
> DImode),
> +                                   DImode);
> +
> +    case ADDRESS_REG_REG:
> +      return addr.shift == 0;
> +
> +    default:
> +      break;
> +    }
> +  return false;
> +}
> +
> +/* Helper to emit either stg or st2g instruction.  */
> +static void
> +aarch64_emit_stxg_insn (machine_mode mode, rtx nxt, rtx addr, rtx tagp)
> +{
> +  rtx pat;
> +  rtx mem_addr = gen_rtx_MEM (mode, nxt);
> +  rtvec vec = gen_rtvec (2, gen_rtx_MEM (mode, addr), tagp);
> +  rtx unspec = gen_rtx_UNSPEC_VOLATILE (mode, vec,
> UNSPECV_TAG_SPACE);
> +
> +  if (!rtx_equal_p (nxt, addr))
> +    {
> +      rtx tmp = gen_rtx_CLOBBER (VOIDmode, addr);
> +      rtvec parv = gen_rtvec (2, gen_rtx_SET (mem_addr, unspec), tmp);
> +      pat = gen_rtx_PARALLEL (VOIDmode, parv);
> +    }
> +  else
> +    {
> +      pat = gen_rtx_SET (mem_addr, unspec);
> +    }
> +  emit_insn (pat);
> +}
> +
> +/* Tag the memory via an explicit loop.  This is used when tag_memory
> expand
> +   is invoked for:
> +     - non-constant size, or
> +     - constant but not encodable size (!aarch64_granule16_simm9 ()), or
> +     - constant and encodable size (aarch64_granule16_simm9 ()), but over
> the
> +       unroll threshold (aarch64_tag_memory_loop_threshold).  */
> +
> +static void
> +aarch64_tag_memory_via_loop (rtx base, rtx size, rtx tagged_pointer)
> +{
> +  rtx_code_label *top_label, *bottom_label;
> +  machine_mode iter_mode;
> +  rtx next;
> +
> +  iter_mode = GET_MODE (size);
> +  if (iter_mode == VOIDmode)
> +    iter_mode = word_mode;
> +
> +  /* Prepare the addr operand for tagging memory.  */
> +  rtx addr_reg = gen_reg_rtx (Pmode);
> +  emit_move_insn (addr_reg, base);
> +
> +  rtx size_reg = gen_reg_rtx (iter_mode);
> +  emit_move_insn (size_reg, size);
> +
> +  /*
> +         tbz  size, 4, label1
> +         stg  tag,[addr], #16
> +    label1:
> +   */
> +  auto *label1 = gen_label_rtx ();
> +  auto branch = aarch64_gen_test_and_branch (EQ, size_reg, 4, label1);
> +  auto jump = emit_jump_insn (branch);
> +  JUMP_LABEL (jump) = label1;
> +
> +  next = gen_rtx_POST_INC (Pmode, addr_reg);
> +  aarch64_emit_stxg_insn (TImode, next, addr_reg, tagged_pointer);
> +
> +  emit_label (label1);
> +
> +  /*
> +    asr  iter, size, 5
> +    cbz  iter, label2
> +   */
> +  rtx iter = gen_reg_rtx (iter_mode);
> +  emit_insn (gen_rtx_SET (iter,
> +                       gen_rtx_ASHIFTRT (iter_mode, size_reg, GEN_INT
> (5))));
> +  bottom_label = gen_label_rtx ();
> +  branch = aarch64_gen_compare_zero_and_branch (EQ, iter, bottom_label);
> +  aarch64_emit_unlikely_jump (branch);
> +
> +  /*
> +    top_label:
> +    st2g  tag, [addr], #32
> +    subs  iter, iter, #1
> +    bne   top_label
> +   */
> +  top_label = gen_label_rtx ();
> +  emit_label (top_label);
> +
> +  /* Tag Memory using post-index st2g.  */
> +  next = gen_rtx_POST_INC (Pmode, addr_reg);
> +  aarch64_emit_stxg_insn (OImode, next, addr_reg, tagged_pointer);
> +
> +  /* Decrement ITER.  */
> +  emit_insn (gen_subdi3_compare1_imm (iter, iter, CONST1_RTX
> (iter_mode),
> +                                   CONSTM1_RTX (iter_mode)));
> +
> +  rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
> +  rtx x = gen_rtx_fmt_ee (NE, CCmode, cc_reg, const0_rtx);
> +  jump = emit_jump_insn (gen_aarch64_bcond (x, cc_reg, top_label));
> +  JUMP_LABEL (jump) = top_label;
> +
> +  emit_label (bottom_label);
> +}
> +
> +/* Implement expand for tag_memory.  */
> +void
> +aarch64_expand_tag_memory (rtx base, rtx tagged_pointer, rtx size)
> +{
> +  rtx addr;
> +  HOST_WIDE_INT len, offset;
> +  unsigned HOST_WIDE_INT granule_size;
> +  unsigned HOST_WIDE_INT iters = 0;
> +
> +  granule_size = (HOST_WIDE_INT) AARCH64_MEMTAG_GRANULE_SIZE;
> +
> +  if (!REG_P (tagged_pointer))
> +    tagged_pointer = force_reg (Pmode, tagged_pointer);
> +
> +  if (!REG_P (base))
> +    base = force_reg (Pmode, base);
> +
> +  /* If size is small enough, I can can unroll the loop using stg/st2g
> +     instructions.  */
> +  if (CONST_INT_P (size))
> +    {
> +      len = INTVAL (size);
> +      if (len == 0)
> +     return; /* Nothing to do.  */
> +
> +      /* The amount of memory to tag must be aligned to granule size by now.
> */
> +      gcc_assert (len % granule_size == 0);
> +
> +      iters = len / granule_size;
> +    }
> +
> +  /* Check predicate on max offset possible: offset (in base rtx) + size.  */
> +  rtx end_addr = simplify_gen_binary (PLUS, Pmode, base, size);
> +  end_addr = gen_rtx_MEM (TImode, end_addr);
> +  if (iters > 0
> +      && iters <= (unsigned HOST_WIDE_INT)
> aarch64_tag_memory_loop_threshold
> +      && aarch64_granule16_memory_address_p (end_addr))
> +    {
> +      offset = 0;
> +      while (iters)
> +     {
> +       machine_mode mode = TImode;
> +       if (iters / 2)
> +         {
> +           mode = OImode;
> +           iters--;
> +         }
> +       iters--;
> +       addr = plus_constant (Pmode, base, offset);
> +       offset +=  GET_MODE_SIZE (mode).to_constant ();
> +       aarch64_emit_stxg_insn (mode, addr, addr, tagged_pointer);
> +     }
> +    }
> +  else
> +    aarch64_tag_memory_via_loop (base, size, tagged_pointer);
>  }
> 
>  /* Implement TARGET_ASM_FILE_END for AArch64.  This adds the AArch64
> GNU NOTE
> @@ -32806,6 +33103,21 @@ aarch64_libgcc_floating_mode_supported_p
>  #undef TARGET_MEMTAG_CAN_TAG_ADDRESSES
>  #define TARGET_MEMTAG_CAN_TAG_ADDRESSES
> aarch64_can_tag_addresses
> 
> +#undef TARGET_MEMTAG_TAG_BITSIZE
> +#define TARGET_MEMTAG_TAG_BITSIZE aarch64_memtag_tag_bitsize
> +
> +#undef TARGET_MEMTAG_GRANULE_SIZE
> +#define TARGET_MEMTAG_GRANULE_SIZE aarch64_memtag_granule_size
> +
> +#undef TARGET_MEMTAG_INSERT_RANDOM_TAG
> +#define TARGET_MEMTAG_INSERT_RANDOM_TAG
> aarch64_memtag_insert_random_tag
> +
> +#undef TARGET_MEMTAG_ADD_TAG
> +#define TARGET_MEMTAG_ADD_TAG aarch64_memtag_add_tag
> +
> +#undef TARGET_MEMTAG_EXTRACT_TAG
> +#define TARGET_MEMTAG_EXTRACT_TAG aarch64_memtag_extract_tag
> +
>  #if CHECKING_P
>  #undef TARGET_RUN_TARGET_SELFTESTS
>  #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
> diff --git a/gcc/config/aarch64/aarch64.md
> b/gcc/config/aarch64/aarch64.md
> index 98c65a74c8e..534c5b766d6 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -412,6 +412,7 @@ (define_c_enum "unspecv" [
>      UNSPECV_GCSPOPM          ; Represent GCSPOPM.
>      UNSPECV_GCSSS1           ; Represent GCSSS1 Xt.
>      UNSPECV_GCSSS2           ; Represent GCSSS2 Xt.
> +    UNSPECV_TAG_SPACE                ; Represent MTE tag memory space.
>      UNSPECV_TSTART           ; Represent transaction start.
>      UNSPECV_TCOMMIT          ; Represent transaction commit.
>      UNSPECV_TCANCEL          ; Represent transaction cancel.
> @@ -8608,46 +8609,48 @@ (define_insn "aarch64_rndrrs"
>  ;; Memory Tagging Extension (MTE) instructions.
> 
>  (define_insn "irg"
> -  [(set (match_operand:DI 0 "register_operand" "=rk")
> +  [(set (match_operand:DI 0 "register_operand")
>       (ior:DI
> -      (and:DI (match_operand:DI 1 "register_operand" "rk")
> +      (and:DI (match_operand:DI 1 "register_operand")
>                (const_int MEMTAG_TAG_MASK))
> -      (ashift:DI (unspec:QI [(match_operand:DI 2 "register_operand" "r")]
> +      (ashift:DI (unspec:QI [(match_operand:DI 2 "aarch64_reg_or_zero")]
>                    UNSPEC_GEN_TAG_RND)
>                   (const_int 56))))]
>    "TARGET_MEMTAG"
> -  "irg\\t%0, %1, %2"
> -  [(set_attr "type" "memtag")]
> +  {@ [ cons: =0, 1, 2 ; attrs: type ]
> +     [ rk      , rk, r  ; memtag ] irg\\t%0, %1, %2
> +     [ rk      , rk, Z  ; memtag ] irg\\t%0, %1
> +  }
>  )
> 
>  (define_insn "gmi"
>    [(set (match_operand:DI 0 "register_operand" "=r")
> -     (ior:DI (ashift:DI
> -              (const_int 1)
> -              (and:QI (lshiftrt:DI
> -                       (match_operand:DI 1 "register_operand" "rk")
> -                       (const_int 56)) (const_int 15)))
> -             (match_operand:DI 2 "register_operand" "r")))]
> +     (ior:DI
> +      (unspec:DI [(match_operand:DI 1 "register_operand" "rk")
> +                  (const_int 0)]
> +                 UNSPEC_GEN_TAG)
> +      (match_operand:DI 2 "aarch64_reg_or_zero" "rZ")))]
>    "TARGET_MEMTAG"
> -  "gmi\\t%0, %1, %2"
> +  "gmi\\t%0, %1, %x2"
>    [(set_attr "type" "memtag")]
>  )
> 
>  (define_insn "addg"
> -  [(set (match_operand:DI 0 "register_operand" "=rk")
> +  [(set (match_operand:DI 0 "register_operand")
>       (ior:DI
> -      (and:DI (plus:DI (match_operand:DI 1 "register_operand" "rk")
> -                       (match_operand:DI 2 "aarch64_granule16_uimm6"
> "i"))
> -              (const_int -1080863910568919041)) ;; 0xf0ff...
> +      (and:DI (plus:DI (match_operand:DI 1 "register_operand")
> +                       (match_operand:DI 2 "aarch64_granule16_imm6"))
> +              (const_int MEMTAG_TAG_MASK))
>        (ashift:DI
> -       (unspec:QI
> -        [(and:QI (lshiftrt:DI (match_dup 1) (const_int 56)) (const_int 15))
> -         (match_operand:QI 3 "aarch64_memtag_tag_offset" "i")]
> -        UNSPEC_GEN_TAG)
> +           (unspec:DI [(match_dup 1)
> +                       (match_operand:QI 3 "aarch64_memtag_tag_offset")]
> +                       UNSPEC_GEN_TAG)
>         (const_int 56))))]
>    "TARGET_MEMTAG"
> -  "addg\\t%0, %1, #%2, #%3"
> -  [(set_attr "type" "memtag")]
> +  {@ [ cons: =0 , 1  , 2 , 3 ; attrs: type ]
> +     [ rk       , rk , Uag ,  ; memtag   ] addg\t%0, %1, #%2, #%3
> +     [ rk       , rk , Ung ,  ; memtag   ] subg\t%0, %1, #%n2, #%3
> +  }
>  )
> 
>  (define_insn "subp"
> @@ -8681,17 +8684,83 @@ (define_insn "ldg"
>  ;; STG doesn't align the address but aborts with alignment fault
>  ;; when the address is not 16-byte aligned.
>  (define_insn "stg"
> -  [(set (mem:QI (unspec:DI
> -      [(plus:DI (match_operand:DI 1 "register_operand" "rk")
> -                (match_operand:DI 2 "aarch64_granule16_simm9" "i"))]
> -      UNSPEC_TAG_SPACE))
> -     (and:QI (lshiftrt:DI (match_operand:DI 0 "register_operand" "rk")
> -                          (const_int 56)) (const_int 15)))]
> +  [(set (match_operand:TI 0 "aarch64_granule16_memory_operand"
> "+Umg")
> +      (unspec_volatile:TI
> +     [(match_dup 0)
> +      (match_operand:DI 1 "register_operand" "rk")]
> +     UNSPECV_TAG_SPACE))]
>    "TARGET_MEMTAG"
> -  "stg\\t%0, [%1, #%2]"
> +  "stg\\t%1, %0"
>    [(set_attr "type" "memtag")]
>  )
> 
> +(define_insn "stg_<mte_name>"
> +  [(set (mem:TI (MTE_PP:DI (match_operand:DI 0 "register_operand" "+rk")))
> +     (unspec_volatile:TI
> +      [(mem:TI (match_dup 0))
> +       (match_operand:DI 1 "register_operand" "rk")]
> +      UNSPECV_TAG_SPACE))
> +   (clobber (match_dup 0))]
> +  "TARGET_MEMTAG"
> +  "stg\\t%1, <stg_ops>"
> +  [(set_attr "type" "memtag")]
> +)
> +
> +;; ST2G updates allocation tags for two memory granules (i.e. 32 bytes) at
> +;; once, without zero initialization.
> +(define_insn "st2g"
> +  [(set (match_operand:OI 0 "aarch64_granule16_memory_operand"
> "+Umg")
> +      (unspec_volatile:OI
> +     [(match_dup 0)
> +      (match_operand:DI 1 "register_operand" "rk")]
> +     UNSPECV_TAG_SPACE))]
> +  "TARGET_MEMTAG"
> +  "st2g\\t%1, %0"
> +  [(set_attr "type" "memtag")]
> +)
> +
> +(define_insn "st2g_<mte_name>"
> +  [(set (mem:OI (MTE_PP:DI (match_operand:DI 0 "register_operand"
> "+rk")))
> +     (unspec_volatile:OI
> +      [(mem:OI (match_dup 0))
> +       (match_operand:DI 1 "register_operand" "rk")]
> +      UNSPECV_TAG_SPACE))
> +   (clobber (match_dup 0))]
> +  "TARGET_MEMTAG"
> +  "st2g\\t%1, <st2g_ops>"
> +  [(set_attr "type" "memtag")]
> +)
> +
> +(define_expand "tag_memory"
> +  [(match_operand:DI 0 "register_operand" "")
> +   (match_operand:DI 1 "nonmemory_operand" "")
> +   (match_operand:DI 2 "nonmemory_operand" "")]
> +  ""
> +{
> +  aarch64_expand_tag_memory (operands[0], operands[1], operands[2]);
> +  DONE;
> +})
> +
> +(define_expand "compose_tag"
> +  [(set (match_operand:DI 0 "register_operand")
> +     (ior:DI
> +      (and:DI (plus:DI (match_operand:DI 1 "register_operand")
> +                       (const_int 0))
> +              (const_int MEMTAG_TAG_MASK))
> +      (ashift:DI
> +       (unspec:DI [(match_dup 1)
> +                  (match_operand 2 "immediate_operand")]
> +                  UNSPEC_GEN_TAG)
> +       (const_int 56))))]
> +  ""
> +{
> +  if (INTVAL (operands[2]) == 0)
> +    {
> +     emit_move_insn (operands[0], operands[1]);
> +     DONE;
> +    }
> +})
> +
>  ;; Load/Store 64-bit (LS64) instructions.
>  (define_insn "ld64b"
>    [(set (match_operand:V8DI 0 "register_operand" "=r")
> diff --git a/gcc/config/aarch64/aarch64.opt
> b/gcc/config/aarch64/aarch64.opt
> index 8aae953e60d..135b6753ac5 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -443,6 +443,11 @@ individual writeback accesses where possible.  A
> value of two means we
>  also try to opportunistically form writeback opportunities by folding in
>  trailing destructive updates of the base register used by a pair.
> 
> +-param=aarch64-tag-memory-loop-threshold=
> +Target Joined UInteger Var(aarch64_tag_memory_loop_threshold) Init(10)
> IntegerRange(0, 65536) Param
> +Param to control the treshold in number of granules beyond which an
> +explicit loop for tagging a memory block is emitted.
> +
>  Wexperimental-fmv-target
>  Target Var(warn_experimental_fmv) Warning Init(1)
>  This option is deprecated.
> diff --git a/gcc/config/aarch64/constraints.md
> b/gcc/config/aarch64/constraints.md
> index 7b9e5583bc7..94d2ff4d847 100644
> --- a/gcc/config/aarch64/constraints.md
> +++ b/gcc/config/aarch64/constraints.md
> @@ -346,6 +346,12 @@ (define_memory_constraint "Ump"
>         (match_test "aarch64_legitimate_address_p (GET_MODE (op), XEXP (op,
> 0),
>                                                 true,
> ADDR_QUERY_LDP_STP)")))
> 
> +(define_memory_constraint "Umg"
> +  "@internal
> +  A memory address for MTE load/store tag operation."
> +  (and (match_code "mem")
> +       (match_test "aarch64_granule16_memory_address_p (op)")))
> +
>  ;; Used for storing or loading pairs in an AdvSIMD register using an STP/LDP
>  ;; as a vector-concat.  The address mode uses the same constraints as if it
>  ;; were for a single value.
> @@ -600,6 +606,21 @@ (define_address_constraint "Dp"
>   An address valid for a prefetch instruction."
>   (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
> 
> +(define_constraint "Uag"
> +  "@internal
> +  A constant that can be used as address offset for an ADDG operation."
> +  (and (match_code "const_int")
> +       (match_test "IN_RANGE (ival, 0, 1008)
> +                 && !(ival & 0xf)")))
> +
> +(define_constraint "Ung"
> +  "@internal
> +  A constant that can be used as address offset for an SUBG operation (once
> +  negated)."
> +  (and (match_code "const_int")
> +       (match_test "IN_RANGE (ival, -1008, -1)
> +                 && !(ival & 0xf)")))
> +
>  (define_constraint "vgb"
>    "@internal
>     A constraint that matches an immediate offset valid for SVE LD1B
> diff --git a/gcc/config/aarch64/iterators.md
> b/gcc/config/aarch64/iterators.md
> index 332e7ffd2ea..586c3bc3285 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -2887,6 +2887,9 @@ (define_code_iterator SVE_UNPRED_FP_BINARY
> [plus minus mult])
>  ;; SVE integer comparisons.
>  (define_code_iterator SVE_INT_CMP [lt le eq ne ge gt ltu leu geu gtu])
> 
> +;; pre/post-{inc,dec} for mte instructions.
> +(define_code_iterator MTE_PP [post_inc post_dec pre_inc pre_dec])
> +
>  ;; -------------------------------------------------------------------
>  ;; Code Attributes
>  ;; -------------------------------------------------------------------
> @@ -3233,6 +3236,23 @@ (define_code_attr SVE_COND_FP [(plus
> "UNSPEC_COND_FADD")
>                              (minus "UNSPEC_COND_FSUB")
>                              (mult "UNSPEC_COND_FMUL")])
> 
> +;; Map MTE pre/post to the right asm format
> +(define_code_attr stg_ops [(post_inc "[%0], 16")
> +                        (post_dec "[%0], -16")
> +                        (pre_inc  "[%0, 16]!")
> +                        (pre_dec  "[%0, -16]!")])
> +
> +(define_code_attr st2g_ops [(post_inc "[%0], 32")
> +                         (post_dec "[%0], -32")
> +                         (pre_inc  "[%0, 32]!")
> +                         (pre_dec  "[%0, -32]!")])
> +
> +;; Map MTE pre/post to names
> +(define_code_attr mte_name [(post_inc "postinc")
> +                         (post_dec "postdec")
> +                         (pre_inc "preinc")
> +                         (pre_dec "predec")])
> +
>  ;; -------------------------------------------------------------------
>  ;; Int Iterators.
>  ;; -------------------------------------------------------------------
> diff --git a/gcc/config/aarch64/predicates.md
> b/gcc/config/aarch64/predicates.md
> index 42304cef439..dca0baf75e0 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -1066,13 +1066,20 @@ (define_predicate
> "aarch64_bytes_per_sve_vector_operand"
>         (match_test "known_eq (wi::to_poly_wide (op, mode),
>                             BYTES_PER_SVE_VECTOR)")))
> 
> +;; The uimm4 field is a 4-bit field that only accepts immediates in the
> +;; range 0..15.
>  (define_predicate "aarch64_memtag_tag_offset"
>    (and (match_code "const_int")
> -       (match_test "IN_RANGE (INTVAL (op), 0, 15)")))
> +       (match_test "UINTVAL (op) <= 15")))
> +
> +(define_predicate "aarch64_granule16_memory_operand"
> +  (and (match_test "TARGET_MEMTAG")
> +       (match_code "mem")
> +       (match_test "aarch64_granule16_memory_address_p (op)")))
> 
> -(define_predicate "aarch64_granule16_uimm6"
> +(define_predicate "aarch64_granule16_imm6"
>    (and (match_code "const_int")
> -       (match_test "IN_RANGE (INTVAL (op), 0, 1008)
> +       (match_test "IN_RANGE (INTVAL (op), -1008, 1008)
>                   && !(INTVAL (op) & 0xf)")))
> 
>  (define_predicate "aarch64_granule16_simm9"
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 0bc22695931..526c04404da 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -17917,6 +17917,11 @@ would be beneficial to unroll the main
> vectorized loop and by how much.  This
>  parameter set's the upper bound of how much the vectorizer will unroll the
> main
>  loop.  The default value is four.
> 
> +@item aarch64-tag-memory-loop-threshold
> +Param to control the treshold in number of granules beyond which an
> +explicit loop for tagging a memory block is emitted.  The memory block
> +is tagged using MTE instructions.
> +
>  @end table
> 
>  The following choices of @var{name} are available on GCN targets:
> @@ -18396,8 +18401,10 @@ for a list of supported options.
>  The option cannot be combined with @option{-fsanitize=thread} or
>  @option{-fsanitize=hwaddress}.  Note that the only targets
>  @option{-fsanitize=hwaddress} is currently supported on are x86-64
> -(only with @code{-mlam=u48} or @code{-mlam=u57} options) and
> AArch64,
> -in both cases only in ABIs with 64-bit pointers.
> +(only with @code{-mlam=u48} or @code{-mlam=u57} options) and
> AArch64, in both
> +cases only in ABIs with 64-bit pointers.  Similarly,
> +@option{-fsanitize=memtag-stack} is currently only supported on AArch64
> ABIs
> +with 64-bit pointers.
> 
>  When compiling with @option{-fsanitize=address}, you should also
>  use @option{-g} to produce more meaningful output.
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
> b/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
> index f8368690032..e94a2220fe3 100644
> --- a/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
> @@ -54,9 +54,9 @@ test_memtag_6 (void *p)
>    __arm_mte_set_tag (p);
>  }
> 
> -/* { dg-final { scan-assembler-times {irg\tx..?, x..?, x..?\n} 1 } } */
> +/* { dg-final { scan-assembler-times {irg\tx..?, x..?\n} 1 } } */
>  /* { dg-final { scan-assembler-times {gmi\tx..?, x..?, x..?\n} 1 } } */
>  /* { dg-final { scan-assembler-times {subp\tx..?, x..?, x..?\n} 1 } } */
>  /* { dg-final { scan-assembler-times {addg\tx..?, x..?, #0, #1\n} 1 } } */
>  /* { dg-final { scan-assembler-times {ldg\tx..?, \[x..?, #0\]\n} 1 } } */
> -/* { dg-final { scan-assembler-times {stg\tx..?, \[x..?, #0\]\n} 1 } } */
> \ No newline at end of file
> +/* { dg-final { scan-assembler-times {stg\tx..?, \[x..?\]\n} 1 } } */
> --
> 2.52.0

Reply via email to