Matthew Malcomson <matthew.malcom...@arm.com> writes:
> Hi Richard,
>
> I'm sending up the revised patch 5 (introducing stack variable handling)
> without the other changes to other patches.
>
> I figure there's been quite a lot of changes to this patch and I wanted
> to give you time to review them while I worked on finishing the less
> widespread changes in patch 6 and before I ran the more exhaustive (and
> time-consuming) tests in case you didn't like the changes and those
> exhaustive tests would just have to get repeated.

Thanks, the new approach looks good to me.  Most of the comments below
are just minor.

> […]
> @@ -75,6 +89,26 @@ extern hash_set <tree> *asan_used_labels;
>  
>  #define ASAN_USE_AFTER_SCOPE_ATTRIBUTE       "use after scope memory"
>  
> +/* NOTE: The values below and the hooks under targetm.memtag define an ABI 
> and
> +   are hard-coded to these values in libhwasan, hence they can't be changed
> +   independently here.  */
> +/* How many bits are used to store a tag in a pointer.
> +   The default version uses the entire top byte of a pointer (i.e. 8 bits).  
> */
> +#define HWASAN_TAG_SIZE targetm.memtag.tag_size ()
> +/* Tag Granule of HWASAN shadow stack.
> +   This is the size in real memory that each byte in the shadow memory refers
> +   to.  I.e. if a variable is X bytes long in memory then it's tag in shadow

s/it's/its/

> +   memory will span X / HWASAN_TAG_GRANULE_SIZE bytes.
> +   Most variables will need to be aligned to this amount since two variables
> +   that are neighbours in memory and share a tag granule would need to share

s/neighbours/neighbors/

> +   the same tag (the shared tag granule can only store one tag).  */
> +#define HWASAN_TAG_GRANULE_SIZE targetm.memtag.granule_size ()
> +/* Define the tag for the stack background.
> +   This defines what tag the stack pointer will be and hence what tag all
> +   variables that are not given special tags are (e.g. spilled registers,
> +   and parameters passed on the stack).  */
> +#define HWASAN_STACK_BACKGROUND gen_int_mode (0, QImode)
> +
>  /* Various flags for Asan builtins.  */
>  enum asan_check_flags
>  {
> […]
> @@ -1352,6 +1393,28 @@ asan_redzone_buffer::flush_if_full (void)
>      flush_redzone_payload ();
>  }
>  
> +/* Returns whether we are tagging pointers and checking those tags on memory
> +   access.  */
> +bool
> +hwasan_sanitize_p ()
> +{
> +  return sanitize_flags_p (SANITIZE_HWADDRESS);
> +}
> +
> +/* Are we tagging the stack?  */
> +bool
> +hwasan_sanitize_stack_p ()
> +{
> +  return (hwasan_sanitize_p () && param_hwasan_instrument_stack);
> +}
> +
> +/* Are we protecting alloca objects?  */

Same comment as before about avoiding the word “protect”, both in the
comment and the option name.  Maybe s/protect/sanitize/ or s/protect/tag/.

> +bool
> +hwasan_sanitize_allocas_p (void)
> +{
> +  return (hwasan_sanitize_stack_p () && param_hwasan_protect_allocas);
> +}
> +
>  /* Insert code to protect stack vars.  The prologue sequence should be 
> emitted
>     directly, epilogue sequence returned.  BASE is the register holding the
>     stack base, against which OFFSETS array offsets are relative to, OFFSETS
> […]
> @@ -3702,4 +3772,330 @@ make_pass_asan_O0 (gcc::context *ctxt)
>    return new pass_asan_O0 (ctxt);
>  }
>  
> +/* For stack tagging:
> +
> +   Return the offset from the frame base tag that the "next" expanded object
> +   should have.  */
> +uint8_t
> +hwasan_current_frame_tag ()
> +{
> +  return hwasan_frame_tag_offset;
> +}
> +
> +/* For stack tagging:
> +
> +   Return the 'base pointer' for this function.  If that base pointer has not
> +   yet been created then we create a register to hold it and initialise that
> +   value with a possibly random tag and the value of the
> +   virtual_stack_vars_rtx.  */

As discussed offline, I think the old approach of generating the
initialisation in hwasan_emit_prologue was safer, although I agree
there doesn't seem to be a specific problem with doing things this way.

> +rtx
> +hwasan_frame_base ()
> +{
> +  if (! hwasan_frame_base_ptr)
> +    {
> +      hwasan_frame_base_ptr
> +     = targetm.memtag.insert_random_tag (virtual_stack_vars_rtx);
> +    }

Nit: should be no braces around single statements, even if they span
multiple lines.

> +
> +  return hwasan_frame_base_ptr;
> +}
> +
> +/* Record a compile-time constant size stack variable that HWASAN will need 
> to
> +   tag.  This record of the range of a stack variable will be used by
> +   `hwasan_emit_prologue` to emit the RTL at the start of each frame which 
> will
> +   set tags in the shadow memory according to the assigned tag for each 
> object.
> +
> +   The range that the object spans in stack space should be described by the
> +   bounds `untagged_base + nearest` and `untagged_base + farthest`.
> +   `tagged_base` is the base address which contains the "base frame tag" for
> +   this frame, and from which the value to address this object with will be
> +   calculated.
> +
> +   We record the `untagged_base` since the functions in the hwasan library we
> +   use to tag memory take pointers without a tag.  */
> +void
> +hwasan_record_stack_var (rtx untagged_base, rtx tagged_base,
> +                      poly_int64 nearest, poly_int64 farthest)
> +{
> +  hwasan_stack_var *cur_var = new hwasan_stack_var;
> +  cur_var->untagged_base = untagged_base;
> +  cur_var->tagged_base = tagged_base;
> +  cur_var->nearest_offset = nearest;
> +  cur_var->farthest_offset = farthest;
> +  cur_var->tag_offset = hwasan_current_frame_tag ();
> +
> +  hwasan_tagged_stack_vars.safe_push (cur_var);
> +}

Very minor, but it seems more consistent to have the “_offset” on the
parameter names as well as the field names.  Doing that might also make
the comment slightly clearer.

Not sure it's necessary to use separately-allocated structures here.
Making it a vec<hawasan_stack_var> might be simpler.

> +
> +/* Return the RTX representing the farthest extent of the statically 
> allocated
> +   stack objects for this frame.  If hwasan_frame_base_ptr has not been
> +   initialised then we are not storing any static variables on the stack in

s/initialised/initialized/

> +   this frame.  In this case we return NULL_RTX to represent that.
> +
> +   Otherwise simply return virtual_stack_vars_rtx + frame_offset.  */
> +rtx
> +hwasan_get_frame_extent ()
> +{
> +  return hwasan_frame_base_ptr ?
> +    plus_constant (Pmode, virtual_stack_vars_rtx, frame_offset)
> +    : NULL_RTX;

Formatting here should be:

  return (hwasan_frame_base_ptr
          ? plus_constant (Pmode, virtual_stack_vars_rtx, frame_offset)
          : NULL_RTX);

> +}
> +
> +
> +/* For stack tagging:
> +
> +   Increment the tag offset modulo the size a tag can represent.  */

Maybe “the frame tag offset”.

> […]
> +/* Clear internal state for the next function.
> +   This function is called before variables on the stack get expanded, in
> +   `init_vars_expansion`.  */
> +void
> +hwasan_record_frame_init ()
> +{
> +  delete asan_used_labels;
> +  asan_used_labels = NULL;
> +
> +  /* If this isn't the case then some stack variable was recorded *before*
> +     hwasan_record_frame_init is called, yet *after* the hwasan prologue for
> +     the previous frame was emitted.  Such stack variables would not have
> +     their shadow stack filled in.  */
> +  gcc_assert (hwasan_tagged_stack_vars.is_empty ());
> +  hwasan_frame_base_ptr = NULL_RTX;
> +
> +  /* When not using a random frame tag we can avoid the background stack
> +     color which gives the user a little better debug output upon a crash.
> +     Meanwhile, when using a random frame tag it will be nice to avoid adding
> +     tags for the first object since that is unnecessary extra work.
> +     Hence set the initial hwasan_frame_tag_offset to be 0 if using a random
> +     frame tag and 1 otherwise.
> +
> +     As described in hwasan_increment_frame_tag, in the kernel the stack
> +     pointer has the tag 0xff.  That means that to avoid 0xff and 0 (the tag
> +     which the kernel does not check and the background tag respectively) we
> +     start with a tag offset of 2.  */
> +  hwasan_frame_tag_offset = param_hwasan_random_frame_tag
> +    ? 0
> +    : sanitize_flags_p (SANITIZE_KERNEL_HWADDRESS) ? 2 : 1;

Wonder whether it's worth splitting:

+  if (hwasan_frame_tag_offset == 0 && ! param_hwasan_random_frame_tag)
+    hwasan_frame_tag_offset += 1;
+  if (hwasan_frame_tag_offset == 1 && ! param_hwasan_random_frame_tag
+      && sanitize_flags_p (SANITIZE_KERNEL_HWADDRESS))
+    hwasan_frame_tag_offset += 1;

out of hwasan_increment_frame_tag and using it here too, so that we
only need to explain the relatively subtle condition once.

Just a suggestion though.

> +}
> +
> +/* For stack tagging:
> +   (Emits HWASAN equivalent of what is emitted by
> +   `asan_emit_stack_protection`).
> +
> +   Emits the extra prologue code to set the shadow stack as required for 
> HWASAN
> +   stack instrumentation.
> +
> +   BASES is an array containing the tagged base registers for each object.
> +   We map each object to a given base since large aligned objects have a
> +   different base to others and we need to know which objects use which base.
> +
> +   UNTAGGED_BASES contains the same information as above except without tags.
> +   This is needed since libhwasan only accepts untagged pointers in
> +   __hwasan_tag_memory.
> +
> +   OFFSETS is an array with the start and end offsets for each object stored 
> on
> +   the stack in this frame.  This array is hence twice the length of the 
> other
> +   array arguments (given it has two entries for each stack object).
> +
> +   TAGS is an array containing the tag *offset* each object should have from
> +   the tag in its base pointer.
> +
> +   LENGTH contains the length of the OFFSETS array.  */

The comment is out of date: “BASES is …” onwards can be removed.

> +void
> +hwasan_emit_prologue ()
> +{
> +  /* We need untagged base pointers since libhwasan only accepts untagged
> +    pointers in __hwasan_tag_memory.  We need the tagged base pointer to 
> obtain
> +    the base tag for an offset.  */
> +
> +  gcc_assert ((hwasan_frame_base_ptr == NULL_RTX)
> +           == hwasan_tagged_stack_vars.is_empty ());
> +  if (! hwasan_frame_base_ptr)
> +    return;
> +
> +  size_t length = hwasan_tagged_stack_vars.length ();
> +  hwasan_stack_var **vars = hwasan_tagged_stack_vars.address ();
> +
> +  poly_int64 bot = 0, top = 0;
> +  size_t i = 0;
> +  for (i = 0; i < length; i++)
> +    {
> +      hwasan_stack_var *cur = vars[i];
> +      poly_int64 nearest = cur->nearest_offset;
> +      poly_int64 farthest = cur->farthest_offset;
> +
> +      if (known_ge (nearest, farthest))
> +     {
> +       top = nearest;
> +       bot = farthest;
> +     }
> +      else
> +     {
> +       /* Given how these values are calculated, one must be known greater
> +          than the other.  */
> +       gcc_assert (known_le (nearest, farthest));
> +       top = farthest;
> +       bot = nearest;
> +     }
> +      poly_int64 size = (top - bot);
> +
> +      /* Assert the edge of each variable is aligned to the HWASAN tag 
> granule
> +      size.  */
> +      gcc_assert (multiple_p (top, HWASAN_TAG_GRANULE_SIZE));
> +      gcc_assert (multiple_p (bot, HWASAN_TAG_GRANULE_SIZE));
> +      gcc_assert (multiple_p (size, HWASAN_TAG_GRANULE_SIZE));
> +
> +      rtx ret = init_one_libfunc ("__hwasan_tag_memory");
> +      rtx base_tag = targetm.memtag.extract_tag (cur->tagged_base, NULL_RTX);
> +      rtx tag = plus_constant (QImode, base_tag, cur->tag_offset);
> +      tag = hwasan_truncate_to_tag_size (tag, NULL_RTX);
> +
> +      rtx bottom = convert_memory_address (ptr_mode,
> +                                        plus_constant (Pmode,
> +                                                       cur->untagged_base,
> +                                                       bot));
> +      emit_library_call (ret, LCT_NORMAL, VOIDmode,
> +                      bottom, ptr_mode,
> +                      tag, QImode,
> +                      gen_int_mode (size, ptr_mode), ptr_mode);
> +      delete cur;
> +    }
> +  /* Clear the stack vars, we've emitted the prologue for them all now.  */
> +  hwasan_tagged_stack_vars.truncate (0);
> +}
> […]
> +/* For stack tagging:
> +
> +   Truncate `tag` to the number of bits that a tag uses (i.e. to
> +   HWASAN_TAG_SIZE).  Store the result in `target` if it's convenient.  */
> +rtx
> +hwasan_truncate_to_tag_size (rtx tag, rtx target)
> +{
> +  gcc_assert (GET_MODE (tag) == QImode);
> +  if (HWASAN_TAG_SIZE != GET_MODE_PRECISION (QImode))
> +    {
> +      gcc_assert (GET_MODE_PRECISION (QImode) > HWASAN_TAG_SIZE);
> +      tag = expand_simple_binop (QImode, AND, tag,
> +                              gen_int_mode (-HWASAN_TAG_SIZE, QImode),

Should be (1 << HWASAN_TAG_SIZE) - 1 or equivalent.

> +                              target,
> +                              /* unsignedp = */1, OPTAB_WIDEN);
> +      gcc_assert (tag);
> +    }
> +  return tag;
> +}
> +
>  #include "gt-asan.h"
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 
> b270a4ddb9db469ba52e42f36a1bc2f02d8f03fc..52a0dd6dfda1868622b88821a6d4abc5f2c7ce03
>  100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -365,19 +365,22 @@ align_local_variable (tree decl, bool really_expand)
>  {
>    unsigned int align;
>  
> -  if (TREE_CODE (decl) == SSA_NAME)
> -    align = TYPE_ALIGN (TREE_TYPE (decl));
> -  else
> -    {
> -      align = LOCAL_DECL_ALIGNMENT (decl);
> -      /* Don't change DECL_ALIGN when called from estimated_stack_frame_size.
> -      That is done before IPA and could bump alignment based on host
> -      backend even for offloaded code which wants different
> -      LOCAL_DECL_ALIGNMENT.  */
> -      if (really_expand)
> -     SET_DECL_ALIGN (decl, align);
> -    }
> -  return align / BITS_PER_UNIT;
> +  align = (TREE_CODE (decl) == SSA_NAME)
> +    ? TYPE_ALIGN (TREE_TYPE (decl)) : LOCAL_DECL_ALIGNMENT (decl);

Formatting:

  align = (TREE_CODE (decl) == SSA_NAME
           ? TYPE_ALIGN (TREE_TYPE (decl))
           : LOCAL_DECL_ALIGNMENT (decl));

> +
> +  if (hwasan_sanitize_stack_p ())
> +    align = MAX (align, ((unsigned)HWASAN_TAG_GRANULE_SIZE) * BITS_PER_UNIT);

Should be formatted as:

    align = MAX (align, (unsigned) HWASAN_TAG_GRANULE_SIZE * BITS_PER_UNIT);

> +  if (TREE_CODE (decl) != SSA_NAME && really_expand)
> +    /* Don't change DECL_ALIGN when called from estimated_stack_frame_size.
> +       That is done before IPA and could bump alignment based on host
> +       backend even for offloaded code which wants different
> +       LOCAL_DECL_ALIGNMENT.  */
> +    SET_DECL_ALIGN (decl, align);
> +
> +  unsigned int ret_align = align / BITS_PER_UNIT;
> +
> +  return ret_align;

Don't think adding ret_align really adds much: the original direct
return of “align / BITS_PER_UNIT” seemed clearer.

>  }
>  
>  /* Align given offset BASE with ALIGN.  Truncate up if ALIGN_UP is true,
> […]
> @@ -1006,7 +1022,8 @@ expand_one_stack_var_at (tree decl, rtx base, unsigned 
> base_align,
>        /* Set alignment we actually gave this decl if it isn't an SSA name.
>           If it is we generate stack slots only accidentally so it isn't as
>        important, we'll simply use the alignment that is already set.  */
> -      if (base == virtual_stack_vars_rtx)
> +      if (base == virtual_stack_vars_rtx
> +       || (hwasan_sanitize_stack_p () && base == hwasan_frame_base ()))

I think it would be good to have a side-effect-free version of
hwasan_frame_base for comparisons.  There's then no reason to check
hwasan_sanitize_stack_p () first: the pointer will be null if we're
not sanitising the stack.

Also, it might be good to put this in a helper such as:

  stack_vars_base_reg_p (base)

that returns true for both virtual_stack_vars_rtx and the hwasan
frame base.  I think this is something that other code is likely
to want to test eventually.

> @@ -1123,10 +1140,31 @@ expand_stack_vars (bool (*pred) (size_t), class 
> stack_vars_data *data)
>        if (pred && !pred (i))
>       continue;
>  
> +      base = hwasan_sanitize_stack_p ()
> +     ? hwasan_frame_base ()
> +     : virtual_stack_vars_rtx;

Formatting:

      base = (hwasan_sanitize_stack_p ()
              ? hwasan_frame_base ()
              : virtual_stack_vars_rtx);

>        alignb = stack_vars[i].alignb;
>        if (alignb * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT)
>       {
> -       base = virtual_stack_vars_rtx;
> +       poly_int64 hwasan_orig_offset;
> +       if (hwasan_sanitize_stack_p ())
> +         {
> +           /* There must be no tag granule "shared" between different
> +              objects.  This means that no HWASAN_TAG_GRANULE_SIZE byte
> +              chunk can have more than one object in it.
> +
> +              We ensure this by forcing the end of the last bit of data to
> +              be aligned to HWASAN_TAG_GRANULE_SIZE bytes here, and setting
> +              the start of each variable to be aligned to
> +              HWASAN_TAG_GRANULE_SIZE bytes in `align_local_variable`.
> +
> +              We can't align just one of the start or end, since there are
> +              untagged things stored on the stack that we have no control on
> +              the alignment and these can't share a tag granule with a
> +              tagged variable.  */

“things stored on the stack that we have no control on the alignment”
sounds odd.

> +           hwasan_orig_offset = align_frame_offset (HWASAN_TAG_GRANULE_SIZE);
> +           gcc_assert (stack_vars[i].alignb >= HWASAN_TAG_GRANULE_SIZE);
> +         }
>         /* ASAN description strings don't yet have a syntax for expressing
>            polynomial offsets.  */
>         HOST_WIDE_INT prev_offset;
> […]
> @@ -1321,27 +1387,49 @@ expand_one_stack_var_1 (tree var)
>  {
>    poly_uint64 size;
>    poly_int64 offset;
> +  poly_int64 hwasan_orig_offset;
>    unsigned byte_align;
>  
>    if (TREE_CODE (var) == SSA_NAME)
>      {
>        tree type = TREE_TYPE (var);
>        size = tree_to_poly_uint64 (TYPE_SIZE_UNIT (type));
> -      byte_align = TYPE_ALIGN_UNIT (type);
>      }
>    else
> -    {
> -      size = tree_to_poly_uint64 (DECL_SIZE_UNIT (var));
> -      byte_align = align_local_variable (var, true);
> -    }
> +    size = tree_to_poly_uint64 (DECL_SIZE_UNIT (var));
> +  byte_align = align_local_variable (var, true);
>  
>    /* We handle highly aligned variables in expand_stack_vars.  */
>    gcc_assert (byte_align * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT);
>  
> +  if (hwasan_sanitize_stack_p ())
> +    /* Allocate zero bytes to align the stack.  */
> +    hwasan_orig_offset = align_frame_offset (HWASAN_TAG_GRANULE_SIZE);
>    offset = alloc_stack_frame_space (size, byte_align);
>  
> -  expand_one_stack_var_at (var, virtual_stack_vars_rtx,
> +  rtx base;
> +  if (hwasan_sanitize_stack_p ())
> +    {

I think we need to zero-initialise hwasan_orig_offset to defend
against “maybe used uninitialised” warnings (which wouldn't necessarily
fire for all hosts or option combinations, but might for some).

Alternatively, since the only common code between the two
hwasan_sanitize_stack_p () blocks is the single call:

  offset = alloc_stack_frame_space (size, byte_align);

it might be easier and more readable to duplicate that call instead
of the hwasan_sanitize_stack_p () condition.

> +      base = hwasan_frame_base ();
> +      /* Use `frame_offset` to automatically account for machines where the
> +      frame grows upwards.
> +
> +      `offset` will always point to the "start" of the stack object, which
> +      will be the smallest address, for ! FRAME_GROWS_DOWNWARD this is *not*
> +      the "furthest" offset from the base delimiting the current stack
> +      object.  `frame_offset` will always delimit the extent that the frame.
> +      */
> +      hwasan_record_stack_var (virtual_stack_vars_rtx, base,
> +                            hwasan_orig_offset, frame_offset);
> +    }
> +  else
> +    base = virtual_stack_vars_rtx;
> +
> +  expand_one_stack_var_at (var, base,
>                          crtl->max_used_stack_slot_alignment, offset);
> +
> +  if (hwasan_sanitize_stack_p ())
> +    hwasan_increment_frame_tag ();
>  }
>  
>  /* Wrapper for expand_one_stack_var_1 that checks SSA_NAMEs are
> […]
> diff --git a/gcc/target.def b/gcc/target.def
> index 
> 07064ea366a2c0bde0afbf45d78e16d7e9e9d13d..26183a42aef5f30aae7abcd474843200dd5206eb
>  100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -6835,10 +6835,73 @@ HOOK_VECTOR (TARGET_MEMTAG_, memtag)
>  
>  DEFHOOK
>  (can_tag_addresses,
> - "True if backend architecture naturally supports ignoring the top byte of\n\
> - pointers.  This feature means that @option{-fsanitize=hwaddress} can work.",
> + "True if backend architecture naturally supports ignoring some region of\n\
> +pointers.  This feature means that @option{-fsanitize=hwaddress} can work.",
>   bool, (), default_memtag_can_tag_addresses)

Looks like this belongs in an earlier patch, but it doesn't really
matter if you're intending to push as a single commit.

> +DEFHOOK
> +(granule_size,
> + "Return the size in real memory that each byte in shadow memory refers 
> to.\n\
> +I.e. if a variable is X bytes long in memory, then this hook should return\n\
> +the value Y such that the tag in shadow memory spans X/Y bytes.\n\
> +\n\
> +Most variables will need to be aligned to this amount since two variables\n\
> +that are neighbours in memory and share a tag granule would need to share\n\

s/neighbours/neighbors/

> +the same tag.\n\
> +\n\
> +The default returns 16.",

Should use texinfo markup, so s/X/@var{x}/g and s/Y/@var{y}/g.

> +  uint8_t, (), default_memtag_granule_size)
> +
> +DEFHOOK
> +(insert_random_tag,
> + "Create a new register with the address value of @var{base} and a\n\
> +(possibly) random tag in it.\n\
> +This function is used to generate a tagged base for the current stack 
> frame.",
> +  rtx, (rtx base), default_memtag_insert_random_tag)
> +
> +DEFHOOK
> +(add_tag,
> + "Return an RTX that represents the result of adding @var{addr_offset} to\n\
> +the address pointer @var{base} and @var{tag_offset} to the tag in pointer\n\

IMO “the address in pointer @var{base}” would be easier to understand.

> +@var{base}.\n\
> +The resulting RTX must either be a valid memory address or be able to get\n\
> +put into an operand with force_operand.\n\

@code{force_operand}

> +\n\
> +Unlike other memtag hooks, this must return an expression and not emit any\n\
> +RTL.",
> +  rtx, (rtx base, poly_int64 addr_offset, uint8_t tag_offset),
> +  default_memtag_add_tag)
> +
> +DEFHOOK
> +(set_tag,
> + "Return an RTX representing @var{untagged_base} but with the tag 
> @var{tag}.\n\
> +Try and store this in @var{target} if convenient.\n\
> +@var{untagged_base} is required to have a zero tag when this hook is 
> called.\n\
> +The default of this hook is to set the top byte of @var{untagged_base} to\n\
> +@var{tag}.",
> +  rtx, (rtx untagged_base, rtx tag, rtx target), default_memtag_set_tag)
> +
> +DEFHOOK
> +(extract_tag,
> + "Return an RTX representing the tag stored in @var{tagged_pointer}.\n\
> +Store the result in @var{target} if it is convenient.\n\
> +The default represents the top byte of the original pointer.",
> +  rtx, (rtx tagged_pointer, rtx target), default_memtag_extract_tag)
> +
> +DEFHOOK
> +(untagged_pointer,
> + "Return an RTX representing @var{tagged_pointer} with its tag set to 
> zero.\n\
> +Store the result in @var{target} if convenient.\n\
> +The default clears the top byte of the original pointer.",
> +  rtx, (rtx tagged_pointer, rtx target), default_memtag_untagged_pointer)
> +
>  HOOK_VECTOR_END (memtag)
>  #undef HOOK_PREFIX
>  #define HOOK_PREFIX "TARGET_"

These hooks look like a nice abstraction, thanks.

> @@ -2379,10 +2383,130 @@ default_speculation_safe_value (machine_mode mode 
> ATTRIBUTE_UNUSED,
>    return result;
>  }
>  
> +/* How many bits to shift in order to access the tag bits.
> +   The default is to store the tag in the top 8 bits of a 64 bit pointer, 
> hence
> +   shifting 56 bits will leave just the tag.
> +   We require that the shift is less than 64 for the `const_int_rtx` use 
> here,
> +   that's also a requirement for the tag to actually be stored on any systems
> +   with 64 bit (or smaller) pointers.  */
> +#define HWASAN_SHIFT 56

I think (GET_MODE_PRECISION (Pmode) - 8) would be better.

> +#define HWASAN_SHIFT_RTX const_int_rtx[MAX_SAVED_CONST_INT + HWASAN_SHIFT]

And similarly this can use GEN_INT (HWASAN_SHIFT).  Using GEN_INT rather
gen_int_mode isn't great, but it's safe in context, and GCC doesn't have
an interface for directly specifying the correct mode for a shift amount.

> +
>  bool
>  default_memtag_can_tag_addresses ()
>  {
>    return false;
>  }
>  
> +uint8_t
> +default_memtag_tag_size ()
> +{
> +  return 8;
> +}
> +
> +uint8_t
> +default_memtag_granule_size ()
> +{
> +  return 16;
> +}
> +
> +/* The default implementation of TARGET_MEMTAG_INSERT_RANDOM_TAG.  */
> +rtx
> +default_memtag_insert_random_tag (rtx untagged)
> +{
> +  gcc_assert (param_hwasan_instrument_stack);
> +  if (param_hwasan_random_frame_tag)
> +    {
> +      rtx base = gen_reg_rtx (Pmode);
> +      rtx temp = gen_reg_rtx (QImode);
> +      rtx fn = init_one_libfunc ("__hwasan_generate_tag");
> +      rtx new_tag = emit_library_call_value (fn, temp, LCT_NORMAL, QImode);
> +      rtx ret = targetm.memtag.set_tag (untagged, new_tag, base);
> +      if (ret != base)
> +     emit_move_insn (base, ret);
> +      return base;

The idea with “store in target if convenient” interfaces is that the
caller shouldn't (need to) create fresh registers itself.  So this
should just be:

      return targetm.memtag.set_tag (untagged, new_tag, NULL_RTX);

(with no “base” or “ret” variables).  Similarly there's no need
to create “tmp” for passing to emit_library_call_value; you can
just pass NULL_RTX instead.

FWIW, passing in a target to this hook too would be fine with me.
It's just that (as with other interfaces like this) the caller
couldn't assume that the target is actually used.

> +    }
> +  else
> +    {
> +      /* NOTE: The kernel API does not have __hwasan_generate_tag exposed.
> +      In the future we may add the option emit random tags with inline
> +      instrumentation instead of function calls.  This would be the same
> +      between the kernel and userland.  */
> +      return untagged;
> +    }
> +}
> +
> +/* The default implementation of TARGET_MEMTAG_ADD_TAG.  */
> +rtx
> +default_memtag_add_tag (rtx base, poly_int64 offset, uint8_t tag_offset)
> +{
> +  /* Need to look into what the most efficient code sequence is.
> +     This is a code sequence that would be emitted *many* times, so we
> +     want it as small as possible.
> +
> +     There are two places where tag overflow is a question:
> +       - Tagging the shadow stack.
> +       (both tagging and untagging).
> +       - Tagging addressable pointers.
> +
> +     We need to ensure both behaviors are the same (i.e. that the tag that
> +     ends up in a pointer after "overflowing" the tag bits with a tag 
> addition
> +     is the same that ends up in the shadow space).
> +
> +     The aim is that the behavior of tag addition should follow modulo
> +     wrapping in both instances.
> +
> +     The libhwasan code doesn't have any path that increments a pointer's 
> tag,
> +     which means it has no opinion on what happens when a tag increment
> +     overflows (and hence we can choose our own behavior).
> +
> +     NOTE:
> +     Here we return an expression which represents the base with the
> +     provided offsets.
> +     This does not have to be a valid operand to anything, since the
> +     `force_operand` machinery in the compiler already handles this.  */
> +
> +  offset += ((uint64_t)tag_offset << HWASAN_SHIFT);
> +  return plus_constant (Pmode, base, offset);
> +}

Maybe this is just me, but the code is so simple and so in line with
what I'd expect that I think it might be clearer to drop the “NOTE: …”
part of the comment.  I thought at first it was describing something
specific to this particular hook implementation, but I think really
it's just restating the interface described in the documentation.

(To be clear: the rest of the comment is good, it's just the NOTE
part that I found confusing.)

Thanks,
Richard

Reply via email to