
in the past weeks I've been looking into prototyping both spectre V1 
(speculative array bound bypass) diagnostics and mitigation in an
architecture independent manner to assess feasability and some kind
of upper bound on the performance impact one can expect.
https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is
an interesting read in this context as well.

For simplicity I have implemented mitigation on GIMPLE right before
RTL expansion and have chosen TLS to do mitigation across function
boundaries.  Diagnostics sit in the same place but both are not in
any way dependent on each other.

The mitigation strategy chosen is that of tracking speculation
state via a mask that can be used to zero parts of the addresses
that leak the actual data.  That's similar to what aarch64 does
with -mtrack-speculation (but oddly there's no mitigation there).

I've optimized things to the point that is reasonable when working
target independent on GIMPLE but I've only looked at x86 assembly
and performance.  I expect any "final" mitigation if we choose to
implement and integrate such would be after RTL expansion since
RTL expansion can end up introducing quite some control flow whose
speculation state is not properly tracked by the prototype.

I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs
were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local
mitigation and =3 does mitigation global with passing the state
via TLS memory.

The following was measured on a Haswell desktop CPU:

        -O2 vs. -O2 -fspectre-v1=2

                                  Estimated                       Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  ---------
400.perlbench    9770        245       39.8 *    9770        452       21.6 *  
401.bzip2        9650        378       25.5 *    9650        726       13.3 *  
403.gcc          8050        236       34.2 *    8050        352       22.8 *  
429.mcf          9120        223       40.9 *    9120        656       13.9 *  
445.gobmk       10490        400       26.2 *   10490        666       15.8 *  
456.hmmer        9330        388       24.1 *    9330        536       17.4 *  
458.sjeng       12100        437       27.7 *   12100        661       18.3 *  
462.libquantum  20720        300       69.1 *   20720        384       53.9 *  
464.h264ref     22130        451       49.1 *   22130        586       37.8 *  
471.omnetpp      6250        291       21.5 *    6250        398       15.7 *  
473.astar        7020        334       21.0 *    7020        522       13.5 *  
483.xalancbmk    6900        182       37.9 *    6900        306       22.6 *  
 Est. SPECint_base2006                   --
 Est. SPECint2006                                                        --

   -O2 -fspectre-v1=3

                                  Estimated                       Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  ---------
400.perlbench                                    9770        497       19.6 *  
401.bzip2                                        9650        772       12.5 *  
403.gcc                                          8050        427       18.9 *  
429.mcf                                          9120        696       13.1 *  
445.gobmk                                       10490        726       14.4 *  
456.hmmer                                        9330        537       17.4 *  
458.sjeng                                       12100        721       16.8 *  
462.libquantum                                  20720        446       46.4 *  
464.h264ref                                     22130        613       36.1 *  
471.omnetpp                                      6250        471       13.3 *  
473.astar                                        7020        579       12.1 *  
483.xalancbmk                                    6900        350       19.7 *  
 Est. SPECint(R)_base2006           Not Run
 Est. SPECint2006                                                        --

While the following was measured on a Zen Epyc server:

-O2 vs -O2 -fspectre-v1=2

                       Estimated                       Estimated
                 Base     Base        Base        Peak     Peak        Peak
Benchmarks       Copies  Run Time     Rate        Copies  Run Time     Rate
--------------- -------  ---------  ---------    -------  ---------  ---------
500.perlbench_r       1        499       3.19  *       1        621       2.56  
* 124%
502.gcc_r             1        286       4.95  *       1        392       3.61  
* 137%
505.mcf_r             1        331       4.88  *       1        456       3.55  
* 138%
520.omnetpp_r         1        454       2.89  *       1        563       2.33  
* 124%
523.xalancbmk_r       1        328       3.22  *       1        569       1.86  
* 173%
525.x264_r            1        518       3.38  *       1        776       2.26  
* 150%
531.deepsjeng_r       1        365       3.14  *       1        448       2.56  
* 123%
541.leela_r           1        598       2.77  *       1        729       2.27  
* 122%
548.exchange2_r       1        460       5.69  *       1        756       3.46  
* 164%
557.xz_r              1        403       2.68  *       1        586       1.84  
* 145%
 Est. SPECrate2017_int_base              3.55
 Est. SPECrate2017_int_peak                                               2.56  

-O2 -fspectre-v2=3

                       Estimated                       Estimated
                 Base     Base        Base        Peak     Peak        Peak
Benchmarks       Copies  Run Time     Rate        Copies  Run Time     Rate
--------------- -------  ---------  ---------    -------  ---------  ---------
500.perlbench_r                               NR       1        700       2.27  
* 140%
502.gcc_r                                     NR       1        485       2.92  
* 170%
505.mcf_r                                     NR       1        596       2.71  
* 180%
520.omnetpp_r                                 NR       1        604       2.17  
* 133%
523.xalancbmk_r                               NR       1        643       1.64  
* 196%
525.x264_r                                    NR       1        797       2.20  
* 154%
531.deepsjeng_r                               NR       1        542       2.12  
* 149%
541.leela_r                                   NR       1        872       1.90  
* 146%
548.exchange2_r                               NR       1        761       3.44  
* 165%
557.xz_r                                      NR       1        595       1.81  
* 148%
 Est. SPECrate2017_int_base           Not Run
 Est. SPECrate2017_int_peak                                               2.26  

you can see, even thoug we're comparing apples and oranges, that the 
performance impact is quite dependent on the microarchitecture.

Similarly interesting as performance is the effect on text size which is
surprisingly high (_best_ case is 13 bytes per conditional branch plus 3
bytes per instrumented memory).

   BASE  -O2
   text    data     bss     dec     hex filename
1117726   20928   12704 1151358  11917e 400.perlbench
  56568    3800    4416   64784    fd10 401.bzip2
3419568    7912  751520 4179000  3fc438 403.gcc
  12212     712   11984   24908    614c 429.mcf
1460694 2081772 2330096 5872562  599bb2 445.gobmk
 284929    5956   82040  372925   5b0bd 456.hmmer
 130782    2152 2576896 2709830  295946 458.sjeng
  41915     764      96   42775    a717 462.libquantum
 505452   11220  372320  888992   d90a0 464.h264ref
 638188    9584   14664  662436   a1ba4 471.omnetpp
  38859     900    5216   44975    afaf 473.astar
4033878  140248   12168 4186294  3fe0b6 483.xalancbmk
   PEAK -O2 -fspectre-v1=2
   text    data     bss     dec     hex filename
1508032   20928   12704 1541664  178620 400.perlbench   135%
  76098    3800    4416   84314   1495a 401.bzip2       135%
4483530    7912  751520 5242962  500052 403.gcc         131%
  16006     712   11984   28702    701e 429.mcf         131%
1647384 2081772 2330096 6059252  5c74f4 445.gobmk       112%
 377259    5956   82040  465255   71967 456.hmmer       132%
 164672    2152 2576896 2743720  29dda8 458.sjeng       126%
  47901     764      96   48761    be79 462.libquantum  114%
 649854   11220  372320 1033394   fc4b2 464.h264ref     129%
 706908    9584   14664  731156   b2814 471.omnetpp     111%
  48493     900    5216   54609    d551 473.astar       125%
4862056  140248   12168 5014472  4c83c8 483.xalancbmk   121%
   PEAK -O2 -fspectre-v1=3
   text    data     bss     dec     hex filename
1742008   20936   12704 1775648  1b1820 400.perlbench   156%
  83338    3808    4416   91562   165aa 401.bzip2       147%
5219850    7920  751520 5979290  5b3c9a 403.gcc         153%
  17422     720   11984   30126    75ae 429.mcf         143%
1801688 2081780 2330096 6213564  5ecfbc 445.gobmk       123%
 431827    5964   82040  519831   7ee97 456.hmmer       152%
 182200    2160 2576896 2761256  2a2228 458.sjeng       139%
  53773     772      96   54641    d571 462.libquantum  128%
 691798   11228  372320 1075346  106892 464.h264ref     137%
 976692    9592   14664 1000948   f45f4 471.omnetpp     153%
  54525     908    5216   60649    ece9 473.astar       140%
5808306  140256   12168 5960730  5af41a 483.xalancbmk   144%

   BASE -O2 -g
   text    data     bss     dec     hex filename
2209713    8576    9080 2227369  21fca9 500.perlbench_r
9295702   37432 1150664 10483798 9ff856 502.gcc_r
  21795     712     744   23251    5ad3 505.mcf_r
2067560    8984   46888 2123432  2066a8 520.omnetpp_r
5763577  142584   20040 5926201  5a6d39 523.xalancbmk_r
 508402    6102   29592  544096   84d60 525.x264_r
  84222     784 12138360 12223366 ba8386 531.deepsjeng_r
 223480    8544   30072  262096   3ffd0 541.leela_r
  70554     864    6384   77802   12fea 548.exchange2_r
 180640     884   17704  199228   30a3c 557.xz_r
   PEAK -fspectre-v2=2
   text    data     bss     dec     hex filename
2991161    8576    9080 3008817  2de931 500.perlbench_r 135%
12244886  37432 1150664 13432982 ccf896 502.gcc_r       132%
  28475     712     744   29931    74eb 505.mcf_r       131%
2397026    8984   46888 2452898  256da2 520.omnetpp_r   116%
6846853  142584   20040 7009477  6af4c5 523.xalancbmk_r 119%
 645730    6102   29592  681424   a65d0 525.x264_r      127%
 111166     784 12138360 12250310 baecc6 531.deepsjeng_r 132%
 260835    8544   30072  299451   491bb 541.leela_r     117%
  96874     864    6384  104122   196ba 548.exchange2_r 137%
 215288     884   17704  233876   39194 557.xz_r        119%
   PEAK -fspectre-v2=3
   text    data     bss     dec     hex filename
3365945    8584    9080 3383609  33a139 500.perlbench_r 152%
14790638  37440 1150664 15978742 f3d0f6 502.gcc_r       159%
  31419     720     744   32883    8073 505.mcf_r       144%
2867893    8992   46888 2923773  2c9cfd 520.omnetpp_r   139%
8183689  142592   20040 8346321  7f5ad1 523.xalancbmk_r 142%
 697434    6110   29592  733136   b2fd0 525.x264_r      137%
 123638     792 12138360 12262790 bb1d86 531.deepsjeng_r 147%
 315347    8552   30072  353971   566b3 541.leela_r     141%
  98578     872    6384  105834   19d6a 548.exchange2_r 140%
 239144     892   17704  257740   3eecc 557.xz_r        133%

The patch relies heavily on RTL optimizations for DCE purposes.  At the
same time we rely on RTL not statically computing the mask (RTL has no
conditional constant propagation).  Full instrumentation of the classic
Spectre V1 testcase

char a[1024];
int b[1024];
int foo (int i, int bound)
  if (i < bound)
    return b[a[i]];

is the following:

        xorl    %eax, %eax
        cmpl    %esi, %edi
        setge   %al
        subq    $1, %rax
        jne     .L4
        .p2align 4,,10
        .p2align 3
        andl    %eax, %edi
        movslq  %edi, %rdi
        movsbq  a(%rdi), %rax
        movl    b(,%rax,4), %eax

so the generated GIMPLE was "tuned" for reasonable x86 assembler outcome.

Patch below for reference (and your own testing in case you are curious).
I do not plan to pursue this further at this point.


>From 01e4a5a43e266065d32489daa50de0cf2425d5f5 Mon Sep 17 00:00:00 2001
From: Richard Guenther <rguent...@suse.de>
Date: Wed, 5 Dec 2018 13:17:02 +0100
Subject: [PATCH] warn-spectrev1

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7960cace16a..64d472d7fa0 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1334,6 +1334,7 @@ OBJS = \
        gimple-ssa-sprintf.o \
        gimple-ssa-warn-alloca.o \
        gimple-ssa-warn-restrict.o \
+       gimple-ssa-spectrev1.o \
        gimple-streamer-in.o \
        gimple-streamer-out.o \
        gimple-walk.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 45d7f6189e5..1ae7fcfe177 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -702,6 +702,10 @@ Warn when one local variable shadows another local 
variable or parameter of comp
 Common Warning Undocumented Alias(Wshadow=compatible-local)
+Common Var(warn_spectrev1) Warning
+Warn about code susceptible to spectre v1 style attacks.
 Common Var(warn_stack_protect) Warning
 Warn when not issuing stack smashing protection for some reason.
@@ -2406,6 +2410,14 @@ fsingle-precision-constant
 Common Report Var(flag_single_precision_constant) Optimization
 Convert floating point constants to single precision constants.
+Common Alias(fspectre-v1=, 2, 0)
+Insert code to mitigate spectre v1 style attacks.
+Common Report RejectNegative Joined UInteger IntegerRange(0, 3) 
Var(flag_spectrev1) Optimization
+Insert code to mitigate spectre v1 style attacks.
 Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization
 Split lifetimes of induction variables when loops are unrolled.
diff --git a/gcc/gimple-ssa-spectrev1.cc b/gcc/gimple-ssa-spectrev1.cc
new file mode 100644
index 00000000000..c2a5dc95324
--- /dev/null
+++ b/gcc/gimple-ssa-spectrev1.cc
@@ -0,0 +1,824 @@
+/* Loop interchange.
+   Copyright (C) 2017-2018 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+This file is part of GCC.
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "is-a.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "gimple-pretty-print.h"
+#include "gimple-iterator.h"
+#include "params.h"
+#include "tree-ssa.h"
+#include "cfganal.h"
+#include "gimple-walk.h"
+#include "tree-ssa-loop.h"
+#include "tree-dfa.h"
+#include "tree-cfg.h"
+#include "fold-const.h"
+#include "builtins.h"
+#include "alias.h"
+#include "cfgloop.h"
+#include "varasm.h"
+#include "cgraph.h"
+#include "gimple-fold.h"
+#include "diagnostic.h"
+/* The Spectre V1 situation is as follows:
+      if (attacker_controlled_idx < bound)  // speculated as true but is false
+        {
+         // out-of-bound access, returns value interesting to attacker
+         val = mem[attacker_controlled_idx];
+         // access that causes a cache-line to be brought in - canary
+         ... = attacker_controlled_mem[val];
+       }
+   The last load provides the side-channel.  The pattern can be split
+   into multiple functions or translation units.  Conservatively we'd
+   have to warn about
+      int foo (int *a) {  return *a; }
+   thus any indirect (or indexed) memory access.  That's obvioulsy
+   not useful.
+   The next level would be to warn only when we see load of val as
+   well.  That then misses cases like
+      int foo (int *a, int *b)
+      {
+        int idx = load_it (a);
+       return load_it (&b[idx]);
+      }
+   Still we'd warn about cases like
+      struct Foo { int *a; };
+      int foo (struct Foo *a) { return *a->a; }
+   though dereferencing VAL isn't really an interesting case.  It's
+   hard to exclude this conservatively so the obvious solution is
+   to restrict the kind of loads that produce val, for example based
+   on its type or its number of bits.  It's tempting to do this at
+   the point of the load producing val but in the end what matters
+   is the number of bits that reach the second loads [as index] given
+   there are practical limits on the size of the canary.  For this
+   we have to consider
+      int foo (struct Foo *a, int *b)
+      {
+        int *c = a->a;
+       int idx = *b;
+       return *(c + idx);
+      }
+   where idx has too many bits to be an interesting attack vector(?).
+ */
+/* The pass does two things, first it performs data flow analysis
+   to be able to warn about the second load.  This is controlled
+   via -Wspectre-v1.
+   Second it instruments control flow in the program to track a
+   mask which is all-ones but all-zeroes if the CPU speculated
+   a branch in the wrong direction.  This mask is then used to
+   mask the address[-part(s)] of loads with non-invariant addresses,
+   effectively mitigating the attack.  This is controlled by
+   -fpectre-v1[=N] where N is default 2 and
+     1  optimistically omit some instrumentations (currently
+        backedge control flow instructions do not update the
+       speculation mask)
+     2  instrument conservatively using a function-local speculation
+        mask
+     3  instrument conservatively using a global (TLS) speculation
+        mask.  This adds TLS loads/stores of the speculation mask
+       at function boundaries and before and after calls.
+ */
+/* We annotate statements whose defs cannot be used to leaking data
+   speculatively via loads with SV1_SAFE.  This is used to optimize
+   masking of indices where masked indices (and derived by constant
+   ones) are not masked again.  Note this works only up to the points
+   that possibly change the speculation mask value.  */
+#define SV1_SAFE GF_PLF_1
+namespace {
+const pass_data pass_data_spectrev1 =
+  GIMPLE_PASS, /* type */
+  "spectrev1", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_cfg|PROP_ssa, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa, /* todo_flags_finish */
+class pass_spectrev1 : public gimple_opt_pass
+  pass_spectrev1 (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_spectrev1, ctxt)
+  {}
+  /* opt_pass methods: */
+  opt_pass * clone () { return new pass_spectrev1 (m_ctxt); }
+  virtual bool gate (function *) { return warn_spectrev1 || flag_spectrev1; }
+  virtual unsigned int execute (function *);
+  static bool stmt_is_indexed_load (gimple *);
+  static bool stmt_mangles_index (gimple *, tree);
+  static bool find_value_dependent_guard (gimple *, tree);
+  static void mark_influencing_outgoing_flow (basic_block, tree);
+  static tree instrument_mem (gimple_stmt_iterator *, tree, tree);
+}; // class pass_spectrev1
+bitmap_head *influencing_outgoing_flow;
+static bool
+call_between (gimple *first, gimple *second)
+  gcc_assert (gimple_bb (first) == gimple_bb (second));
+  /* ???  This is inefficient.  Maybe we can use gimple_uid to assign
+     unique IDs to stmts belonging to groups with the same speculation
+     mask state.  */
+  for (gimple_stmt_iterator gsi = gsi_for_stmt (first);
+       gsi_stmt (gsi) != second; gsi_next (&gsi))
+    if (is_gimple_call (gsi_stmt (gsi)))
+      return true;
+  return false;
+basic_block ctx_bb;
+gimple *ctx_stmt;
+static bool
+gather_indexes (tree, tree *idx, void *data)
+  vec<tree *> *indexes = (vec<tree *> *)data;
+  if (TREE_CODE (*idx) != SSA_NAME)
+    return true;
+  if (!SSA_NAME_IS_DEFAULT_DEF (*idx)
+      && gimple_bb (SSA_NAME_DEF_STMT (*idx)) == ctx_bb
+      && gimple_plf (SSA_NAME_DEF_STMT (*idx), SV1_SAFE)
+      && (flag_spectrev1 < 3
+         || !call_between (SSA_NAME_DEF_STMT (*idx), ctx_stmt)))
+    return true;
+  if (indexes->is_empty ())
+    indexes->safe_push (idx);
+  else if (*(*indexes)[0] == *idx)
+    indexes->safe_push (idx);
+  else
+    return false;
+  return true;
+pass_spectrev1::instrument_mem (gimple_stmt_iterator *gsi, tree mem, tree mask)
+  /* First try to see if we can find a single index we can zero which
+     has the chance of repeating in other loads and also avoids separate
+     LEA and memory references decreasing code size and AGU occupancy.  */
+  auto_vec<tree *, 8> indexes;
+  ctx_bb = gsi_bb (*gsi);
+  ctx_stmt = gsi_stmt (*gsi);
+      && for_each_index (&mem, gather_indexes, (void *)&indexes))
+    {
+      /* All indices are safe.  */
+      if (indexes.is_empty ())
+       return mem;
+      if (TYPE_PRECISION (TREE_TYPE (*indexes[0]))
+         <= TYPE_PRECISION (TREE_TYPE (mask)))
+       {
+         tree idx = *indexes[0];
+         gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (idx))
+                     || POINTER_TYPE_P (TREE_TYPE (idx)));
+         /* Instead of instrumenting IDX directly we could look at
+            definitions with a single SSA use and instrument that
+            instead.  But we have to do some work to make SV1_SAFE
+            propagation updated then - this would really ask to first
+            gather all indexes of all refs we want to instrument and
+            compute some optimal set of instrumentations.  */
+         gimple_seq seq = NULL;
+         tree idx_mask = gimple_convert (&seq, TREE_TYPE (idx), mask);
+         tree masked_idx = gimple_build (&seq, BIT_AND_EXPR,
+                                         TREE_TYPE (idx), idx, idx_mask);
+         /* Mark the instrumentation sequence as visited.  */
+         for (gimple_stmt_iterator si = gsi_start (seq);
+              !gsi_end_p (si); gsi_next (&si))
+           gimple_set_visited (gsi_stmt (si), true);
+         gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+         gimple_set_plf (SSA_NAME_DEF_STMT (masked_idx), SV1_SAFE, true);
+         /* Replace downstream users in the BB which reduces register pressure
+            and allows SV1_SAFE propagation to work (which stops at call/BB
+            boundaries though).
+            ???  This is really reg-pressure vs. dependence chains so not
+            a generally easy thing.  Making the following propagate into
+            all uses dominated by the insert slows down 429.mcf even more.
+            ???  We can actually track SV1_SAFE across PHIs but then we
+            have to propagate into PHIs here.  */
+         gimple *use_stmt;
+         use_operand_p use_p;
+         imm_use_iterator iter;
+         FOR_EACH_IMM_USE_STMT (use_stmt, iter, idx)
+           if (gimple_bb (use_stmt) == gsi_bb (*gsi)
+               && gimple_code (use_stmt) != GIMPLE_PHI
+               && !gimple_visited_p (use_stmt))
+             {
+               FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
+                 SET_USE (use_p, masked_idx);
+               update_stmt (use_stmt);
+             }
+         /* Modify MEM in place...  (our stmt is already marked visited).  */
+         for (unsigned i = 0; i < indexes.length (); ++i)
+           *indexes[i] = masked_idx;
+         return mem;
+       }
+    }
+  /* ???  Can we handle TYPE_REVERSE_STORAGE_ORDER at all?  Need to
+     handle BIT_FIELD_REFs.  */
+  /* Strip a bitfield reference to re-apply it at the end.  */
+  tree bitfield = NULL_TREE;
+  tree bitfield_off = NULL_TREE;
+      && DECL_BIT_FIELD (TREE_OPERAND (mem, 1)))
+    {
+      bitfield = TREE_OPERAND (mem, 1);
+      bitfield_off = TREE_OPERAND (mem, 2);
+      mem = TREE_OPERAND (mem, 0);
+    }
+  tree ptr_base = mem;
+  /* VIEW_CONVERT_EXPRs do not change offset, strip them, they get folded
+     into the MEM_REF we create.  */
+  while (TREE_CODE (ptr_base) == VIEW_CONVERT_EXPR)
+    ptr_base = TREE_OPERAND (ptr_base, 0);
+  tree ptr = make_ssa_name (ptr_type_node);
+  gimple *new_stmt = gimple_build_assign (ptr, build_fold_addr_expr 
+  gimple_set_visited (new_stmt, true);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  ptr = make_ssa_name (ptr_type_node);
+  new_stmt = gimple_build_assign (ptr, BIT_AND_EXPR,
+                                 gimple_assign_lhs (new_stmt), mask);
+  gimple_set_visited (new_stmt, true);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  tree type = TREE_TYPE (mem);
+  unsigned align = get_object_alignment (mem);
+  if (align != TYPE_ALIGN (type))
+    type = build_aligned_type (type, align);
+  tree new_mem = build2 (MEM_REF, type, ptr,
+                        build_int_cst (reference_alias_ptr_type (mem), 0));
+  if (bitfield)
+    new_mem = build3 (COMPONENT_REF, TREE_TYPE (bitfield), new_mem,
+                     bitfield, bitfield_off);
+  return new_mem;
+check_spectrev1_2nd_load (tree, tree *idx, void *data)
+  sbitmap value_from_indexed_load = (sbitmap)data;
+  if (TREE_CODE (*idx) == SSA_NAME
+      && bitmap_bit_p (value_from_indexed_load, SSA_NAME_VERSION (*idx)))
+    return false;
+  return true;
+check_spectrev1_2nd_load (gimple *, tree, tree ref, void *data)
+  return !for_each_index (&ref, check_spectrev1_2nd_load, data);
+pass_spectrev1::mark_influencing_outgoing_flow (basic_block bb, tree op)
+  if (!bitmap_set_bit (&influencing_outgoing_flow[SSA_NAME_VERSION (op)],
+                      bb->index))
+    return;
+  /* Note we are deliberately non-conservatively stop at call and
+     memory boundaries here expecting earlier optimization to expose
+     value dependences via SSA chains.  */
+  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
+  if (gimple_vuse (def_stmt)
+      || !is_gimple_assign (def_stmt))
+    return;
+  ssa_op_iter i;
+  FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, i, SSA_OP_USE)
+    mark_influencing_outgoing_flow (bb, op);
+pass_spectrev1::find_value_dependent_guard (gimple *stmt, tree op)
+  bitmap_iterator bi;
+  unsigned i;
+  EXECUTE_IF_SET_IN_BITMAP (&influencing_outgoing_flow[SSA_NAME_VERSION (op)],
+                           0, i, bi)
+    /* ???  If control-dependent on.
+       ???  Make bits in influencing_outgoing_flow the index of the BB
+       in RPO order so we could walk bits from STMT "upwards" finding
+       the nearest one.  */
+    if (dominated_by_p (CDI_DOMINATORS,
+                       gimple_bb (stmt), BASIC_BLOCK_FOR_FN (cfun, i)))
+      {
+       if (dump_enabled_p ())
+         dump_printf_loc (MSG_NOTE, stmt, "Condition %G in block %d "
+                          "is related to indexes used in %G\n",
+                          last_stmt (BASIC_BLOCK_FOR_FN (cfun, i)),
+                          i, stmt);
+       return true;
+      }
+  /* Note we are deliberately non-conservatively stop at call and
+     memory boundaries here expecting earlier optimization to expose
+     value dependences via SSA chains.  */
+  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
+  if (gimple_vuse (def_stmt)
+      || !is_gimple_assign (def_stmt))
+    return false;
+  ssa_op_iter it;
+  FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, it, SSA_OP_USE)
+    if (find_value_dependent_guard (stmt, op))
+      /* Others may be "nearer".  */
+      return true;
+  return false;
+pass_spectrev1::stmt_is_indexed_load (gimple *stmt)
+  /* Given we ignore the function boundary for incoming parameters
+     let's ignore return values of calls as well for the purpose
+     of being the first indexed load (also ignore inline-asms).  */
+  if (!gimple_assign_load_p (stmt))
+    return false;
+  /* Exclude esp. pointers from the index load itself (but also floats,
+     vectors, etc. - quite a bit handwaving here).  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))))
+    return false;
+  /* If we do not have any SSA uses the load cannot be one indexed
+     by an attacker controlled value.  */
+  if (zero_ssa_operands (stmt, SSA_OP_USE))
+    return false;
+  return true;
+/* Return true whether the index in the use operand OP in STMT is
+   not transfered to STMTs defs.  */
+pass_spectrev1::stmt_mangles_index (gimple *stmt, tree op)
+  if (gimple_assign_load_p (stmt))
+    return true;
+  if (gassign *ass = dyn_cast <gassign *> (stmt))
+    {
+      enum tree_code code = gimple_assign_rhs_code (ass);
+      switch (code)
+       {
+       case TRUNC_DIV_EXPR:
+       case CEIL_DIV_EXPR:
+       case FLOOR_DIV_EXPR:
+       case ROUND_DIV_EXPR:
+       case EXACT_DIV_EXPR:
+       case RDIV_EXPR:
+       case TRUNC_MOD_EXPR:
+       case CEIL_MOD_EXPR:
+       case FLOOR_MOD_EXPR:
+       case ROUND_MOD_EXPR:
+       case LSHIFT_EXPR:
+       case RSHIFT_EXPR:
+       case LROTATE_EXPR:
+       case RROTATE_EXPR:
+         /* Division, modulus or shifts by the index do not produce
+            something useful for the attacker.  */
+         if (gimple_assign_rhs2 (ass) == op)
+           return true;
+         break;
+       default:;
+         /* Comparisons do not produce an index value.  */
+         if (TREE_CODE_CLASS (code) == tcc_comparison)
+           return true;
+       }
+    }
+  /* ???  We could handle builtins here.  */
+  return false;
+static GTY(()) tree spectrev1_tls_mask_decl;
+/* Main entry for spectrev1 pass.  */
+unsigned int
+pass_spectrev1::execute (function *fn)
+  calculate_dominance_info (CDI_DOMINATORS);
+  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
+  int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
+  int rpo_num = pre_and_rev_post_order_compute_fn (fn, NULL, rpo, false);
+  /* We track for each SSA name whether its value (may) depend(s) on
+     the result of an indexed load.
+     A set of operation will kill a value (enough).  */
+  auto_sbitmap value_from_indexed_load (num_ssa_names);
+  bitmap_clear (value_from_indexed_load);
+  unsigned orig_num_ssa_names = num_ssa_names;
+  influencing_outgoing_flow = XCNEWVEC (bitmap_head, num_ssa_names);
+  for (unsigned i = 1; i < num_ssa_names; ++i)
+    bitmap_initialize (&influencing_outgoing_flow[i], &bitmap_default_obstack);
+  /* Diagnosis.  */
+  /* Function arguments are not indexed loads unless we want to
+     be conservative to a level no longer useful.  */
+  for (int i = 0; i < rpo_num; ++i)
+    {
+      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
+      for (gphi_iterator gpi = gsi_start_phis (bb);
+          !gsi_end_p (gpi); gsi_next (&gpi))
+       {
+         gphi *phi = gpi.phi ();
+         bool value_from_indexed_load_p = false;
+         use_operand_p arg_p;
+         ssa_op_iter it;
+         FOR_EACH_PHI_ARG (arg_p, phi, it, SSA_OP_USE)
+           {
+             tree arg = USE_FROM_PTR (arg_p);
+             if (TREE_CODE (arg) == SSA_NAME
+                 && bitmap_bit_p (value_from_indexed_load,
+                                  SSA_NAME_VERSION (arg)))
+               value_from_indexed_load_p = true;
+           }
+         if (value_from_indexed_load_p)
+           bitmap_set_bit (value_from_indexed_load,
+                           SSA_NAME_VERSION (PHI_RESULT (phi)));
+       }
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+          !gsi_end_p (gsi); gsi_next (&gsi))
+       {
+         gimple *stmt = gsi_stmt (gsi);
+         if (is_gimple_debug (stmt))
+           continue;
+         if (walk_stmt_load_store_ops (stmt, value_from_indexed_load,
+                                       check_spectrev1_2nd_load,
+                                       check_spectrev1_2nd_load))
+           warning_at (gimple_location (stmt), OPT_Wspectre_v1, "%Gspectrev1",
+                       stmt);
+         bool value_from_indexed_load_p = false;
+         if (stmt_is_indexed_load (stmt))
+           {
+             /* We are interested in indexes to later loads so ultimatively
+                register values that all happen to separate SSA defs.
+                Interesting aggregates will be decomposed by later loads
+                which we then mark as producing an index.  Simply mark
+                all SSA defs as coming from an indexed load.  */
+             /* We are handling a single load in STMT right now.  */
+             ssa_op_iter it;
+             tree op;
+             FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
+               if (find_value_dependent_guard (stmt, op))
+                 {
+                   /* ???  Somehow record the dependence to point to it in
+                      diagnostics.  */
+                   value_from_indexed_load_p = true;
+                   break;
+                 }
+           }
+         tree op;
+         ssa_op_iter it;
+         FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
+           if (bitmap_bit_p (value_from_indexed_load,
+                             SSA_NAME_VERSION (op))
+               && !stmt_mangles_index (stmt, op))
+             {
+               value_from_indexed_load_p = true;
+               break;
+             }
+         if (value_from_indexed_load_p)
+           FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_DEF)
+             /* ???  We could cut off single-bit values from the chain
+                here or pretain that float loads will be never turned
+                into integer indices, etc.  */
+             bitmap_set_bit (value_from_indexed_load,
+                             SSA_NAME_VERSION (op));
+       }
+      if (EDGE_COUNT (bb->succs) > 1)
+       {
+         gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb));
+         /* ???  What about switches?  What about badly speculated EH?  */
+         if (!stmt)
+           continue;
+         /* We could constrain conditions here to those more likely
+            being "bounds checks".  For example common guards for
+            indirect accesses are NULL pointer checks.
+            ???  This isn't fully safe, but it drops the number of
+            spectre warnings for dwarf2out.i from cc1files from 70 to 16.  */
+         if ((gimple_cond_code (stmt) == EQ_EXPR
+              || gimple_cond_code (stmt) == NE_EXPR)
+             && integer_zerop (gimple_cond_rhs (stmt))
+             && POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt))))
+           ;
+         else
+           {
+             ssa_op_iter it;
+             tree op;
+             FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
+               mark_influencing_outgoing_flow (bb, op);
+           }
+       }
+    }
+  for (unsigned i = 1; i < orig_num_ssa_names; ++i)
+    bitmap_release (&influencing_outgoing_flow[i]);
+  XDELETEVEC (influencing_outgoing_flow);
+  /* Instrumentation.  */
+  if (!flag_spectrev1)
+    return 0;
+  /* Create the default all-ones mask.  When doing IPA instrumentation
+     this should initialize the mask from TLS memory and outgoing edges
+     need to save the mask to TLS memory.  */
+  gimple *new_stmt;
+  if (!spectrev1_tls_mask_decl
+      && flag_spectrev1 >= 3)
+    {
+      /* Use a smaller variable in case sign-extending loads are
+        available?  */
+      spectrev1_tls_mask_decl
+         = build_decl (BUILTINS_LOCATION,
+                       VAR_DECL, NULL_TREE, ptr_type_node);
+      TREE_STATIC (spectrev1_tls_mask_decl) = 1;
+      TREE_PUBLIC (spectrev1_tls_mask_decl) = 1;
+      DECL_VISIBILITY (spectrev1_tls_mask_decl) = VISIBILITY_HIDDEN;
+      DECL_VISIBILITY_SPECIFIED (spectrev1_tls_mask_decl) = 1;
+      DECL_INITIAL (spectrev1_tls_mask_decl)
+         = build_all_ones_cst (ptr_type_node);
+      DECL_NAME (spectrev1_tls_mask_decl) = get_identifier ("__SV1MSK");
+      DECL_ARTIFICIAL (spectrev1_tls_mask_decl) = 1;
+      DECL_IGNORED_P (spectrev1_tls_mask_decl) = 1;
+      varpool_node::finalize_decl (spectrev1_tls_mask_decl);
+      make_decl_one_only (spectrev1_tls_mask_decl,
+                         DECL_ASSEMBLER_NAME (spectrev1_tls_mask_decl));
+      set_decl_tls_model (spectrev1_tls_mask_decl,
+                         decl_default_tls_model (spectrev1_tls_mask_decl));
+    }
+  /* We let the SSA rewriter cope with rewriting mask into SSA and
+     inserting PHI nodes.  */
+  tree mask = create_tmp_reg (ptr_type_node, "spectre_v1_mask");
+  new_stmt = gimple_build_assign (mask,
+                                 flag_spectrev1 >= 3
+                                 ? spectrev1_tls_mask_decl
+                                 : build_all_ones_cst (ptr_type_node));
+  gimple_stmt_iterator gsi
+      = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (fn)));
+  gsi_insert_before (&gsi, new_stmt, GSI_CONTINUE_LINKING);
+  /* We are using the visited flag to track stmts downstream in a BB.  */
+  for (int i = 0; i < rpo_num; ++i)
+    {
+      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
+      for (gphi_iterator gpi = gsi_start_phis (bb);
+          !gsi_end_p (gpi); gsi_next (&gpi))
+       gimple_set_visited (gpi.phi (), false);
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+          !gsi_end_p (gsi); gsi_next (&gsi))
+       gimple_set_visited (gsi_stmt (gsi), false);
+    }
+  for (int i = 0; i < rpo_num; ++i)
+    {
+      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
+      for (gphi_iterator gpi = gsi_start_phis (bb);
+          !gsi_end_p (gpi); gsi_next (&gpi))
+       {
+         gphi *phi = gpi.phi ();
+         /* ???  We can merge SAFE state across BB boundaries in
+            some cases, like when edges are not critical and the
+            state was made SAFE in the tail of the predecessors
+            and not invalidated by calls.   */
+         gimple_set_plf (phi, SV1_SAFE, false);
+       }
+      bool instrumented_call_p = false;
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+          !gsi_end_p (gsi); gsi_next (&gsi))
+       {
+         gimple *stmt = gsi_stmt (gsi);
+         gimple_set_visited (stmt, true);
+         if (is_gimple_debug (stmt))
+           continue;
+         tree op;
+         ssa_op_iter it;
+         bool safe = is_gimple_assign (stmt);
+         if (safe)
+           FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
+             {
+               if (safe
+                   && (SSA_NAME_IS_DEFAULT_DEF (op)
+                       || !gimple_plf (SSA_NAME_DEF_STMT (op), SV1_SAFE)
+                       /* Once mask can have changed we cannot further
+                          propagate safe state.  */
+                       || gimple_bb (SSA_NAME_DEF_STMT (op)) != bb
+                       /* That includes calls if we have instrumented one
+                          in this block.  */
+                       || (instrumented_call_p
+                           && call_between (SSA_NAME_DEF_STMT (op), stmt))))
+                 {
+                   safe = false;
+                   break;
+                 }
+             }
+         gimple_set_plf (stmt, SV1_SAFE, safe);
+         /* Instrument bounded loads.
+            We instrument non-aggregate loads with non-invariant address.
+            The idea is to reliably instrument the bounded load while
+            leaving the canary, being it load or store, aggregate or
+            non-aggregate, alone.  */
+         if (gimple_assign_single_p (stmt)
+             && gimple_vuse (stmt)
+             && !gimple_vdef (stmt)
+             && !zero_ssa_operands (stmt, SSA_OP_USE))
+           {
+             tree new_mem = instrument_mem (&gsi, gimple_assign_rhs1 (stmt),
+                                            mask);
+             gimple_assign_set_rhs1 (stmt, new_mem);
+             update_stmt (stmt);
+             /* The value loaded my a masked load is "safe".  */
+             gimple_set_plf (stmt, SV1_SAFE, true);
+           }
+         /* Instrument return store to TLS mask.  */
+         if (flag_spectrev1 >= 3
+             && gimple_code (stmt) == GIMPLE_RETURN)
+           {
+             new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask);
+             gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+           }
+         /* Instrument calls with store/load to/from TLS mask.
+            ???  Placement of the stores/loads can be optimized in a LCM
+            way.  */
+         else if (flag_spectrev1 >= 3
+                  && is_gimple_call (stmt)
+                  && gimple_vuse (stmt))
+           {
+             new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask);
+             gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+             if (!stmt_ends_bb_p (stmt))
+               {
+                 new_stmt = gimple_build_assign (mask,
+                                                 spectrev1_tls_mask_decl);
+                 gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
+               }
+             else
+               {
+                 edge_iterator ei;
+                 edge e;
+                 FOR_EACH_EDGE (e, ei, bb->succs)
+                   {
+                     if (e->flags & EDGE_ABNORMAL)
+                       continue;
+                     new_stmt = gimple_build_assign (mask,
+                                                     spectrev1_tls_mask_decl);
+                     gsi_insert_on_edge (e, new_stmt);
+                   }
+               }
+             instrumented_call_p = true;
+           }
+       }
+      if (EDGE_COUNT (bb->succs) > 1)
+       {
+         gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb));
+         /* ???  What about switches?  What about badly speculated EH?  */
+         if (!stmt)
+           continue;
+         /* Instrument conditional branches to track mis-speculation
+            via a pointer-sized mask.
+            ???  We could restrict to instrumenting those conditions
+            that control interesting loads or apply simple heuristics
+            like not instrumenting FP compares or equality compares
+            which are unlikely bounds checks.  But we have to instrument
+            bool != 0 because multiple conditions might have been
+            combined.  */
+         edge truee, falsee;
+         extract_true_false_edges_from_block (bb, &truee, &falsee);
+         /* Unless -fspectre-v1=2 we do not instrument loop exit tests.  */
+         if (flag_spectrev1 >= 2
+             || !loop_exits_from_bb_p (bb->loop_father, bb))
+           {
+             gimple_stmt_iterator gsi = gsi_last_bb (bb);
+             /* Instrument
+                  if (a_1 > b_2)
+                as
+                  tem_mask_3 = a_1 > b_2 ? -1 : 0;
+                  if (tem_mask_3 != 0)
+                this will result in a
+                  xor %eax, %eax; cmp|test; setCC %al; sub $0x1, %eax; jne
+                sequence which is faster in practice than when retaining
+                the original jump condition.  This is 10 bytes overhead
+                on x86_64 plus 3 bytes for an and on the true path and
+                5 bytes for an and and not on the false path.  */
+             tree tem_mask = make_ssa_name (ptr_type_node);
+             new_stmt = gimple_build_assign (tem_mask, COND_EXPR,
+                                             build2 (gimple_cond_code (stmt),
+                                                     boolean_type_node,
+                                                     gimple_cond_lhs (stmt),
+                                                     gimple_cond_rhs (stmt)),
+                                             build_all_ones_cst 
+                                             build_zero_cst (ptr_type_node));
+             gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+             gimple_cond_set_code (stmt, NE_EXPR);
+             gimple_cond_set_lhs (stmt, tem_mask);
+             gimple_cond_set_rhs (stmt, build_zero_cst (ptr_type_node));
+             update_stmt (stmt);
+             /* On the false edge
+                  mask = mask & ~tem_mask_3;  */
+             gimple_seq tems = NULL;
+             tree tem_mask2 = make_ssa_name (ptr_type_node);
+             new_stmt = gimple_build_assign (tem_mask2, BIT_NOT_EXPR,
+                                             tem_mask);
+             gimple_seq_add_stmt_without_update (&tems, new_stmt);
+             new_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
+                                             mask, tem_mask2);
+             gimple_seq_add_stmt_without_update (&tems, new_stmt);
+             gsi_insert_seq_on_edge (falsee, tems);
+             /* On the true edge
+                  mask = mask & tem_mask_3;  */
+             new_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
+                                             mask, tem_mask);
+             gsi_insert_on_edge (truee, new_stmt);
+           }
+       }
+    }
+  gsi_commit_edge_inserts ();
+  return 0;
+} // anon namespace
+gimple_opt_pass *
+make_pass_spectrev1 (gcc::context *ctxt)
+  return new pass_spectrev1 (ctxt);
diff --git a/gcc/params.def b/gcc/params.def
index 6f98fccd291..19f7dbf4dad 100644
--- a/gcc/params.def
+++ b/gcc/params.def
         " loops.",
         100, 0, 0)
+        "spectre-v1-max-instrument-indices",
+        "Maximum number of indices to instrument before instrumenting the 
whole address.",
+        1, 0, 0)
 Local variables:
diff --git a/gcc/passes.def b/gcc/passes.def
index 144df4fa417..2fe0cdcfa7e 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -400,6 +400,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_lower_resx);
   NEXT_PASS (pass_nrv);
   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
+  NEXT_PASS (pass_spectrev1);
   NEXT_PASS (pass_warn_function_noreturn);
   NEXT_PASS (pass_gen_hsail);
diff --git a/gcc/testsuite/gcc.dg/Wspectre-v1-1.c 
new file mode 100644
index 00000000000..3ac647e72fd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-Wspectre-v1" } */
+unsigned char a[1024];
+int b[256];
+int foo (int i, int bound)
+  if (i < bound)
+    return b[a[i]];  /* { dg-warning "spectrev1" } */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 9f9d85fdbc3..f5c164f465f 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -625,6 +625,7 @@ extern gimple_opt_pass *make_pass_local_fn_summary 
(gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_spectrev1 (gcc::context *ctxt);
 /* Current optimization pass.  */
 extern opt_pass *current_pass;

Reply via email to