On Wed, Sep 17, 2014 at 10:11 AM, Ilya Enkovich <[email protected]> wrote:
> On 16 Sep 12:02, Uros Bizjak wrote:
>>
>> Hm, can this patch be compiled as part of the series? The expanders
>> refer to various gen_bnd patterns that I don't see. Also, I don't see
>> BND mode introduced.
>
> Hi,
>
> Here is a patch from the series that introduces modes and instructions:
> https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00880.html. It needs update in
> bndldx expander as you suggested.
>
>>
>> Anyway, some general observations:
>>
>> > + case IX86_BUILTIN_BNDLDX:
>> > + arg0 = CALL_EXPR_ARG (exp, 0);
>> > + arg1 = CALL_EXPR_ARG (exp, 1);
>> > +
>> > + op0 = expand_normal (arg0);
>> > + op1 = expand_normal (arg1);
>> > +
>> > + op0 = force_reg (Pmode, op0);
>> > + op1 = force_reg (Pmode, op1);
>> > +
>> > + /* Avoid registers which connot be used as index. */
>> > + if (!index_register_operand (op1, Pmode))
>> > + {
>> > + rtx temp = gen_reg_rtx (Pmode);
>> > + emit_move_insn (temp, op1);
>> > + op1 = temp;
>> > + }
>> > +
>> > + /* If op1 was a register originally then it may have
>> > + mode other than Pmode. We need to extend in such
>> > + case because bndldx may work only with Pmode regs. */
>> > + if (GET_MODE (op1) != Pmode)
>> > + op1 = ix86_zero_extend_to_Pmode (op1);
>> > +
>> > + if (REG_P (target))
>> > + emit_insn (TARGET_64BIT
>> > + ? gen_bnd64_ldx (target, op0, op1)
>> > + : gen_bnd32_ldx (target, op0, op1));
>> > + else
>> > + {
>> > + rtx temp = gen_reg_rtx (BNDmode);
>> > + emit_insn (TARGET_64BIT
>> > + ? gen_bnd64_ldx (temp, op0, op1)
>> > + : gen_bnd32_ldx (temp, op0, op1));
>> > + emit_move_insn (target, temp);
>> > + }
>> > + return target;
>>
>> I don't like the way arguments are prepared. For the case above,
>> bnd_ldx should have index_register_operand predicate in its pattern,
>> and this predicate (and its mode) should be checked in the expander
>> code. There are many examples of argument expansion in
>> ix86_expand_builtin function, including how Pmode is handled.
>>
>> Also, please see how target is handled there. Target can be null, so
>> REG_P predicate will crash.
>>
>> You should also select insn patterns depending on BNDmode, not TARGET_64BIT.
>>
>> Please use assign_386_stack_local so stack slots can be shared.
>> SLOT_TEMP is intended for short-lived temporaries, you can introduce
>> new slots if you need more live values at once.
>>
>> Uros.
>
> Thanks for comments! Here is a new version in which I addressed all your
> concerns.
Unfortunately, it doesn't. The patch only fixed one instance w.r.t to
target handling, the one I referred as an example. You still have
unchecked target, at least in IX86_BUILTIN_BNDMK.
However, you have a general problems in your builtin expansion code,
so please look at how other builtins are handled. E.g.:
if (optimize || !target
|| GET_MODE (target) != tmode
|| !register_operand(target, tmode))
target = gen_reg_rtx (tmode);
also, here is an example how input operands are prepared:
op0 = expand_normal (arg0);
op1 = expand_normal (arg1);
op2 = expand_normal (arg2);
if (!register_operand (op0, Pmode))
op0 = ix86_zero_extend_to_Pmode (op0);
if (!register_operand (op1, SImode))
op1 = copy_to_mode_reg (SImode, op1);
if (!register_operand (op2, SImode))
op2 = copy_to_mode_reg (SImode, op2);
So, Pmode is handled in a special way, even when x32 is not considered.
BTW: I wonder if word_mode is needed here, Pmode can be SImode with
address prefix (x32).
Inside the expanders, please use expand_simple_binop and expand_unop
on RTX, not tree expressions. Again, please see many examples.
Please use emit_move_insn instead of
emit_insn (gen_move_insn (m1, op1));
As a wish, can you perhaps write a generic cmove expander to be also
used in other places?
> +static void
> +ix86_emit_move_max (rtx dst, rtx src)
Uros.
> Ilya
> --
> 2014-09-17 Ilya Enkovich <[email protected]>
>
> * config/i386/i386-builtin-types.def (BND): New.
> (ULONG): New.
> (BND_FTYPE_PCVOID_ULONG): New.
> (VOID_FTYPE_BND_PCVOID): New.
> (VOID_FTYPE_PCVOID_PCVOID_BND): New.
> (BND_FTYPE_PCVOID_PCVOID): New.
> (BND_FTYPE_PCVOID): New.
> (BND_FTYPE_BND_BND): New.
> (PVOID_FTYPE_PVOID_PVOID_ULONG): New.
> (PVOID_FTYPE_PCVOID_BND_ULONG): New.
> (ULONG_FTYPE_VOID): New.
> (PVOID_FTYPE_BND): New.
> * config/i386/i386.c: Include tree-chkp.h, rtl-chkp.h.
> (ix86_builtins): Add
> IX86_BUILTIN_BNDMK, IX86_BUILTIN_BNDSTX,
> IX86_BUILTIN_BNDLDX, IX86_BUILTIN_BNDCL,
> IX86_BUILTIN_BNDCU, IX86_BUILTIN_BNDRET,
> IX86_BUILTIN_BNDNARROW, IX86_BUILTIN_BNDINT,
> IX86_BUILTIN_SIZEOF, IX86_BUILTIN_BNDLOWER,
> IX86_BUILTIN_BNDUPPER.
> (builtin_isa): Add leaf_p and nothrow_p fields.
> (def_builtin): Initialize leaf_p and nothrow_p.
> (ix86_add_new_builtins): Handle leaf_p and nothrow_p
> flags.
> (bdesc_mpx): New.
> (bdesc_mpx_const): New.
> (ix86_init_mpx_builtins): New.
> (ix86_init_builtins): Call ix86_init_mpx_builtins.
> (ix86_emit_move_max): New.
> (ix86_expand_builtin): Expand IX86_BUILTIN_BNDMK,
> IX86_BUILTIN_BNDSTX, IX86_BUILTIN_BNDLDX,
> IX86_BUILTIN_BNDCL, IX86_BUILTIN_BNDCU,
> IX86_BUILTIN_BNDRET, IX86_BUILTIN_BNDNARROW,
> IX86_BUILTIN_BNDINT, IX86_BUILTIN_SIZEOF,
> IX86_BUILTIN_BNDLOWER, IX86_BUILTIN_BNDUPPER.
> * config/i386/i386.h (ix86_stack_slot): Added SLOT_BND_STORED.
>
> diff --git a/gcc/config/i386/i386-builtin-types.def
> b/gcc/config/i386/i386-builtin-types.def
> index 35c0035..989297a 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -47,6 +47,7 @@ DEF_PRIMITIVE_TYPE (UCHAR, unsigned_char_type_node)
> DEF_PRIMITIVE_TYPE (QI, char_type_node)
> DEF_PRIMITIVE_TYPE (HI, intHI_type_node)
> DEF_PRIMITIVE_TYPE (SI, intSI_type_node)
> +DEF_PRIMITIVE_TYPE (BND, pointer_bounds_type_node)
> # ??? Logically this should be intDI_type_node, but that maps to "long"
> # with 64-bit, and that's not how the emmintrin.h is written. Again,
> # changing this would change name mangling.
> @@ -60,6 +61,7 @@ DEF_PRIMITIVE_TYPE (USHORT, short_unsigned_type_node)
> DEF_PRIMITIVE_TYPE (INT, integer_type_node)
> DEF_PRIMITIVE_TYPE (UINT, unsigned_type_node)
> DEF_PRIMITIVE_TYPE (UNSIGNED, unsigned_type_node)
> +DEF_PRIMITIVE_TYPE (ULONG, long_unsigned_type_node)
> DEF_PRIMITIVE_TYPE (LONGLONG, long_long_integer_type_node)
> DEF_PRIMITIVE_TYPE (ULONGLONG, long_long_unsigned_type_node)
> DEF_PRIMITIVE_TYPE (UINT8, unsigned_char_type_node)
> @@ -806,3 +808,15 @@ DEF_FUNCTION_TYPE_ALIAS (V2DI_FTYPE_V2DI_V2DI, TF)
> DEF_FUNCTION_TYPE_ALIAS (V4SF_FTYPE_V4SF_V4SF, TF)
> DEF_FUNCTION_TYPE_ALIAS (V4SI_FTYPE_V4SI_V4SI, TF)
> DEF_FUNCTION_TYPE_ALIAS (V8HI_FTYPE_V8HI_V8HI, TF)
> +
> +# MPX builtins
> +DEF_FUNCTION_TYPE (BND, PCVOID, ULONG)
> +DEF_FUNCTION_TYPE (VOID, PCVOID, BND)
> +DEF_FUNCTION_TYPE (VOID, PCVOID, BND, PCVOID)
> +DEF_FUNCTION_TYPE (BND, PCVOID, PCVOID)
> +DEF_FUNCTION_TYPE (BND, PCVOID)
> +DEF_FUNCTION_TYPE (BND, BND, BND)
> +DEF_FUNCTION_TYPE (PVOID, PVOID, PVOID, ULONG)
> +DEF_FUNCTION_TYPE (PVOID, PCVOID, BND, ULONG)
> +DEF_FUNCTION_TYPE (ULONG, VOID)
> +DEF_FUNCTION_TYPE (PVOID, BND)
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index d0f58b1..a076450 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -85,6 +85,8 @@ along with GCC; see the file COPYING3. If not see
> #include "tree-vectorizer.h"
> #include "shrink-wrap.h"
> #include "builtins.h"
> +#include "tree-chkp.h"
> +#include "rtl-chkp.h"
>
> static rtx legitimize_dllimport_symbol (rtx, bool);
> static rtx legitimize_pe_coff_extern_decl (rtx, bool);
> @@ -28775,6 +28777,19 @@ enum ix86_builtins
> IX86_BUILTIN_XABORT,
> IX86_BUILTIN_XTEST,
>
> + /* MPX */
> + IX86_BUILTIN_BNDMK,
> + IX86_BUILTIN_BNDSTX,
> + IX86_BUILTIN_BNDLDX,
> + IX86_BUILTIN_BNDCL,
> + IX86_BUILTIN_BNDCU,
> + IX86_BUILTIN_BNDRET,
> + IX86_BUILTIN_BNDNARROW,
> + IX86_BUILTIN_BNDINT,
> + IX86_BUILTIN_SIZEOF,
> + IX86_BUILTIN_BNDLOWER,
> + IX86_BUILTIN_BNDUPPER,
> +
> /* BMI instructions. */
> IX86_BUILTIN_BEXTR32,
> IX86_BUILTIN_BEXTR64,
> @@ -28848,6 +28863,8 @@ struct builtin_isa {
> enum ix86_builtin_func_type tcode; /* type to use in the declaration */
> HOST_WIDE_INT isa; /* isa_flags this builtin is defined for */
> bool const_p; /* true if the declaration is
> constant */
> + bool leaf_p; /* true if the declaration has leaf attribute
> */
> + bool nothrow_p; /* true if the declaration has nothrow
> attribute */
> bool set_and_not_built_p;
> };
>
> @@ -28899,6 +28916,8 @@ def_builtin (HOST_WIDE_INT mask, const char *name,
> ix86_builtins[(int) code] = NULL_TREE;
> ix86_builtins_isa[(int) code].tcode = tcode;
> ix86_builtins_isa[(int) code].name = name;
> + ix86_builtins_isa[(int) code].leaf_p = false;
> + ix86_builtins_isa[(int) code].nothrow_p = false;
> ix86_builtins_isa[(int) code].const_p = false;
> ix86_builtins_isa[(int) code].set_and_not_built_p = true;
> }
> @@ -28949,6 +28968,11 @@ ix86_add_new_builtins (HOST_WIDE_INT isa)
> ix86_builtins[i] = decl;
> if (ix86_builtins_isa[i].const_p)
> TREE_READONLY (decl) = 1;
> + if (ix86_builtins_isa[i].leaf_p)
> + DECL_ATTRIBUTES (decl) = build_tree_list (get_identifier ("leaf"),
> + NULL_TREE);
> + if (ix86_builtins_isa[i].nothrow_p)
> + TREE_NOTHROW (decl) = 1;
> }
> }
> }
> @@ -30402,6 +30426,27 @@ static const struct builtin_description
> bdesc_round_args[] =
> { OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_vmrsqrt28v4sf_round,
> "__builtin_ia32_rsqrt28ss_round", IX86_BUILTIN_RSQRT28SS, UNKNOWN, (int)
> V4SF_FTYPE_V4SF_V4SF_INT },
> };
>
> +/* Bultins for MPX. */
> +static const struct builtin_description bdesc_mpx[] =
> +{
> + { OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndstx",
> IX86_BUILTIN_BNDSTX, UNKNOWN, (int) VOID_FTYPE_PCVOID_BND_PCVOID },
> + { OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndcl",
> IX86_BUILTIN_BNDCL, UNKNOWN, (int) VOID_FTYPE_PCVOID_BND },
> + { OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndcu",
> IX86_BUILTIN_BNDCU, UNKNOWN, (int) VOID_FTYPE_PCVOID_BND },
> +};
> +
> +/* Const builtins for MPX. */
> +static const struct builtin_description bdesc_mpx_const[] =
> +{
> + { OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndmk",
> IX86_BUILTIN_BNDMK, UNKNOWN, (int) BND_FTYPE_PCVOID_ULONG },
> + { OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndldx",
> IX86_BUILTIN_BNDLDX, UNKNOWN, (int) BND_FTYPE_PCVOID_PCVOID },
> + { OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_narrow_bounds",
> IX86_BUILTIN_BNDNARROW, UNKNOWN, (int) PVOID_FTYPE_PCVOID_BND_ULONG },
> + { OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndint",
> IX86_BUILTIN_BNDINT, UNKNOWN, (int) BND_FTYPE_BND_BND },
> + { OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_sizeof",
> IX86_BUILTIN_SIZEOF, UNKNOWN, (int) ULONG_FTYPE_VOID },
> + { OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndlower",
> IX86_BUILTIN_BNDLOWER, UNKNOWN, (int) PVOID_FTYPE_BND },
> + { OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndupper",
> IX86_BUILTIN_BNDUPPER, UNKNOWN, (int) PVOID_FTYPE_BND },
> + { OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndret",
> IX86_BUILTIN_BNDRET, UNKNOWN, (int) BND_FTYPE_PCVOID },
> +};
> +
> /* FMA4 and XOP. */
> #define MULTI_ARG_4_DF2_DI_I V2DF_FTYPE_V2DF_V2DF_V2DI_INT
> #define MULTI_ARG_4_DF2_DI_I1 V4DF_FTYPE_V4DF_V4DF_V4DI_INT
> @@ -31250,6 +31295,67 @@ ix86_init_mmx_sse_builtins (void)
> }
> }
>
> +static void
> +ix86_init_mpx_builtins ()
> +{
> + const struct builtin_description * d;
> + enum ix86_builtin_func_type ftype;
> + tree decl;
> + size_t i;
> +
> + for (i = 0, d = bdesc_mpx;
> + i < ARRAY_SIZE (bdesc_mpx);
> + i++, d++)
> + {
> + if (d->name == 0)
> + continue;
> +
> + ftype = (enum ix86_builtin_func_type) d->flag;
> + decl = def_builtin (d->mask, d->name, ftype, d->code);
> +
> + /* With no leaf and nothrow flags for MPX builtins
> + abnormal edges may follow its call when setjmp
> + presents in the function. Since we may have a lot
> + of MPX builtins calls it causes lots of useless
> + edges and enormous PHI nodes. To avoid this we mark
> + MPX builtins as leaf and nothrow. */
> + if (decl)
> + {
> + DECL_ATTRIBUTES (decl) = build_tree_list (get_identifier ("leaf"),
> + NULL_TREE);
> + TREE_NOTHROW (decl) = 1;
> + }
> + else
> + {
> + ix86_builtins_isa[(int)d->code].leaf_p = true;
> + ix86_builtins_isa[(int)d->code].nothrow_p = true;
> + }
> + }
> +
> + for (i = 0, d = bdesc_mpx_const;
> + i < ARRAY_SIZE (bdesc_mpx_const);
> + i++, d++)
> + {
> + if (d->name == 0)
> + continue;
> +
> + ftype = (enum ix86_builtin_func_type) d->flag;
> + decl = def_builtin_const (d->mask, d->name, ftype, d->code);
> +
> + if (decl)
> + {
> + DECL_ATTRIBUTES (decl) = build_tree_list (get_identifier ("leaf"),
> + NULL_TREE);
> + TREE_NOTHROW (decl) = 1;
> + }
> + else
> + {
> + ix86_builtins_isa[(int)d->code].leaf_p = true;
> + ix86_builtins_isa[(int)d->code].nothrow_p = true;
> + }
> + }
> +}
> +
> /* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
> to return a pointer to VERSION_DECL if the outcome of the expression
> formed by PREDICATE_CHAIN is true. This function will be called during
> @@ -32788,6 +32894,7 @@ ix86_init_builtins (void)
>
> ix86_init_tm_builtins ();
> ix86_init_mmx_sse_builtins ();
> + ix86_init_mpx_builtins ();
>
> if (TARGET_LP64)
> ix86_init_builtins_va_builtins_abi ();
> @@ -35053,6 +35160,29 @@ ix86_expand_vec_set_builtin (tree exp)
> return target;
> }
>
> +/* Choose max of DST and SRC and put it to DST. */
> +static void
> +ix86_emit_move_max (rtx dst, rtx src)
> +{
> + rtx t;
> +
> + if (TARGET_CMOVE)
> + {
> + t = ix86_expand_compare (LTU, dst, src);
> + emit_insn (gen_rtx_SET (VOIDmode, dst,
> + gen_rtx_IF_THEN_ELSE (GET_MODE (dst), t,
> + src, dst)));
> + }
> + else
> + {
> + rtx nomove = gen_label_rtx ();
> + emit_cmp_and_jump_insns (dst, src, GEU, const0_rtx,
> + GET_MODE (dst), 1, nomove);
> + emit_insn (gen_rtx_SET (VOIDmode, dst, src));
> + emit_label (nomove);
> + }
> +}
> +
> /* Expand an expression EXP that calls a built-in function,
> with result going to TARGET if that's convenient
> (and in mode MODE if that's convenient).
> @@ -35118,6 +35248,311 @@ ix86_expand_builtin (tree exp, rtx target, rtx
> subtarget,
>
> switch (fcode)
> {
> + case IX86_BUILTIN_BNDMK:
> + arg0 = CALL_EXPR_ARG (exp, 0);
> + arg1 = CALL_EXPR_ARG (exp, 1);
> +
> + /* Builtin arg1 is size of block but instruction op1 should
> + be (size - 1). */
> + op0 = expand_normal (arg0);
> + op1 = expand_normal (fold_build2 (PLUS_EXPR, TREE_TYPE (arg1),
> + arg1, integer_minus_one_node));
> + op0 = force_reg (Pmode, op0);
> + op1 = force_reg (Pmode, op1);
> +
> + emit_insn (BNDmode == BND64mode
> + ? gen_bnd64_mk (target, op0, op1)
> + : gen_bnd32_mk (target, op0, op1));
> + return target;
> +
> + case IX86_BUILTIN_BNDSTX:
> + arg0 = CALL_EXPR_ARG (exp, 0);
> + arg1 = CALL_EXPR_ARG (exp, 1);
> + arg2 = CALL_EXPR_ARG (exp, 2);
> +
> + op0 = expand_normal (arg0);
> + op1 = expand_normal (arg1);
> + op2 = expand_normal (arg2);
> +
> + op0 = force_reg (Pmode, op0);
> + op1 = force_reg (BNDmode, op1);
> + op2 = force_reg (Pmode, op2);
> +
> + emit_insn (BNDmode == BND64mode
> + ? gen_bnd64_stx (op2, op0, op1)
> + : gen_bnd32_stx (op2, op0, op1));
> + return 0;
> +
> + case IX86_BUILTIN_BNDLDX:
> + if (!target)
> + return 0;
> +
> + arg0 = CALL_EXPR_ARG (exp, 0);
> + arg1 = CALL_EXPR_ARG (exp, 1);
> +
> + op0 = expand_normal (arg0);
> + op1 = expand_normal (arg1);
> +
> + op0 = force_reg (Pmode, op0);
> + op1 = force_reg (Pmode, op1);
> +
> + if (REG_P (target))
> + emit_insn (BNDmode == BND64mode
> + ? gen_bnd64_ldx (target, op0, op1)
> + : gen_bnd32_ldx (target, op0, op1));
> + else
> + {
> + rtx temp = gen_reg_rtx (BNDmode);
> + emit_insn (BNDmode == BND64mode
> + ? gen_bnd64_ldx (temp, op0, op1)
> + : gen_bnd32_ldx (temp, op0, op1));
> + emit_move_insn (target, temp);
> + }
> + return target;
> +
> + case IX86_BUILTIN_BNDCL:
> + arg0 = CALL_EXPR_ARG (exp, 0);
> + arg1 = CALL_EXPR_ARG (exp, 1);
> +
> + op0 = expand_normal (arg0);
> + op1 = expand_normal (arg1);
> +
> + op0 = force_reg (Pmode, op0);
> + op1 = force_reg (BNDmode, op1);
> +
> + emit_insn (BNDmode == BND64mode
> + ? gen_bnd64_cl (op1, op0)
> + : gen_bnd32_cl (op1, op0));
> + return 0;
> +
> + case IX86_BUILTIN_BNDCU:
> + arg0 = CALL_EXPR_ARG (exp, 0);
> + arg1 = CALL_EXPR_ARG (exp, 1);
> +
> + op0 = expand_normal (arg0);
> + op1 = expand_normal (arg1);
> +
> + op0 = force_reg (Pmode, op0);
> + op1 = force_reg (BNDmode, op1);
> +
> + emit_insn (BNDmode == BND64mode
> + ? gen_bnd64_cu (op1, op0)
> + : gen_bnd32_cu (op1, op0));
> + return 0;
> +
> + case IX86_BUILTIN_BNDRET:
> + arg0 = CALL_EXPR_ARG (exp, 0);
> + gcc_assert (TREE_CODE (arg0) == SSA_NAME);
> + target = chkp_get_rtl_bounds (arg0);
> + /* If no bounds were specified for returned value,
> + then use INIT bounds. It usually happens when
> + some built-in function is expanded. */
> + if (!target)
> + {
> + rtx t1 = gen_reg_rtx (Pmode);
> + rtx t2 = gen_reg_rtx (Pmode);
> + target = gen_reg_rtx (BNDmode);
> + emit_move_insn (t1, const0_rtx);
> + emit_move_insn (t2, constm1_rtx);
> + emit_insn (BNDmode == BND64mode
> + ? gen_bnd64_mk (target, t1, t2)
> + : gen_bnd32_mk (target, t1, t2));
> + }
> + gcc_assert (target && REG_P (target));
> + return target;
> +
> + case IX86_BUILTIN_BNDNARROW:
> + {
> + rtx m1, m1h1, m1h2, lb, ub, t1;
> +
> + /* Return value and lb. */
> + arg0 = CALL_EXPR_ARG (exp, 0);
> + /* Bounds. */
> + arg1 = CALL_EXPR_ARG (exp, 1);
> + /* Size. */
> + arg2 = CALL_EXPR_ARG (exp, 2);
> +
> + /* Size was passed but we need to use (size - 1) as for bndmk. */
> + arg2 = fold_build2 (PLUS_EXPR, TREE_TYPE (arg2), arg2,
> + integer_minus_one_node);
> +
> + /* Add LB to size and inverse to get UB. */
> + arg2 = fold_build2 (PLUS_EXPR, TREE_TYPE (arg2), arg2, arg0);
> + arg2 = fold_build1 (BIT_NOT_EXPR, TREE_TYPE (arg2), arg2);
> +
> + op0 = expand_normal (arg0);
> + op1 = expand_normal (arg1);
> + op2 = expand_normal (arg2);
> +
> + lb = force_reg (Pmode, op0);
> + ub = force_reg (Pmode, op2);
> +
> + /* We need to move bounds to memory before any computations. */
> + if (!MEM_P (op1))
> + {
> + m1 = assign_386_stack_local (BNDmode, SLOT_TEMP);
> + emit_insn (gen_move_insn (m1, op1));
> + }
> + else
> + m1 = op1;
> +
> + /* Generate mem expression to be used for access to LB and UB. */
> + m1h1 = gen_rtx_MEM (Pmode, XEXP (m1, 0));
> + m1h2 = gen_rtx_MEM (Pmode, plus_constant (Pmode, XEXP (m1, 0),
> + GET_MODE_SIZE (Pmode)));
> +
> + t1 = gen_reg_rtx (Pmode);
> +
> + /* Compute LB. */
> + emit_move_insn (t1, m1h1);
> + ix86_emit_move_max (t1, lb);
> + emit_move_insn (m1h1, t1);
> +
> +
> + /* Compute UB. UB are stored in 1's complement form. Therefore
> + we also use max here. */
> + emit_move_insn (t1, m1h2);
> + ix86_emit_move_max (t1, ub);
> + emit_move_insn (m1h2, t1);
> +
> + op2 = gen_reg_rtx (BNDmode);
> + emit_move_insn (op2, m1);
> +
> + return chkp_join_splitted_slot (op0, op2);
> + }
> +
> + case IX86_BUILTIN_BNDINT:
> + {
> + unsigned bndsize = GET_MODE_SIZE (BNDmode);
> + unsigned psize = GET_MODE_SIZE (Pmode);
> + rtx res = assign_stack_local (BNDmode, bndsize, 0);
> + rtx m1, m2, m1h1, m1h2, m2h1, m2h2, t1, t2, rh1, rh2;
> +
> + arg0 = CALL_EXPR_ARG (exp, 0);
> + arg1 = CALL_EXPR_ARG (exp, 1);
> +
> + op0 = expand_normal (arg0);
> + op1 = expand_normal (arg1);
> +
> + /* We need to move bounds to memory before any computations. */
> + if (!MEM_P (op0))
> + {
> + m1 = assign_386_stack_local (BNDmode, SLOT_TEMP);
> + emit_move_insn (m1, op0);
> + }
> + else
> + m1 = op0;
> +
> + if (!MEM_P (op1))
> + {
> + m2 = assign_386_stack_local (BNDmode,
> + MEM_P (op0)
> + ? SLOT_TEMP
> + : SLOT_BND_STORED);
> + emit_move_insn (m2, op1);
> + }
> + else
> + m2 = op1;
> +
> + /* Generate mem expression to be used for access to LB and UB. */
> + m1h1 = gen_rtx_MEM (Pmode, XEXP (m1, 0));
> + m1h2 = gen_rtx_MEM (Pmode, plus_constant (Pmode, XEXP (m1, 0),
> psize));
> + m2h1 = gen_rtx_MEM (Pmode, XEXP (m2, 0));
> + m2h2 = gen_rtx_MEM (Pmode, plus_constant (Pmode, XEXP (m2, 0),
> psize));
> + rh1 = gen_rtx_MEM (Pmode, XEXP (res, 0));
> + rh2 = gen_rtx_MEM (Pmode, plus_constant (Pmode, XEXP (res, 0),
> psize));
> +
> + /* Allocate temporaries. */
> + t1 = gen_reg_rtx (Pmode);
> + t2 = gen_reg_rtx (Pmode);
> +
> + /* Compute LB. */
> + emit_move_insn (t1, m1h1);
> + emit_move_insn (t2, m2h1);
> + ix86_emit_move_max (t1, t2);
> + emit_move_insn (rh1, t1);
> +
> + /* Compute UB. UB are stored in 1's complement form. Therefore
> + we also use max here. */
> + emit_move_insn (t1, m1h2);
> + emit_move_insn (t2, m2h2);
> + ix86_emit_move_max (t1, t2);
> + emit_move_insn (rh2, t1);
> +
> + return res;
> + }
> +
> + case IX86_BUILTIN_SIZEOF:
> + {
> + enum machine_mode mode = Pmode;
> + rtx t1, t2;
> +
> + arg0 = CALL_EXPR_ARG (exp, 0);
> + gcc_assert (TREE_CODE (arg0) == VAR_DECL);
> +
> + t1 = gen_reg_rtx (mode);
> + t2 = gen_rtx_SYMBOL_REF (Pmode,
> + IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME
> (arg0)));
> + t2 = gen_rtx_UNSPEC (mode, gen_rtvec (1, t2), UNSPEC_SIZEOF);
> +
> + emit_insn (gen_rtx_SET (VOIDmode, t1, t2));
> +
> + return t1;
> + }
> +
> + case IX86_BUILTIN_BNDLOWER:
> + {
> + rtx mem, hmem;
> +
> + arg0 = CALL_EXPR_ARG (exp, 0);
> + op0 = expand_normal (arg0);
> +
> + /* We need to move bounds to memory first. */
> + if (!MEM_P (op0))
> + {
> + mem = assign_386_stack_local (BNDmode, SLOT_TEMP);
> + emit_move_insn (mem, op0);
> + }
> + else
> + mem = op0;
> +
> + /* Generate mem expression to access LB and load it. */
> + hmem = gen_rtx_MEM (Pmode, XEXP (mem, 0));
> + target = gen_reg_rtx (Pmode);
> + emit_move_insn (target, hmem);
> +
> + return target;
> + }
> +
> + case IX86_BUILTIN_BNDUPPER:
> + {
> + rtx mem, hmem;
> +
> + arg0 = CALL_EXPR_ARG (exp, 0);
> + op0 = expand_normal (arg0);
> +
> + /* We need to move bounds to memory first. */
> + if (!MEM_P (op0))
> + {
> + mem = assign_386_stack_local (BNDmode, SLOT_TEMP);
> + emit_move_insn (mem, op0);
> + }
> + else
> + mem = op0;
> +
> + /* Generate mem expression to access UB and load it. */
> + hmem = gen_rtx_MEM (Pmode,
> + gen_rtx_PLUS (Pmode, XEXP (mem, 0),
> + GEN_INT (GET_MODE_SIZE (Pmode))));
> + target = gen_reg_rtx (Pmode);
> + emit_move_insn (target, hmem);
> +
> + /* We need to inverse all bits of UB. */
> + emit_insn (gen_rtx_SET (Pmode, target, gen_rtx_NOT (Pmode, target)));
> +
> + return target;
> + }
> +
> case IX86_BUILTIN_MASKMOVQ:
> case IX86_BUILTIN_MASKMOVDQU:
> icode = (fcode == IX86_BUILTIN_MASKMOVQ
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index a38c5d1..ededa67 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2317,6 +2317,7 @@ enum ix86_stack_slot
> SLOT_CW_FLOOR,
> SLOT_CW_CEIL,
> SLOT_CW_MASK_PM,
> + SLOT_BND_STORED,
> MAX_386_STACK_LOCALS
> };
>