On Wed, Sep 17, 2014 at 6:31 PM, Ilya Enkovich <enkovich....@gmail.com> wrote:
>> >> I don't like the way arguments are prepared. For the case above, >> >> bnd_ldx should have index_register_operand predicate in its pattern, >> >> and this predicate (and its mode) should be checked in the expander >> >> code. There are many examples of argument expansion in >> >> ix86_expand_builtin function, including how Pmode is handled. >> >> >> >> Also, please see how target is handled there. Target can be null, so >> >> REG_P predicate will crash. >> >> >> >> You should also select insn patterns depending on BNDmode, not >> >> TARGET_64BIT. >> >> >> >> Please use assign_386_stack_local so stack slots can be shared. >> >> SLOT_TEMP is intended for short-lived temporaries, you can introduce >> >> new slots if you need more live values at once. >> >> >> >> Uros. >> > >> > Thanks for comments! Here is a new version in which I addressed all your >> > concerns. >> >> Unfortunately, it doesn't. The patch only fixed one instance w.r.t to >> target handling, the one I referred as an example. You still have >> unchecked target, at least in IX86_BUILTIN_BNDMK. >> >> However, you have a general problems in your builtin expansion code, >> so please look at how other builtins are handled. E.g.: >> >> if (optimize || !target >> || GET_MODE (target) != tmode >> || !register_operand(target, tmode)) >> target = gen_reg_rtx (tmode); >> >> also, here is an example how input operands are prepared: >> >> op0 = expand_normal (arg0); >> op1 = expand_normal (arg1); >> op2 = expand_normal (arg2); >> if (!register_operand (op0, Pmode)) >> op0 = ix86_zero_extend_to_Pmode (op0); >> if (!register_operand (op1, SImode)) >> op1 = copy_to_mode_reg (SImode, op1); >> if (!register_operand (op2, SImode)) >> op2 = copy_to_mode_reg (SImode, op2); >> >> So, Pmode is handled in a special way, even when x32 is not considered. >> >> BTW: I wonder if word_mode is needed here, Pmode can be SImode with >> address prefix (x32). >> >> Inside the expanders, please use expand_simple_binop and expand_unop >> on RTX, not tree expressions. Again, please see many examples. > > Thank you for additional explanations. Hope this time I answer your concerns > correctly :) Yes, this version is MUCH better. There are further comments down the code. > 2014-09-17 Ilya Enkovich <ilya.enkov...@intel.com> > > * config/i386/i386-builtin-types.def (BND): New. > (ULONG): New. > (BND_FTYPE_PCVOID_ULONG): New. > (VOID_FTYPE_BND_PCVOID): New. > (VOID_FTYPE_PCVOID_PCVOID_BND): New. > (BND_FTYPE_PCVOID_PCVOID): New. > (BND_FTYPE_PCVOID): New. > (BND_FTYPE_BND_BND): New. > (PVOID_FTYPE_PVOID_PVOID_ULONG): New. > (PVOID_FTYPE_PCVOID_BND_ULONG): New. > (ULONG_FTYPE_VOID): New. > (PVOID_FTYPE_BND): New. > * config/i386/i386.c: Include tree-chkp.h, rtl-chkp.h. > (ix86_builtins): Add > IX86_BUILTIN_BNDMK, IX86_BUILTIN_BNDSTX, > IX86_BUILTIN_BNDLDX, IX86_BUILTIN_BNDCL, > IX86_BUILTIN_BNDCU, IX86_BUILTIN_BNDRET, > IX86_BUILTIN_BNDNARROW, IX86_BUILTIN_BNDINT, > IX86_BUILTIN_SIZEOF, IX86_BUILTIN_BNDLOWER, > IX86_BUILTIN_BNDUPPER. > (builtin_isa): Add leaf_p and nothrow_p fields. > (def_builtin): Initialize leaf_p and nothrow_p. > (ix86_add_new_builtins): Handle leaf_p and nothrow_p > flags. > (bdesc_mpx): New. > (bdesc_mpx_const): New. > (ix86_init_mpx_builtins): New. > (ix86_init_builtins): Call ix86_init_mpx_builtins. > (ix86_emit_cmove): New. > (ix86_emit_move_max): New. > (ix86_expand_builtin): Expand IX86_BUILTIN_BNDMK, > IX86_BUILTIN_BNDSTX, IX86_BUILTIN_BNDLDX, > IX86_BUILTIN_BNDCL, IX86_BUILTIN_BNDCU, > IX86_BUILTIN_BNDRET, IX86_BUILTIN_BNDNARROW, > IX86_BUILTIN_BNDINT, IX86_BUILTIN_SIZEOF, > IX86_BUILTIN_BNDLOWER, IX86_BUILTIN_BNDUPPER. > * config/i386/i386.h (ix86_stack_slot): Added SLOT_BND_STORED. .. > + /* We need to move bounds to memory before any computations. */ > + if (!MEM_P (op1)) > + { > + m1 = assign_386_stack_local (BNDmode, SLOT_TEMP); > + emit_move_insn (m1, op1); > + } > + else > + m1 = op1; No negative conditions, please. Just swap the arms of if sentence. It is much more readable. > + > + /* Generate mem expression to be used for access to LB and UB. */ > + m1h1 = gen_rtx_MEM (Pmode, XEXP (m1, 0)); > + m1h2 = gen_rtx_MEM (Pmode, plus_constant (Pmode, XEXP (m1, 0), > + GET_MODE_SIZE (Pmode))); Please use adjust_address instead of manually producing MEMs. > + > + t1 = gen_reg_rtx (Pmode); > + > + /* Compute LB. */ > + emit_move_insn (t1, m1h1); > + ix86_emit_move_max (t1, lb); > + emit_move_insn (m1h1, t1); > + > + /* Compute UB. UB is stored in 1's complement form. Therefore > + we also use max here. */ > + emit_move_insn (t1, m1h2); > + ix86_emit_move_max (t1, ub); > + emit_move_insn (m1h2, t1); > + > + op2 = gen_reg_rtx (BNDmode); > + emit_move_insn (op2, m1); > + > + return chkp_join_splitted_slot (lb, op2); > + } > + > + case IX86_BUILTIN_BNDINT: The handling in this builtin looks strange. I suggest to do it in a serial way, like this: if (MEM_P (op0) m0 = op0; else { m0 = stack (SLOT_TEMP) move (op0, m0) } move parts of m0 to temporaries. if (MEM_P (op1) m1 = op1; else { m1 = stack (SLOT_TEMP) move (op1, m1) } move parts of m1 to another temporaries. Process temporaries. res = stack (SLOT_BND_STORED) move calculated stuff to res. This will ensure that nobody will clobber your stack slot with the result, assuming consumer will soon use it. SLOT_TEMP is a short-lived slot, and can be reused. > + { > + unsigned bndsize = GET_MODE_SIZE (BNDmode); > + unsigned psize = GET_MODE_SIZE (Pmode); > + rtx res, m1, m2, m1h1, m1h2, m2h1, m2h2, t1, t2, rh1, rh2; > + > + arg0 = CALL_EXPR_ARG (exp, 0); > + arg1 = CALL_EXPR_ARG (exp, 1); > + > + op0 = expand_normal (arg0); > + op1 = expand_normal (arg1); > + > + /* We need to move bounds to memory before any computations. */ > + if (!MEM_P (op0)) > + { > + m1 = assign_386_stack_local (BNDmode, SLOT_TEMP); > + emit_move_insn (m1, op0); > + } > + else > + m1 = op0; > + > + if (!MEM_P (op1)) > + { > + m2 = assign_386_stack_local (BNDmode, > + MEM_P (op0) > + ? SLOT_TEMP > + : SLOT_BND_STORED); > + emit_move_insn (m2, op1); > + } > + else > + m2 = op1; > + > + if (!MEM_P (op0)) > + res = m1; > + else if (!MEM_P (op1)) > + res = m2; > + else > + res = assign_386_stack_local (BNDmode, SLOT_TEMP); > + /* Generate mem expression to be used for access to LB and UB. */ > + m1h1 = gen_rtx_MEM (Pmode, XEXP (m1, 0)); > + m1h2 = gen_rtx_MEM (Pmode, plus_constant (Pmode, XEXP (m1, 0), > psize)); > + m2h1 = gen_rtx_MEM (Pmode, XEXP (m2, 0)); > + m2h2 = gen_rtx_MEM (Pmode, plus_constant (Pmode, XEXP (m2, 0), > psize)); > + rh1 = gen_rtx_MEM (Pmode, XEXP (res, 0)); > + rh2 = gen_rtx_MEM (Pmode, plus_constant (Pmode, XEXP (res, 0), > psize)); Please use adjust_address here. > + > + /* Allocate temporaries. */ > + t1 = gen_reg_rtx (Pmode); > + t2 = gen_reg_rtx (Pmode); > + > + /* Compute LB. */ > + emit_move_insn (t1, m1h1); > + emit_move_insn (t2, m2h1); > + ix86_emit_move_max (t1, t2); > + emit_move_insn (rh1, t1); > + > + /* Compute UB. UB is stored in 1's complement form. Therefore > + we also use max here. */ > + emit_move_insn (t1, m1h2); > + emit_move_insn (t2, m2h2); > + ix86_emit_move_max (t1, t2); > + emit_move_insn (rh2, t1); > + > + return res; > + } > + > + case IX86_BUILTIN_SIZEOF: > + { > + enum machine_mode mode = Pmode; No need for the above temporary... > + tree name; > + rtx temp; > + > + if (!target > + || GET_MODE (target) != Pmode > + || !register_operand (target, Pmode)) > + target = gen_reg_rtx (Pmode); > + > + arg0 = CALL_EXPR_ARG (exp, 0); > + gcc_assert (TREE_CODE (arg0) == VAR_DECL); > + > + name = DECL_ASSEMBLER_NAME (arg0); > + temp = gen_rtx_SYMBOL_REF (Pmode, IDENTIFIER_POINTER (name)); > + temp = gen_rtx_UNSPEC (mode, gen_rtvec (1, temp), UNSPEC_SIZEOF); Call expander directly, use target as an operand 0. You won't need move below. > + emit_move_insn (target, temp); > + > + return target; > + } > + > + case IX86_BUILTIN_BNDLOWER: > + { > + rtx mem, hmem; > + > + if (!target > + || GET_MODE (target) != Pmode > + || !register_operand (target, Pmode)) > + target = gen_reg_rtx (Pmode); > + > + arg0 = CALL_EXPR_ARG (exp, 0); > + op0 = expand_normal (arg0); > + > + /* We need to move bounds to memory first. */ > + if (!MEM_P (op0)) > + { > + mem = assign_386_stack_local (BNDmode, SLOT_TEMP); > + emit_move_insn (mem, op0); > + } > + else > + mem = op0; No negative conditions. > + /* Generate mem expression to access LB and load it. */ > + hmem = gen_rtx_MEM (Pmode, XEXP (mem, 0)); adjust_address again. > + emit_move_insn (target, hmem); > + > + return target; > + } > + > + case IX86_BUILTIN_BNDUPPER: > + { > + rtx mem, hmem; > + > + if (!target > + || GET_MODE (target) != Pmode > + || !register_operand (target, Pmode)) > + target = gen_reg_rtx (Pmode); > + > + arg0 = CALL_EXPR_ARG (exp, 0); > + op0 = expand_normal (arg0); > + > + /* We need to move bounds to memory first. */ > + if (!MEM_P (op0)) > + { > + mem = assign_386_stack_local (BNDmode, SLOT_TEMP); > + emit_move_insn (mem, op0); > + } > + else > + mem = op0; No negative conditions. > + /* Generate mem expression to access UB and load it. */ > + hmem = gen_rtx_MEM (Pmode, > + gen_rtx_PLUS (Pmode, XEXP (mem, 0), > + GEN_INT (GET_MODE_SIZE (Pmode)))); adjust_address again. > + emit_move_insn (target, hmem); > + > + /* We need to inverse all bits of UB. */ > + emit_move_insn (target, gen_rtx_NOT (Pmode, target)); Use emit_simple_unop here. > + > + return target; > + } > + > case IX86_BUILTIN_MASKMOVQ: > case IX86_BUILTIN_MASKMOVDQU: > icode = (fcode == IX86_BUILTIN_MASKMOVQ > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index a38c5d1..ededa67 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -2317,6 +2317,7 @@ enum ix86_stack_slot > SLOT_CW_FLOOR, > SLOT_CW_CEIL, > SLOT_CW_MASK_PM, > + SLOT_BND_STORED, > MAX_386_STACK_LOCALS > }; > Uros.