On Fri, 6 Jan 2012, Martin Jambor wrote:
> Hi,
>
> I'm trying to teach our expander how to deal with misaligned MEM_REFs
> on strict alignment targets. We currently generate code which leads
> to bus error signals due to misaligned accesses.
>
> I admit my motivation is not any target in particular but simply being
> able to produce misaligned MEM_REFs in SRA, currently we work-around
> that by producing COMPONENT_REFs which causes quite a few headaches.
> Nevertheless, I started by following Richi's advice and set out to fix
> the following two simple testcases on a strict-alignment platform, a
> sparc64 in the compile farm. If I understood him correctly, Richi
> claimed they have been failing "since forever:"
>
> ----- test case 1: -----
>
> extern void abort ();
>
> typedef unsigned int myint __attribute__((aligned(1)));
>
> /* even without the attributes we get bus error */
> unsigned int __attribute__((noinline, noclone))
> foo (myint *p)
> {
> return *p;
> }
>
> struct blah
> {
> char c;
> myint i;
> };
>
> struct blah g;
>
> #define cst 0xdeadbeef
>
> int
> main (int argc, char **argv)
> {
> int i;
> g.i = cst;
> i = foo (&g.i);
> if (i != cst)
> abort ();
> return 0;
> }
>
> ----- test case 2: -----
>
> extern void abort ();
>
> typedef unsigned int myint __attribute__((aligned(1)));
>
> void __attribute__((noinline, noclone))
> foo (myint *p, unsigned int i)
> {
> *p = i;
> }
>
> struct blah
> {
> char c;
> myint i;
> };
>
> struct blah g;
>
> #define cst 0xdeadbeef
>
> int
> main (int argc, char **argv)
> {
> foo (&g.i, cst);
> if (g.i != cst)
> abort ();
> return 0;
> }
>
> ------------------------
>
> I dug in expr.c and found two places which handle misaligned MEM_REfs
> loads and stores respectively but only if there is a special
> movmisalign_optab operation available for the given mode. My approach
> therefore was to append calls to extract_bit_field and store_bit_field
> which seem to be the part of expander capable of dealing with
> misaligned memory accesses. The patch is below, it fixes both
> testcases on sparc64--linux-gnu.
>
> Is this approach generally the right thing to do? And of course,
> since my knowledge of RTL and expander is very limited I expect that
> there will by quite many suggestions about its various particular
> aspects. I have run the c and c++ testsuite with the second hunk in
> place without any problems, the same test of the whole patch is under
> way right now but it takes quite a lot of time, therefore most
> probably I won't have the results today. Of course I plan to do a
> bootstrap and at least Fortran checking on this platform too but that
> is really going to take some time and I'd like to hear any comments
> before that.
The idea is good (well, I suppose I suggested it ... ;))
> One more question: I'd like to be able to handle misaligned loads of
> stores of SSE vectors this way too but then of course I cannot use
> STRICT_ALIGNMENT as the guard but need a more elaborate predicate. I
> assume it must already exist, which one is it?
There is none :/ STRICT_ALIGNMENT would need to get a mode argument,
but reality is that non-STRICT_ALIGNMENT targets at the moment
need to have their movmisalign optab trigger for all cases that
will not work when misaligned.
> --- 4572,4653 ----
> || TREE_CODE (to) == TARGET_MEM_REF)
> && mode != BLKmode
> && ((align = get_object_or_type_alignment (to))
> ! < GET_MODE_ALIGNMENT (mode)))
> {
> enum machine_mode address_mode;
> ! rtx reg;
>
> reg = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> reg = force_not_mem (reg);
>
> ! if ((icode = optab_handler (movmisalign_optab, mode))
> ! != CODE_FOR_nothing)
> {
> + struct expand_operand ops[2];
> + rtx op0, mem;
> +
> + if (TREE_CODE (to) == MEM_REF)
> + {
> + addr_space_t as
> + = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (to,
> 1))));
> + tree base = TREE_OPERAND (to, 0);
> + address_mode = targetm.addr_space.address_mode (as);
> + op0 = expand_expr (base, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> + op0 = convert_memory_address_addr_space (address_mode, op0, as);
> + if (!integer_zerop (TREE_OPERAND (to, 1)))
> + {
> + rtx off
> + = immed_double_int_const (mem_ref_offset (to),
> address_mode);
> + op0 = simplify_gen_binary (PLUS, address_mode, op0, off);
> + }
> + op0 = memory_address_addr_space (mode, op0, as);
> + mem = gen_rtx_MEM (mode, op0);
> + set_mem_attributes (mem, to, 0);
> + set_mem_addr_space (mem, as);
> + }
> + else if (TREE_CODE (to) == TARGET_MEM_REF)
> + {
> + addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (to));
> + struct mem_address addr;
> +
> + get_address_description (to, &addr);
> + op0 = addr_for_mem_ref (&addr, as, true);
> + op0 = memory_address_addr_space (mode, op0, as);
> + mem = gen_rtx_MEM (mode, op0);
> + set_mem_attributes (mem, to, 0);
> + set_mem_addr_space (mem, as);
> + }
> + else
> + gcc_unreachable ();
> + if (TREE_THIS_VOLATILE (to))
> + MEM_VOLATILE_P (mem) = 1;
> +
> + create_fixed_operand (&ops[0], mem);
> + create_input_operand (&ops[1], reg, mode);
> + /* The movmisalign<mode> pattern cannot fail, else the assignment
> would
> + silently be omitted. */
> + expand_insn (icode, 2, ops);
> + return;
> + }
> + else if (STRICT_ALIGNMENT || SLOW_UNALIGNED_ACCESS (mode, align))
so - this can be STRICT_ALIGNMENT only. I guess then this does not
fix the SSE testcase you had, right? (Alternatively just using
SLOW_UNALIGNED_ACCESS is ok as well, we'd just change behavior here
compared to what we have done in the past)
> + {
> + rtx op0, mem;
> addr_space_t as
> ! = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (to, 1))));
> tree base = TREE_OPERAND (to, 0);
> address_mode = targetm.addr_space.address_mode (as);
> op0 = expand_expr (base, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> op0 = convert_memory_address_addr_space (address_mode, op0, as);
> op0 = memory_address_addr_space (mode, op0, as);
> mem = gen_rtx_MEM (mode, op0);
> set_mem_attributes (mem, to, 0);
> set_mem_addr_space (mem, as);
>
> ! store_bit_field (mem, tree_low_cst (TYPE_SIZE (TREE_TYPE (to)), 1),
> ! tree_low_cst (TREE_OPERAND (to, 1), 0),
> ! 0, 0, mode, reg);
I think you want GET_MODE_BITSIZE (mode) for the bitsize argument,
and zero for the bitnum argument and handle the offset of the MEM_REF
(you don't handle TARGET_MEM_REFs here, so you'd better assert it is
a MEM_REF only) similar to how it is handled in the movmisalign case.
In fact the MEM_REF/TARGET_MEM_REF conversion to the mem can be
shared between the movmisaling and store_bit_field cases I think.
> ! return;
> }
> }
>
> /* Assignment of a structure component needs special treatment
> *************** expand_expr_real_1 (tree exp, rtx target
> *** 9356,9376 ****
> if (TREE_THIS_VOLATILE (exp))
> MEM_VOLATILE_P (temp) = 1;
> if (mode != BLKmode
> ! && align < GET_MODE_ALIGNMENT (mode)
> /* If the target does not have special handling for unaligned
> loads of mode then it can use regular moves for them. */
> ! && ((icode = optab_handler (movmisalign_optab, mode))
> ! != CODE_FOR_nothing))
> ! {
> ! struct expand_operand ops[2];
>
> ! /* We've already validated the memory, and we're creating a
> ! new pseudo destination. The predicates really can't fail,
> ! nor can the generator. */
> ! create_output_operand (&ops[0], NULL_RTX, mode);
> ! create_fixed_operand (&ops[1], temp);
> ! expand_insn (icode, 2, ops);
> ! return ops[0].value;
> }
> return temp;
> }
> --- 9379,9409 ----
> if (TREE_THIS_VOLATILE (exp))
> MEM_VOLATILE_P (temp) = 1;
> if (mode != BLKmode
> ! && align < GET_MODE_ALIGNMENT (mode))
> ! {
> /* If the target does not have special handling for unaligned
> loads of mode then it can use regular moves for them. */
> ! if ((icode = optab_handler (movmisalign_optab, mode))
> ! != CODE_FOR_nothing)
> ! {
> ! struct expand_operand ops[2];
>
> ! /* We've already validated the memory, and we're creating a
> ! new pseudo destination. The predicates really can't fail,
> ! nor can the generator. */
> ! create_output_operand (&ops[0], NULL_RTX, mode);
> ! create_fixed_operand (&ops[1], temp);
> ! expand_insn (icode, 2, ops);
> ! return ops[0].value;
> ! }
> ! else if (STRICT_ALIGNMENT || SLOW_UNALIGNED_ACCESS (mode, align))
> ! temp = extract_bit_field (temp,
> ! tree_low_cst (TYPE_SIZE (TREE_TYPE
> (exp)), 1),
> ! tree_low_cst (TREE_OPERAND (exp, 1), 0),
> ! TYPE_UNSIGNED (TREE_TYPE (exp)), true
> /* ??? */,
> ! (modifier == EXPAND_STACK_PARM
> ! ? NULL_RTX : target),
> ! mode, mode);
Similar for the TREE_OPERAND (exp, 1) handling.
Eric, do you have any objections in principle of handling
get_object_or_type_alignment () < GET_MODE_ALIGNMENT (mode)
stores/loads this way?
Thanks,
Richard.