Hi Richard, it is already a while ago, but I had not found time to continue with this patch until now.
I think I have now a better solution, which properly addresses your comments below. On 3/25/19 9:41 AM, Richard Biener wrote: > On Fri, 22 Mar 2019, Bernd Edlinger wrote: > >> On 3/21/19 12:15 PM, Richard Biener wrote: >>> On Sun, 10 Mar 2019, Bernd Edlinger wrote: >>> Finally... >>> >>> Index: gcc/function.c >>> =================================================================== >>> --- gcc/function.c (revision 269264) >>> +++ gcc/function.c (working copy) >>> @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl) >>> if (DECL_MODE (decl) == BLKmode) >>> return false; >>> >>> + if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL >>> + && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl)) >>> + && GET_MODE_ALIGNMENT (DECL_MODE (decl)) >>> + > MEM_ALIGN (DECL_INCOMING_RTL (decl))) >>> + return false; >>> + >>> /* If -ffloat-store specified, don't put explicit float variables >>> into registers. */ >>> /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa >>> >>> I wonder if it is necessary to look at DECL_INCOMING_RTL here >>> and why such RTL may not exist? That is, iff DECL_INCOMING_RTL >>> doesn't exist then shouldn't we return false for safety reasons? >>> You are right, it is not possbile to return different results from use_register_for_decl before vs. after incoming RTL is assigned. That hits an assertion in set_rtl. This hunk is gone now, instead I changed assign_parm_setup_reg to use movmisalign optab and/or extract_bit_field if misaligned entry_parm is to be assigned in a register. I have no test coverage for the movmisalign optab though, so I rely on your code review for that part. >>> Similarly the very same issue should exist on x86_64 which is >>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate >>> alignment on the caller side. So the STRICT_ALIGNMENT check is >>> a wrong one. >>> >> >> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets >> just use MEM_ALIGN to select the right instructions. MEM_ALIGN >> is always 32-bit align on the DImode memory. The x86_64 vector instructions >> would look at MEM_ALIGN and do the right thing, yes? > > No, they need to use the movmisalign optab and end up with UNSPECs > for example. Ah, thanks, now I see. >> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL >> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target >> does not even have to look at MEM_ALIGN except in the mov_misalign_optab, >> right? > > Yes, I think we never losened that. Note that RTL expansion has to > fix this up for them. Note that strictly speaking SLOW_UNALIGNED_ACCESS > specifies that x86 is strict-align wrt vector modes. > Yes I agree, the code would be incorrect for x86 as well when the movmisalign_optab is not used. So I invoke the movmisalign optab if available and if not fall back to extract_bit_field. As in the assign_parm_setup_stack assign_parm_setup_reg assumes that data->promoted_mode != data->nominal_mode does not happen with misaligned stack slots. Attached is the v3 if my patch. Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf. Is it OK for trunk? Thanks Bernd.
2019-07-30 Bernd Edlinger <bernd.edlin...@hotmail.de> PR middle-end/89544 * function.c (assign_param_data_one): Remove unused data members. (assign_parm_find_stack_rtl): Use larger alignment when possible. (assign_parm_adjust_stack_rtl): Revise STRICT_ALIGNMENT check. (assign_parm_setup_reg): Handle misaligned stack arguments. testsuite: 2019-07-30 Bernd Edlinger <bernd.edlin...@hotmail.de> PR middle-end/89544 * gcc.target/arm/unaligned-argument-1.c: New test. * gcc.target/arm/unaligned-argument-2.c: New test. Index: gcc/function.c =================================================================== --- gcc/function.c (revision 273767) +++ gcc/function.c (working copy) @@ -2274,8 +2274,6 @@ struct assign_parm_data_one int partial; BOOL_BITFIELD named_arg : 1; BOOL_BITFIELD passed_pointer : 1; - BOOL_BITFIELD on_stack : 1; - BOOL_BITFIELD loaded_in_reg : 1; }; /* A subroutine of assign_parms. Initialize ALL. */ @@ -2699,8 +2697,23 @@ assign_parm_find_stack_rtl (tree parm, struct assi intentionally forcing upward padding. Otherwise we have to come up with a guess at the alignment based on OFFSET_RTX. */ poly_int64 offset; - if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm) + if (data->locate.where_pad == PAD_NONE || data->entry_parm) align = boundary; + else if (data->locate.where_pad == PAD_UPWARD) + { + align = boundary; + /* If the argument offset is actually more aligned than the nominal + stack slot boundary, take advantage of that excess alignment. + Don't make any assumptions if STACK_POINTER_OFFSET is in use. */ + if (poly_int_rtx_p (offset_rtx, &offset) + && STACK_POINTER_OFFSET == 0) + { + unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT; + if (offset_align == 0 || offset_align > STACK_BOUNDARY) + offset_align = STACK_BOUNDARY; + align = MAX (align, offset_align); + } + } else if (poly_int_rtx_p (offset_rtx, &offset)) { align = least_bit_hwi (boundary); @@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d ultimate type, don't use that slot after entry. We'll make another stack slot, if we need one. */ if (stack_parm - && ((STRICT_ALIGNMENT - && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)) + && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm) + && targetm.slow_unaligned_access (data->nominal_mode, + MEM_ALIGN (stack_parm))) || (data->nominal_type && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm) && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY))) @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all did_conversion = true; } + else if (MEM_P (data->entry_parm) + && GET_MODE_ALIGNMENT (promoted_nominal_mode) + > MEM_ALIGN (data->entry_parm) + && targetm.slow_unaligned_access (promoted_nominal_mode, + MEM_ALIGN (data->entry_parm))) + { + enum insn_code icode = optab_handler (movmisalign_optab, + promoted_nominal_mode); + + if (icode != CODE_FOR_nothing) + emit_insn (GEN_FCN (icode) (parmreg, validated_mem)); + else + rtl = parmreg = extract_bit_field (validated_mem, + GET_MODE_BITSIZE (promoted_nominal_mode), 0, + unsignedp, parmreg, + promoted_nominal_mode, VOIDmode, false, NULL); + } else emit_move_insn (parmreg, validated_mem); Index: gcc/testsuite/gcc.target/arm/unaligned-argument-1.c =================================================================== --- gcc/testsuite/gcc.target/arm/unaligned-argument-1.c (revision 0) +++ gcc/testsuite/gcc.target/arm/unaligned-argument-1.c (working copy) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */ + +struct s { + int a, b; +} __attribute__((aligned(8))); + +struct s f0; + +void f(int a, int b, int c, int d, struct s f) +{ + f0 = f; +} + +/* { dg-final { scan-assembler-times "ldrd" 1 } } */ +/* { dg-final { scan-assembler-times "strd" 1 } } */ +/* { dg-final { scan-assembler-times "stm" 0 } } */ Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c =================================================================== --- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c (revision 0) +++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c (working copy) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */ + +struct s { + int a, b; +} __attribute__((aligned(8))); + +struct s f0; + +void f(int a, int b, int c, int d, int e, struct s f) +{ + f0 = f; +} + +/* { dg-final { scan-assembler-times "ldrd" 0 } } */ +/* { dg-final { scan-assembler-times "strd" 0 } } */ +/* { dg-final { scan-assembler-times "stm" 1 } } */