On 8/15/19 2:54 PM, Richard Biener wrote: > On Thu, 15 Aug 2019, Bernd Edlinger wrote: > >>>>> >>>>> Hmm. So your patch overrides user-alignment here. Woudln't it >>>>> be better to do that more conciously by >>>>> >>>>> if (! DECL_USER_ALIGN (decl) >>>>> || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl)) >>>>> && targetm.slow_unaligned_access (DECL_MODE (decl), align))) >>>>> >> >> ? I don't know why that would be better? >> If the value is underaligned no matter why, pretend it was declared as >> naturally aligned if that causes wrong code otherwise. >> That was the idea here. > > It would be better because then we ignore it and use what we'd use > by default rather than inventing sth new. And your patch suggests > it might be needed to up align even w/o DECL_USER_ALIGN. >
Hmmm, you mean the constant 1.0i should not have DECL_USER_ALIGN set? But it inherits the alignment from the destination variable, apparently. did you mean if (! DECL_USER_ALIGN (decl) && align < GET_MODE_ALIGNMENT (DECL_MODE (decl)) && ... ? I can give it a try. >>>>> IMHO whatever code later fails to properly use unaligned loads >>>>> should be fixed instead rather than ignoring user requested alignment. >>>>> >>>>> Can you quote a short testcase that explains what exactly goes wrong? >>>>> The struct-layout ones are awkward to look at... >>>>> >>>> >>>> Sure, >>>> >>>> $ cat test.c >>>> _Complex float __attribute__((aligned(1))) cf; >>>> >>>> void foo (void) >>>> { >>>> cf = 1.0i; >>>> } >>>> >>>> $ arm-linux-gnueabihf-gcc -S test.c >>>> during RTL pass: expand >>>> test.c: In function 'foo': >>>> test.c:5:6: internal compiler error: in gen_movsf, at >>>> config/arm/arm.md:7003 >>>> 5 | cf = 1.0i; >>>> | ~~~^~~~~~ >>>> 0x7ba475 gen_movsf(rtx_def*, rtx_def*) >>>> ../../gcc-trunk/gcc/config/arm/arm.md:7003 >>>> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const >>>> ../../gcc-trunk/gcc/recog.h:318 >>>> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*) >>>> ../../gcc-trunk/gcc/expr.c:3695 >>>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*) >>>> ../../gcc-trunk/gcc/expr.c:3791 >>>> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*) >>>> ../../gcc-trunk/gcc/expr.c:3490 >>>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*) >>>> ../../gcc-trunk/gcc/expr.c:3791 >>>> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool) >>>> ../../gcc-trunk/gcc/expr.c:5855 >>>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) >>>> ../../gcc-trunk/gcc/expr.c:5441 >>> >>> Huh, so why didn't it trigger >>> >>> /* Handle misaligned stores. */ >>> mode = TYPE_MODE (TREE_TYPE (to)); >>> if ((TREE_CODE (to) == MEM_REF >>> || TREE_CODE (to) == TARGET_MEM_REF) >>> && mode != BLKmode >>> && !mem_ref_refers_to_non_mem_p (to) >>> && ((align = get_object_alignment (to)) >>> < GET_MODE_ALIGNMENT (mode)) >>> && (((icode = optab_handler (movmisalign_optab, mode)) >>> != CODE_FOR_nothing) >>> || targetm.slow_unaligned_access (mode, align))) >>> { >>> >>> ? (_Complex float is 32bit aligned it seems, the DECL_RTL for the >>> var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] <var_decl >>> 0x2aaaaaad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned. >>> >>> Ah, 'to' is a plain DECL here so the above handling is incomplete. >>> IIRC component refs like __real cf = 0.f should be handled fine >>> again(?). So, does adding || DECL_P (to) fix the case as well? >>> >> >> So I tried this instead of the varasm.c change: >> >> Index: expr.c >> =================================================================== >> --- expr.c (revision 274487) >> +++ expr.c (working copy) >> @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem >> /* Handle misaligned stores. */ >> mode = TYPE_MODE (TREE_TYPE (to)); >> if ((TREE_CODE (to) == MEM_REF >> - || TREE_CODE (to) == TARGET_MEM_REF) >> + || TREE_CODE (to) == TARGET_MEM_REF >> + || DECL_P (to)) >> && mode != BLKmode >> - && !mem_ref_refers_to_non_mem_p (to) >> + && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to)) >> && ((align = get_object_alignment (to)) >> < GET_MODE_ALIGNMENT (mode)) >> && (((icode = optab_handler (movmisalign_optab, mode)) >> >> Result, yes, it fixes this test case >> but then I run all struct-layout-1.exp there are sill cases. where we have >> problems: >> >> In file included from >> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M >> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h: >> In function 'test2112':^M >> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:23:10: >> internal compiler error: in gen_movdf, at config/arm/arm.md:7107^M >> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:62:3: >> note: in definition of macro 'TX'^M >> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:1: >> note: in expansion of macro 'TCI'^M >> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:294: >> note: in expansion of macro 'F'^M >> 0x7ba377 gen_movdf(rtx_def*, rtx_def*)^M >> ../../gcc-trunk/gcc/config/arm/arm.md:7107^M >> 0xa494c7 insn_gen_fn::operator()(rtx_def*, rtx_def*) const^M >> ../../gcc-trunk/gcc/recog.h:318^M >> 0xa494c7 emit_move_insn_1(rtx_def*, rtx_def*)^M >> ../../gcc-trunk/gcc/expr.c:3695^M >> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M >> ../../gcc-trunk/gcc/expr.c:3791^M >> 0xa49437 emit_move_complex_parts(rtx_def*, rtx_def*)^M >> ../../gcc-trunk/gcc/expr.c:3490^M >> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M >> ../../gcc-trunk/gcc/expr.c:3791^M >> 0xa50faf store_expr(tree_node*, rtx_def*, int, bool, bool)^M >> ../../gcc-trunk/gcc/expr.c:5856^M >> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M >> ../../gcc-trunk/gcc/expr.c:5302^M >> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M >> ../../gcc-trunk/gcc/expr.c:4983^M >> 0x9338af expand_gimple_stmt_1^M >> ../../gcc-trunk/gcc/cfgexpand.c:3777^M >> 0x9338af expand_gimple_stmt^M >> ../../gcc-trunk/gcc/cfgexpand.c:3875^M >> 0x939221 expand_gimple_basic_block^M >> ../../gcc-trunk/gcc/cfgexpand.c:5915^M >> 0x93af86 execute^M >> ../../gcc-trunk/gcc/cfgexpand.c:6538^M >> Please submit a full bug report,^M >> >> My personal gut feeling this will be more fragile than over-aligning the >> constants. > > As said the constant shouldn't end up under-aligned, the user cannot > specify alignment of literal constants. Not sure what you mean > with "over"-aligning. > Hmm wait a moment, I actually wanted _only_ to change the DECL_ARTIFICIAL that is built by build_constant_desc. It uses align_variable of course, but I totally missed that this also controls the alignment of normal variables, sorry about the confusion here. I mean we should align the constant for the unaligned complex with the natural alignment of the type-mode. That wrong fix made the variables ignore the alignment, which was of course not intended, and instead I would need: Index: expr.c =================================================================== --- expr.c (revision 274531) +++ expr.c (working copy) @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem /* Handle misaligned stores. */ mode = TYPE_MODE (TREE_TYPE (to)); if ((TREE_CODE (to) == MEM_REF - || TREE_CODE (to) == TARGET_MEM_REF) + || TREE_CODE (to) == TARGET_MEM_REF + || DECL_P (to)) && mode != BLKmode - && !mem_ref_refers_to_non_mem_p (to) + && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to)) && ((align = get_object_alignment (to)) < GET_MODE_ALIGNMENT (mode)) && (((icode = optab_handler (movmisalign_optab, mode)) Index: varasm.c =================================================================== --- varasm.c (revision 274531) +++ varasm.c (working copy) @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3. If not see #include "stmt.h" #include "expr.h" #include "expmed.h" +#include "optabs.h" #include "output.h" #include "langhooks.h" #include "debug.h" @@ -3386,7 +3387,15 @@ build_constant_desc (tree exp) if (TREE_CODE (exp) == STRING_CST) SET_DECL_ALIGN (decl, targetm.constant_alignment (exp, DECL_ALIGN (decl))); else - align_variable (decl, 0); + { + align_variable (decl, 0); + if (DECL_ALIGN (decl) < GET_MODE_ALIGNMENT (DECL_MODE (decl)) + && ((optab_handler (movmisalign_optab, DECL_MODE (decl)) + != CODE_FOR_nothing) + || targetm.slow_unaligned_access (DECL_MODE (decl), + DECL_ALIGN (decl)))) + SET_DECL_ALIGN (decl, GET_MODE_ALIGNMENT (DECL_MODE (decl))); + } /* Now construct the SYMBOL_REF and the MEM. */ if (use_object_blocks_p ()) >> >> >>>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool) >>>> ../../gcc-trunk/gcc/expr.c:4983 >>>> 0x93396f expand_gimple_stmt_1 >>>> ../../gcc-trunk/gcc/cfgexpand.c:3777 >>>> 0x93396f expand_gimple_stmt >>>> ../../gcc-trunk/gcc/cfgexpand.c:3875 >>>> 0x9392e1 expand_gimple_basic_block >>>> ../../gcc-trunk/gcc/cfgexpand.c:5915 >>>> 0x93b046 execute >>>> ../../gcc-trunk/gcc/cfgexpand.c:6538 >>>> Please submit a full bug report, >>>> with preprocessed source if appropriate. >>>> Please include the complete backtrace with any bug report. >>>> See <https://gcc.gnu.org/bugs/> for instructions. >>>> >>>> Without the hunk in varasm.c of course. >>>> >>>> What happens is that expand_expr_real_2 returns a unaligned mem_ref here: >>>> >>>> case COMPLEX_CST: >>>> /* Handle evaluating a complex constant in a CONCAT target. */ >>>> if (original_target && GET_CODE (original_target) == CONCAT) >>>> { >>>> [... this path not taken ...] >> >> BTW: this code block executes when the other ICE happens. >> >>>> } >>>> >>>> /* fall through */ >>>> >>>> case STRING_CST: >>>> temp = expand_expr_constant (exp, 1, modifier); >>>> >>>> /* temp contains a constant address. >>>> On RISC machines where a constant address isn't valid, >>>> make some insns to get that address into a register. */ >>>> if (modifier != EXPAND_CONST_ADDRESS >>>> && modifier != EXPAND_INITIALIZER >>>> && modifier != EXPAND_SUM >>>> && ! memory_address_addr_space_p (mode, XEXP (temp, 0), >>>> MEM_ADDR_SPACE (temp))) >>>> return replace_equiv_address (temp, >>>> copy_rtx (XEXP (temp, 0))); >>>> return temp; >>>> >>>> The result of expand_expr_real(..., EXPAND_NORMAL) ought to be usable >>>> by emit_move_insn, that is expected just *everywhere* and can't be changed. >>>> >>>> This could probably be fixed in an ugly way in the COMPLEX_CST, handler >>>> but OTOH, I don't see any reason why this constant has to be misaligned >>>> when it can be easily aligned, which avoids the need for a misaligned >>>> access. >>> >>> If the COMPLEX_CST happends to end up in unaligned memory then that's >>> of course a bug (unless the target requests that for all COMPLEX_CSTs). >>> That is, if the unalignment is triggered because the store is to an >>> unaligned decl. >>> >>> But I think the issue is the above one? >>> >> >> yes initially the constant seems to be unaligned. then it is expanded, >> and there is no special handling for unaligned constants in expand_expr_real, >> and then probably expand_assignment or store_expr seem not fully prepared for >> this either. > > With a cross I see the constant has regular aligned _Complex type > so not sure how it can end up unaligned. > Maybe a target configuration issue. Not sure, I have configured mine this way: ../gcc-trunk/configure --prefix=/home/ed/gnu/arm-linux-gnueabihf-linux64 --target=arm-linux-gnueabihf --enable-languages=all --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard However it appears now there are two different errors, one is in expand_assignment which you found (I start to wonder if I should add you to the authors section of this patch), and a different one, which I have not yet simplified, but you can easily try that for yourself: make check-gcc-c RUNTESTFLAGS="struct-layout-1.exp=*" it is okay when the test fails to execute but there should no internal compiler errors. >>>> >>>> The problem is that the code that handles this misaligned access >>>> is skipped because the mem_rtx has initially no MEM_ATTRS and therefore >>>> MEM_ALIGN == 32, and therefore the code that handles the unaligned >>>> access is not taken. BUT before the mem_rtx is returned it is >>>> set to MEM_ALIGN = 8 by set_mem_attributes, and we have an assertion, >>>> because the result from expand_expr_real(..., EXPAND_NORMAL) ought to be >>>> usable with emit_move_insn. >>> >>> yes, as said the _access_ determines the address should be aligned >>> so we shouldn't end up setting MEM_ALIGN to 8 but to 32 according >>> to the access type/mode. But we can't trust DECL_ALIGN of >>> FUNCTION_DECLs but we _can_ trust users writing *(int *)fn >>> (maybe for actual accesses we _can_ trust DECL_ALIGN, it's just >>> we may not compute nonzero bits for the actual address because >>> of function pointer mangling) >>> (for accessing function code I'd say this would be premature >>> optimization, but ...) >>> >> >> Not a very nice solution, but it is not worth to spend much effort >> in optimizing undefined behavior, I just want to avoid the ICE >> at this time and would not trust the DECL_ALIGN either. > > So I meant > > Index: gcc/builtins.c > =================================================================== > --- gcc/builtins.c (revision 274534) > +++ gcc/builtins.c (working copy) > @@ -255,7 +255,8 @@ get_object_alignment_2 (tree exp, unsign > > /* Extract alignment information from the innermost object and > possibly adjust bitpos and offset. */ > - if (TREE_CODE (exp) == FUNCTION_DECL) > + if (TREE_CODE (exp) == FUNCTION_DECL > + && addr_p) > { > /* Function addresses can encode extra information besides their > alignment. However, if TARGET_PTRMEMFUNC_VBIT_LOCATION > > so we get at DECL_ALIGN of the FUNCTION_DECL (not sure if we > can trust it). > >>> >>> Still I think you can't simply override STACK_SLOT_ALIGNMENT just because >>> of the mode of an entry param, can you? If you can assume a bigger >>> alignment then STACK_SLOT_ALIGNMENT should return it. >>> >> >> I don't see a real problem here. All target except i386 and gcn (whatever >> that is) >> use the default for STACK_SLOT_ALIGNMENT which simply allows any (large) >> align value >> to rule the effective STACK_SLOT_ALIGNMENT. The user could have simply >> declared >> the local variable with the alignment that results in better code FWIW. >> >> If the stack alignment is too high that is capped in assign_stack_local: >> >> /* Ignore alignment if it exceeds MAX_SUPPORTED_STACK_ALIGNMENT. */ >> if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT) >> { >> alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT; >> alignment = MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT; >> } >> >> I for one, would just assume that MAX_SUPPORTED_STACK_ALIGNMENT should >> be sufficient for all modes that need movmisalign_optab and friends. >> If it is not, an ICE would be just fine. > > Hmm. In some way we could better communicate with the user then > and do not allow under-aligning automatic vars? But the you > still have packed structs with BLKmode where the actual field > accesses will carry SImode even when not aligned(?) > Yes, that works also when unaligned. > > Please split it into the parts for the PR and parts making the > asserts not trigger. > Yes, will do. > The PR is already fixed, right? The assign_parm_find_stack_rtl hunk > is merely an optimization? > Hmmmm... You are right, I should have added that to the commit message... Of course the test cases try to verify the optimization. Thanks Bernd.