On Tue, Jan 19, 2016 at 4:13 PM, Christian Bruel <christian.br...@st.com> wrote: > > > On 01/19/2016 04:01 PM, Christian Bruel wrote: >> >> Hi Richard, >> >> thanks for your input, >> >> On 01/18/2016 12:36 PM, Richard Biener wrote: >>> >>> On Fri, Jan 8, 2016 at 2:29 PM, Christian Bruel <christian.br...@st.com> >>> wrote: >>>> >>>> When compiling code with attribute targets on arm or aarch64, >>>> vector_type_mode returns different results (eg Vmode or BLKmode) >>>> depending >>>> on the current simd flags that are not set between functions. >>>> >>>> for example the following code: >>>> >>>> #include <arm_neon.h> >>>> >>>> extern int8x8_t a; >>>> extern int8x8_t b; >>>> >>>> int16x8_t >>>> __attribute__ ((target("fpu=neon"))) >>>> foo(void) >>>> { >>>> return vaddl_s8 (a, b); >>>> } >>>> >>>> Triggers gcc_asserts in copy_to_mode_regs while expanding NEON builtins >>>> , >>>> because the mismatch and DECL_MODE current's TYPE_MODE used in >>>> expand_builtin for global variables. >>>> >>>> but the best explanation is in the vector_type_mode: >>>> /* Vector types need to re-check the target flags each time we report >>>> the machine mode. We need to do this because attribute target can >>>> change the result of vector_mode_supported_p and have_regs_of_mode >>>> on a per-function basis. Thus the TYPE_MODE of a VECTOR_TYPE can >>>> change on a per-function basis. */ >>>> >>>> I first tried to hack the 2 machine descriptions to insert >>>> convert_to_mode >>>> or relayout_decls here and there, but I found this very fragile. Instead >>>> a >>>> more central relayout the of type while expanding gave good results, as >>>> proposed here. >>>> >>>> bootstraped and tested with no regression for arm, aarch64 and i586. >>>> >>>> Does this look to be the right approach ? >>>> >>>> nb: for testing this patch is complementary with >>>> >>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00332.html >>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00248.html >>>> >>>> thanks for your comments. >>> >>> A x86 specific testcase that ICEs as well: >>> >>> typedef int v8si __attribute__((vector_size(32))); >>> v8si a; >>> v8si __attribute__((target("avx"))) foo() >>> { >>> return a; >>> } >>> >>> in your patch not using the shared DECL_RTL of the global var >>> "fixes" this so I think a conceptually better fix would be to >>> "adjust" DECL_RTL from globals via a adjust_address (or so). >>> >>> Also given that we do >>> >>> /* ... fall through ... */ >>> >>> case FUNCTION_DECL: >>> case RESULT_DECL: >>> decl_rtl = DECL_RTL (exp); >>> expand_decl_rtl: >>> gcc_assert (decl_rtl); >>> decl_rtl = copy_rtx (decl_rtl); >>> >>> thus always "unshare" DECL_RTL anyway it might be not so >>> bad to simply do >>> >>> decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0); >>> >>> instead of that to avoid one copy. >>> >>> Index: expr.c >>> =================================================================== >>> --- expr.c (revision 232496) >>> +++ expr.c (working copy) >>> @@ -9597,7 +9597,10 @@ expand_expr_real_1 (tree exp, rtx target >>> decl_rtl = DECL_RTL (exp); >>> expand_decl_rtl: >>> gcc_assert (decl_rtl); >>> - decl_rtl = copy_rtx (decl_rtl); >>> + if (MEM_P (decl_rtl)) >>> + decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0); >>> + else >>> + decl_rtl = copy_rtx (decl_rtl); >>> /* Record writes to register variables. */ >>> if (modifier == EXPAND_WRITE >>> && REG_P (decl_rtl) >>> >>> untested apart from on the x86_64 testcase (which it fixes). One could >>> guard >>> this further to only apply on vector typed decls with mismatched mode of >>> course. >>> >>> I think that re-layouting globals is not very good design. >>> >>> Richard. >> >> A few other ICEs with this implementation, for instance if the context >> is not in a function, such as >> >> typedef __simd64_int8_t int8x8_t; >> >> extern int8x8_t b; >> int8x8_t *a = &b; >> >> So, to avoid a var re-layout and a copy_rtx (implied by adjust_address >> btw). What about just calling 'change_address' ? like: (very lightly >> tested) >> >> Index: expr.c >> =================================================================== >> --- expr.c (revision 232564) >> +++ expr.c (working copy) >> @@ -9392,7 +9392,8 @@ >> enum expand_modifier modifier, rtx *alt_rtl, >> bool inner_reference_p) >> { >> - rtx op0, op1, temp, decl_rtl; >> + rtx op0, op1, temp; >> + rtx decl_rtl = NULL_RTX; >> tree type; >> int unsignedp; >> machine_mode mode, dmode; >> @@ -9590,11 +9591,22 @@ >> && (TREE_STATIC (exp) || DECL_EXTERNAL (exp))) >> layout_decl (exp, 0); >> >> + decl_rtl = DECL_RTL (exp); >> + >> + if (MEM_P (decl_rtl) >> + && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode)) >> + { >> + if (current_function_decl >> + && (! reload_completed && !reload_in_progress)) >> + decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0); >> + } >> + >> /* ... fall through ... */ >> >> case FUNCTION_DECL: >> case RESULT_DECL: >> - decl_rtl = DECL_RTL (exp); >> + if (! decl_rtl) >> + decl_rtl = DECL_RTL (exp); >> expand_decl_rtl: >> gcc_assert (decl_rtl); >> decl_rtl = copy_rtx (decl_rtl); >> >> I'm not sure that moving the code in the 'expand_decl_rtl' label is >> best, as we'd need to test for exp and the case should only happen for >> global vars (not functions or results) > > > Here is the alternative implementation, shorter after all. testing in > progress. > > Index: expr.c > =================================================================== > --- expr.c (revision 232570) > +++ expr.c (working copy) > @@ -9597,6 +9597,15 @@ > decl_rtl = DECL_RTL (exp); > expand_decl_rtl: > gcc_assert (decl_rtl); > + > + if (exp && code == VAR_DECL && MEM_P (decl_rtl) > + && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode)) > + { > + if (current_function_decl > + && (! reload_completed && !reload_in_progress))
maybe just if (currently_expanding_to_rtl)? But yes, this looks like a safe variant of the fix. Richard. > + decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0); > + } > + > decl_rtl = copy_rtx (decl_rtl); > /* Record writes to register variables. */ > if (modifier == EXPAND_WRITE > >> >> thanks, >> >