On Fri, Jan 8, 2016 at 2:29 PM, Christian Bruel <christian.br...@st.com> wrote: > When compiling code with attribute targets on arm or aarch64, > vector_type_mode returns different results (eg Vmode or BLKmode) depending > on the current simd flags that are not set between functions. > > for example the following code: > > #include <arm_neon.h> > > extern int8x8_t a; > extern int8x8_t b; > > int16x8_t > __attribute__ ((target("fpu=neon"))) > foo(void) > { > return vaddl_s8 (a, b); > } > > Triggers gcc_asserts in copy_to_mode_regs while expanding NEON builtins , > because the mismatch and DECL_MODE current's TYPE_MODE used in > expand_builtin for global variables. > > but the best explanation is in the vector_type_mode: > /* Vector types need to re-check the target flags each time we report > the machine mode. We need to do this because attribute target can > change the result of vector_mode_supported_p and have_regs_of_mode > on a per-function basis. Thus the TYPE_MODE of a VECTOR_TYPE can > change on a per-function basis. */ > > I first tried to hack the 2 machine descriptions to insert convert_to_mode > or relayout_decls here and there, but I found this very fragile. Instead a > more central relayout the of type while expanding gave good results, as > proposed here. > > bootstraped and tested with no regression for arm, aarch64 and i586. > > Does this look to be the right approach ? > > nb: for testing this patch is complementary with > > https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00332.html > https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00248.html > > thanks for your comments.
A x86 specific testcase that ICEs as well: typedef int v8si __attribute__((vector_size(32))); v8si a; v8si __attribute__((target("avx"))) foo() { return a; } in your patch not using the shared DECL_RTL of the global var "fixes" this so I think a conceptually better fix would be to "adjust" DECL_RTL from globals via a adjust_address (or so). Also given that we do /* ... fall through ... */ case FUNCTION_DECL: case RESULT_DECL: decl_rtl = DECL_RTL (exp); expand_decl_rtl: gcc_assert (decl_rtl); decl_rtl = copy_rtx (decl_rtl); thus always "unshare" DECL_RTL anyway it might be not so bad to simply do decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0); instead of that to avoid one copy. Index: expr.c =================================================================== --- expr.c (revision 232496) +++ expr.c (working copy) @@ -9597,7 +9597,10 @@ expand_expr_real_1 (tree exp, rtx target decl_rtl = DECL_RTL (exp); expand_decl_rtl: gcc_assert (decl_rtl); - decl_rtl = copy_rtx (decl_rtl); + if (MEM_P (decl_rtl)) + decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0); + else + decl_rtl = copy_rtx (decl_rtl); /* Record writes to register variables. */ if (modifier == EXPAND_WRITE && REG_P (decl_rtl) untested apart from on the x86_64 testcase (which it fixes). One could guard this further to only apply on vector typed decls with mismatched mode of course. I think that re-layouting globals is not very good design. Richard. > > > > > >