On Fri, Jan 8, 2016 at 2:29 PM, Christian Bruel <christian.br...@st.com> wrote:
> When compiling code with attribute targets on arm or aarch64,
> vector_type_mode returns different results (eg Vmode or BLKmode) depending
> on the current simd flags that are not set between functions.
>
> for example the following code:
>
> #include <arm_neon.h>
>
> extern int8x8_t a;
> extern int8x8_t b;
>
> int16x8_t
> __attribute__ ((target("fpu=neon")))
> foo(void)
> {
>    return vaddl_s8 (a, b);
> }
>
> Triggers gcc_asserts in copy_to_mode_regs while expanding NEON builtins ,
> because the mismatch and DECL_MODE current's TYPE_MODE used in
> expand_builtin for global variables.
>
> but the best explanation is in the vector_type_mode:
> /* Vector types need to re-check the target flags each time we report
>     the machine mode.  We need to do this because attribute target can
>     change the result of vector_mode_supported_p and have_regs_of_mode
>     on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
>     change on a per-function basis.  */
>
> I first tried to hack the 2 machine descriptions to insert convert_to_mode
> or relayout_decls here and there, but I found this very fragile. Instead a
> more central relayout the of type while expanding gave good results, as
> proposed here.
>
> bootstraped and tested with no regression for arm, aarch64 and i586.
>
> Does this look to be the right approach ?
>
> nb: for testing this patch is complementary with
>
> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00332.html
> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00248.html
>
> thanks for your comments.

A x86 specific testcase that ICEs as well:

typedef int v8si __attribute__((vector_size(32)));
v8si a;
v8si __attribute__((target("avx"))) foo()
{
  return a;
}

in your patch not using the shared DECL_RTL of the global var
"fixes" this so I think a conceptually better fix would be to
"adjust" DECL_RTL from globals via a adjust_address (or so).

Also given that we do

      /* ... fall through ...  */

    case FUNCTION_DECL:
    case RESULT_DECL:
      decl_rtl = DECL_RTL (exp);
    expand_decl_rtl:
      gcc_assert (decl_rtl);
      decl_rtl = copy_rtx (decl_rtl);

thus always "unshare" DECL_RTL anyway it might be not so
bad to simply do

     decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);

instead of that to avoid one copy.

Index: expr.c
===================================================================
--- expr.c      (revision 232496)
+++ expr.c      (working copy)
@@ -9597,7 +9597,10 @@ expand_expr_real_1 (tree exp, rtx target
       decl_rtl = DECL_RTL (exp);
     expand_decl_rtl:
       gcc_assert (decl_rtl);
-      decl_rtl = copy_rtx (decl_rtl);
+      if (MEM_P (decl_rtl))
+       decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
+      else
+       decl_rtl = copy_rtx (decl_rtl);
       /* Record writes to register variables.  */
       if (modifier == EXPAND_WRITE
          && REG_P (decl_rtl)

untested apart from on the x86_64 testcase (which it fixes).  One could guard
this further to only apply on vector typed decls with mismatched mode of course.

I think that re-layouting globals is not very good design.

Richard.

>
>
>
>
>
>

Reply via email to