https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005

--- Comment #3 from Joel Holdsworth <joel at airwebreathe dot org.uk> ---
Interesting. Comparing the implementation of _mm_store_si128 to vst1q_s32:

emminitrin.h

extern __inline void __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
_mm_store_si128 (__m128i *__P, __m128i __B)
{
  *__P = __B;
}


arm_neon.h

__extension__ extern __inline void
__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
vst1q_s32 (int32_t * __a, int32x4_t __b)
{
  __builtin_neon_vst1v4si ((__builtin_neon_si *) __a, __b);
}


So why is one implemented with a built-in, and the other with a pointer
dereference?

Is there a way of making the optimizer see through __builtin_neon_vst1v4si with
GIMPLE? Where would the code be implemented? Where is it implemented for other
architectures?

Reply via email to