On Mon, Apr 18, 2016 at 12:13 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
> On Mon, Apr 18, 2016 at 8:40 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>> On Sun, Jan 10, 2016 at 11:45 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
>>> On Sun, Jan 10, 2016 at 11:32 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>> Since *mov<mode>_internal and <avx512>_(load|store)<mode>_mask patterns
>>>> can handle unaligned load and store, we can remove UNSPEC_LOADU and
>>>> UNSPEC_STOREU.  We use function prototypes with pointer to scalar for
>>>> unaligned load/store builtin functions so that memory passed to
>>>> *mov<mode>_internal is unaligned.
>>>>
>>>> Tested on x86-64.  Is this OK for trunk in stage 3?
>>>
>>> This patch is not appropriate for stage 3.
>>>
>>> Uros.
>>>
>>>> H.J.
>>>> ----
>>>> gcc/
>>>>
>>>>         PR target/69201
>>>>         * config/i386/avx512bwintrin.h (_mm512_mask_loadu_epi16): Pass
>>>>         const short * to __builtin_ia32_loaddquhi512_mask.
>>>>         (_mm512_maskz_loadu_epi16): Likewise.
>>>>         (_mm512_mask_storeu_epi16): Pass short * to
>>>>         __builtin_ia32_storedquhi512_mask.
>>>>         (_mm512_mask_loadu_epi8): Pass const char * to
>>>>         __builtin_ia32_loaddquqi512_mask.
>>>>         (_mm512_maskz_loadu_epi8): Likewise.
>>>>         (_mm512_mask_storeu_epi8): Pass char * to
>>>>         __builtin_ia32_storedquqi512_mask.
>>>>         * config/i386/avx512fintrin.h (_mm512_loadu_pd): Pass
>>>>         const double * to __builtin_ia32_loadupd512_mask.
>>>>         (_mm512_mask_loadu_pd): Likewise.
>>>>         (_mm512_maskz_loadu_pd): Likewise.
>>>>         (_mm512_storeu_pd): Pass double * to
>>>>         __builtin_ia32_storeupd512_mask.
>>>>         (_mm512_mask_storeu_pd): Likewise.
>>>>         (_mm512_loadu_ps): Pass const float * to
>>>>         __builtin_ia32_loadups512_mask.
>>>>         (_mm512_mask_loadu_ps): Likewise.
>>>>         (_mm512_maskz_loadu_ps): Likewise.
>>>>         (_mm512_storeu_ps): Pass float * to
>>>>         __builtin_ia32_storeups512_mask.
>>>>         (_mm512_mask_storeu_ps): Likewise.
>>>>         (_mm512_mask_loadu_epi64): Pass const long long * to
>>>>         __builtin_ia32_loaddqudi512_mask.
>>>>         (_mm512_maskz_loadu_epi64): Likewise.
>>>>         (_mm512_mask_storeu_epi64): Pass long long *
>>>>         to __builtin_ia32_storedqudi512_mask.
>>>>         (_mm512_loadu_si512): Pass const int * to
>>>>         __builtin_ia32_loaddqusi512_mask.
>>>>         (_mm512_mask_loadu_epi32): Likewise.
>>>>         (_mm512_maskz_loadu_epi32): Likewise.
>>>>         (_mm512_storeu_si512): Pass int * to
>>>>         __builtin_ia32_storedqusi512_mask.
>>>>         (_mm512_mask_storeu_epi32): Likewise.
>>>>         * config/i386/avx512vlbwintrin.h (_mm256_mask_storeu_epi8): Pass
>>>>         char * to __builtin_ia32_storedquqi256_mask.
>>>>         (_mm_mask_storeu_epi8): Likewise.
>>>>         (_mm256_mask_loadu_epi16): Pass const short * to
>>>>         __builtin_ia32_loaddquhi256_mask.
>>>>         (_mm256_maskz_loadu_epi16): Likewise.
>>>>         (_mm_mask_loadu_epi16): Pass const short * to
>>>>         __builtin_ia32_loaddquhi128_mask.
>>>>         (_mm_maskz_loadu_epi16): Likewise.
>>>>         (_mm256_mask_loadu_epi8): Pass const char * to
>>>>         __builtin_ia32_loaddquqi256_mask.
>>>>         (_mm256_maskz_loadu_epi8): Likewise.
>>>>         (_mm_mask_loadu_epi8): Pass const char * to
>>>>         __builtin_ia32_loaddquqi128_mask.
>>>>         (_mm_maskz_loadu_epi8): Likewise.
>>>>         (_mm256_mask_storeu_epi16): Pass short * to.
>>>>         __builtin_ia32_storedquhi256_mask.
>>>>         (_mm_mask_storeu_epi16): Pass short * to.
>>>>         __builtin_ia32_storedquhi128_mask.
>>>>         * config/i386/avx512vlintrin.h (_mm256_mask_loadu_pd): Pass
>>>>         const double * to __builtin_ia32_loadupd256_mask.
>>>>         (_mm256_maskz_loadu_pd): Likewise.
>>>>         (_mm_mask_loadu_pd): Pass onst double * to
>>>>         __builtin_ia32_loadupd128_mask.
>>>>         (_mm_maskz_loadu_pd): Likewise.
>>>>         (_mm256_mask_storeu_pd): Pass double * to
>>>>         __builtin_ia32_storeupd256_mask.
>>>>         (_mm_mask_storeu_pd): Pass double * to
>>>>         __builtin_ia32_storeupd128_mask.
>>>>         (_mm256_mask_loadu_ps): Pass const float * to
>>>>         __builtin_ia32_loadups256_mask.
>>>>         (_mm256_maskz_loadu_ps): Likewise.
>>>>         (_mm_mask_loadu_ps): Pass const float * to
>>>>         __builtin_ia32_loadups128_mask.
>>>>         (_mm_maskz_loadu_ps): Likewise.
>>>>         (_mm256_mask_storeu_ps): Pass float * to
>>>>         __builtin_ia32_storeups256_mask.
>>>>         (_mm_mask_storeu_ps): ass float * to
>>>>         __builtin_ia32_storeups128_mask.
>>>>         (_mm256_mask_loadu_epi64): Pass const long long * to
>>>>         __builtin_ia32_loaddqudi256_mask.
>>>>         (_mm256_maskz_loadu_epi64): Likewise.
>>>>         (_mm_mask_loadu_epi64): Pass const long long * to
>>>>         __builtin_ia32_loaddqudi128_mask.
>>>>         (_mm_maskz_loadu_epi64): Likewise.
>>>>         (_mm256_mask_storeu_epi64): Pass long long * to
>>>>         __builtin_ia32_storedqudi256_mask.
>>>>         (_mm_mask_storeu_epi64): Pass long long * to
>>>>         __builtin_ia32_storedqudi128_mask.
>>>>         (_mm256_mask_loadu_epi32): Pass const int * to
>>>>         __builtin_ia32_loaddqusi256_mask.
>>>>         (_mm256_maskz_loadu_epi32): Likewise.
>>>>         (_mm_mask_loadu_epi32): Pass const int * to
>>>>         __builtin_ia32_loaddqusi128_mask.
>>>>         (_mm_maskz_loadu_epi32): Likewise.
>>>>         (_mm256_mask_storeu_epi32): Pass int * to
>>>>         __builtin_ia32_storedqusi256_mask.
>>>>         (_mm_mask_storeu_epi32): Pass int * to
>>>>         __builtin_ia32_storedqusi128_mask.
>>>>         * config/i386/i386-builtin-types.def (PCSHORT): New.
>>>>         (PINT64): Likewise.
>>>>         (V64QI_FTYPE_PCCHAR_V64QI_UDI): Likewise.
>>>>         (V32HI_FTYPE_PCSHORT_V32HI_USI): Likewise.
>>>>         (V32QI_FTYPE_PCCHAR_V32QI_USI): Likewise.
>>>>         (V16SF_FTYPE_PCFLOAT_V16SF_UHI): Likewise.
>>>>         (V8DF_FTYPE_PCDOUBLE_V8DF_UQI): Likewise.
>>>>         (V16SI_FTYPE_PCINT_V16SI_UHI): Likewise.
>>>>         (V16HI_FTYPE_PCSHORT_V16HI_UHI): Likewise.
>>>>         (V16QI_FTYPE_PCCHAR_V16QI_UHI): Likewise.
>>>>         (V8SF_FTYPE_PCFLOAT_V8SF_UQI): Likewise.
>>>>         (V8DI_FTYPE_PCINT64_V8DI_UQI): Likewise.
>>>>         (V8SI_FTYPE_PCINT_V8SI_UQI): Likewise.
>>>>         (V8HI_FTYPE_PCSHORT_V8HI_UQI): Likewise.
>>>>         (V4DF_FTYPE_PCDOUBLE_V4DF_UQI): Likewise.
>>>>         (V4SF_FTYPE_PCFLOAT_V4SF_UQI): Likewise.
>>>>         (V4DI_FTYPE_PCINT64_V4DI_UQI): Likewise.
>>>>         (V4SI_FTYPE_PCINT_V4SI_UQI): Likewise.
>>>>         (V2DF_FTYPE_PCDOUBLE_V2DF_UQI): Likewise.
>>>>         (V2DI_FTYPE_PCINT64_V2DI_UQI): Likewise.
>>>>         (VOID_FTYPE_PDOUBLE_V8DF_UQI): Likewise.
>>>>         (VOID_FTYPE_PDOUBLE_V4DF_UQI): Likewise.
>>>>         (VOID_FTYPE_PDOUBLE_V2DF_UQI): Likewise.
>>>>         (VOID_FTYPE_PFLOAT_V16SF_UHI): Likewise.
>>>>         (VOID_FTYPE_PFLOAT_V8SF_UQI): Likewise.
>>>>         (VOID_FTYPE_PFLOAT_V4SF_UQI): Likewise.
>>>>         (VOID_FTYPE_PINT64_V8DI_UQI): Likewise.
>>>>         (VOID_FTYPE_PINT64_V4DI_UQI): Likewise.
>>>>         (VOID_FTYPE_PINT64_V2DI_UQI): Likewise.
>>>>         (VOID_FTYPE_PINT_V16SI_UHI): Likewise.
>>>>         (VOID_FTYPE_PINT_V8SI_UHI): Likewise.
>>>>         (VOID_FTYPE_PINT_V4SI_UHI): Likewise.
>>>>         (VOID_FTYPE_PSHORT_V32HI_USI): Likewise.
>>>>         (VOID_FTYPE_PSHORT_V16HI_UHI): Likewise.
>>>>         (VOID_FTYPE_PSHORT_V8HI_UQI): Likewise.
>>>>         (VOID_FTYPE_PCHAR_V64QI_UDI): Likewise.
>>>>         (VOID_FTYPE_PCHAR_V32QI_USI): Likewise.
>>>>         (VOID_FTYPE_PCHAR_V16QI_UHI): Likewise.
>>>>         (V64QI_FTYPE_PCV64QI_V64QI_UDI): Removed.
>>>>         (V32HI_FTYPE_PCV32HI_V32HI_USI): Likewise.
>>>>         (V32QI_FTYPE_PCV32QI_V32QI_USI): Likewise.
>>>>         (V16HI_FTYPE_PCV16HI_V16HI_UHI): Likewise.
>>>>         (V16QI_FTYPE_PCV16QI_V16QI_UHI): Likewise.
>>>>         (V8HI_FTYPE_PCV8HI_V8HI_UQI): Likewise.
>>>>         (VOID_FTYPE_PV32HI_V32HI_USI): Likewise.
>>>>         (VOID_FTYPE_PV16HI_V16HI_UHI): Likewise.
>>>>         (VOID_FTYPE_PV8HI_V8HI_UQI): Likewise.
>>>>         (VOID_FTYPE_PV64QI_V64QI_UDI): Likewise.
>>>>         (VOID_FTYPE_PV32QI_V32QI_USI): Likewise.
>>>>         (VOID_FTYPE_PV16QI_V16QI_UHI): Likewise.
>>>>         * config/i386/i386.c (ix86_emit_save_reg_using_mov): Don't
>>>>         use UNSPEC_STOREU.
>>>>         (ix86_emit_restore_sse_regs_using_mov): Don't use UNSPEC_LOADU.
>>>>         (ix86_avx256_split_vector_move_misalign): Don't use unaligned
>>>>         load nor store.
>>>>         (ix86_expand_vector_move_misalign): Likewise.
>>>>         (bdesc_special_args): Use CODE_FOR_movvNXY_internal and pointer
>>>>         to scalar function prototype for unaligned load/store builtins.
>>>>         (ix86_expand_special_args_builtin): Updated.
>>>>         * config/i386/sse.md (UNSPEC_LOADU): Removed.
>>>>         (UNSPEC_STOREU): Likewise.
>>>>         (VI_ULOADSTORE_BW_AVX512VL): Likewise.
>>>>         (VI_ULOADSTORE_F_AVX512VL): Likewise.
>>>>         (<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>): Likewise.
>>>>         (*<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>): Likewise.
>>>>         (<sse>_storeu<ssemodesuffix><avxsizesuffix>): Likewise.
>>>>         (<avx512>_storeu<ssemodesuffix><avxsizesuffix>_mask): Likewise.
>>>>         (<sse2_avx_avx512f>_loaddqu<mode><mask_name>): Likewise.
>>>>         (*<sse2_avx_avx512f>_loaddqu<mode><mask_name>"): Likewise.
>>>>         (sse2_avx_avx512f>_storedqu<mode>): Likewise.
>>>>         (<avx512>_storedqu<mode>_mask): Likewise.
>>>>         (*sse4_2_pcmpestr_unaligned): Likewise.
>>>>         (*sse4_2_pcmpistr_unaligned): Likewise.
>>>>         (*mov<mode>_internal): Renamed to ...
>>>>         (mov<mode>_internal): This.  Remove check of AVX and IAMCU on
>>>>         misaligned operand.  Replace vmovdqu64 with vmovdqu<ssescalarsize>.
>>>>         (movsd/movhpd to movupd peephole): Don't use UNSPEC_LOADU.
>>>>         (movlpd/movhpd to movupd peephole): Don't use UNSPEC_STOREU.
>>>>
>>>> gcc/testsuite/
>>>>
>>>>         PR target/69201
>>>>         * gcc.target/i386/avx256-unaligned-store-1.c (a): Make it
>>>>         extern to force it misaligned.
>>>>         (b): Likewise.
>>>>         (c): Likewise.
>>>>         (d): Likewise.
>>>>         Check vmovups.*movv8sf_internal/3 instead of avx_storeups256.
>>>>         Don't check `*' before movv4sf_internal.
>>>>         * gcc.target/i386/avx256-unaligned-store-2.c: Check
>>>>         vmovups.*movv32qi_internal/3 instead of avx_storeups256.
>>>>         Don't check `*' before movv16qi_internal.
>>>>         * gcc.target/i386/avx256-unaligned-store-3.c (a): Make it
>>>>         extern to force it misaligned.
>>>>         (b): Likewise.
>>>>         (c): Likewise.
>>>>         (d): Likewise.
>>>>         Check vmovups.*movv4df_internal/3 instead of avx_storeupd256.
>>>>         Don't check `*' before movv2df_internal.
>>>>         * gcc.target/i386/avx256-unaligned-store-4.c (a): Make it
>>>>         extern to force it misaligned.
>>>>         (b): Likewise.
>>>>         (c): Likewise.
>>>>         (d): Likewise.
>>>>         Check movv8sf_internal instead of avx_storeups256.
>>>>         Check movups.*movv4sf_internal/3 instead of avx_storeups256.
>>
>>
>> Here is the updated patch for GCC 7.  Tested on x86-64.  OK for
>> trrunk?
>
> IIRC from previous discussion, are we sure we won't propagate
> unaligned memory into SSE arithmetic insns?

Yes, it is true and it is what

(define_special_memory_constraint "Bm"
  "@internal Vector memory operand."
  (match_operand 0 "vector_memory_operand"))

is used for.

> Otherwise, the patch is OK, but please wait for Kirill for AVX512 approval.
>
> Thanks,
> Uros.



-- 
H.J.

Reply via email to