Re: [PATCH] Enable GCC support for AMX

2020-09-18 Thread Hongyu Wang via Gcc-patches
Hi Kirill, Very Appreciated for your review again I just update the patch with adding XSAVE dependency and use __builtin_cpu_supports for runtime test. Re-based on Sept. 15 trunk and tested with sde. Kindly PING. Hongyu Wang 于2020年9月12日周六 上午1:00写道: > Hi > > Thanks for your review,

Enable GCC support for Intel Key Locker extension

2020-09-20 Thread Hongyu Wang via Gcc-patches
, Hongyu, Wang From e469649ff6e9c924964912517da69af27921a065 Mon Sep 17 00:00:00 2001 From: liuhongt Date: Thu, 5 Mar 2020 17:36:02 +0800 Subject: [PATCH] Enable GCC to support Intel Key Locker ISA gcc/ChangeLog 2018-12-15 Xuepeng Guo * common/config/i386/cpuinfo.h (get_available_features

Re: [PATCH] Enable GCC support for AMX

2020-09-28 Thread Hongyu Wang via Gcc-patches
Thanks! I'll ask my colleague to help check in the patch. Kirill Yukhin 于2020年9月28日周一 下午7:38写道: > Hello, > > On 12 сен 01:00, Hongyu Wang wrote: > > Hi > > > > Thanks for your review, and sorry for the late reply. It took a while > > to finish the runtim

[PATCH] Add Missing FSF copyright notes for some x86 intrinsic headers

2020-09-28 Thread Hongyu Wang via Gcc-patches
/avx512vp2intersectintrin.h: Ditto. * config/i386/avx512vp2intersectvlintrin.h: Ditto. * config/i386/pconfigintrin.h: Ditto. * config/i386/tsxldtrkintrin.h: Ditto. * config/i386/wbnoinvdintrin.h: Ditto. -- Regards, Hongyu, Wang From ec6263ba1d74953721dd274c301bdeeeb71d5e77 Mon Sep 17 00:00:00 2001

Re: [committed] testsuite: Fix up amx* dg-do run tests with older binutils

2020-09-30 Thread Hongyu Wang via Gcc-patches
Thanks for the fix! I forgot that we don't have builtin check for target-supports.exp. Will update these once we implement AMX with builtins. Jakub Jelinek 于2020年9月30日周三 下午7:51写道: > On Fri, Sep 18, 2020 at 04:31:55PM +0800, Hongyu Wang via Gcc-patches > wrote: > > Very App

[PATCH] i386: Fix pr104551 testcase for solaris [PR 104726]

2022-03-01 Thread Hongyu Wang via Gcc-patches
Use avx2-check mechanism to avoid illegal instrucion on non-avx2 target. Tested by Rainer Orth on Solaris/x86. Pushed to trunk as obvious fix. gcc/testsuite/ChangeLog: PR target/104726 * gcc.target/i386/pr104551.c: Use avx2-check.h. --- gcc/testsuite/gcc.target/i386/pr104551.c |

[PATCH] AVX512FP16: Fix vcvt[u]si2sh runtime tests for Solaris

2022-03-01 Thread Hongyu Wang via Gcc-patches
Use standard C type instead of __int64_t which doesn't work on Solaris. Tested by Rainer Orth on Solaris/x86. Pushed to trunk as obvious fix. gcc/testsuite/ChangeLog: PR target/104724 * gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c: Use long long instead of __int64_t.

[PATCH] AVX512FP16: Fix masm=intel output for vfc?(madd|mul)csh [PR 104977]

2022-03-18 Thread Hongyu Wang via Gcc-patches
Hi, This patch fixes typo in subst for scalar complex mask_round operand. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. Ok for master? gcc/ChangeLog: PR target/104977 * config/i386/sse.md (avx512fp16_fmash_v8hf): Correct round operand for intel

[PATCH] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-18 Thread Hongyu Wang via Gcc-patches
Hi, For complex scalar intrinsic like _mm_mask_fcmadd_sch, the mask should be and by 1 to ensure the mask is bind to lowest byte. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. Ok for master? gcc/ChangeLog: PR target/104978 * config/i386/sse.md (avx512fp16

Re: [PATCH] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-20 Thread Hongyu Wang via Gcc-patches
日周一 09:08写道: > > On Sat, Mar 19, 2022 at 8:09 AM Hongyu Wang via Gcc-patches > wrote: > > > > Hi, > > > > For complex scalar intrinsic like _mm_mask_fcmadd_sch, the > > mask should be and by 1 to ensure the mask is bind to lowest byte. > > > &g

Re: [PATCH] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-20 Thread Hongyu Wang via Gcc-patches
sk8 k, __m128 a, __m128 b) > https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=vmovss&ig_expand=3807,3081,3082,3084,3083,4837,4838 Oh, if this works, the non-avx512vl part could also be adjusted. Will try this, thanks. Hongtao Liu 于2022年3月21日周一 09:48写道: > &

[PATCH v2] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-21 Thread Hongyu Wang via Gcc-patches
Hi, For complex scalar intrinsic like _mm_mask_fcmadd_sch, the mask should be and by 1 to ensure the mask is bind to lowest byte. Use masked vmovss to perform same operation which omits higher bits of mask. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. Ok for master? gcc/ChangeLo

Re: [PATCH v2] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-21 Thread Hongyu Wang via Gcc-patches
here are strictly V8HF operands from builtin input. I suppose there should be no chance to input a different size subreg for the expander, otherwise (__v8hf) convert in builtin would fail first. Hongtao Liu via Gcc-patches 于2022年3月21日周一 20:53写道: > > On Mon, Mar 21, 2022 at 7:52 PM Hongyu Wan

[PATCH v3] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-21 Thread Hongyu Wang via Gcc-patches
Hi, here is the patch with force_reg before lowpart_subreg. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. Ok for master? For complex scalar intrinsic like _mm_mask_fcmadd_sch, the mask should be and by 1 to ensure the mask is bind to lowest byte. Use masked vmovss to perform same

Re: [PATCH] x86: Use x constraint on KL patterns

2022-03-25 Thread Hongyu Wang via Gcc-patches
Is it possible to create a test case that gas would throw an error for invalid operands? H.J. Lu via Gcc-patches 于2022年3月26日周六 04:50写道: > > Since KL instructions have no AVX512 version, replace the "v" register > constraint with the "x" register constraint. > > PR target/105058 >

Re: [PATCH] x86: Use x constraint on KL patterns

2022-03-25 Thread Hongyu Wang via Gcc-patches
didn't find it in document. H.J. Lu 于2022年3月26日周六 09:22写道: > > On Fri, Mar 25, 2022 at 6:08 PM Hongyu Wang wrote: > > > > Is it possible to create a test case that gas would throw an error for > > invalid operands? > > You can use -ffix-xmmN to disable XMM0-15.

[PATCH] i386: Fix infinite loop under -mrelax-cmpxchg-loop [PR 103069]

2022-04-13 Thread Hongyu Wang via Gcc-patches
Hi, For -mrelax-cmpxchg-loop which relaxes atomic_fetch_ loops, there is a missing set to %eax when compare fails, which would result in infinite loop in some benchmark. Add set to %eax to avoid it. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} Ok for master? gcc/ChangeLog: PR ta

[PATCH] i386: Disable stv under optimize_size [PR 105034]

2022-04-13 Thread Hongyu Wang via Gcc-patches
Hi, >From -Os point of view, stv converts scalar register to vector mode which introduces extra reg conversion and increase instruction size. Disabling stv under optimize_size would avoid such code size increment and no need to touch ix86_size_cost that has not been tuned for long time. Bootstrap

Re: [PATCH] i386: Disable stv under optimize_size [PR 105034]

2022-04-14 Thread Hongyu Wang via Gcc-patches
(a) : (b)) +#define min(a,b) (((a) < (b))? (a) : (b)) + +int foo(int x) +{ + return max(x,0); +} + +int bar(int x) +{ + return min(x,0); +} + +unsigned int baz(unsigned int x) +{ + return min(x,1); +} + +/* { dg-final { scan-assembler-not "xmm" } } */ -- 2.18.1 Richard Biener via Gc

Re: [PATCH] i386: Disable stv under optimize_size [PR 105034]

2022-04-14 Thread Hongyu Wang via Gcc-patches
x) +{ + return min(x,0); +} + +unsigned int baz(unsigned int x) +{ + return min(x,1); +} + +/* { dg-final { scan-assembler-not "xmm" } } */ -- 2.18.1 Richard Biener 于2022年4月14日周四 16:06写道: > > On Thu, Apr 14, 2022 at 9:55 AM Hongyu Wang wrote: > &g

Re: [PATCH] i386: Disable stv under optimize_size [PR 105034]

2022-04-14 Thread Hongyu Wang via Gcc-patches
onsider insn number under -Os, also for ABS/MIN/MAX it needs more correct model to describe the actual insn count. Thanks for your review. Richard Biener 于2022年4月14日周四 16:56写道: > > On Thu, Apr 14, 2022 at 10:31 AM Hongyu Wang wrote: > > > > > >virtual bool gate (fun

[PATCH] i386: Correct target attribute for crc32 intrinsics

2022-04-14 Thread Hongyu Wang via Gcc-patches
Hi, Complile _mm_crc32_u8/16/32/64 intrinsics with -mcrc32 would meet target specific option mismatch. Correct target pragma to fix. Bootstrapped/regtest on x86_64-pc-linux-gnu{-m32,}. Ok for master and backport to GCC 11? gcc/ChangeLog: * config/i386/smmintrin.h: Correct target pragma

Re: [PATCH] i386: Correct target attribute for crc32 intrinsics

2022-04-15 Thread Hongyu Wang via Gcc-patches
r, it has been a long term issue for intrinsic diagnostic. So for this test either I change the dg-error message or the call to builtin, otherwise it would fail. Uros Bizjak via Gcc-patches 于2022年4月15日周五 15:54写道: > > On Fri, Apr 15, 2022 at 6:30 AM Hongyu Wang wrote: > > > >

[PATCH] AVX512F: Add missing macro for mask(z?)_scalf_s[sd] [PR 105339]

2022-04-22 Thread Hongyu Wang via Gcc-patches
Hi, Add missing macro under O0 and adjust macro format for scalf intrinsics. Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}. Ok for master and backport to GCC 9/10/11? gcc/ChangeLog: PR target/105339 * config/i386/avx512fintrin.h (_mm512_scalef_round_pd): Add pare

Re: [PATCH] AVX512F: Add missing macro for mask(z?)_scalf_s[sd] [PR 105339]

2022-04-22 Thread Hongyu Wang via Gcc-patches
> Please add the corresponding intrinsic test in sse-14.c Sorry for forgetting this part. Updated patch. Thanks. Hongtao Liu via Gcc-patches 于2022年4月22日周五 16:49写道: > > On Fri, Apr 22, 2022 at 4:12 PM Hongyu Wang via Gcc-patches > wrote: > > > > Hi, > > > > A

Re: [PATCH] i386: Fix GLC tuning with -masm=intel [PR104104]

2022-01-18 Thread Wang, Hongyu via Gcc-patches
Sorry for introducing such failure and thanks for the patch, I suppose it could be treated as obvious fix? 发件人: Jakub Jelinek 发送时间: 星期三, 一月 19, 2022 8:01 上午 收件人: Hongtao Liu; Uros Bizjak 抄送: gcc-patches@gcc.gnu.org; Wang, Hongyu 主题: [PATCH] i386: Fix GLC tuning

[PATCH] i386: Relax cmpxchg instruction under -mrelax-cmpxchg-loop [PR 103069]

2022-02-21 Thread Hongyu Wang via Gcc-patches
Hi, For cmpxchg, it is commonly used in spin loop, and several user code such as pthread directly takes cmpxchg as loop condition, which cause huge cache bouncing. This patch extends previous implementation to relax all cmpxchg instruction under -mrelax-cmpxchg-loop with an extra atomic load, com

[PATCH] AVX512F: Add helper enumeration for ternary logic intrinsics.

2022-02-25 Thread Hongyu Wang via Gcc-patches
Hi, This patch intends to sync with llvm change in https://reviews.llvm.org/D120307 to add enumeration and truncate imm to unsigned char, so users could use ~ on immediates. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,}. Ok for master? gcc/ChangeLog: * config/i386/avx512fintrin.h

[PATCH] i386: Fix V8HF vector init under -mno-avx [PR 104664]

2022-02-28 Thread Hongyu Wang via Gcc-patches
Hi, For V8HFmode vector init with HFmode, do not directly emits V8HF move with subreg, which may cause reload to assign general register to move src. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,}. Ok for master? gcc/ChangeLog: PR target/104664 * config/i386/i386-expand.cc

[PATCH] i386: Fix wrong codegen for -mrelax-cmpxchg-loop

2021-11-17 Thread Hongyu Wang via Gcc-patches
Hi Uros, For -mrelax-cmpxchg-loop introduced by PR 103069/r12-5265, it would produce infinite loop. The correct code should be .L84: movl(%rdi), %ecx movl%eax, %edx orl %esi, %edx cmpl%eax, %ecx jne .L82 lock cmpxchgl %edx, (%r

[PATCH] Support Intel AVX-IFMA

2022-10-18 Thread Hongyu Wang via Gcc-patches
Hi, Here is the update patch that align the implementation to AVX-VNNI, and corrects some spelling error for AVX512IFMA pattern. Bootstrapped/regtested on x86_64-pc-linux-gnu and sde. Ok for trunk? gcc/ * common/config/i386/i386-common.cc (OPTION_MASK_ISA_AVXIFMA_SET, OPTION_MAS

[PATCH] i386: Enable small loop unrolling for O2

2022-10-25 Thread Hongyu Wang via Gcc-patches
Hi, Inspired by rs6000 and s390 port changes, this patch enables loop unrolling for small size loop at O2 by default. The default behavior is to unroll loop with unknown trip-count and less than 4 insns by 1 time. This improves 548.exchange2 by 3.5% on icelake and 6% on zen3 with 1.2% codesize in

Re: [PATCH] i386: Enable small loop unrolling for O2

2022-10-26 Thread Hongyu Wang via Gcc-patches
c-patches 于2022年10月26日周三 14:57写道: > > On Wed, Oct 26, 2022 at 7:53 AM Hongyu Wang wrote: > > > > Hi, > > > > Inspired by rs6000 and s390 port changes, this patch > > enables loop unrolling for small size loop at O2 by default. > > The default behavior is

Re: [PATCH] i386: Enable small loop unrolling for O2

2022-10-28 Thread Hongyu Wang via Gcc-patches
then the backend can turn it on by default in O2? I don't know if there is a way to turn on middle-end pass by target-specific flags. Richard Biener via Gcc-patches 于2022年10月28日周五 15:33写道: > > On Wed, Oct 26, 2022 at 7:53 AM Hongyu Wang wrote: > > > > Hi, > > > &

[PATCH V2] Enable small loop unrolling for O2

2022-11-01 Thread Hongyu Wang via Gcc-patches
Hi, this is the updated patch of https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604345.html, which uses targetm.loop_unroll_adjust as gate to enable small loop unroll. This patch does not change rs6000/s390 since I don't have machine to test them, but I suppose the default behavior is the

Re: [PATCH] [i386]Add combine splitter to transform pxor/pcmpeqb/pmovmskb/cmp 0xffff to ptest.

2022-05-06 Thread Hongyu Wang via Gcc-patches
> +(define_split > + [(set (reg:CCZ FLAGS_REG) > + (compare:CCZ (unspec:SI > + [(eq:VI1_AVX2 > + (match_operand:VI1_AVX2 0 "vector_operand") > + (match_operand:VI1_AVX2 1 "const0_operand"))] > + UNSPE

Re: [PATCH] Reconstruct i386 testsuite with __builtin_cpu_supports

2022-05-06 Thread Hongyu Wang via Gcc-patches
> I don't think *_os_support calls should be removed. IIRC, > __builtin_cpu_supports function checks if the feature is supported by > CPU, whereas *_os_supports calls check via xgetbv if OS supports > handling of new registers. avx_os_support is like avx_os_support (void) { unsigned int eax, ed

[PATCH] i386: Add a constraint for absolute symboilc address [PR 105576]

2022-05-18 Thread Hongyu Wang via Gcc-patches
Hi, This patch adds a constraint "Ws" to allow absolute symbolic address for either function or variable. This also works under -mcmodel=large. Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} Ok for master? gcc/ChangeLog: PR target/105576 * config/i386/constraints.md (Ws):

Re: [PATCH] i386: Add a constraint for absolute symboilc address [PR 105576]

2022-05-18 Thread Hongyu Wang via Gcc-patches
Oh, I just found that asm ("%p0" :: "i"(addr)); also works on -mcmodel=large in this case, please ignore this patch. Thanks. Uros Bizjak via Gcc-patches 于2022年5月18日周三 17:46写道: > > On Wed, May 18, 2022 at 9:32 AM Hongyu Wang wrote: > > > > Hi, > > &g

Re: [PATCH] i386: Add a constraint for absolute symboilc address [PR 105576]

2022-05-18 Thread Hongyu Wang via Gcc-patches
g. Uros Bizjak 于2022年5月18日周三 18:18写道: > > On Wed, May 18, 2022 at 12:14 PM Hongyu Wang wrote: > > > > Oh, I just found that asm ("%p0" :: "i"(addr)); also works on > > -mcmodel=large in this case, please ignore this patch. Thanks. > > -fpic will b

[PATCH] x86: Adjust keylocker testcases for fail on darwin

2020-11-09 Thread Hongyu Wang via Gcc-patches
. * gcc.target/i386/keylocker-aesenc256kl.c: New test. -- Regards, Hongyu, Wang From 9009ce97099b3a80fdf61a1927c1fff9c7f5b9bf Mon Sep 17 00:00:00 2001 From: hongyuw1 Date: Fri, 6 Nov 2020 15:08:10 +0800 Subject: [PATCH] Adjust Keylocker regex pattern for darwin, and add missing aesenc256kl test. gcc

Re: [PATCH] x86: Adjust keylocker testcases for fail on darwin

2020-11-09 Thread Hongyu Wang via Gcc-patches
> > Please rewrite scan strings back to using double-quotation marks. > Yes, updated patch. Uros Bizjak 于2020年11月9日周一 下午7:41写道: > > On Mon, Nov 9, 2020 at 11:50 AM Hongyu Wang wrote: > > > > Hi > > > > According to the discussion in > > https://g

[PATCH][PR target/97770] x86: Add missing popcount2 expander

2020-11-11 Thread Hongyu Wang via Gcc-patches
, Wang From b809052b0bab5d80dd0a1b1ffbd55faa8179a416 Mon Sep 17 00:00:00 2001 From: Hongyu Wang Date: Wed, 11 Nov 2020 09:41:13 +0800 Subject: [PATCH] Add popcount expander to enable popcount auto vectorization under AVX512BITALG/AVX512POPCNTDQ target. gcc/ChangeLog PR target/97770 * gcc/config

Re: [PATCH] Remove redundant builtins for avx512f scalar instructions.

2020-11-12 Thread Hongyu Wang via Gcc-patches
11月13日周五 下午1:43写道: > > > On 12/23/19 10:31 PM, Hongyu Wang wrote: > > Hi: > For avx512f scalar instructions, current builtin function like > __builtin_ia32_*{sd,ss}_round can be replaced by > __builtin_ia32_*{sd,ss}_mask_round with mask parameter set to -1. This > pat

[PATCH] AVX512FP16: Support cond_op for HFmode

2021-09-23 Thread Hongyu Wang via Gcc-patches
Hi, This patch extend the expanders for cond_op to support vector HF modes. bootstraped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for master? gcc/ChangeLog: * config/i386/sse.md (cond_): Extend to support vector HFmodes. (cond_mul): Likewise. (cond_div): Lik

Re: [PATCH] AVX512FP16: Support cond_op for HFmode

2021-09-23 Thread Hongyu Wang via Gcc-patches
> >-Original Message- > >From: Wang, Hongyu > >Sent: Thursday, September 23, 2021 5:16 PM > >To: Liu, Hongtao > >Cc: gcc-patches@gcc.gnu.org > >Subject: [PATCH] AVX512FP16: Support cond_op for HFmode > > > >Hi, > > > >This patch extend t

[PATCH] AVX512FP16:support basic 64/32bit vector type and operation.

2021-09-27 Thread Hongyu Wang via Gcc-patches
Hi Uros, This patch intends to support V4HF/V2HF vector type and basic operations. For 32bit target, V4HF vector is parsed same as __m64 type, V2HF is parsed by stack and returned from GPR since it is not specified by ABI. We found for 64bit vector in ia32, when mmx disabled there seems no mov_i

Re: [PATCH] AVX512FP16:support basic 64/32bit vector type and operation.

2021-09-27 Thread Hongyu Wang via Gcc-patches
moving the extra define_insn, and drop V4HFmode from VALID_AVX512FP16_REG_MODE. Now v4hf would behave same as v8qi. Bootsrapped and regtested on x86_64-pc-linux-gnu{-m32,} and sde. OK for master with the updated one? Uros Bizjak via Gcc-patches 于2021年9月27日周一 下午7:35写道: > > On Mon, Sep

Re: [PATCH] AVX512FP16:support basic 64/32bit vector type and operation.

2021-09-28 Thread Hongyu Wang via Gcc-patches
Tue, Sep 28, 2021 at 6:48 AM Hongyu Wang wrote: > > > > > ia32 ABI declares that __m64 values pass via MMX registers. Due to > > > this, we are not able to fully disable MMX register usage, as is the > > > case with x86_64. So, V4HFmode values will pass to functions

[PATCH] i386: Fix wrong result for AMX-TILE intrinsic when parsing expression.

2021-11-03 Thread Hongyu Wang via Gcc-patches
Hi, _tile_loadd, _tile_stored, _tile_streamloadd intrinsics are defined by macro, so the parameters should be wrapped by parentheses to accept expressions. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. OK for master and backport to GCC11 branch? gcc/ChangeLog: * config/i

[PATCH] i386: Auto vectorize sdot_prod, usdot_prod with VNNI instruction.

2021-11-03 Thread Hongyu Wang via Gcc-patches
Hi, AVX512VNNI/AVXVNNI has vpdpwssd for HImode, vpdpbusd for QImode, so Adjust HImode sdot_prod expander and add QImode usdot_prod expander to enhance vectorization for dotprod. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. Ok for master? gcc/ChangeLog: * config/i386/sse.

Re: [PATCH] i386: Fix wrong result for AMX-TILE intrinsic when parsing expression.

2021-11-03 Thread Hongyu Wang via Gcc-patches
> Could you add a testcase for that? Yes, updated patch. Hongtao Liu via Gcc-patches 于2021年11月4日周四 上午10:25写道: > > On Thu, Nov 4, 2021 at 9:19 AM Hongyu Wang via Gcc-patches > wrote: > > > > Hi, > > > > _tile_loadd, _tile_stored, _tile_streamloadd intrins

[PATCH] PR target/103069: Relax cmpxchg loop for x86 target

2021-11-12 Thread Hongyu Wang via Gcc-patches
Hi, >From the CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line ex

[PATCH] PR libgomp/103068: Optimize gomp_mutex_lock_slow for x86 target

2021-11-13 Thread Hongyu Wang via Gcc-patches
Hi, >From the CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers /xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line e

Re: [PATCH] PR target/103069: Relax cmpxchg loop for x86 target

2021-11-15 Thread Hongyu Wang via Gcc-patches
Thanks for your review, this is the patch I'm going to check-in. Uros Bizjak via Gcc-patches 于2021年11月15日周一 下午4:25写道: > > On Sat, Nov 13, 2021 at 3:34 AM Hongyu Wang wrote: > > > > Hi, > > > > From the CPU's point of view, getting a cache line for wr

[PATCH] AVX512FP16: Adjust builtin for mask complex fma

2021-10-13 Thread Hongyu Wang via Gcc-patches
Hi, Current mask/mask3 implementation for complex fma contains duplicated parameter in macro, which may cause error at -O0. Refactor macro implementation to builtins to avoid potential error. For round intrinsic with NO_ROUND as input, ix86_erase_embedded_rounding erases embedded_rounding upspec

[PATCH] AVX512FP16: Support vector shuffle builtins

2021-10-13 Thread Hongyu Wang via Gcc-patches
Hi, This patch supports HFmode vector shuffle by creating HImode subreg when expanding permutation expr. Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde{-m32,} OK for master? gcc/ChangeLog: * config/i386/i386-expand.c (ix86_expand_vec_perm): Convert HFmode input ope

[PATCH] AVX512FP16: Fix testcase for complex intrinsic

2021-10-14 Thread Hongyu Wang via Gcc-patches
Hi, -march=cascadelake which contains -mavx512vl produces unmatched scan for vf[c]maddcsh test, so add -mno-avx512vl to vf[c]maddcsh-1a.c. Also add scan for vblendmps to vf[c]maddcph tests to check correctness. Tested on unix{-m32,} with -march=cascadelake. Pushed to trunk as obvious fix. gcc/

[PATCH] AVX512FP16: Fix ICE for 2 v4hf vector concat

2021-10-14 Thread Hongyu Wang via Gcc-patches
Hi, For V4HFmode, doing vector concat like __builtin_shufflevector (a, b, {0, 1, 2, 3, 4, 5, 6, 7}) could trigger ICE since it is not handled in ix86_vector_init (). Handle HFmode like HImode to avoid such ICE. Bootstrappted/regtested on x86_64-pc-linux-gnu{-m32,} and sde{-m32,} OK for master

Re: [PATCH] AVX512FP16: Support vector shuffle builtins

2021-10-14 Thread Hongyu Wang via Gcc-patches
> This part seems not related to vector shuffle. Yes, have separated this part to another patch and checked-in. Updated patch. Ok for this one? Hongtao Liu via Gcc-patches 于2021年10月14日周四 下午2:33写道: > > On Thu, Oct 14, 2021 at 10:39 AM Hongyu Wang via Gcc-patches > wrote

Re: [PATCH] AVX512FP16: Support vector shuffle builtins

2021-10-14 Thread Hongyu Wang via Gcc-patches
hanks for pointing it out, didn't realize the difference between these 2 functions. Updated patch. Hongtao Liu 于2021年10月15日周五 下午1:54写道: > > On Fri, Oct 15, 2021 at 1:37 PM Hongyu Wang wrote: > > > > > This part seems not related to vector shuffle. > > Yes,

[PATCH] i386: Fix wrong codegen for V8HF move without TARGET_AVX512F

2021-10-19 Thread Hongyu Wang via Gcc-patches
Since _Float16 type is enabled under sse2 target, returning V8HFmode vector without AVX512F target would generate wrong vmovdqa64 instruction. Adjust ix86_get_ssemov to avoid this. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. OK for master? gcc/ChangeLog: PR target/102812

Re: [PATCH] i386: Fix wrong codegen for V8HF move without TARGET_AVX512F

2021-10-20 Thread Hongyu Wang via Gcc-patches
__vector_size__ (16))); + +v8hf t (_Float16 a) +{ +return (v8hf) {a, 0, 0, 0, 0, 0, 0, 0}; +} -- 2.18.1 Hongtao Liu via Gcc-patches 于2021年10月21日周四 下午1:24写道: > > On Wed, Oct 20, 2021 at 1:31 PM Hongyu Wang via Gcc-patches > wrote: > > > > Since _Float16 type is ena

Re: [PATCH] i386: Fix wrong codegen for V8HF move without TARGET_AVX512F

2021-10-21 Thread Hongyu Wang via Gcc-patches
Thanks for reminding this, will adjust the testcase since the output for 128/256bit HFmode load has changed. Martin Liška 于2021年10月21日周四 下午8:49写道: > > On 10/21/21 07:47, Hongyu Wang via Gcc-patches wrote: > > |Yes, updated patch.| > > Note the patch caused the following test

[PATCH] Adjust testcase for 128/256 bit HF vector load/store

2021-10-21 Thread Hongyu Wang via Gcc-patches
Hi, The HF vector move have been updated to align with HI vector, adjust according testcase for _Float16 vector load and store. Tested on x86_64-pc-linux-gnu{-m32,}, pushed as obvious fix. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-13.c: Adjust scan-assembler for xmm/

Re: [PATCH] testsuite: i386: Fix gcc.target/i386/avx512fp16-trunchf.c on Solaris [PR102835]

2021-10-25 Thread Hongyu Wang via Gcc-patches
> uses %ebp instead of the expected %esp. As Hongyu Wang suggested in the > PR, this can be fixed by accepting both forms, which this patch does. > > Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu

[PATCH] AVX512FP16: Optimize _Float16 reciprocal for div and sqrt

2021-10-26 Thread Hongyu Wang via Gcc-patches
Hi, For _Float16 type, add insn and expanders to optimize x / y to x * rcp (y), and x / sqrt (y) to x * rsqrt (y). As Half float only have minor precision difference between div and mul * rcp, there is no need for Newton-Rhapson approximation. Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}

RE: [llvm-dev] [PATCH] Add optional _Float16 support

2021-07-12 Thread Wang, Pengfei via Gcc-patches
> Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. Can you please explain the behavior here? Is there difference between _Float16 and _Complex _Float16 when return? I.e., 1, In which case will _Float16 values return in both %xmm0 and %xmm1? 2, For a single _Float16 value, ar

RE: [llvm-dev] [PATCH] Add optional _Float16 support

2021-07-13 Thread Wang, Pengfei via Gcc-patches
r for _Complex _Float16 on 32 bits target? Thanks Pengfei -Original Message- From: H.J. Lu Sent: Tuesday, July 13, 2021 10:26 PM To: Wang, Pengfei ; llvm-...@lists.llvm.org Cc: Joseph Myers ; GCC Patches ; GNU C Library ; IA32 System V Application Binary Interface Subject: Re: [llv

RE: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

2021-07-14 Thread Wang, Pengfei via Gcc-patches
* Clang for AArch64 promotes each individual operation and rounds immediately afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two fadd operations. It's implemented in the LLVM backend where we can't see what was originally a single expression. Yes, but this is not con

RE: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

2021-07-14 Thread Wang, Pengfei via Gcc-patches
It seems Clang doesn't support -fexcess-precision=xxx: https://github.com/llvm/llvm-project/blob/main/clang/test/Driver/clang_f_opts.c#L403 Thanks Pengfei -Original Message- From: Hongtao Liu Sent: Thursday, July 15, 2021 2:35 PM To: Wang, Pengfei Cc: Craig Topper ; Jakub Je

Re: [PATCH] i386: Avoid fma_chain for -march=alderlake and sapphirerapids.

2022-12-14 Thread Hongyu Wang via Gcc-patches
If there is no objection, I'm going to backport the m_SAPPHIRERAPIDS and m_ALDERLAKE change to GCC 12. Uros Bizjak via Gcc-patches 于2022年12月7日周三 15:11写道: > > On Wed, Dec 7, 2022 at 7:36 AM Hongyu Wang wrote: > > > > For Alderlake there is similar issu

[PATCH] Fix avx512ne2ps2bf16 wrong code [PR 111127]

2023-08-24 Thread Hongyu Wang via Gcc-patches
Hi, For PR27, the wrong code was caused by wrong expander for maskz. correct the parameter order for avx512ne2ps2bf16_maskz expander Bootstrapped/regtested on x86-64-pc-linux-gnu{m32,}. OK for master and backport to GCC13? gcc/ChangeLog: PR target/27 * config/i386/sse.

[PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument.

2023-08-31 Thread Hongyu Wang via Gcc-patches
Like base_reg_class, INDEX_REG_CLASS also does not support backend insn. Add index_reg_class with insn argument for lra/reload usage. gcc/ChangeLog: * addresses.h (index_reg_class): New wrapper function like base_reg_class. * doc/tm.texi: Document INSN_INDEX_REG_CLASS.

[PATCH 00/13] [RFC] Support Intel APX EGPR

2023-08-31 Thread Hongyu Wang via Gcc-patches
approach for APX implementation for EGPR component. It may still have potential issues or bugs and requires futher optimization. Any comments are very appreciated. [1]. https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. Hongyu Wang (2

[PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling In inline asm, we do not know if the insn can use EGPR, so disable EGPR usage by default from mapping the common reg/mem constraint to non-EGPR constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage for inline asm. gcc/ChangeLog: * config/i386/i386.cc

[PATCH 03/13] [APX_EGPR] Initial support for APX_F

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling Add -mapx-features= enumeration to separate subfeatures of APX_F. -mapxf is treated same as previous ISA flag, while it sets -mapx-features=apx_all that enables all subfeatures. gcc/ChangeLog: * common/config/i386/cpuinfo.h (XSTATE_APX_F): New macro. (XCR_APX

[PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns

2023-08-31 Thread Hongyu Wang via Gcc-patches
For vector move insns like vmovdqa/vmovdqu, their evex counterparts requrire explicit suffix 64/32/16/8. The usage of these instruction are prohibited under AVX10_1 or AVX512F, so for AVX2+APX_F we select vmovaps/vmovups for vector load/store insns that contains EGPR. gcc/ChangeLog: * con

[PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling Current reload infrastructure does not support selective base_reg_class for backend insn. Add insn argument to base_reg_class for lra/reload usage. gcc/ChangeLog: * addresses.h (base_reg_class): Add insn argument. Pass to MODE_CODE_BASE_REG_CLASS. (r

[PATCH 05/13] [APX EGPR] Add register and memory constraints that disallow EGPR

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling For APX, as we extended the GENERAL_REG_CLASS, new constraints are needed to restrict insns that cannot adopt EGPR either in its reg or memory operands. gcc/ChangeLog: * config/i386/constraints.md (h): New register constraint for GENERAL_GPR16. (Bt):

[PATCH 07/13] [APX EGPR] Add backend hook for base_reg_class/index_reg_class.

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling Add backend helper functions to verify if a rtx_insn can adopt EGPR to its base/index reg of memory operand. The verification rule goes like 1. For asm insn, enable/disable EGPR by ix86_apx_inline_asm_use_gpr32. 2. Disable EGPR for unrecognized insn. 3. If which_alternat

[PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling Disable EGPR usage for below legacy insns in opcode map2/3 that have vex but no evex counterpart. insn list: 1. phminposuw/vphminposuw 2. ptest/vptest 3. roundps/vroundps, roundpd/vroundpd, roundss/vroundss, roundsd/vroundsd 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm 5.

[PATCH 04/13] [APX EGPR] Add 16 new integer general purpose registers

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling Extend GENERAL_REGS with extra r16-r31 registers like REX registers, named as REX2 registers. They will only be enabled under TARGET_APX_EGPR. gcc/ChangeLog: * config/i386/i386-protos.h (x86_extended_rex2reg_mentioned_p): New function prototype. * con

[PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5)

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling These legacy insn in opcode map0/1 only support GPR16, and do not have vex/evex counterpart, directly adjust constraints and add gpr32 attr to patterns. insn list: 1. xsave/xsave64, xrstor/xrstor64 2. xsaves/xsaves64, xrstors/xrstors64 3. xsavec/xsavec64 4. xsaveopt/xsaveopt6

[PATCH 10/13] [APX EGPR] Handle legacy insns that only support GPR16 (2/5)

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling These legacy insns in opcode map2/3 have vex but no evex counterpart, disable EGPR for them by adjusting alternatives and attr_gpr32. insn list: 1. phaddw/vphaddw, phaddd/vphaddd, phaddsw/vphaddsw 2. phsubw/vphsubw, phsubd/vphsubd, phsubsw/vphsubsw 3. psignb/vpsginb, psignw/v

[PATCH 12/13] [APX_EGPR] Handle legacy insns that only support GPR16 (4/5)

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling The APX enabled hardware should also be AVX10 enabled, thus for map2/3 insns with evex counterpart, we assume auto promotion to EGPR under APX_F if the insn uses GPR32. So for below insns, we disabled EGPR usage for their sse mnenomics, while allowing egpr generation of their

[PATCH 13/13] [APX EGPR] Handle vex insns that only support GPR16 (5/5)

2023-08-31 Thread Hongyu Wang via Gcc-patches
From: Kong Lingling These vex insn may have legacy counterpart that could support EGPR, but they do not have evex counterpart. Split out its vex part from patterns and set the vex part to non-EGPR supported by adjusting constraints and attr_gpr32. insn list: 1. vmovmskpd/vmovmskps 2. vpmovmskb 3

Re: [PATCH 00/13] [RFC] Support Intel APX EGPR

2023-09-01 Thread Hongyu Wang via Gcc-patches
Richard Biener via Gcc-patches 于2023年8月31日周四 17:21写道: > > On Thu, Aug 31, 2023 at 10:22 AM Hongyu Wang via Gcc-patches > wrote: > > > > Intel Advanced performance extension (APX) has been released in [1]. > > It contains several extensions such as extended 16 general p

Re: [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)

2023-09-01 Thread Hongyu Wang via Gcc-patches
Richard Biener via Gcc-patches 于2023年8月31日周四 17:31写道: > > On Thu, Aug 31, 2023 at 11:26 AM Richard Biener > wrote: > > > > On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches > > wrote: > > > > > > From: Kong Lingling > > > >

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-09-01 Thread Hongyu Wang via Gcc-patches
Jakub Jelinek via Gcc-patches 于2023年8月31日周四 17:18写道: > > On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote: > > From: Kong Lingling > > > > In inline asm, we do not know if the insn can use EGPR, so disable EGPR > > usage by default f

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-09-01 Thread Hongyu Wang via Gcc-patches
Uros Bizjak via Gcc-patches 于2023年8月31日周四 18:01写道: > > On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches > wrote: > > > > On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote: > > > From: Kong Lingling > > > > > &g

Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-01 Thread Hongyu Wang via Gcc-patches
Uros Bizjak via Gcc-patches 于2023年8月31日周四 18:16写道: > > On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang wrote: > > > > From: Kong Lingling > > > > Current reload infrastructure does not support selective base_reg_class > > for backend insn. Add insn argument

Re: [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns

2023-09-01 Thread Hongyu Wang via Gcc-patches
Jakub Jelinek via Gcc-patches 于2023年8月31日周四 17:44写道: > > On Thu, Aug 31, 2023 at 04:20:19PM +0800, Hongyu Wang via Gcc-patches wrote: > > For vector move insns like vmovdqa/vmovdqu, their evex counterparts > > requrire explicit suffix 64/32/16/8. The usage of these instruction

Re: [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns

2023-09-01 Thread Hongyu Wang via Gcc-patches
Jakub Jelinek 于2023年9月1日周五 17:20写道: > > On Fri, Sep 01, 2023 at 05:07:53PM +0800, Hongyu Wang wrote: > > Jakub Jelinek via Gcc-patches 于2023年8月31日周四 > > 17:44写道: > > > > > > On Thu, Aug 31, 2023 at 04:20:19PM +0800, Hongyu Wang via Gcc-patches > > &

Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-09 Thread Hongyu Wang via Gcc-patches
Vladimir Makarov via Gcc-patches 于2023年9月9日周六 01:04写道: > > > On 8/31/23 04:20, Hongyu Wang wrote: > > @@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression > > (@code{MEM} for the top level > > of an address, @code{ADDRESS} for something th

Re: [PATCH] Enable GCC support for AMX

2020-08-31 Thread Hongyu Wang via Gcc-patches
PING^3 Hongyu Wang 于2020年8月4日周二 下午11:40写道: > > Kirill Yukhin 于2020年8月4日周二 下午10:47写道: > > > > Hello, > > > > On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote: > > > Hi: > > > > > > This patch is about to support Intel Advanced M

Re: [PATCH] Enable GCC support for AMX

2020-09-11 Thread Hongyu Wang via Gcc-patches
} */ > > +/* { dg-require-effective-target amx_bf16 } */ > > +#include"amxbf16-asmintel-1.c" > > I didn't get it. We ususally use second tescase to actually execute > it and (well, a little) verify that semantics is ok. E.g. that > operands order is correct. Could

Re: [PATCH] Enable GCC support for AMX

2020-07-16 Thread Hongyu Wang via Gcc-patches
Update for SAPPHIRERAPIDS and PING Hongyu Wang 于2020年7月7日周二 上午11:24写道: > > Hi Kirill, could you help review this patch? > > Hongyu Wang 于2020年7月6日周一 上午9:58写道: > > > > Hi: > > > > This patch is about to support Intel Advanced Matrix Extensions (AMX) > >

Re: [PATCH] Enable GCC support for AMX

2020-07-23 Thread Hongyu Wang via Gcc-patches
PING^2 Hongyu Wang 于2020年7月17日周五 下午1:40写道: > > Update for SAPPHIRERAPIDS and PING > > Hongyu Wang 于2020年7月7日周二 上午11:24写道: > > > > > Hi Kirill, could you help review this patch? > > > > Hongyu Wang 于2020年7月6日周一 上午9:58写道: > > > > > > Hi:

Re: [PATCH] Enable GCC support for AMX

2020-08-04 Thread Hongyu Wang via Gcc-patches
PING^3 Hongyu Wang 于2020年7月24日周五 下午1:41写道: > > PING^2 > > Hongyu Wang 于2020年7月17日周五 下午1:40写道: > > > > Update for SAPPHIRERAPIDS and PING > > > > Hongyu Wang 于2020年7月7日周二 上午11:24写道: > > > > > > > > Hi Kirill, could you help revie

<    5   6   7   8   9   10   11   >