[PATCH 3/4] [PATCH 3/4] x86: Properly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS

2021-09-15 Thread lili.cui--- via Gcc-patches
From: "H.J. Lu" Check TARGET_USE_VECTOR_FP_CONVERTS or TARGET_USE_VECTOR_CONVERTS when handling avx_partial_xmm_update attribute. Don't convert AVX partial XMM register update if vector packed SSE conversion should be used. gcc/ PR target/101900 * config/i386/i386-features.c (r

[PATCH 2/4] [PATCH 2/4] x86: Update memcpy/memset inline strategies for -mtune=tremont

2021-09-15 Thread lili.cui--- via Gcc-patches
From: "H.J. Lu" Simply memcpy and memset inline strategies to avoid branches for -mtune=tremont: 1. Create Tremont cost model from generic cost model. 2. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector load and store for up to 16 * 16 (256) bytes when the data size is fi

[PATCH 4/4] [PATCH 4/4] x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY

2021-09-15 Thread lili.cui--- via Gcc-patches
From: "H.J. Lu" 1. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY in SSE FP to FP splitters. 2. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY in SSE INT to FP splitters. 3. Also check TARGET_SSE_PARTIAL_REG

[PATCH 1/4] [PATCH 1/4] x86: Update -mtune=tremont

2021-09-15 Thread lili.cui--- via Gcc-patches
From: "H.J. Lu" Initial -mtune=tremont update 1. Use Haswell scheduling model. 2. Assume that stack engine allows to execute push&pop instructions in parall. 3. Prepare for scheduling pass as -mtune=generic. 4. Use the same issue rate as -mtune=generic. 5. Enable partial_reg_dependency. 6. Disab

[PATCH 0/4] Update mtune=tremont

2021-09-15 Thread lili.cui--- via Gcc-patches
From: "Cui,Lili" Hi, I have four patches for tremont tuning, With all patches applied, performance impacts on SPEC CPU 2017 are: 500.perlbench_r 1.81% 502.gcc_r 0.57% 505.mcf_r 1.16% 520.omnetpp_r 0.00% 523.xalancbmk_r 0.