Hi Richi,
on 2021/5/10 下午9:55, Richard Biener wrote:
> On Sat, May 8, 2021 at 10:05 AM Kewen.Lin wrote:
>>
>> Hi Richi,
>>
>> Thanks for the comments!
>>
>> on 2021/5/7 下午5:43, Richard Biener wrote:
>>> On Fri, May 7, 2021 at 5:30 AM Ke
Hi Richard,
on 2021/5/10 下午10:08, Richard Sandiford wrote:
> "Kewen.Lin via Gcc-patches" writes:
>> on 2021/5/7 下午5:43, Richard Biener wrote:
>>> On Fri, May 7, 2021 at 5:30 AM Kewen.Lin via Gcc-patches
>>> wrote:
>>>>
>>>> Hi,
Hi Segher,
on 2021/5/11 上午4:12, Segher Boessenkool wrote:
> Hi!
>
> On Sat, May 08, 2021 at 04:12:18PM +0800, Kewen.Lin wrote:
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -5234,6 +5234,8 @@ typedef struct _rs6000_cost_data
>>
Hi Richi,
> OTOH we already pass scalar_stmt to individual add_stmt_cost,
> so not sure whether the context really matters. That said,
> the density test looks "interesting" ... the intent was that finish_cost
> might look at gathered data from add_stmt, not that it looks at
>
Hi Richi,
Thanks for the review!
on 2021/5/11 下午9:26, Richard Biener wrote:
> On Fri, 7 May 2021, Kewen.Lin wrote:
>
>> Hi,
>>
>> This patch is to teach forwprop to optimize some cases where the
>> permutated operands of vector permutation are from two same typ
Hi!
>>> But in the end the vector code shouldn't end up worse than the
>>> scalar code with respect to IVs - the cases where it would should
>>> be already costed. So I wonder if you have specific examples
>>> where things go worse enough for the heuristic to trigger?
>>>
>>
>> One typical case t
on 2021/5/17 下午4:55, Richard Biener wrote:
> On Thu, May 13, 2021 at 9:04 AM Kewen.Lin wrote:
>>
>> Hi!
>>
>>>>> But in the end the vector code shouldn't end up worse than the
>>>>> scalar code with respect to IVs - the cases where it would
Hi,
This patch is to replace the current hardcoded weight factor 50
for those statements in an inner loop relative to the loop being
vectorized with a specific parameter vect-inner-loop-weight-factor.
The motivation behind this change is: if targets want to have one
unique function to gather some
on 2022/12/9 06:04, Michael Meissner wrote:
> On Wed, Dec 07, 2022 at 03:55:41PM +0800, Kewen.Lin wrote:
>> Hi Mike,
>>
>> on 2022/12/7 14:44, Michael Meissner wrote:
>>> On Tue, Dec 06, 2022 at 05:36:54PM +0800, Kewen.Lin wrote:
>>>> Hi Mike,
>>>
on 2022/12/12 11:23, HAO CHEN GUI wrote:
> Hi Kewen,
>
> 在 2022/12/8 16:47, Kewen.Lin 写道:
>> This documentation update reminds me of that the current prototype of
>> __ieee128
>> variant can be:
>>
>> unsigned int scalar_extract_exp (__ieee128 source);
&g
on 2022/12/13 14:14, Michael Meissner wrote:
> On Mon, Dec 12, 2022 at 06:20:14PM +0800, Kewen.Lin wrote:
>> Without or with patch #1, the below ICE in libgcc exists, the ICE should have
>> nothing to do with the special handling for building_libgcc in patch #1. I
>> think
on 2022/12/6 19:27, Kewen.Lin via Gcc-patches wrote:
> Hi Mike,
>
> Thanks for fixing this, some comments are inlined below.
>
> on 2022/11/2 10:42, Michael Meissner wrote:
>> This patch fixes the issue that GCC cannot build when the default long double
>> is IEEE 128
Hi Jeff,
on 2022/12/12 09:44, Jiufu Guo via Gcc-patches wrote:
> Hi,
>
> Compare with previous patch, this patch updates accoding to comments; fixes
> conflicts with trunk, and recheck bootstrap®test.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607333.html
>
> For a complicate 64bi
Hi Jakub,
Thanks for the comments!
on 2022/12/14 17:36, Jakub Jelinek wrote:
> On Wed, Dec 14, 2022 at 04:46:07PM +0800, Kewen.Lin via Gcc-patches wrote:
>> on 2022/12/6 19:27, Kewen.Lin via Gcc-patches wrote:
>>> Hi Mike,
>>>
>>> Thanks for fixing t
Hi Jeff,
on 2022/12/12 09:38, Jiufu Guo via Gcc-patches wrote:
> Hi,
>
> For constant C:
> If '(c & 0x8000ULL) == 0x8000ULL' or say:
> 32(1) || 16(x) || 1(1) || 15(x), using "li; xoris" would be ok.
>
> If '(c & 0x80008000ULL) == 0x8000ULL' or say:
> 32(0) ||
Hi,
As PR106736 shows, it's unexpected to use __vector_quad and
__vector_pair types without MMA support, it would cause ICE
when expanding the corresponding assignment. We can't guard
these built-in types registering under MMA support as Peter
pointed out in that PR, because the registering is gl
Hi,
Gentle ping this series:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html
BR,
Kewen
on 2022/11/24 17:15, Kewen Lin wrote:
> Hi,
>
> Following Segher's suggestion, this patch series is to rework
> function rs6000_emit_vector_compare for vector float and int
> in multiple
Hi,
Gentle ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603350.html
BR,
Kewen
> on 2022/10/12 16:12, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> PR106680 shows that -m32 -mpowerpc64 is different from
>> -mpowerpc64 -m32, this is determined
Hi,
Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607526.html
BR,
Kewen
on 2022/11/30 16:30, Kewen.Lin via Gcc-patches wrote:
> Hi,
>
> As PR104024 shows, the option -mpower10-fusion isn't guarded by
> -mcpu=power10, it causes compiler to fuse for som
Hi,
Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607527.html
BR,
Kewen
on 2022/11/30 16:30, Kewen.Lin via Gcc-patches wrote:
> Hi,
>
> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
> if fun->decl is not null but no cgraph node is availab
>> I bet the above workaround in generic code was added for a reason, it would
>> surprise me if _Float128 worked at all without that hack.
>
> OK, I'll have a look at those nan failures soon.
By investigating the exposed NaN failures, I found it's due to that it wants
to convert _Float128 type c
on 2022/12/14 18:33, Jakub Jelinek wrote:
> On Wed, Dec 14, 2022 at 06:11:26PM +0800, Kewen.Lin wrote:
>>> The hacks with different precisions of powerpc 128-bit floating types are
>>> very unfortunate, it is I assume because the middle-end asserted that scalar
>>
Hi Honza,
Thanks for the comments.
on 2022/12/14 21:22, Jan Hubicka wrote:
>>> PR middle-end/105818
>>>
>>> gcc/ChangeLog:
>>>
>>> * predict.cc (optimize_function_for_size_p): Further check
>>> optimize_size of fun->decl when it is valid but no cgraph node.
>>>
>>> gcc/testsuite/Chang
Hi Segher,
Thanks for the review comments!
on 2022/12/15 06:29, Segher Boessenkool wrote:
> On Wed, Nov 30, 2022 at 04:30:13PM +0800, Kewen.Lin wrote:
>> As PR104024 shows, the option -mpower10-fusion isn't guarded by
>> -mcpu=power10, it causes compiler to fuse for some patt
Hi,
In function fold_convert_const_real_from_real, when the modes of
two types involved in fp conversion are the same, we can simply
take it as copy, rebuild with the exactly same TREE_REAL_CST and
the target type. It is more efficient and helps to avoid possible
unexpected signalling bit clearin
Hi Richi,
Thanks for the comments!
on 2022/12/19 16:49, Richard Biener wrote:
> On Mon, Dec 19, 2022 at 9:12 AM Kewen.Lin wrote:
>>
>> Hi,
>>
>> In function fold_convert_const_real_from_real, when the modes of
>> two types involved in fp conversion are the same,
on 2022/12/20 20:14, Jakub Jelinek wrote:
> On Mon, Dec 19, 2022 at 04:11:59PM +0800, Kewen.Lin wrote:
>> In function fold_convert_const_real_from_real, when the modes of
>> two types involved in fp conversion are the same, we can simply
>> take it as copy, rebuild
on 2022/12/21 02:56, Segher Boessenkool wrote:
> On Wed, Dec 14, 2022 at 07:21:20PM +0800, Kewen.Lin wrote:
>> I'm going to push this next week if no objections.
>
> Please do?
>
Thanks! Committed in r13-4814-g282462b39584ae.
BR,
Kewen
Hi,
The hunk for setting flag OPTION_MASK_P10_FUSION locates wrongly
between the if and else if block for OPTION_MASK_MMA. This is
to fix this oversight accordingly.
Bootstrapped and regtested on powerpc64-linux-gnu P8 and
powerpc64le-linux-gnu P9 and P10.
IMO this is obvious, already committe
Hi Segher,
on 2022/12/20 21:19, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Dec 19, 2022 at 02:13:49PM +0800, Kewen.Lin wrote:
>> on 2022/12/15 06:29, Segher Boessenkool wrote:
>>> On Wed, Nov 30, 2022 at 04:30:13PM +0800, Kewen.Lin wrote:
>>>> --- a/gcc/confi
Hi,
This a different attempt from Mike's approach[1][2] to fix the
issue in PR107299. With option -mabi=ieeelongdouble specified,
type long double (and __float128) and _Float128 have the same
mode TFmode, but they have different type precisions, it causes
the assertion to fail in function fold_us
Hi Segher,
on 2022/12/22 05:24, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Dec 21, 2022 at 05:02:17PM +0800, Kewen.Lin wrote:
>> This a different attempt from Mike's approach[1][2] to fix the
>> issue in PR107299.
>
> Ke Wen, Mike: so iiuc with this patch appli
Hi Joseph,
on 2022/12/22 05:40, Joseph Myers wrote:
> On Wed, 21 Dec 2022, Segher Boessenkool wrote:
>
>>> --- a/gcc/tree.cc
>>> +++ b/gcc/tree.cc
>>> @@ -9442,15 +9442,6 @@ build_common_tree_nodes (bool signed_char)
>>>if (!targetm.floatn_mode (n, extended).exists (&mode))
>>> contin
Hi Segher,
on 2022/12/24 04:26, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Oct 12, 2022 at 04:12:21PM +0800, Kewen.Lin wrote:
>> PR106680 shows that -m32 -mpowerpc64 is different from
>> -mpowerpc64 -m32, this is determined by the way how we
>> handle option powerp
Hi,
As Honza pointed out in [1], the current uses of function
optimize_function_for_speed_p in rs6000_option_override_internal
are too early, since the query results from the functions
optimize_function_for_{speed,size}_p could be changed later due
to profile feedback and some function attributes
Hi,
We noticed this issue when Segher reviewed the patch for
PR104024. When there is no explicit setting for option
-mpower10-fusion, we enable OPTION_MASK_P10_FUSION for
TARGET_POWER10. But it's not right, it should honour
tuning setting instead.
This patch is to fix it accordingly, it's boots
Hi Segher,
Thanks for the comments.
on 2023/1/4 18:46, Segher Boessenkool wrote:
> On Wed, Jan 04, 2023 at 05:20:14PM +0800, Kewen.Lin wrote:
>> As Honza pointed out in [1], the current uses of function
>> optimize_function_for_speed_p in rs6000_option_override_internal
>>
on 2023/1/4 22:02, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Jan 04, 2023 at 08:15:03PM +0800, Kewen.Lin wrote:
>> on 2023/1/4 18:46, Segher Boessenkool wrote:
>>>> @@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx
>>>> tlsarg, rtx co
Hi,
As PR108272 shows, there are some invalid uses of MMA opaque
types in inline asm statements. This patch is to teach the
function rs6000_opaque_type_invalid_use_p for inline asm,
check and error any invalid use of MMA opaque types in input
and output operands.
Bootstrapped and regtested on po
Hi,
Before r13-4894, if 64 bit is explicitly specified, option
powerpc64 is explicitly enabled too; while if 64 bit is
implicitly enabled and there is no explicit setting for
option powerpc64, option powerpc64 is eventually enabled
or not would rely on the default value of the used cpu.
It's initi
Hi Pat,
on 2023/1/6 03:30, Pat Haugen wrote:
> On 1/4/23 3:20 AM, Kewen.Lin via Gcc-patches wrote:
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 88c865b6b4b..6fa084c0807 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/
on 2023/1/6 17:28, Kewen.Lin via Gcc-patches wrote:
> Hi Pat,
>
> on 2023/1/6 03:30, Pat Haugen wrote:
>> On 1/4/23 3:20 AM, Kewen.Lin via Gcc-patches wrote:
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 88c865b6b4b..6fa084
Hi,
When testing one patch which adds a fortran test case into
test bucket powerpc/ppc-fortran/, I found one unexpected
failure on a non-PowerPC target. It's due to that
ppc-fortran.exp does not exit early if the testing target
isn't a PowerPC target. This patch is to make it exit
immediately if
Hi,
As PR108240 shows, some options like -mmodulo can enable some
flags implicitly including OPTION_MASK_VSX. But the enabled
flag can conflict with some existing setting like soft float,
it would result in some unexpected cases and consequent ICE.
Actually there are already some checkings for VS
on 2023/1/6 17:26, Kewen.Lin via Gcc-patches wrote:
> Hi,
>
> As PR108272 shows, there are some invalid uses of MMA opaque
> types in inline asm statements. This patch is to teach the
> function rs6000_opaque_type_invalid_use_p for inline asm,
> check and error any invalid
Hi,
PR108348 shows one special case that MMA opaque types are
used in function arguments and treated as pass by reference,
it results in one copying from argument to a temp variable,
since this copying happens before rs6000_function_arg check,
it can cause ICE without MMA support then. This patc
Hi,
As Honza pointed out in [1], the current uses of function
optimize_function_for_speed_p in rs6000_option_override_internal
are too early, since the query results from the functions
optimize_function_for_{speed,size}_p could be changed later due
to profile feedback and some function attributes
Hi Segher,
Thanks for the review comments!
on 2023/1/16 16:49, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Jan 16, 2023 at 04:33:36PM +0800, Kewen.Lin wrote:
>> PR108348 shows one special case that MMA opaque types are
>> used in function arguments and treated as pas
Hi,
Now we will check optimize_function_for_speed_p (cfun) for
TARGET_SAVE_TOC_INDIRECT if it's implicitly enabled. But
the effect of -msave-toc-indirect is actually to save the
TOC in the prologue for indirect calls rather than inline,
it's also good for optimize_function_for_size? So this
patc
Hi Segher!
on 2023/1/16 18:40, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Jan 16, 2023 at 05:20:56PM +0800, Kewen.Lin wrote:
>> on 2023/1/16 16:49, Segher Boessenkool wrote:
>>>> +/* { dg-require-effective-target powerpc_p9modulo_ok } */
>>>
>>>
Hi,
As Andrew pointed out in PR108396, there is one typo in
rs6000-overload.def on built-in function vec_vsubcuq:
[VEC_VSUBCUQ, vec_vsubcuqP, __builtin_vec_vsubcuq]
"vec_vsubcuqP" should be "vec_vsubcuq", this typo caused
us to define vec_vsubcuqP in rs6000-vecdefines.h instead
of vec_vsubcuq,
Hi Segher,
on 2023/1/16 23:24, Segher Boessenkool wrote:
> On Mon, Jan 16, 2023 at 09:05:38PM +0800, Kewen.Lin wrote:
>>> The *_ok things should only be used for features that can be disabled
>>> during configuration, or features that we *want* users to be able to
>>&g
Hi,
As Segher suggested in [1], this patch is to refactor the
script genfusion.pl for generating fusion.md.
It mainly consists of:
1) Add main subroutine, which calls several backbone
subroutines, hope it can show the skeleton clearly.
2) Encapsulate copyright and top comments emission t
Hi,
To keep the previous refactoring patch not need to
re-generate fusion.md and make the review easier,
I didn't merge this patch into the previous one.
But I think this one can help to make the subroutine
gen_logical_addsubf_scalar more clear, by separating
logical-logical and add-logical handl
Hi Mike,
Thanks for the comments!
on 2023/1/18 04:57, Michael Meissner wrote:
> On Mon, Jan 16, 2023 at 05:39:04PM +0800, Kewen.Lin wrote:
>> Hi,
>>
>> Now we will check optimize_function_for_speed_p (cfun) for
>> TARGET_SAVE_TOC_INDIRECT if it's implicitly enabl
8/24 09:24, Xionghu Luo wrote:
> 主题:
> Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the
> UNSPECS [PR106069]
> From:
> Xionghu Luo
> 日期:
> 2022/8/24, 09:24
>
> 收件人:
> "Kewen.Lin" , Segher Boessenkool
>
> 抄送:
> Xiongh
Hi,
Like r14-3317 which moves the handlings on memory access
type VMAT_GATHER_SCATTER in vectorizable_load final loop
nest, this one is to deal with vectorizable_store side.
Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
Is it ok for trunk?
BR
Hi Richi,
on 2023/8/22 20:17, Richard Biener wrote:
> On Tue, Aug 22, 2023 at 10:44 AM Kewen.Lin wrote:
>>
>> Hi,
>>
>> Now we use DR_GROUP_STORE_COUNT to record how many stores
>> in a group have been transformed and only do the actual
>> transform when
on 2023/8/22 20:32, Richard Biener wrote:
> On Tue, Aug 22, 2023 at 10:45 AM Kewen.Lin wrote:
>>
>> Hi,
>>
>> To avoid some duplicates in some follow-up patches on
>> function vectorizable_store, this patch is to adjust some
>> existing vec with
Hi Peter,
on 2023/8/24 10:07, Peter Bergner wrote:
> On 8/21/23 8:51 PM, Kewen.Lin wrote:
>>> The following patch has been bootstrapped and regtested on powerpc64-linux.
>>
>> I think we should test this on powerpc64le-linux P8 or P9 (no P10) as well.
>
> That'
on 2023/8/25 11:20, Peter Bergner wrote:
> On 8/24/23 12:56 AM, Kewen.Lin wrote:
>> By looking into the uses of function rs6000_pcrel_p, I think we can
>> just replace it with TARGET_PCREL. Previously we don't require PCREL
>> unset for any unsupported target/OS, so w
on 2023/8/26 06:04, Peter Bergner wrote:
> On 8/25/23 6:20 AM, Kewen.Lin wrote:
>> Assuming the current PCREL_SUPPORTED_BY_OS unchanged, when
>> PCREL_SUPPORTED_BY_OS is true, all its required conditions are
>> satisfied, it should be safe. while PCREL_SUPPORTED_BY_OS is
&
Hi Carl,
on 2023/8/25 03:53, Carl Love wrote:
> GCC maintainers:
>
> Version 3, fixed the built-in instance names. Missed removing the "n"
> the name. Added the tighter constraints on the predicates for the
> define_insn. Updated the wording for the built-ins in the
> documentation file. Chan
Hi Haochen,
on 2023/8/25 14:44, HAO CHEN GUI wrote:
> Hi,
> This patch enables SImode in FP register on P7. Instruction "fctiw"
> stores its integer output in an FP register. So SImode in FP register
> needs be enabled on P7 if we want support "fctiw" on P7.
>
It sounds reasonable to support S
Hi Haochen,
on 2023/8/25 14:44, HAO CHEN GUI wrote:
> Hi,
> This patch implements 32bit inline lrint by "fctiw". It depends on
> the patch1 to do SImode move from FP register on P7.
>
> Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
>
> Thanks
> Gui Haochen
>
> Ch
Hi Carl,
on 2023/8/29 04:00, Carl Love wrote:
>
> GCC maintainers:
>
> Version 4, additional define_insn name fix. Change Log fix for the
> UNSPEC_DQUAN. Retested patch on Power 10 LE.
>
> Version 3, fixed the built-in instance names. Missed removing the "n"
> the name. Added the tighter co
Hi Haochen,
on 2023/8/29 10:50, HAO CHEN GUI wrote:
> Hi,
> This patch adds "TARGET_64BIT" check when calling vector load/store
> with length expand in expand_block_move. It matches the expand condition
> of "lxvl" and "stxvl" defined in vsx.md.
>
> This patch fixes the ICE occurred with the
on 2023/8/31 13:47, HAO CHEN GUI wrote:
> Kewen,
> I refined the patch according to your comments and it passed bootstrap
> and regression test.
>
> I committed it as
> https://gcc.gnu.org/g:946b8967b905257ac9f140225db744c9a6ab91be
Thanks! We want this to be backported, so it's also ok for b
Hi Peter,
on 2023/8/31 06:42, Peter Bergner wrote:
> Commit r14-3258-ge7a36e4715c716 increased the amount of folding we perform,
> leading to better code. Update the expected instruction counts to match the
> the number of associated vec_* built-in calls.
>
> Tested on powerpc64le-linux with no
Hi Haochen,
on 2023/9/4 13:33, HAO CHEN GUI wrote:
> Hi,
> This patch enables SImode in FP registers on P7. Instruction "fctiw"
> stores its integer output in an FP register. So SImode in FP register
> needs be enabled on P7 if we want support "fctiw" on P7.
>
> The test case is in the second
Hi Haochen,
on 2023/9/4 13:33, HAO CHEN GUI wrote:
> Hi,
> This patch implements 32bit inline lrint by "fctiw". It depends on
> the patch1 to do SImode move from FP registers on P7.
>
> Compared to last version, the main change is to add tests for "lrintf"
> and adjust the count of correspond
Hi Ajit,
on 2023/8/31 18:44, Ajit Agarwal via Gcc-patches wrote:
>
> This patch removes zero extension from vctzlsbb as it already zero extends.
> Bootstrapped and regtested on powerpc64-linux-gnu.
>
> Thanks & Regards
> Ajit
>
> rs6000: unnecessary clear after vctzlsbb in vec_first_match_or_eo
Hi,
on 2023/9/13 00:39, Ajit Agarwal wrote:
> This patch removes zero extension from vctzlsbb as it already zero extends.
> Bootstrapped and regtested on powerpc64-linux-gnu.
>
> Thanks & Regards
> Ajit
>
> rs6000: unnecessary clear after vctzlsbb in vec_first_match_or_eos_index
>
> For rs6000
Hi,
As PR111380 (and the discussion in related PRs) shows, for
now how function rs6000_can_inline_p treats the callee
without any target option node is wrong. It considers it's
always safe to inline this kind of callee, but actually its
target flags are from the command line options
(target_optio
Hi,
PR111366 exposes one thing that can be improved in function
rs6000_update_ipa_fn_target_info is to skip the given empty
inline asm string, since it's impossible to adopt any
hardware features (so far HTM).
Since this rs6000_update_ipa_fn_target_info related approach
exists in GCC12 and later,
Hi Haochen,
on 2023/9/14 16:35, HAO CHEN GUI wrote:
> Hi Kewen,
>
> 在 2023/9/12 17:33, Kewen.Lin 写道:
>> Ok, at least regression testing doesn't expose any needs to do disparaging
>> for this. Could you also test this patch with SPEC2017 for P7 and P8
>> separatel
Hi Bin,
Thanks for the review!!
on 2020/8/8 下午4:01, Bin.Cheng wrote:
> Hi Kewen,
> Sorry for the late reply.
> The patch's most important change is below cost computation:
>
>> @@ -5890,6 +5973,10 @@ determine_iv_cost (struct ivopts_data *data, struct
>> iv_cand *cand)
>> cost_step = add_co
Hi Segher,
on 2020/8/7 下午10:42, Segher Boessenkool wrote:
> Hi!
>
> On Fri, Aug 07, 2020 at 10:44:10AM +0800, Kewen.Lin wrote:
>>> I think this makes a lot of sense.
>>>
>>>> btw, not sure whether it's a good idea to move target_option_override_hook
&g
Hi Bin,
on 2020/8/10 下午8:38, Bin.Cheng wrote:
> On Mon, Aug 10, 2020 at 12:27 PM Kewen.Lin wrote:
>>
>> Hi Bin,
>>
>> Thanks for the review!!
>>
>> on 2020/8/8 下午4:01, Bin.Cheng wrote:
>>> Hi Kewen,
>>> Sorry for the late reply.
>&g
Hi,
As the PR comments show, the case gcc.dg/gomp/pr82374.c fails
on Power7 since gcc8. But it passes from gcc10. By looking
into the difference, it's due to that gcc10 sets -fno-common
as default, which makes vectorizer force the alignment and
be able to use aligned vector load/store on those t
Hi Richard,
Thanks for the comments!
on 2020/8/13 上午12:10, Richard Sandiford wrote:
> "Kewen.Lin" writes:
>> Hi Segher,
>>
>> on 2020/8/7 锟斤拷锟斤拷10:42, Segher Boessenkool wrote:
>>> Hi!
>>>
>>> On Fri, Aug 07, 2020 at 10:44:10AM +0800
Hi Bin,
> I see, it's similar to the auto-increment case where cost should be
> recorded only once. So this is okay given 1) fine predicting
> rtl-unroll is likely impossible here; 2) the patch has very limited
> impact.
>
Really appreciate your time and patience!
I extended the previous versio
Hi Segher,
on 2020/8/15 上午6:01, Segher Boessenkool wrote:
> Hi!
>
> On Fri, Aug 14, 2020 at 01:42:24PM +0800, Kewen.Lin wrote:
>>> I think personally I'd prefer an option (3): call
>>> target_option_override_hook directly in decode_options,
>>> if help_op
Hi Richard,
>> Yeah, the comments were confusing, its intent is to check which targets
>> support partial vectors and which usage to be used.
>>
>> How about to update them like:
>>
>> "Return true if loops using partial vectors are supported and usage kind is
>> 1/2".
>
> I wasn't really comment
Hi Bin,
>>
>> For one particular case like:
>>
>> for (i = 0; i < SIZE; i++)
>> y[i] = a * x[i] + z[i];
>>
>> we will mark reg_offset_p for IV candidates on x as below:
>>- (unsigned long) (x_18(D) + 8)// only mark this before.
>>- x_18(D) + 8
>>- (unsigne
Hi,
This patch is to backport the fix for PR92923 and its sequent fix for
PR93136 to GCC-9 branch. We found the builtin functions needlessly
using VIEW_CONVERT_EXPRs on their operands can probably cause
remarkable performance issue especailly when they are in the hotspot.
One typical case is
h
Hi Richard,
>
>> +# Return true if loops using partial vectors are supported but only for
>> loops
>> +# whose need to iterate can be removed, that is, value of
>> +# param_vect_partial_vector_usage is set to 1.
>
> For these comments, I think it would be good to use the sourcebuild.texi
> word
Hi,
I'd like to gentle ping this since IVOPTs part is already to land.
https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546698.html
BR,
Kewen
on 2020/5/28 下午8:19, Kewen.Lin via Gcc-patches wrote:
>
> gcc/ChangeLog
>
> 2020-MM-DD Kewen Lin
>
> * cfgloop.h (
Hi,
Power9 supports vector with length in bytes load/store, this patch
is to teach check_effective_target_vect_len_load_store to take it
and its laters as effective vector with length targets.
Also supplement the documents for has_arch_pwr*.
Bootstrapped/regtested on powerpc64le-linux-gnu P8.
I
Hi Will,
Thanks for the review!
on 2020/9/1 上午1:13, will schmidt wrote:
> On Mon, 2020-08-31 at 14:43 +0800, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> Power9 supports vector with length in bytes load/store, this patch
>> is to teach check_effective_target_vec
Hi Segher,
>> proc check_effective_target_vect_len_load_store { } {
>> -return 0
>> +return [expr { [check_effective_target_has_arch_pwr9] }]
>> }
>
> Why not just
>
> return check_effective_target_has_arch_pwr9;
>
> ? (Or lose at least two pairs of brackets if not all three :-) )
Hi,
This is a trivial patch to clean existing rs6000 test targets
p8 and p9+ with existing has_arch_pwr8 and has_arch_pwr9
target combination or only one of them. Not sure if it's a
good idea to tidy this, but send out for comments.
Bootstrapped/regtested on powerpc64le-linux-gnu P9.
Any commen
Hi Segher,
on 2020/9/1 上午3:41, Segher Boessenkool wrote:
> Hi!
>
> Just a note:
>
> On Tue, Aug 25, 2020 at 08:46:55PM +0800, Kewen.Lin wrote:
>> 1) Currently address_cost hook on rs6000 always return zero, but at least
>> from Power7, pre_inc/pre_dec kind instructio
Hi Bin,
>> 2) This case makes me think we should exclude ainc candidates in function
>> mark_reg_offset_candidates. The justification is that: ainc candidate
>> handles step update itself and when we calculate the cost for it against
>> its ainc_use, the cost_step has been reduced. When unrolling
Hi Bin,
I've updated the patch to punt ainc_use candidates as below:
> + /* Skip AINC candidate since it contains address update itself,
> +the replicated AINC computations when unrolling still have
> +updates, unlike reg_offset_p candidates ca
Hi Segher,
on 2020/9/2 下午6:25, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Sep 02, 2020 at 11:16:00AM +0800, Kewen.Lin wrote:
>> on 2020/9/1 上午3:41, Segher Boessenkool wrote:
>>> On Tue, Aug 25, 2020 at 08:46:55PM +0800, Kewen.Lin wrote:
>>>> 1) Currentl
Hi Segher,
>> Good question! I agree that they can execute in parallel, but it depends
>> on how we interprete the addressing cost, if it's for required execution
>> resource, I think it's off, since comparing with ld, the ldu has two iops
>> and extra ALU requirement.
>
> OTOH, if you do not us
Hi Andrea,
on 2020/9/4 下午8:11, Andrea Corallo wrote:
> Hi all,
>
> just a small patch removing a piece of unreachable code in
> 'vect_estimate_min_profitable_iters' given the condition
> (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) is always true as
> checked just above.
>
FWIW, I had the
Hi Segher,
on 2020/9/4 下午10:16, Segher Boessenkool wrote:
> Hi!
>
> On Fri, Sep 04, 2020 at 04:47:37PM +0800, Kewen.Lin wrote:
>>>> Apart from that, one P9 specific point is that the update form load isn't
>>>> preferred, the reason is that the instructio
Hi,
This patch is to make vector CTOR with char/short leverage direct
move instructions when they are available. With one constructed
test case, it can speed up 145% for char and 190% for short on P9.
Tested SPEC2017 x264_r at -Ofast on P9, it gets 1.61% speedup
(but based on unexpected SLP see
1201 - 1300 of 1615 matches
Mail list logo