On 2020/9/15 14:51, Richard Biener wrote:
>> I only see VAR_DECL and PARM_DECL, is there any function to check the tree
>> variable is global? I added DECL_REGISTER, but the RTL still expands to
>> stack:
>
> is_global_var () or alternatively !auto_var_in_fn_p (), I think doing
> IFN_SET onl
On 2020/9/14 17:47, Richard Biener wrote:
On Mon, Sep 14, 2020 at 10:05 AM luoxhu wrote:
Not sure whether this reflects the issues you discussed above.
I constructed below test cases and tested with and without this patch,
only if "a+c"(which means store only), the performance is getting ba
On 2020/9/10 18:08, Richard Biener wrote:
> On Wed, Sep 9, 2020 at 6:03 PM Segher Boessenkool
> wrote:
>>
>> On Wed, Sep 09, 2020 at 04:28:19PM +0200, Richard Biener wrote:
>>> On Wed, Sep 9, 2020 at 3:49 PM Segher Boessenkool
>>> wrote:
Hi!
On Tue, Sep 08, 2020 at 10:26:51
On 2020/9/8 16:26, Richard Biener wrote:
>> Seems not only pseudo, for example "v = vec_insert (i, v, n);"
>> the vector variable will be store to stack first, then [r112:DI] is a
>> memory here to be processed. So the patch loads it from stack(insn #10) to
>> temp vector register first, and st
Hi Richi,
On 2020/9/7 19:57, Richard Biener wrote:
> + if (TREE_CODE (to) == ARRAY_REF)
> + {
> + tree op0 = TREE_OPERAND (to, 0);
> + if (TREE_CODE (op0) == VIEW_CONVERT_EXPR
> + && expand_view_convert_to_vec_set (to, from, to_rtx))
> + {
> +
Hi,
On 2020/9/4 18:23, Segher Boessenkool wrote:
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 03b00738a5e..00c65311f76 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
/* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2)
On 2020/9/4 15:23, Richard Biener wrote:
> On Fri, Sep 4, 2020 at 9:19 AM Richard Biener
> wrote:
>>
>> On Fri, Sep 4, 2020 at 8:38 AM luoxhu wrote:
>>>
>>>
>>>
>>> On 2020/9/4 14:16, luoxhu via Gcc-patches wrote:
>>>>
On 2020/9/4 14:16, luoxhu via Gcc-patches wrote:
Hi,
Yes, I checked and found that both vec_set and vec_extract doesn't support
variable index for most targets, store_bit_field_1 and extract_bit_field_1
would only consider use optabs when index is integer value. Anyway, it
shouldn
Hi,
On 2020/9/3 18:29, Richard Biener wrote:
> On Thu, Sep 3, 2020 at 11:20 AM luoxhu wrote:
>>
>>
>>
>> On 2020/9/2 17:30, Richard Biener wrote:
so maybe bypass convert_vector_to_array_for_subscript for special
circumstance
like "i = v[n%4]" or "v[n&3]=i" to generate vec_extract
On 2020/9/2 17:30, Richard Biener wrote:
>> so maybe bypass convert_vector_to_array_for_subscript for special
>> circumstance
>> like "i = v[n%4]" or "v[n&3]=i" to generate vec_extract or vec_insert builtin
>> call a relative simpler method?
> I think you have it backward. You need to work wit
Hi,
On 2020/9/1 21:07, Richard Biener wrote:
> On Tue, Sep 1, 2020 at 10:11 AM luoxhu via Gcc-patches
> wrote:
>>
>> Hi,
>>
>> On 2020/9/1 01:04, Segher Boessenkool wrote:
>>> Hi!
>>>
>>> On Mon, Aug 31, 2020 at 04:06:47AM -0500, Xiong H
Hi,
On 2020/9/1 00:47, will schmidt wrote:
>> + tmode = TYPE_MODE (TREE_TYPE (arg0));
>> + mode1 = TYPE_MODE (TREE_TYPE (TREE_TYPE (arg0)));
>> + mode2 = TYPE_MODE ((TREE_TYPE (arg2)));
>> + gcc_assert (VECTOR_MODE_P (tmode));
>> +
>> + op0 = expand_expr (arg0, NULL_RTX, tmode, EXPAND_NORMAL)
Hi,
On 2020/9/1 01:04, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Aug 31, 2020 at 04:06:47AM -0500, Xiong Hu Luo wrote:
>> vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
>> to be insert, arg2 is the place to insert arg1 to arg0. This patch adds
>> __builtin_vec_insert_v
Hi,
On 2020/8/13 20:52, Jan Hubicka wrote:
>> Since there are no other callers outside of these specialized nodes, the
>> guessed profile count should be same equal? Perf tool shows that even
>> each specialized node is called only once, none of them take same time for
>> each call:
>>
>>40.6
Hi,
On 2020/8/13 01:53, Jan Hubicka wrote:
> Hello,
> with Martin we spent some time looking into exchange2 and my
> understanding of the problem is the following:
>
> There is the self recursive function digits_2 with the property that it
> has 10 nested loops and calls itself from the innermost
Hi Richard,
On 2020/8/3 22:01, Richard Sandiford wrote:
/* Try a wider mode if truncating the store mode to NEW_MODE
requires a real instruction. */
if (maybe_lt (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (store_mode))
@@ -1779,6 +1780,25 @@ find_shift_sequence (poly_int6
On 2020/8/3 22:01, Richard Sandiford wrote:
/* Try a wider mode if truncating the store mode to NEW_MODE
requires a real instruction. */
if (maybe_lt (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (store_mode))
@@ -1779,6 +1780,25 @@ find_shift_sequence (poly_int64 access_s
Thanks, the v5 update as comments:
1. Move const_rhs shift out of loop;
2. Iterate from int size for read_mode.
This patch could optimize(works for char/short/int/void*):
6: r119:TI=[r118:DI+0x10]
7: [r118:DI]=r119:TI
8: r121:DI=[r118:DI+0x8]
=>
6: r119:TI=[r118:DI+0x10]
16: r122:DI=r119:TI#
Gentle ping in case this mail is missed, Thanks :)
https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550602.html
Xionghu
On 2020/7/24 18:47, luoxhu via Gcc-patches wrote:
Hi Richard,
This is the updated version that could pass all regression test on
Power9-LE.
Just need another "may
Hi Richard,
This is the updated version that could pass all regression test on
Power9-LE.
Just need another "maybe_lt (GET_MODE_SIZE (new_mode), access_size)"
before generating shift for store_info->const_rhs to ensure correct
constant is generated, take testsuite/gfortran1/equiv_2.x for example
On 2020/7/23 04:30, Richard Sandiford wrote:
>
> I now realise the reason is that the starting mode is too wide.
> I think we should fix that by doing:
>
>FOR_EACH_MODE_IN_CLASS (new_mode_iter, MODE_INT)
> {
>…
>
> and then add:
>
>if (maybe_lt (GET_MODE_SIZE (new_mo
Hi,
On 2020/7/22 19:05, Richard Sandiford wrote:
> This wasn't really what I meant. Using subregs is fine, but I was
> thinking of:
>
>/* Also try a wider mode if the necessary punning is either not
>desirable or not possible. */
>if (!CONSTANT_P (store_info->rhs)
>
Hi,
On 2020/7/21 23:30, Richard Sandiford wrote:
> Xiong Hu Luo writes:>> @@ -1872,9 +1872,27 @@
> get_stored_val (store_info *store_info, machine_mode read_mode,
>> {
>> poly_int64 shift = gap * BITS_PER_UNIT;
>> poly_int64 access_size = GET_MODE_SIZE (read_mode) + gap;
>>
On 2020/7/20 23:31, Segher Boessenkool wrote:
On Mon, Jul 13, 2020 at 02:30:28PM +0800, luoxhu wrote:
For extracting high part element from DImode register like:
{%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
split it before reload with "and mask" to avoid generating shift right
32 bit th
Hi David,
On 2020/7/14 22:17, David Edelsohn wrote:
> Unfortunately this patch is eliciting a number of new testsuite
> failures, all like
>
> error: unrecognizable insn:
> (insn 44 43 45 5 (parallel [
> (set (reg:SI 199)
> (unspec:SI [
> (re
Hi,
On 2020/7/11 08:54, Segher Boessenkool wrote:
> Hi!
>
> On Fri, Jul 10, 2020 at 09:39:40AM +0800, luoxhu wrote:
>> OK, seems the md file needs a format tool too...
>
> Heh. Just make sure it looks good (that is, does what it looks like),
> looks like the rest, etc. It's hard to do anything
On 2020/7/11 08:28, Segher Boessenkool wrote:
Hi!
On Thu, Jul 09, 2020 at 09:14:45PM -0500, Xiong Hu Luo wrote:
* config/rs6000/rs6000.md (rotl_unspec): New
define_insn_and_split.
+; rldimi with UNSPEC_SI_FROM_SF.
+(define_insn_and_split "*rotl_unspec"
Please have rotldi
On 2020/7/10 03:25, Segher Boessenkool wrote:
>
>> + "TARGET_NO_SF_SUBREG"
>> + "#"
>> + "&& vsx_reg_sfsubreg_ok (operands[0], SFmode)"
>
> Put this in the insn condition? And since this is just a predicate,
> you can just use it instead of gpc_reg_operand.
>
> (The split condition become
Update patch to keep the logic for non TARGET_P8_VECTOR targets.
Please ignore the previous [PATCH 1/2], Sorry!
Move V4SF to V4SI, init vector like V4SI and move to V4SF back.
Better instruction sequence could be generated on Power9:
lfs + xxpermdi + xvcvdpsp + vmrgew
=>
lwz + (sldi + or) + mtvs
Hi,
On 2020/7/10 03:25, Segher Boessenkool wrote:
> Hi!
>
> On Thu, Jul 09, 2020 at 11:09:42AM +0800, luoxhu wrote:
>>> Maybe change it back to just SI? It won't match often at all for QI or
>>> HI anyway, it seems. Sorry for that detour. Should be good with the
>>> above nits fixed :-)
>>
>>
On 2020/7/9 06:43, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Jul 08, 2020 at 11:19:21AM +0800, luoxhu wrote:
>> For extracting high part element from DImode register like:
>>
>> {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
>>
>> split it before reload with "and mask" to avoid generatin
On 2020/7/8 05:31, Segher Boessenkool wrote:
> Hi!
>
> On Tue, Jul 07, 2020 at 04:39:58PM +0800, luoxhu wrote:
>>> Lots of questions, sorry!
>>
>> Thanks for the nice suggestions of the initial patch contains many issues:),
>
> Pretty much all of it should *work*, it just can be improved and
>
On 2020/7/7 08:18, Segher Boessenkool wrote:
> Hi!
>
> On Sun, Jul 05, 2020 at 09:17:57PM -0500, Xionghu Luo wrote:
>> For extracting high part element from DImode register like:
>>
>> {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
>>
>> split it before reload with "and mask" to avoid gene
Gentle ping...
On 2020/6/1 09:45, Xionghu Luo wrote:
resend the patch for stage1:
https://gcc.gnu.org/pipermail/gcc-patches/2020-January/538186.html
The performance of exchange2 built with PGO will decrease ~28% by r278808
due to profile count set incorrectly. The cloned nodes are updated to
Hi,
On 2020/6/3 04:32, Segher Boessenkool wrote:
> Hi Xiong Hu,
>
> On Tue, Jun 02, 2020 at 04:41:50AM -0500, Xionghu Luo wrote:
>> Double array in structure as function arguments or return value is accessed
>> by BLKmode, they are stored to stack and load from stack with redundant
>> conversion
On 2020/5/13 02:24, Richard Sandiford wrote:
> luoxhu writes:
>> + /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e:
>> +
>> + 73: r145:SI=r123:DI#0-0x1
>> + 74: r144:DI=zero_extend (r145:SI)
>> + 75: r143:DI=r144:DI+0x1
>> + ...
>> + 31: r135:CC=cmp (r123:DI
Minor refine of checking iterations nonoverflow and a testcase for stage 1.
This "subtract/extend/add" existed for a long time and still annoying us
(PR37451, part of PR61837) when converting from 32bits to 64bits, as the ctr
register is used as 64bits on powerpc64, Andraw Pinski had a patch but
在 2020-05-06 20:09,Richard Biener 写道:
On Thu, 30 Apr 2020, luoxhu wrote:
Update the patch with overflow check. Bootstrap and regression tested
PASS on Power8-LE.
Use determine_value_range to get value range info for fold convert
expressions
with internal operation PLUS_EXPR/MINUS_EXPR/MULT
Update the patch with overflow check. Bootstrap and regression tested PASS on
Power8-LE.
Use determine_value_range to get value range info for fold convert expressions
with internal operation PLUS_EXPR/MINUS_EXPR/MULT_EXPR when not overflow on
wrapping overflow inner type. i.e.:
(long unsigne
On 2020/4/28 18:30, Richard Biener wrote:
>
> OK, I guess instead of get_range_info expr_to_aff_combination could
> simply use determine_value_range (op0, &minv, &maxv) == VR_RANGE
> (the && TREE_CODE (op0) == SSA_NAME check can then be removed)?
>
Tried with determine_value_range, it works
On 2020/4/28 15:01, Richard Biener wrote:
> On Tue, 28 Apr 2020, Xionghu Luo wrote:
>
>> From: Xionghu Luo
>>
>> Get and propagate value range info to convert expressions with convert
>> operation on PLUS_EXPR/MINUS_EXPR/MULT_EXPR when not overflow. i.e.:
>>
>> (long unsigned int)((unsigned
Tiny update to accommodate unsigned int compare.
On 2020/4/20 16:21, luoxhu via Gcc-patches wrote:
Hi,
On 2020/4/18 00:32, Segher Boessenkool wrote:
On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote:
On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote:
luoxhu
Hi,
On 2020/4/18 00:32, Segher Boessenkool wrote:
> On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote:
>> On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote:
>>> luoxhu--- via Gcc-patches writes:
>>>> -count = simplify_gen_binary
On 2020/4/17 08:52, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Apr 13, 2020 at 10:11:43AM +0800, luoxhu wrote:
>> frame_pointer_needed is set to true in reload pass setup_can_eliminate,
>> but regs_ever_live[31] is false, pro_and_epilogue uses it without live
>> check causing CPU2006 465.tonto
From: Xionghu Luo
This "subtract/extend/add" existed for a long time and still annoying us
(PR37451, PR61837) when converting from 32bits to 64bits, as the ctr
register is used as 64bits on powerpc64, Andraw Pinski had a patch but
caused some issue and reverted by Joseph S. Myers(PR37451, PR37782
This bug is exposed by FRE refactor of r263875. Comparing the fre
dump file shows no obvious change of the segment fault function proves
it to be a target issue.
frame_pointer_needed is set to true in reload pass setup_can_eliminate,
but regs_ever_live[31] is false, pro_and_epilogue uses it withou
On 2020/4/3 06:16, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Mar 30, 2020 at 11:59:57AM +0800, luoxhu wrote:
>>> Do we want something later in the RTL pipeline to make "addi"s etc. again?
>
> (This would be a good thing to consider -- maybe a define_insn_and_split
> will work. But see below
On 2020/3/28 00:04, Segher Boessenkool wrote:
Hi!
On Fri, Mar 27, 2020 at 09:34:00AM +0800, luoxhu wrote:
On 2020/3/27 07:59, Segher Boessenkool wrote:
On Wed, Mar 25, 2020 at 11:15:22PM -0500, luo...@linux.ibm.com wrote:
frame_pointer_needed is set to true in reload pass setup_can_eliminat
On 2020/3/27 22:33, Segher Boessenkool wrote:
> Hi!
>
> On Thu, Mar 26, 2020 at 05:06:43AM -0500, luo...@linux.ibm.com wrote:
>> Remove split code from add3 to allow a later pass to split.
>> This allows later logic to hoist out constant load in add instructions.
>> In loop, lis+ori could be ho
On 2020/3/27 07:59, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Mar 25, 2020 at 11:15:22PM -0500, luo...@linux.ibm.com wrote:
>> frame_pointer_needed is set to true in reload pass setup_can_eliminate,
>> but regs_ever_live[31] is false, so pro_and_epilogue doesn't save/restore
>> r31 even it is
From: Xionghu Luo
Remove split code from add3 to allow a later pass to split.
This allows later logic to hoist out constant load in add instructions.
In loop, lis+ori could be hoisted out to improve performance compared with
previous addis+addi (About 15% on typical case), weak point is
one more
From: Xionghu Luo
This P1 bug is exposed by FRE refactor of r263875. Comparing the fre
dump file shows no obvious change of the segment fault function proves
it to be a target issue.
frame_pointer_needed is set to true in reload pass setup_can_eliminate,
but regs_ever_live[31] is false, so pro_a
52 matches
Mail list logo