Not exactly a patch ping, but I was hoping we could re-engage the discussion on
this and figure out how we can make POImode work for powerpc.
How does x86 solve this? There was some suggestion that it has some similar
situations?
Thanks,
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux
Ping.
So, this has sat for a while and it’s getting close to the end of stage1 now. I
don’t see that we're any closer to a solution that allows us to use POImode
without risking this ICE. I had to disable the use of VSX vector pair
loads/stores in inline expansion of memcpy/memmove do avoid it.
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Oct 26, 2020, at 4:44 PM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> This patch adds the first couple patterns to support p10 fusion. These
> will allow combine to create a single insn for a pair o
One last addendum to this. I discovered that that needs a "sort"
in front of "keys %logicals_addsub" because otherwise you may get
the operators in different orders sometimes which leads to fusion.md
having the patterns in different orders which isn't helpful for
sane debugging. Segher and I discu
For some reason this never showed up on gcc-patches, trying again.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> Begin forwarded message:
>
> From: Aaron Sawdey
> Subject: [PATCH,rs6000] Fix p10 fusion test cases for -m32
> Date: May 25, 2021 at 1:45:36 PM CDT
> To:
This certainly causes a bootstrap miscompare, and might also be
responsible for PR/100820. The operands to subf were reversed
in the logical-add/sub fusion patterns, and I screwed up my
bootstrap test which is how it ended up getting committed.
If bootstrap and regtest passes, ok for trunk (and ev
These tests have become unstable and SMS either succeeds or doesn't
depending on things like changes in instruction latency. Removing
the scan-rtl-dump-times checks for powerpc*-*-*.
If bootstrap/regtest is passes, ok for trunk and backport to 11?
Thanks!
Aaron
gcc/testsuite
* gcc.dg
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Apr 26, 2021, at 3:21 PM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> Two more sets of combine patterns for p10 fusion. These require
> the "Add insn types for fusion pairs" patch I posted earlier
Ping.
In answer to Will’s question — some of these are not immediately used but will
be in other pending patches.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Apr 26, 2021, at 1:04 PM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> This adds new valu
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Apr 26, 2021, at 2:00 PM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> This adds some test cases to make sure that the combine patterns for p10
> fusion are working.
>
> OK for trunk?
>
> gcc/tests
This patch adds a few new instructions to inline expansion of
memcpy/memmove. Generation of all these is controlled by
the option -mblock-ops-unaligned-vsx which is set on by default if the
target has TARGET_EFFICIENT_UNALIGNED_VSX.
* unaligned vsx load/store (V2DImode)
* unaligned vsx pair load/
I've modified slightly per Will & Segher's comments, re-regstrapped and
posting what I've actually committed.
Aaron
This patch adds a few new instructions to inline expansion of
memcpy/memmove. Generation of all these are controlled by
the option -mblock-ops-unaligned-vsx which is set on by def
Now that the documentation for partial modes says they have a known
number of bits of precision, would it make sense for extract_low_bits to
check this before attempting to extract the bits?
This would solve the problem we have been having with POImode and
extract_low_bits -- DSE tries to use it t
. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Sep 10, 2020, at 4:36 AM, Richard Biener
> wrote:
>
> On Wed, Sep 9, 2020 at 8:28 PM Aaron Sawdey via Gcc-patches
> wrote:
>>
>> Now that the documentation for partial modes says they have a known
>> n
So, would it be legitimate for extract_low_bits to query if the truncate
pattern it will likely use is actually available?
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Sep 10, 2020, at 10:10 AM, Segher Boessenkool
> wrote:
>
> Hi!
>
> On Thu, Sep 10, 2020 at
This is a (hopefully temporary) fix to PR96791. This will make
the default be -mno-block-ops-vector-pair even on power10, so we will
not hit the issue of DSE trying to truncate a POImode register. I am
still concerned it will be possible to hit this because the MMA builtins
will also generate POImo
This is a fix for PR92379. Passes regstrap on ppc64le. Pre-approved by
Segher, committing after posting.
2020-03-13 Aaron Sawdey
PR target/92379
* config/rs6000/rs6000.c (num_insns_constant_multi) Don't shift a
64-bit value by 64 bits (UB).
diff --git a/gcc/config/rs6000/rs6000.c
Updated slightly, removed -Wno-psabi as requested and also fixed the
fact that it wasn't actually checking __builtin_cpu_is or
__builtin_cpu_supports. OK for trunk and backport to 10?
Thanks,
Aaron
2020-06-30 Rajalakshmi Srinivasaraghavan
Aaron Sawdey
gcc/testsuite/
The code snippet for this test was returning 1 if power10
instructions executed correctly. It should return 0 if the
test passes.
OK for trunk and backport to 10?
Thanks,
Aaron
* lib/target-supports.exp (check_power10_hw_available):
Return 0 for passing test.
---
gcc/testsuit
The code snippet for this test was returning 1 if power10
instructions executed correctly. It should return 0 if the
test passes.
Approved offline by Segher with slight change. Will
push after posting.
* lib/target-supports.exp (check_power10_hw_available):
Return 0 for passing t
This patch adds execution tests that use the MMA builtins and
check for the right answer, and a new test that checks whether
__builtin_cpu_supports and __builtin_cpu_is return sane answers.
One final time now that I've gotten things sorted out. OK for trunk
and backport to 10?
Thanks,
Aaron
This fixed the ICE I was seeing, thanks.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Jul 10, 2020, at 10:40 AM, Richard Sandiford
> wrote:
>
> In some cases, expand_expr_real_2 prefers to use the mode of the
> caller-suggested target instead of the mode of the
This patch adds execution tests that use the MMA builtins and
check for the right answer, and new tests that checks whether
__builtin_cpu_supports and __builtin_cpu_is return sane
answers for power10.
I've now cleaned up and separated things out so there are 4 test cases:
* MMA single precision ex
Add a test for dejagnu to determine if execution of MMA instructions is
supported in the test environment. Add an execution test to make sure
that __builtin_cpu_supports("mma") is true if we can execute MMA
instructions.
OK for trunk and backport to 10?
Thanks!
Aaron
gcc/testsuite/
*
Because the check for power10_hw is not called
check_effective_target_power10_hw, it needs to be looked
for by is-effective-target-keyword. Also reorder things
in is-effective-target to put power10_hw with the other
ppc stuff.
These little fixes for power10 dejagnu support were pre-approved
for tr
For future architecture with prefix instructions, always use plq
rather than lq for atomi load of quadword. Then we never have to
do the doubleword swap on little endian. Before this fix, -mno-pcrel
would generate lq with the doubleword swap (which was ok) and -mpcrel
would generate plq, also with
For future architecture with prefix instructions, always use plq
rather than lq for atomic load of quadword. Then we never have to
do the doubleword swap on little endian. Before this fix, -mno-pcrel
would generate lq with the doubleword swap (which was ok) and -mpcrel
would generate plq, also with
For future architecture with prefix instructions, always use plq/pstq
rather than lq/stq for atomic load of quadword. Then we never have to
do the doubleword swap on little endian. Before this fix, -mno-pcrel
would generate lq with the doubleword swap (which was ok) and -mpcrel
would generate plq,
Because reg_to_non_prefixed() only looks at the register being used, it
doesn't get the right answer for stfs, which leads to us not seeing
that it has a PCREL symbol ref. This patch works around this by
introducing a helper function that inspects the insn to see if it is in
fact a stfs. Then if w
The same problem also arises for plfs where prefixed_load_p()
doesn't recognize it so we get just lfs in the asm output
with a @pcrel address.
OK for trunk if regstrap on ppc64le passes?
Thanks,
Aaron
PR target/95347
* config/rs6000/rs6000.c (is_stfs_insn): Rename to
This passed regstrap and was approved offline by Segher, posting
the final form (minus my debug code, oops).
The same problem also arises for plfs where prefixed_load_p()
doesn't recognize it so we get just lfs in the asm output
with an @pcrel address.
PR target/95347
* config/rs6
Ping.
I assume we’re going to want a separate patch for the new instruction type.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Dec 4, 2020, at 1:19 PM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> This patch adds the first batch of patterns to suppo
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Dec 10, 2020, at 8:41 PM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> This patch adds a new function to genfusion.pl to generate patterns for
> logical-logical fusion. They are enabled by default fo
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Dec 11, 2020, at 1:53 PM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> This adds some test cases to make sure that the combine patterns for p10
> fusion are working.
>
> These test cases pass on pow
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Jan 3, 2021, at 2:42 PM, Aaron Sawdey wrote:
>
> Ping.
>
> I assume we’re going to want a separate patch for the new instruction type.
>
> Aaron Sawdey, Ph.D. saw...@linux.ibm.com
> IBM Linux on POWER Toolchain
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Jan 3, 2021, at 2:43 PM, Aaron Sawdey wrote:
>
> Ping.
>
> Aaron Sawdey, Ph.D. saw...@linux.ibm.com
> IBM Linux on POWER Toolchain
>
>
>> On Dec 10, 2020, at 8:41 PM, acsaw...@linux.ibm.com wrote:
>>
>> From:
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Jan 3, 2021, at 2:44 PM, Aaron Sawdey wrote:
>
> Ping.
>
> Aaron Sawdey, Ph.D. saw...@linux.ibm.com
> IBM Linux on POWER Toolchain
>
>
>> On Dec 11, 2020, at 1:53 PM, acsaw...@linux.ibm.com wrote:
>>
>> From:
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Dec 9, 2020, at 11:04 AM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> Ping. I've folded in the changes to comments suggested by Will Schmidt.
>
> This patch implements a RTL pass that looks for pc-
Now that this has been in trunk for a bit with no issues, ok to back port to 10?
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Jun 3, 2020, at 4:10 PM, Aaron Sawdey wrote:
>
> This passed regstrap and was approved offline by Segher, posting
> the final form (minu
Update config.gcc so that we can use --with-cpu=power10.
I've tested that this does do the expected thing
with --with-cpu=power10 and also that it still builds and
bootstraps correctly using --with-cpu=power9 on power9. If there isn't
any other testing I need to do for this, ok for trunk?
Thanks
This patch adds execution tests that use the MMA builtins,
checks for the right answer, and checks that __builtin_cpu_supports
and __builtin_cpu_is return sane answers given that the code
executed correctly.
Tested against P10 sim, should not execute anywhere else due to
requiring power10_hw. Act
For some reason this patch never showed up on gcc-patches.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> Begin forwarded message:
>
> From: acsaw...@linux.ibm.com
> Subject: [PATCH,rs6000] Make MMA builtins use opaque modes [v2]
> Date: November 19, 2020 at 12:58:47 P
> On Nov 20, 2020, at 3:55 AM, Richard Sandiford
> wrote:
>
> acsawdey--- via Gcc-patches writes:
>> diff --git a/gcc/c/c-aux-info.c b/gcc/c/c-aux-info.c
>> index ffc8099856d..41f5598de38 100644
>> --- a/gcc/c/c-aux-info.c
>> +++ b/gcc/c/c-aux-info.c
>> @@ -413,6 +413,10 @@ gen_type (const ch
> On Nov 20, 2020, at 4:57 AM, Aaron Sawdey via Gcc-patches
> wrote:
>
>
>> On Nov 20, 2020, at 3:55 AM, Richard Sandiford
>> wrote:
>>
>> acsawdey--- via Gcc-patches writes:
>>> @@ -16767,7 +16768,7 @@ loc_descriptor (rtx rtl, machine_mode mod
The add-logical and add-add fusion patterns all have constraint
alternatives "=0,1,&r,r" for the output (3). The inputs 0 and 1
are used in the first fusion instruction and then either may be
reused as a temp for the output of the first insn which is
input to the second. However, if input 2 is the
From: Aaron Sawdey
Update the count of matches for the fusion combine patterns after
the recent changes to them. At Segher's request, used \m and \M
in the match patterns. Also I have grouped together all alternatives of
each fusion insn, which should hopefully make this test a little less
fragi
SPEC2017 testing on p10 shows that this optimization does not have a
positive impact on performance. So we are no longer going to enable it
by default. The test cases for it needed to be updated so they always
enable it to test it.
OK for trunk and backport to 11 if bootstrap/regtest passes?
Than
47 matches
Mail list logo