https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141
--- Comment #10 from Richard Biener ---
*** Bug 93142 has been marked as a duplicate of this bug. ***
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141
--- Comment #9 from Jakub Jelinek ---
Author: jakub
Date: Thu Jan 9 08:18:51 2020
New Revision: 280029
URL: https://gcc.gnu.org/viewcvs?rev=280029&root=gcc&view=rev
Log:
PR target/93141
* config/i386/i386.md (subv4): Use SWIDWI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141
--- Comment #8 from Jakub Jelinek ---
Author: jakub
Date: Sun Jan 5 12:52:24 2020
New Revision: 279887
URL: https://gcc.gnu.org/viewcvs?rev=279887&root=gcc&view=rev
Log:
PR target/93141
* config/i386/i386.md (SWIDWI): New mode i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141
--- Comment #7 from Madhur Chauhan ---
As far as I can tell optimal asm generated should be like:
mov-load from on array
mul or preferably mulx with a memory source from the other array
add + adc into 128-bit answer register
adc reg, 0 to accum
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141
--- Comment #6 from Madhur Chauhan ---
As far as I can tell optimal asm generated should be like:
mov-load from on array
mul or preferably mulx with a memory source from the other array
add + adc into 128-bit answer register
adc reg, 0 to accum
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141
--- Comment #5 from Andrew Pinski ---
Just for reference here is aarch64 assembly for the loop:
.L4:
ldr x4, [x9, x5]
ldr x3, [x8, x5]
add x5, x5, 8
mul x6, x4, x3
umulh x3, x4, x3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141
Jakub Jelinek changed:
What|Removed |Added
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141
--- Comment #3 from Jakub Jelinek ---
Untested fix, though this is just about double-word uaddv4, would be good to
handle double-word usubv4, addv4 and subv4 similarly.
--- gcc/config/i386/i386.md.jj 2020-01-03 11:10:43.839511446 +0100
+++ gcc/
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #2 fro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141
--- Comment #1 from Madhur Chauhan ---
The source of this bug is the stackoverflow Q&A:
https://stackoverflow.com/questions/59575408/fastest-way-to-sum-dot-product-of-vector-of-unsigned-64-bit-integers-using-192-2/59579310#59579310
10 matches
Mail list logo