https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #14 from Peter Cordes ---
I happened to look at this old bug again recently.
re: extracting high the low two 32-bit elements:
(In reply to Uroš Bizjak from comment #11)
> > Or without SSE4 -mtune=sandybridge (anything that excluded
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #13 from uros at gcc dot gnu.org ---
Author: uros
Date: Tue May 30 17:18:25 2017
New Revision: 248691
URL: https://gcc.gnu.org/viewcvs?rev=248691&root=gcc&view=rev
Log:
PR target/80833
* config/i386/constraints.md (Yd)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #12 from Uroš Bizjak ---
(In reply to Peter Cordes from comment #4)
> MMX is also a saving in code-size: one fewer prefix byte vs. SSE2 integer
> instructions. It's also another set of 8 registers for 32-bit mode.
After touching a M
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #11 from Uroš Bizjak ---
(In reply to Peter Cordes from comment #0)
> A lower-latency xmm->int strategy would be:
>
> movd%xmm0, %eax
> pextrd $1, %xmm0, %edx
Proposed patch implements the above for generic move
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #10 from Uroš Bizjak ---
(In reply to Peter Cordes from comment #0)
> Scalar 64-bit integer ops in vector regs may be useful in general in 32-bit
> code in some cases, especially if it helps with register pressure.
We have scalar-to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #9 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #8)
> movq%xmm0, (%esp) <<-- unneeded store due to RA problem
For some reason, reload "fixes" direct DImode register moves, and passes value
via memory.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #8 from Uroš Bizjak ---
The patch from comment #7 generates:
a) DImode move for 32 bit targets:
--cut here--
long long test (long long a)
{
asm ("" : "+x" (a));
return a;
}
--cut here--
gcc -O2 -msse4.1 -mtune=intel -mregparm=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #7 from Uroš Bizjak ---
Created attachment 41412
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41412&action=edit
Prototype patch
Patch that emits mov/pinsr or mov/pextr pairs for DImode (x86_32) and TImode
(x86_64) moves.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #6 from Peter Cordes ---
(In reply to Richard Biener from comment #5)
> There's some related bugs. I think there is no part of the compiler that
> specifically tries to avoid store forwarding issues.
Ideally the compiler would keep
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
Richard Biener changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #4 from Peter Cordes ---
I don't think it's worth anyone's time to implement this in 2017, but using MMX
regs for 64-bit store/load would be faster on really old CPUs that split 128b
vectors insns into two halves, like K8 and Pentium
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #3 from Peter Cordes ---
Atom's movd xmm->int is slower (lat=4, rtput=2) than its movd int->xmm (lat=3,
rtput=1), which is opposite of every other CPU (except Silvermont where they're
the same throughput but xmm->int is 1c slower). S
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #2 from Peter Cordes ---
On most CPUs, psrldq / movd is optimal for xmm[1] -> int without SSE4. On
SnB-family, movd runs on port0, and psrldq can run on port5, so they can
execute in parallel. (And the second movd can run the next c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #1 from Peter Cordes ---
See https://godbolt.org/g/krXH9M for the functions I was looking at.
14 matches
Mail list logo