https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87455
--- Comment #6 from Fanael ---
Any hope of getting this fixed in GCC 10? It should just be a matter of
removing Zen[12] from X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87455
--- Comment #5 from Fanael ---
Created attachment 44829
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44829&action=edit
WIP patch
> We already have TARGET_SSE_TYPELESS_STORES for stores, so perhaps we want
> something like typeless reg-r
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87455
--- Comment #4 from Jan Hubicka ---
We already have TARGET_SSE_TYPELESS_STORES for stores, so perhaps we want
something like typeless reg-reg moves and loads flag?
Honza
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87455
--- Comment #3 from Fanael ---
> May be we should remove xorps generation part.
If it were up to me, I'd keep to for BDVER[1234] only, because xorps is still
one byte shorted than either xorpd or pxor and is as fast there, and introduce
a separa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87455
--- Comment #2 from vekumar at gcc dot gnu.org ---
This tuning was intended to generate movups instead of movupd as movups is 1
byte lesser than movupd. May be we should remove xorps generation part.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87455
--- Comment #1 from Fanael ---
Assembly diff between the two:
--- /dev/fd/63 2018-09-27 17:59:06.120507763 +0200
+++ /dev/fd/62 2018-09-27 17:59:06.120507763 +0200
@@ -7,21 +7,21 @@
main:
.LFB5179:
.cfi_startproc
- movaps .LC0