https://sourceware.org/bugzilla/show_bug.cgi?id=22871
H.J. Lu changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #23 from cvs-commit at gcc dot gnu.org ---
The master branch has been updated by H.J. Lu :
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b6f8c7c45229a8a5405079e586bfbaad396d2cbe
commit b6f8c7c45229a8a5405079e586bfbaa
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #22 from H.J. Lu ---
(In reply to H.J. Lu from comment #15)
> (In reply to Jan Beulich from comment #13)
> > One more pair of cases to consider is conversion of word/dword/qword add/sub
> > with an immediate of 128 to sub/add with
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #21 from H.J. Lu ---
(In reply to Linus Torvalds from comment #19)
> (In reply to Linus Torvalds from comment #18)
> >
> > Very interesting. I can confirm that testb seems slower on Skylake too.
>
> Oh no, I take that back.
>
>
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #20 from Linus Torvalds ---
I thought I could make the numbers more stable by using serializing
instructions (cpuid with %eax=0) around the rdtsc, but that just caused some
odd bi-modal behavior where testb/testl and testw/testq "p
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #19 from Linus Torvalds ---
(In reply to Linus Torvalds from comment #18)
>
> Very interesting. I can confirm that testb seems slower on Skylake too.
Oh no, I take that back.
There's something else going on.
Sometimes I get res
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #18 from Linus Torvalds ---
(In reply to H.J. Lu from comment #17)
> A testb microbenchmark
Very interesting. I can confirm that testb seems slower on Skylake too.
And it's not some odd effect of just the call/ret sequence - I di
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #17 from H.J. Lu ---
Created attachment 10846
--> https://sourceware.org/bugzilla/attachment.cgi?id=10846&action=edit
A testb microbenchmark
--
You are receiving this mail because:
You are on the CC list for the bug.
__
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #16 from Linus Torvalds ---
(In reply to H.J. Lu from comment #14)
> We should avoid testb optimization. This are latencies:
>
> testb : 33711871
> testw : 21204854
> testl : 18938530
> testq : 18942712
>
> on Haswell. Most
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #15 from H.J. Lu ---
(In reply to Jan Beulich from comment #13)
> One more pair of cases to consider is conversion of word/dword/qword add/sub
> with an immediate of 128 to sub/add with -128 as immediate.
Done on users/hjl/optimiz
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #14 from H.J. Lu ---
We should avoid testb optimization. This are latencies:
testb : 33711871
testw : 21204854
testl : 18938530
testq : 18942712
on Haswell. Most of other processors show that testb is the slowest.
--
You
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #13 from Jan Beulich ---
One more pair of cases to consider is conversion of word/dword/qword add/sub
with an immediate of 128 to sub/add with -128 as immediate.
--
You are receiving this mail because:
You are on the CC list for
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #12 from Linus Torvalds ---
(In reply to H.J. Lu from comment #11)
>
> imm8 isn't sign-extended. 8 bits should work.
No, it's not sign-extended to the full size, but it doesn't work because you
change the sign bit in the flags i
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #11 from H.J. Lu ---
(In reply to Linus Torvalds from comment #10)
> (In reply to H.J. Lu from comment #7)
> >
> > Good point. I will remove "testq $imm31, mem". I will add
> > "test{q,l,w} $imm8,%r{64,32,16}" to "testb $imm8,%r
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #10 from Linus Torvalds ---
(In reply to H.J. Lu from comment #7)
>
> Good point. I will remove "testq $imm31, mem". I will add
> "test{q,l,w} $imm8,%r{64,32,16}" to "testb $imm8,%r8" to -O3.
I'm assuming that you limit the im
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #9 from Linus Torvalds ---
I already pointed this out in email to hjl, but adding it to the bugzilla too,
in case people want to track it:
There are a few more common cases that can use the REX.W optimization, notably
movq $
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #8 from H.J. Lu ---
I updated users/hjl/optimize branch to fix "andq $foo, %rax"
and remove "testq $imm31, mem".
--
You are receiving this mail because:
You are on the CC list for the bug.
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #7 from H.J. Lu ---
(In reply to Jan Beulich from comment #6)
> (In reply to H.J. Lu from comment #5)
> > I updated users/hjl/optimize branch to encode
> >
> > testq $imm31, mem
> >
> > as
> >
> > testl $imm31, mem
> >
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #6 from Jan Beulich ---
(In reply to H.J. Lu from comment #5)
> I updated users/hjl/optimize branch to encode
>
> testq $imm31, mem
>
> as
>
> testl $imm31, mem
>
> only at -O2.
I was about to suggest that, also becaus
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #5 from H.J. Lu ---
(In reply to Linus Torvalds from comment #4)
> (In reply to H.J. Lu from comment #3)
> >
> > We need to keep
> >
> > andqimm31, mem
>
> Yes.
>
> > and optimize testq to
> >
> > testl imm31, me
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
Linus Torvalds changed:
What|Removed |Added
CC||torvalds@linux-foundation.o
https://sourceware.org/bugzilla/show_bug.cgi?id=22871
H.J. Lu changed:
What|Removed |Added
Summary|Encode instructions of |Encode instructions of
|6
22 matches
Mail list logo