Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-09 Thread Joseph Myers
On Sat, 7 Aug 2021, Stefan Kanthak wrote: > Joseph Myers wrote: > > You should be looking at TS 18661-3 / C2x Annex F for sNaN handling; > > I'll do so as soon as GCC drops support for all C dialects before C2x! > > Unless you use a time machine and fix the POSIX and ISO C standards > written i

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-08 Thread Vincent Lefevre
On 2021-08-07 14:32:32 +0200, Stefan Kanthak wrote: > Joseph Myers wrote: > > On Fri, 6 Aug 2021, Stefan Kanthak wrote: > > PLEASE DON'T STRIP ATTRIBUTION LINES: I did not write the following paragraph! > > >> > I don't know what the standard says about NaNs in this case, I seem to > >> > rememb

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-07 Thread Stefan Kanthak
Joseph Myers wrote: > On Fri, 6 Aug 2021, Stefan Kanthak wrote: PLEASE DON'T STRIP ATTRIBUTION LINES: I did not write the following paragraph! >> > I don't know what the standard says about NaNs in this case, I seem to >> > remember that arithmetic instructions typically produce QNaN when one

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Joseph Myers
On Fri, 6 Aug 2021, Stefan Kanthak wrote: > > I don't know what the standard says about NaNs in this case, I seem to > > remember that arithmetic instructions typically produce QNaN when one of > > the inputs is a NaN, whether signaling or not. > >

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Stefan Kanthak
Richard Biener wrote: > On August 6, 2021 4:32:48 PM GMT+02:00, Stefan Kanthak > wrote: >>Michael Matz wrote: >>> Btw, have you made speed measurements with your improvements? >> >>No. [...] >>If the constant happens to be present in L1 cache, it MAY load as fast >>as an immediate. >>BUT: on

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Richard Biener via Gcc
On August 6, 2021 4:32:48 PM GMT+02:00, Stefan Kanthak wrote: >Michael Matz wrote: > > >> Hello, >> >> On Fri, 6 Aug 2021, Stefan Kanthak wrote: >> >>> For -ffast-math, where the sign of -0.0 is not handled and the spurios >>> invalid floating-point exception for |argument| >= 2**63 is accepta

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Michael Matz via Gcc
Hello, On Fri, 6 Aug 2021, Stefan Kanthak wrote: > >> For -ffast-math, where the sign of -0.0 is not handled and the > >> spurios invalid floating-point exception for |argument| >= 2**63 is > >> acceptable, > > > > This claim would need to be proven in the wild. > > I should have left the "wh

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Stefan Kanthak
Gabriel Paubert wrote: > On Fri, Aug 06, 2021 at 02:43:34PM +0200, Stefan Kanthak wrote: >> Gabriel Paubert wrote: >> >> > Hi, >> > >> > On Thu, Aug 05, 2021 at 01:58:12PM +0200, Stefan Kanthak wrote: [...] >> >> The whole idea behind these implementations is to get rid of loading >> >> flo

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Stefan Kanthak
Michael Matz wrote: > Hello, > > On Fri, 6 Aug 2021, Stefan Kanthak wrote: > >> For -ffast-math, where the sign of -0.0 is not handled and the spurios >> invalid floating-point exception for |argument| >= 2**63 is acceptable, > > This claim would need to be proven in the wild. I should have

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Michael Matz via Gcc
Hello, On Fri, 6 Aug 2021, Stefan Kanthak wrote: > For -ffast-math, where the sign of -0.0 is not handled and the spurios > invalid floating-point exception for |argument| >= 2**63 is acceptable, This claim would need to be proven in the wild. |argument| > 2**52 are already integer, and should

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Gabriel Paubert
On Fri, Aug 06, 2021 at 02:43:34PM +0200, Stefan Kanthak wrote: > Gabriel Paubert wrote: > > > Hi, > > > > On Thu, Aug 05, 2021 at 01:58:12PM +0200, Stefan Kanthak wrote: > >> Gabriel Paubert wrote: > >> > >> > >> > On Thu, Aug 05, 2021 at 09:25:02AM +0200, Stefan Kanthak wrote: > > >> >>

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Richard Biener via Gcc
On Fri, Aug 6, 2021 at 2:47 PM Stefan Kanthak wrote: > > Gabriel Paubert wrote: > > > Hi, > > > > On Thu, Aug 05, 2021 at 01:58:12PM +0200, Stefan Kanthak wrote: > >> Gabriel Paubert wrote: > >> > >> > >> > On Thu, Aug 05, 2021 at 09:25:02AM +0200, Stefan Kanthak wrote: > > >> >>

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Stefan Kanthak
Gabriel Paubert wrote: > Hi, > > On Thu, Aug 05, 2021 at 01:58:12PM +0200, Stefan Kanthak wrote: >> Gabriel Paubert wrote: >> >> >> > On Thu, Aug 05, 2021 at 09:25:02AM +0200, Stefan Kanthak wrote: >> >> .intel_syntax >> >> .text >>

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-05 Thread Gabriel Paubert
Hi, On Thu, Aug 05, 2021 at 01:58:12PM +0200, Stefan Kanthak wrote: > Gabriel Paubert wrote: > > > > On Thu, Aug 05, 2021 at 09:25:02AM +0200, Stefan Kanthak wrote: > >> Hi, > >> > >> targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the > >> following code (13 instructions u

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-05 Thread Gabriel Ravier via Gcc
On 8/5/21 11:42 AM, Gabriel Paubert wrote: On Thu, Aug 05, 2021 at 09:25:02AM +0200, Stefan Kanthak wrote: Hi, targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the following code (13 instructions using 57 bytes, plus 4 quadwords using 32 bytes) for __builtin_trunc() when -msse4.1 i

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-05 Thread Stefan Kanthak
Gabriel Paubert wrote: > On Thu, Aug 05, 2021 at 09:25:02AM +0200, Stefan Kanthak wrote: >> Hi, >> >> targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the >> following code (13 instructions using 57 bytes, plus 4 quadwords >> using 32 bytes) for __builtin_trunc() when -msse4.1 is NOT

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-05 Thread Richard Biener via Gcc
On Thu, Aug 5, 2021 at 11:44 AM Gabriel Paubert wrote: > > On Thu, Aug 05, 2021 at 09:25:02AM +0200, Stefan Kanthak wrote: > > Hi, > > > > targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the > > following code (13 instructions using 57 bytes, plus 4 quadwords > > using 32 bytes) for __

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-05 Thread Gabriel Paubert
On Thu, Aug 05, 2021 at 09:25:02AM +0200, Stefan Kanthak wrote: > Hi, > > targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the > following code (13 instructions using 57 bytes, plus 4 quadwords > using 32 bytes) for __builtin_trunc() when -msse4.1 is NOT given: > >

Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-05 Thread Stefan Kanthak
Hi, targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the following code (13 instructions using 57 bytes, plus 4 quadwords using 32 bytes) for __builtin_trunc() when -msse4.1 is NOT given: .text 0: f2 0f 10 15 10 00 00 00 movsd .LC1(%rip), %xmm2