RE: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

Tamar Christina Tue, 29 Aug 2017 03:06:38 -0700


> -----Original Message-----
> From: Richard Biener [mailto:rguent...@suse.de]
> Sent: 24 August 2017 10:16
> To: Tamar Christina
> Cc: Joseph Myers; Christophe Lyon; Markus Trippelsdorf; Jeff Law; GCC
> Patches; Wilco Dijkstra; Michael Meissner; nd
> Subject: RE: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like
> numbers in GIMPLE.
> 
> On Wed, 23 Aug 2017, Tamar Christina wrote:
> 
> > Hi Richard,
> >
> > Thanks for the feedback,
> >
> > >
> > > I think you should split up your work a bit.  The pieces from
> > > fold_builtin_* you remove and move to lowering that require no
> > > control flow like __builtin_isnan (x) -> x UNORD x should move to
> > > match.pd patterns (aka foldings that will then be applied both on
> GENERIC and GIMPLE).
> > >
> >
> > But emitting them as an UNORD wouldn't solve the performance or
> > correctness issues The patch is trying to address. For correctness,
> > for instance an UNORD would still signal When -fsignaling-nans is used
> (PR/66462).
> 
> Well, then address the correctness issues separately.  I think lumping
> everything into a single patch just makes your live harder ;)


But the patch was made to fix the correctness issues. Which unless I'm mistaken
can only be solved using integer operations. To give a short history on the 
patch:

My original patch had this in builtins.c where the current code is and had
none of this problem. The intention of this patch was just address the 
correctness
issues with isnan etc.

Since we had to do this using integer operations we figured we'd also replace 
the
operations with faster ones so the code was tweaked in order to e.g. make the 
common
paths of fpclassify exit earlier. (I had provided benchmark results to show that
the integer operations are faster than the floating point ones, also on x86).

It was then suggested by upstream review that I do this as a gimple lowering
phase. I pushed back against this but ultimately lost and submitted a new 
version
that moved from builtins.c to gimple-low.c, which was why late expansion for C++
broke, because the expansion is now done very early.

The rational for this move was that it would give the rest of the pipeline a 
chance
to optimize the code further.

After a few revisions this was ultimately accepted (the v3 version), but was 
reverted
because it broke IBM's long double format due to the special case code for it 
violating SSA.

That was trivially fixed (once I finally had access to a PowerPC hardware 
months later) and I changed the
patch to leave the builtin as a function call if at lowering time it wasn't 
known to be a builtin.
This then "fixes" the C++ testcase, in that the test doesn't fail anymore, but 
it doesn't generate
the same code as before.
Though the test is quite contrived and I don't know if actual user code often 
has this.

While checking to see if this behavior is OK, I was suggested to do it as a 
gimple-fold instead.
I did, but then turns out it won't work for non-constants as I can't generate 
new control flow from
a fold (which makes sense in retrospect but I didn't know this before).

This is why the patch is more complex than what it was before. The original 
version only emitted simple tree.

> 
> I was saying that if I write isunordered (x, y) and use -fno-signaling-nans 
> (the
> default) then I would expect the same code to be generated as if I say isnan
> (x) || isnan (y).  And I'm not sure if isnan (x) || isnan (y) is more 
> efficient if
> both are implemented with integer ops compared to using an unordered FP
> compare.
> 

But isnan (x) || isnan (y) can be rewritten using a match.pd rule to 
isunordered (x, y)
If that should really generate the same code. So I don’t think it's an issue 
for this patch.

> A canonical high-level representation on GIMPLE is equally important as
> maybe getting some optimization on a lowered "optimized" form.
> 
> And a good canonical representation is short (x UNORD x or x UNORD y
> counts as short here).
> 
> > > As fallback the expanders should simply emit a call (as done for
> > > isinf when not folded for example).
> >
> > Yes the patch currently already does this.
> 
> Ah, I probably looked at an old version then.
> 
> > My question had more to do if
> > the late expansion when you alias one of these functions in C++ was
> > really an issue, since It used to (sometimes, only when you got the
> > type of the argument correct) expand before whereas now it'll always
> > just emit a function call in this edge case.
> 
> The answer here is really that we should perform lowering at a point where
> we do not omit possible optimizations.  For the particular case this would
> mean lowering after IPA optimizations (also think of accelerator offloading
> which uses the early optimization phase of the host).  This also allows early
> optimization to more accurately estimate code size (when optimizing for
> size).  And I still think whether to expose fp classification as FP or 
> integer ops
> should be a target decision and done late.
> 
> This still means canonicalizations like isnan -> UNORD should happen early
> and during folding.
> 

This feels to me like it's a different patch, whatever transformations you do
to get isnan or any of the other builtins don't seem relevant. Only that when 
you
do generate isnan, you get better and more correct code than before.

If the integer operations are of a concern though, I could add a target hook
to make this opt-in. The backup code is still there as it gets used when the 
floating point
type is not IEEE like.

> >
> > >
> > > That said, split out the uncontroversical part of moving existing
> > > foldings from builtins.c to match.pd where they generate no control-
> flow.
> > >

If that's the direction I have take I will, I am just not convinced the patch 
will be
simpler nor smaller...

Thanks,
Tamar

> > > Thanks,
> > > Richard.
> > >
> > > > Originally the patch was in expand, which would have covered the
> > > > late expansion case similar to what it's doing now in trunk. I was
> > > > however asked to move this to a GIMPLE lowering pass as to give
> > > > other
> > > optimizations a chance to do further optimizations on the generated
> code.
> > > >
> > > > This of course works fine for C since these math functions are a
> > > > Macro in C
> > > but are functions in C++.
> > > >
> > > > C++ would then allow you to do stuff like take the address of the
> > > > C++ function so
> > > >
> > > > void foo(float a) {
> > > >   int (*xx)(...);
> > > >   xx = isnan;
> > > >   if (xx(a))
> > > >     g++;
> > > > }
> > > >
> > > > is perfectly legal. My current patch leaves "isnan" in as a call
> > > > as by the time we're doing GIMPLE lowering the alias has not been
> > > > resolved yet, whereas the version currently committed is able to
> > > > expand it as it's late
> > > in expand.
> > > >
> > > > Curiously the behaviour is somewhat confusing.
> > > > if xx is called with a non-constant it is expanded as you would
> > > > expect
> > > >
> > > > .LFB0:
> > > >         .cfi_startproc
> > > >         fcvt    d0, s0
> > > >         fcmp    d0, d0
> > > >         bvs     .L4
> > > >
> > > > but when xx is called with a constant, say 0 it's not
> > > >
> > > > .LFB0:
> > > >         .cfi_startproc
> > > >         mov     w0, 0
> > > >         bl      isnan
> > > >         cbz     w0, .L1
> > > >
> > > > because it infers the type of the 0 to be an integer and then
> > > > doesn't
> > > recognize the call.
> > > > using 0.0 works, but the behaviour is really counter intuitive.
> > > >
> > > > The question is what should I do now, clearly it would be useful
> > > > to handle the late expansion as well, however I don't know what
> > > > the best
> > > approach would be.
> > > >
> > > > I can either:
> > > >
> > > > 1) Add two new implementations, one for constant folding and one
> > > > for
> > > expansion, but then one in expand would
> > > >    be quite similar to the one in the early lowering phase. The
> > > > constant
> > > folding one could be very simple since
> > > >    it's a constant I can just call the buildins and evaluate the
> > > > value
> > > completely.
> > > >
> > > > 2) Use the patch as is but make another one to allow the renaming
> > > > to be applied quite early on. e.g while still in Tree or GIMPLE
> > > > resolve
> > > >
> > > >         int (*xx)(...);
> > > >         xx = isnan;
> > > >         if (xx(a))
> > > >
> > > >         to
> > > >
> > > >         int (*xx)(...);
> > > >         xx = isnan;
> > > >         if (isnan(a))
> > > >
> > > >    This seems like it would be the best approach and the more
> > > > useful one in
> > > general.
> > > >
> > > > Any thoughts?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > >
> > > > > Richard.
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguent...@suse.de>
> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> > > Norton, HRB 21284 (AG Nuernberg)
> >
> >
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nuernberg)

RE: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

Reply via email to