On Wed, 23 Aug 2017, Tamar Christina wrote: > Hi Richard, > > Thanks for the feedback, > > > > > I think you should split up your work a bit. The pieces from > > fold_builtin_* you remove and move to lowering that require no control flow > > like __builtin_isnan (x) -> x UNORD x should move to match.pd patterns (aka > > foldings that will then be applied both on GENERIC and GIMPLE). > > > > But emitting them as an UNORD wouldn't solve the performance or correctness > issues > The patch is trying to address. For correctness, for instance an UNORD would > still signal > When -fsignaling-nans is used (PR/66462).
Well, then address the correctness issues separately. I think lumping everything into a single patch just makes your live harder ;) I was saying that if I write isunordered (x, y) and use -fno-signaling-nans (the default) then I would expect the same code to be generated as if I say isnan (x) || isnan (y). And I'm not sure if isnan (x) || isnan (y) is more efficient if both are implemented with integer ops compared to using an unordered FP compare. A canonical high-level representation on GIMPLE is equally important as maybe getting some optimization on a lowered "optimized" form. And a good canonical representation is short (x UNORD x or x UNORD y counts as short here). > > As fallback the expanders should simply emit a call (as done for isinf when > > not folded for example). > > Yes the patch currently already does this. Ah, I probably looked at an old version then. > My question had more to do if > the late expansion when you alias one of these functions in C++ was > really an issue, since It used to (sometimes, only when you got the type > of the argument correct) expand before whereas now it'll always just > emit a function call in this edge case. The answer here is really that we should perform lowering at a point where we do not omit possible optimizations. For the particular case this would mean lowering after IPA optimizations (also think of accelerator offloading which uses the early optimization phase of the host). This also allows early optimization to more accurately estimate code size (when optimizing for size). And I still think whether to expose fp classification as FP or integer ops should be a target decision and done late. This still means canonicalizations like isnan -> UNORD should happen early and during folding. Note __builtin_fpclassify is really a special case and lowering that early is a good thing -- I'd simply keep the current code here at the start. > > I think fpclassify is special (double-check), you can't have an indirect > > call to > > the classification family as they are macros (builtins where they exist > > could be > > called indirectly though I guess we should simply disallow taking their > > address). These are appropriate for early lowering like you do. You can > > leave > > high-level ops from fpclassify lowering and rely on folding to turn them > > into > > sth more optimal. > > Fpclassify is also a function in C++, so in that regard it's not > different from the rest. For C code my patch will always do the right > thing as like you said they're macros so I would always be able to lower > them early. Hum, but fpclassify is then not a builtin in C++ given it needs to know the actual values of FP_NAN and friends. That is, we are talking about __builtin_fpclassify which does not match the standards fpclassify API. The builtin is always a function and never a macro. > > > > I still don't fully believe in lowering to integer ops like the isnan > > lowering you > > are doing. That hides what you are doing from (not yet existing) > > propagation > > passes of FP classification. I'd do those transforms only late. > > You are for example missing to open-code isunordered with integer ops, no? > > I'd have isnan become UNORDERED_EXPR and eventually expand that with > > bitops. > > I'm not sure how I would be able to distinguish between the > UNORDERED_EXPR of an isnan call and that or a general X UNORDERED_EXPR X > expression. The new isnan code that replaces the UNORD isn't a general > comparison, it can only really check for nan. In general don't how if I > rewrite these to TREE expressions early in match.pd how I would ever be > able to recognize the very specific cases they try to handle in order to > do bitops only when it makes sense. I believe the reverse transform x UNORD x to isnan (x) is also valid (modulo -fsignalling-nans). > > I think C++ cases will still be problematic, since match.pd will match on > > int (*xx)(...); > xx = isnan; > if (xx(a)) > > quite late, and only when xx is eventually replaced by isnan, so a lot > of optimizations would have already ignored the UNORD it would have > generated. Which means the bitops have to be quite late and would miss a > lot of other optimization phases. Is that so? Did you try whether vectorization likes x UNORD x or the bit operations variant of isnan (x) more? Try int foo (double *x, int n) { unsigned nan_cnt = 0; for (int i = 0; i < n; ++i) if (__builtin_isnan (x[i])) nan_cnt++; return nan_cnt; } for a conditional reduction. On x86_64 I get an inner vectorized loop .L5: movapd (%rax), %xmm0 addq $32, %rax movapd -16(%rax), %xmm1 cmpq %rdx, %rax cmpunordpd %xmm0, %xmm0 cmpunordpd %xmm1, %xmm1 pand %xmm3, %xmm0 pand %xmm3, %xmm1 shufps $136, %xmm1, %xmm0 paddd %xmm0, %xmm2 jne .L5 > It's these cases where you do stuff with a function that you can't with > a macro that's a problem for my Current patch. It'll just always leave > it in as a function call (the current expand code will expand to a cmp > if The types end up matching, but only then), the question is really if > it's a big issue or not. > > And if it is, maybe we should do that rewriting early on. It seems like > replacing xx with isnan early on Would have lots of benefits like > getting type errors. Since for instance this is wrong, but gcc still > produces code for it: > > int g; > > extern "C" int isnan (); > struct bar { int b; }; > > void foo(struct bar a) { > int (*xx)(...); > xx = isnan; > if (xx(a)) > g++; > } > > And makes a call to isnan with a struct as an argument. Well, this is what you wrote ... > For the record, with my current patch, testsuite/g++.dg/opt/pr60849.C passes > since it only checks for compile, > But with a call instead of a compare in the generated code. For the particular testcase that's not a problem I guess. With correctly prototyped isnan and a correct type for the function pointer it would be unfortunate. Thanks, Richard. > Thanks, > Tamar > > > > > That said, split out the uncontroversical part of moving existing foldings > > from > > builtins.c to match.pd where they generate no control-flow. > > > > Thanks, > > Richard. > > > > > Originally the patch was in expand, which would have covered the late > > > expansion case similar to what it's doing now in trunk. I was however > > > asked to move this to a GIMPLE lowering pass as to give other > > optimizations a chance to do further optimizations on the generated code. > > > > > > This of course works fine for C since these math functions are a Macro in > > > C > > but are functions in C++. > > > > > > C++ would then allow you to do stuff like take the address of the > > > C++ function so > > > > > > void foo(float a) { > > > int (*xx)(...); > > > xx = isnan; > > > if (xx(a)) > > > g++; > > > } > > > > > > is perfectly legal. My current patch leaves "isnan" in as a call as by > > > the time we're doing GIMPLE lowering the alias has not been resolved > > > yet, whereas the version currently committed is able to expand it as it's > > > late > > in expand. > > > > > > Curiously the behaviour is somewhat confusing. > > > if xx is called with a non-constant it is expanded as you would expect > > > > > > .LFB0: > > > .cfi_startproc > > > fcvt d0, s0 > > > fcmp d0, d0 > > > bvs .L4 > > > > > > but when xx is called with a constant, say 0 it's not > > > > > > .LFB0: > > > .cfi_startproc > > > mov w0, 0 > > > bl isnan > > > cbz w0, .L1 > > > > > > because it infers the type of the 0 to be an integer and then doesn't > > recognize the call. > > > using 0.0 works, but the behaviour is really counter intuitive. > > > > > > The question is what should I do now, clearly it would be useful to > > > handle the late expansion as well, however I don't know what the best > > approach would be. > > > > > > I can either: > > > > > > 1) Add two new implementations, one for constant folding and one for > > expansion, but then one in expand would > > > be quite similar to the one in the early lowering phase. The constant > > folding one could be very simple since > > > it's a constant I can just call the buildins and evaluate the value > > completely. > > > > > > 2) Use the patch as is but make another one to allow the renaming to > > > be applied quite early on. e.g while still in Tree or GIMPLE resolve > > > > > > int (*xx)(...); > > > xx = isnan; > > > if (xx(a)) > > > > > > to > > > > > > int (*xx)(...); > > > xx = isnan; > > > if (isnan(a)) > > > > > > This seems like it would be the best approach and the more useful one > > > in > > general. > > > > > > Any thoughts? > > > > > > Thanks, > > > Tamar > > > > > > > > > > > Richard. > > > > > > > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, > > HRB 21284 (AG Nuernberg) > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)