[Bug tree-optimization/118297] not vectorizing some code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118297 --- Comment #9 from Tibor Győri --- > We do apply SLP vectorization with -march=znver3 so I wonder > what you think we are missing (apart from the confusing -fopt-info-missed > messages)? I was not originally thinking GCC was missing anything here, but the optinfo messages were sufficiently unclear that could no longer tell if something is getting missed. My main issue here is definitely how unclear these messages can be currently. > We also refuse to loop-header copy this because there's a pow() call in the > block. That is unfortunate, I was hoping that I using idioms like std::pow(A, 2) to square A would not have any optimization impact anymore.
[Bug rtl-optimization/118555] New: -fopt-info reporting of why decide_unroll_constant_iterations decides against unrolling could be improved
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118555 Bug ID: 118555 Summary: -fopt-info reporting of why decide_unroll_constant_iterations decides against unrolling could be improved Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: diagnostic Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tiborgyri at gmail dot com CC: tiborgyri at gmail dot com Target Milestone: --- When decide_unroll_constant_iterations runs it does emit a dump message to the terminal: --- dump_printf (MSG_NOTE, "considering unrolling loop with constant " "number of iterations\n"); --- and if it does decide to unroll, report_unroll issues a passed-optimization message, nicely attached to a source location: --- dump_printf_loc (metadata, locus.get_user_location (), "loop unrolled %d times", loop->lpt_decision.times); --- Unfortunately when decide_unroll_constant_iterations decides against unrolling no messages are emitted. There are sections like this in the code: --- if (dump_file) fprintf (dump_file, ";; Not considering loop, is too big\n"); --- but I think it would be a lot more useful if these would be emitted with dump_printf, or preferably with dump_printf_loc. I would also like to note that this fprintf (dump_file, ...) idiom is not documented at https://gcc.gnu.org/onlinedocs/gccint/Dump-types.html
[Bug tree-optimization/118544] -fopt-info misreports unroll factor when using #pragma GCC unroll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118544 --- Comment #3 from Tibor Győri --- (In reply to Andrew Pinski from comment #2) > The unroll 2 is correct, it was unrolled one time; likewise 3 is unrolled > twice, etc. I suspect you are missunderstanding what the diagnostic is > saying, it is saying unrolled the loop one extra iteration. OK, I guess that does make sense, but it is unexpected that it does not follow the same "counting convention" as the pragma. In the pragma one requests an unroll factor, while the opt-info messages say how many iterations are unrolled. For what its worth, clang reports the unroll factor in its opt-info messages, which feels like the more natural choice of "counting convention" to me.
[Bug tree-optimization/118544] -fopt-info misreports unroll factor when using #pragma GCC unroll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118544 --- Comment #7 from Tibor Győri --- (In reply to Richard Biener from comment #5) > I suppose cunroll should report the loop was fully peeled. > > Note the unroll amount might be confusig when for example loop header copying > causes the number of latch executions to decrease by one before we get to > unroll. Yes, I think adding reports to some currently silent passes would help a lot.
[Bug rtl-optimization/118544] New: -fopt-info misreports unroll factor when using #pragma GCC unroll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118544 Bug ID: 118544 Summary: -fopt-info misreports unroll factor when using #pragma GCC unroll Product: gcc Version: 15.0 URL: https://godbolt.org/z/x1eb65jWf Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tiborgyri at gmail dot com CC: tiborgyri at gmail dot com Target Milestone: --- Created attachment 60202 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60202&action=edit Test case -Wall -Wextra -Ofast -std=c++20 -march=znver3 -gno-as-loc-support Partially unrolling a loop with #pragma GCC unroll results in an "optimization passed" message that misreports the unroll factor. For example #pragma GCC unroll(2) results in this message "loop unrolled 1 times". This is misleading since GCC has correctly unrolled the loop by a factor of 2, as requested by the user. The number reported in the opt-info message is consistently 1 lower than the requested (and actual) unroll factor. Another manifestation of this, is when the requested unroll factor is 1 lower than the loop iteration count. For example, if a loop has 6 iterations, #pragma GCC unroll(5) results in the following message: "loop with 5 iterations completely unrolled" This is confusing, given that the the loop has 6 iterations, not 5. The only time this opt-info message is correct, is when the requested unroll factor is >= the total number of iterations, in which case GCC correctly reports that the loop was completely unrolled. I imagine that some other pass that does not emit opt-info messages unrolls 1 iteration, hence the confusing messages. See: https://godbolt.org/z/x1eb65jWf
[Bug tree-optimization/118297] New: vect_analyze_loop_form gets confused by outer loop that only executes its body once
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118297 Bug ID: 118297 Summary: vect_analyze_loop_form gets confused by outer loop that only executes its body once Product: gcc Version: 15.0 URL: https://godbolt.org/z/a5nKv3xnx Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tiborgyri at gmail dot com CC: tiborgyri at gmail dot com Target Milestone: --- See: https://godbolt.org/z/a5nKv3xnx The outer loop of the loop nest trivially only runs its body exactly once, yet something prints "not vectorized: unsupported outerloop form." I have looked at the tree vectorizer source file and the only place I could find this message was in vect_analyze_loop_form (line 1824): entryedge = loop_preheader_edge (innerloop); if (entryedge->src != loop->header || !single_exit (innerloop) || single_exit (innerloop)->dest != EDGE_PRED (loop->latch, 0)->src) return opt_result::failure_at (vect_location, "not vectorized:" " unsupported outerloop form.\n"); I am not sure which of these conditions ends up being true, but to me it sounds a bit silly when GCC says this trivial outer loop is unsupported by the vectorizer.
[Bug other/118298] New: Partial unroll request for outer loop with #pragma GCC unroll is silently ignored
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118298 Bug ID: 118298 Summary: Partial unroll request for outer loop with #pragma GCC unroll is silently ignored Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: tiborgyri at gmail dot com CC: tiborgyri at gmail dot com Target Milestone: --- See: https://godbolt.org/z/Mc9fsW4s5 The outer loop in this example has a trip count of 6, and I am asking GCC to unroll it by a factor of 2. GCC silently ignores the pragma, while clang does what I expect and unrolls it to trip count of 3. GCC should also honor this pragma, or if it cannot, it should at least leave a note in the optimization report or stdout.
[Bug other/118295] New: The optimization report says sqrt is not inlinable, even when it does get inlined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118295 Bug ID: 118295 Summary: The optimization report says sqrt is not inlinable, even when it does get inlined Product: gcc Version: 15.0 URL: https://godbolt.org/z/nrqjdhd5E Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: tiborgyri at gmail dot com Target Milestone: --- See: https://godbolt.org/z/nrqjdhd5E Some inliner pass always reports sqrt as non-inlinable, even when it does get inlined, eg. on modern x86 with -ffast-math the call to the library function is reduced to a single vsqrtsd isn. My gut feeling is that the inliner pass runs much earlier and it would be difficult/hacky to try and teach such things to it. So I would suggest changing its message to begin with something like "not inlinable by inlining pass", instead of the unqualified "not inlinable", and at the same time adding an "optimization passed" opt-report message to whichever pass inlines sqrt with -fast-math. This way the final opt-report would contain both the initial inlining failure, and then the inlining success, which would make reading these reports congruent with the code being emitted.
[Bug other/118295] The optimization report says sqrt is not inlinable, even when it does get inlined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118295 --- Comment #1 from Tibor Győri --- Created attachment 60042 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60042&action=edit Test case -Wall -Wextra -O3 -ffast-math -std=c++20 -march=znver3 -gno-as-loc-support
[Bug tree-optimization/118294] New: GCC doesn't unroll the outer loop of a nest where the outer body trivially only runs once
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118294 Bug ID: 118294 Summary: GCC doesn't unroll the outer loop of a nest where the outer body trivially only runs once Product: gcc Version: 15.0 URL: https://godbolt.org/z/G7WfxM3e7 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tiborgyri at gmail dot com Target Milestone: --- In this example: https://godbolt.org/z/G7WfxM3e7 the outer loop "for (TT j=0; j
[Bug tree-optimization/118294] GCC doesn't unroll the outer loop of a nest where the outer body trivially only runs once
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118294 Tibor Győri changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #1 from Tibor Győri --- Scratch that I got confused by the combination of opt report messages and the compare and jump instead of vminsd...
[Bug tree-optimization/58902] small matrix multiplication non vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58902 Tibor Győri changed: What|Removed |Added CC||tiborgyri at gmail dot com, ||vincenzo.innocente at cern dot ch --- Comment #1 from Tibor Győri --- Tested this with trunk (future GCC 15), and to me it looks like while the tree-vectorizer still does not understand the loop nest, and/or judges the vectorization to be unprofitable, both loops are fully unrolled and then end ub getting at least somewhat vectorized by the SLP vectorizer. Latest Clang appears to work similarly, fully unroll then SLP. The final version of the Intel Classic compiler also seems to favor this approach. See https://godbolt.org/z/boWxWbYWz Would you agree that this issue has been resolved?
[Bug tree-optimization/118297] not vecotrizing some code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118297 --- Comment #6 from Tibor Győri --- (In reply to Andrew Pinski from comment #5) > (In reply to Tibor Győri from comment #4) > > It might even be the case that the current cost model is correct, > > vectorization is indeed sometimes unprofitable. > > But in that case, the issue is how this is communicated to the user, the > > "unsupported outerloop form" message does not feel informative enough. I > > mean it is a trivial loop that will get completely unrolled, right? So in my > > mind that already counts as "supported", but GCC is telling me it is not. > > Well there are 2 different kinds of vectorizer, the SLP and loop based one, > you were looking at the loop one which is saying it is unsupported. That is all true, but I would prefer if the loop vectorizer identified itself in its messages in the optimization report. Eg. prefix the messages with "loop-vect:" or something. Plus, "unsupported" is quite vague. Unsupported how? If I am reading the opt report, it is to figure out how to change my code to help the compiler, so it would be nice if the compiler was more specific. To suggest something actionable, currently this "unsupported outerloop form" message is emitted when any of the three different conditions listed above are met. If each of those conditions were checked separately, they could be given different, more specific missed-optimization messages.
[Bug other/118295] The optimization report says sqrt is not inlinable, even when it does get inlined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118295 --- Comment #3 from Tibor Győri --- (In reply to Andrew Pinski from comment #2) > Sqrt is NOT inlined but rather replaced with __builtin_sqrt which is then > understood as SQRT instruction. This is NOT inlining but rather > understanding builtins. Fair enough, but I still think there should be an "optimization passed" message for such function call --> builtin transformations.
[Bug tree-optimization/118297] not vecotrizing some code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118297 Tibor Győri changed: What|Removed |Added URL||https://godbolt.org/z/a5nKv ||3xnx --- Comment #4 from Tibor Győri --- It might even be the case that the current cost model is correct, vectorization is indeed sometimes unprofitable. But in that case, the issue is how this is communicated to the user, the "unsupported outerloop form" message does not feel informative enough. I mean it is a trivial loop that will get completely unrolled, right? So in my mind that already counts as "supported", but GCC is telling me it is not.
[Bug target/119079] Intel assembly output should use MOVSXD instead of MOVSX for 32b->64b sign extensions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119079 --- Comment #1 from Tibor Győri --- Created attachment 60630 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60630&action=edit Intel manual page for MOVSX/MOVSXD
[Bug target/119079] New: Intel assembly output should use MOVSXD instead of MOVSX for 32b->64b sign extensions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119079 Bug ID: 119079 Summary: Intel assembly output should use MOVSXD instead of MOVSX for 32b->64b sign extensions Product: gcc Version: 15.0 URL: https://gcc.godbolt.org/z/GrEP9GTr6 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tiborgyri at gmail dot com CC: tiborgyri at gmail dot com Target Milestone: --- Target: x86-64 Created attachment 60629 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60629&action=edit -O3 -march=znver3 -S -masm=intel Currently, x86-64 backend is happy to generate Intel syntax assembly that is strictly speaking not valid: > movsx rax, r10d According to the relevant page of the Intel Software Developer’s Manual, MOVSX is not a valid mnemonic for 32b --> 64b sign extensions, and MOVSXD should be generated instead.
[Bug c++/120857] New: The wording of the warning issued by Wreturn-type is overly confident for the current implementation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120857 Bug ID: 120857 Summary: The wording of the warning issued by Wreturn-type is overly confident for the current implementation Product: gcc Version: 16.0 URL: https://godbolt.org/z/xn5Th8avT Status: UNCONFIRMED Keywords: diagnostic Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: tiborgyri at gmail dot com CC: tiborgyri at gmail dot com Target Milestone: --- As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67629, the current implementation for issuing -Wreturn-type warnings is relatively simplistic and has limitations due to how early it runs. This results in false positive warnings being issued even in cases where it is trivial that control cannot reach the end of the function, such as this: int foo (bool a) { if (a) return 0; else if (!a) return 1; } Despite these current (as of GCC 16 trunk) (and longstanding) limitations, the message emitted is extremely confident: warning: control reaches end of non-void function [-Wreturn-type] The wording unambiguously states that control reaches the end, without any shred of uncertainty. I feel like given how easy it is to run into a false positive, this is overly confident wording. The issue is made worse by the fact that Wreturn-type is enabled by default for C++. I propose that GCC should be more honest about the limitations of its implementations, such as by changing this message to the following: warning: cannot prove that control does not reach end of non-void function [-Wreturn-type] This message would be clear about the condition being detected and the limited trust the user should put into the current implementation.
[Bug c/67629] bogus -Wreturn-type in a function with tautological if-else
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67629 Tibor Győri changed: What|Removed |Added CC||tiborgyri at gmail dot com --- Comment #11 from Tibor Győri --- Reconfirmed for GCC 16 trunk. https://godbolt.org/z/xn5Th8avT