[Bug tree-optimization/118297] not vectorizing some code

2025-01-07 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118297

--- Comment #9 from Tibor Győri  ---
> We do apply SLP vectorization with -march=znver3 so I wonder
> what you think we are missing (apart from the confusing -fopt-info-missed
> messages)?

I was not originally thinking GCC was missing anything here, but the optinfo
messages were sufficiently unclear that could no longer tell if something is
getting missed. My main issue here is definitely how unclear these messages can
be currently.

> We also refuse to loop-header copy this because there's a pow() call in the
> block.

That is unfortunate, I was hoping that I using idioms like std::pow(A, 2) to
square A would not have any optimization impact anymore.

[Bug rtl-optimization/118555] New: -fopt-info reporting of why decide_unroll_constant_iterations decides against unrolling could be improved

2025-01-19 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118555

Bug ID: 118555
   Summary: -fopt-info reporting of why
decide_unroll_constant_iterations decides against
unrolling could be improved
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tiborgyri at gmail dot com
CC: tiborgyri at gmail dot com
  Target Milestone: ---

When decide_unroll_constant_iterations runs it does emit a dump message to the
terminal:
---
dump_printf (MSG_NOTE,
 "considering unrolling loop with constant "
 "number of iterations\n");
---
and if it does decide to unroll, report_unroll issues a passed-optimization
message, nicely attached to a source location:
---
  dump_printf_loc (metadata, locus.get_user_location (),
   "loop unrolled %d times",
   loop->lpt_decision.times);
---

Unfortunately when decide_unroll_constant_iterations decides against unrolling
no messages are emitted. There are sections like this in the code:
---
  if (dump_file)
fprintf (dump_file, ";; Not considering loop, is too big\n");
---
but I think it would be a lot more useful if these would be emitted with
dump_printf, or preferably with dump_printf_loc.

I would also like to note that this fprintf (dump_file, ...) idiom is not
documented at https://gcc.gnu.org/onlinedocs/gccint/Dump-types.html

[Bug tree-optimization/118544] -fopt-info misreports unroll factor when using #pragma GCC unroll

2025-01-18 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118544

--- Comment #3 from Tibor Győri  ---
(In reply to Andrew Pinski from comment #2)
> The unroll 2 is correct, it was unrolled one time; likewise 3 is unrolled
> twice, etc. I suspect you are missunderstanding what the diagnostic is
> saying, it is saying unrolled the loop one extra iteration.
OK, I guess that does make sense, but it is unexpected that it does not follow
the same "counting convention" as the pragma. In the pragma one requests an
unroll factor, while the opt-info messages say how many iterations are
unrolled.

For what its worth, clang reports the unroll factor in its opt-info messages,
which feels like the more natural choice of "counting convention" to me.

[Bug tree-optimization/118544] -fopt-info misreports unroll factor when using #pragma GCC unroll

2025-01-20 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118544

--- Comment #7 from Tibor Győri  ---
(In reply to Richard Biener from comment #5)
> I suppose cunroll should report the loop was fully peeled.
> 
> Note the unroll amount might be confusig when for example loop header copying
> causes the number of latch executions to decrease by one before we get to
> unroll.

Yes, I think adding reports to some currently silent passes would help a lot.

[Bug rtl-optimization/118544] New: -fopt-info misreports unroll factor when using #pragma GCC unroll

2025-01-18 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118544

Bug ID: 118544
   Summary: -fopt-info misreports unroll factor when using #pragma
GCC unroll
   Product: gcc
   Version: 15.0
   URL: https://godbolt.org/z/x1eb65jWf
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tiborgyri at gmail dot com
CC: tiborgyri at gmail dot com
  Target Milestone: ---

Created attachment 60202
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60202&action=edit
Test case -Wall -Wextra -Ofast -std=c++20 -march=znver3 -gno-as-loc-support

Partially unrolling a loop with #pragma GCC unroll results in an "optimization
passed" message that misreports the unroll factor.
For example #pragma GCC unroll(2) results in this message "loop unrolled 1
times". This is misleading since GCC has correctly unrolled the loop by a
factor of 2, as requested by the user. The number reported in the opt-info
message is consistently 1 lower than the requested (and actual) unroll factor.

Another manifestation of this, is when the requested unroll factor is 1 lower
than the loop iteration count. For example, if a loop has 6 iterations, #pragma
GCC unroll(5) results in the following message:
"loop with 5 iterations completely unrolled"
This is confusing, given that the the loop has 6 iterations, not 5.

The only time this opt-info message is correct, is when the requested unroll
factor is >= the total number of iterations, in which case GCC correctly
reports that the loop was completely unrolled.

I imagine that some other pass that does not emit opt-info messages unrolls 1
iteration, hence the confusing messages.
See: https://godbolt.org/z/x1eb65jWf

[Bug tree-optimization/118297] New: vect_analyze_loop_form gets confused by outer loop that only executes its body once

2025-01-04 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118297

Bug ID: 118297
   Summary: vect_analyze_loop_form gets confused by outer loop
that only executes its body once
   Product: gcc
   Version: 15.0
   URL: https://godbolt.org/z/a5nKv3xnx
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tiborgyri at gmail dot com
CC: tiborgyri at gmail dot com
  Target Milestone: ---

See: https://godbolt.org/z/a5nKv3xnx
The outer loop of the loop nest trivially only runs its body exactly once, yet
something prints "not vectorized: unsupported outerloop form."

I have looked at the tree vectorizer source file and the only place I could
find this message was in vect_analyze_loop_form (line 1824):

  entryedge = loop_preheader_edge (innerloop);
  if (entryedge->src != loop->header
  || !single_exit (innerloop)
  || single_exit (innerloop)->dest != EDGE_PRED (loop->latch, 0)->src)
return opt_result::failure_at (vect_location,
   "not vectorized:"
   " unsupported outerloop form.\n");

I am not sure which of these conditions ends up being true, but to me it sounds
a bit silly when GCC says this trivial outer loop is unsupported by the
vectorizer.

[Bug other/118298] New: Partial unroll request for outer loop with #pragma GCC unroll is silently ignored

2025-01-04 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118298

Bug ID: 118298
   Summary: Partial unroll request for outer loop with #pragma GCC
unroll is silently ignored
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tiborgyri at gmail dot com
CC: tiborgyri at gmail dot com
  Target Milestone: ---

See: https://godbolt.org/z/Mc9fsW4s5
The outer loop in this example has a trip count of 6, and I am asking GCC to
unroll it by a factor of 2. GCC silently ignores the pragma, while clang does
what I expect and unrolls it to trip count of 3.

GCC should also honor this pragma, or if it cannot, it should at least leave a
note in the optimization report or stdout.

[Bug other/118295] New: The optimization report says sqrt is not inlinable, even when it does get inlined

2025-01-04 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118295

Bug ID: 118295
   Summary: The optimization report says sqrt is not inlinable,
even when it does get inlined
   Product: gcc
   Version: 15.0
   URL: https://godbolt.org/z/nrqjdhd5E
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tiborgyri at gmail dot com
  Target Milestone: ---

See: https://godbolt.org/z/nrqjdhd5E
Some inliner pass always reports sqrt as non-inlinable, even when it does get
inlined, eg. on modern x86 with -ffast-math the call to the library function is
reduced to a single vsqrtsd isn.

My gut feeling is that the inliner pass runs much earlier and it would be
difficult/hacky to try and teach such things to it. So I would suggest changing
its message to begin with something like "not inlinable by inlining pass",
instead of the unqualified "not inlinable", and at the same time adding an
"optimization passed" opt-report message to whichever pass inlines sqrt with
-fast-math.

This way the final opt-report would contain both the initial inlining failure,
and then the inlining success, which would make reading these reports congruent
with the code being emitted.

[Bug other/118295] The optimization report says sqrt is not inlinable, even when it does get inlined

2025-01-04 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118295

--- Comment #1 from Tibor Győri  ---
Created attachment 60042
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60042&action=edit
Test case

-Wall -Wextra -O3 -ffast-math -std=c++20 -march=znver3 -gno-as-loc-support

[Bug tree-optimization/118294] New: GCC doesn't unroll the outer loop of a nest where the outer body trivially only runs once

2025-01-04 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118294

Bug ID: 118294
   Summary: GCC doesn't unroll the outer loop of a nest where the
outer body trivially only runs once
   Product: gcc
   Version: 15.0
   URL: https://godbolt.org/z/G7WfxM3e7
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tiborgyri at gmail dot com
  Target Milestone: ---

In this example: https://godbolt.org/z/G7WfxM3e7
the outer loop "for (TT j=0; j

[Bug tree-optimization/118294] GCC doesn't unroll the outer loop of a nest where the outer body trivially only runs once

2025-01-04 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118294

Tibor Győri  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Tibor Győri  ---
Scratch that I got confused by the combination of opt report messages and the
compare and jump instead of vminsd...

[Bug tree-optimization/58902] small matrix multiplication non vectorized

2025-01-04 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58902

Tibor Győri  changed:

   What|Removed |Added

 CC||tiborgyri at gmail dot com,
   ||vincenzo.innocente at cern dot 
ch

--- Comment #1 from Tibor Győri  ---
Tested this with trunk (future GCC 15), and to me it looks like while the
tree-vectorizer still does not understand the loop nest, and/or judges the
vectorization to be unprofitable, both loops are fully unrolled and then end ub
getting at least somewhat vectorized by the SLP vectorizer.

Latest Clang appears to work similarly, fully unroll then SLP.
The final version of the Intel Classic compiler also seems to favor this
approach.
See https://godbolt.org/z/boWxWbYWz

Would you agree that this issue has been resolved?

[Bug tree-optimization/118297] not vecotrizing some code

2025-01-04 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118297

--- Comment #6 from Tibor Győri  ---
(In reply to Andrew Pinski from comment #5)
> (In reply to Tibor Győri from comment #4)
> > It might even be the case that the current cost model is correct,
> > vectorization is indeed sometimes unprofitable.
> > But in that case, the issue is how this is communicated to the user, the
> > "unsupported outerloop form" message does not feel informative enough. I
> > mean it is a trivial loop that will get completely unrolled, right? So in my
> > mind that already counts as "supported", but GCC is telling me it is not.
> 
> Well there are 2 different kinds of vectorizer, the SLP and loop based one,
> you were looking at the loop one which is saying it is unsupported.

That is all true, but I would prefer if the loop vectorizer identified itself
in its messages in the optimization report. Eg. prefix the messages with
"loop-vect:" or something. 
Plus, "unsupported" is quite vague. Unsupported how? If I am reading the opt
report, it is to figure out how to change my code to help the compiler, so it
would be nice if the compiler was more specific.
To suggest something actionable, currently this "unsupported outerloop form"
message is emitted when any of the three different conditions listed above are
met. If each of those conditions were checked separately, they could be given
different, more specific missed-optimization messages.

[Bug other/118295] The optimization report says sqrt is not inlinable, even when it does get inlined

2025-01-04 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118295

--- Comment #3 from Tibor Győri  ---
(In reply to Andrew Pinski from comment #2)
> Sqrt is NOT inlined but rather replaced with __builtin_sqrt which is then
> understood as SQRT instruction.  This is NOT inlining but rather
> understanding builtins.

Fair enough, but I still think there should be an "optimization passed" message
for such function call --> builtin transformations.

[Bug tree-optimization/118297] not vecotrizing some code

2025-01-04 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118297

Tibor Győri  changed:

   What|Removed |Added

URL||https://godbolt.org/z/a5nKv
   ||3xnx

--- Comment #4 from Tibor Győri  ---
It might even be the case that the current cost model is correct, vectorization
is indeed sometimes unprofitable.
But in that case, the issue is how this is communicated to the user, the
"unsupported outerloop form" message does not feel informative enough. I mean
it is a trivial loop that will get completely unrolled, right? So in my mind
that already counts as "supported", but GCC is telling me it is not.

[Bug target/119079] Intel assembly output should use MOVSXD instead of MOVSX for 32b->64b sign extensions

2025-03-01 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119079

--- Comment #1 from Tibor Győri  ---
Created attachment 60630
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60630&action=edit
Intel manual page for MOVSX/MOVSXD

[Bug target/119079] New: Intel assembly output should use MOVSXD instead of MOVSX for 32b->64b sign extensions

2025-03-01 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119079

Bug ID: 119079
   Summary: Intel assembly output should use MOVSXD instead of
MOVSX for 32b->64b sign extensions
   Product: gcc
   Version: 15.0
   URL: https://gcc.godbolt.org/z/GrEP9GTr6
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tiborgyri at gmail dot com
CC: tiborgyri at gmail dot com
  Target Milestone: ---
Target: x86-64

Created attachment 60629
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60629&action=edit
-O3 -march=znver3 -S -masm=intel

Currently, x86-64 backend is happy to generate Intel syntax assembly that is
strictly speaking not valid:

> movsx   rax, r10d

According to the relevant page of the Intel Software Developer’s Manual, MOVSX
is not a valid mnemonic for 32b --> 64b sign extensions, and MOVSXD should be
generated instead.

[Bug c++/120857] New: The wording of the warning issued by Wreturn-type is overly confident for the current implementation

2025-06-28 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120857

Bug ID: 120857
   Summary: The wording of the warning issued by Wreturn-type is
overly confident for the current implementation
   Product: gcc
   Version: 16.0
   URL: https://godbolt.org/z/xn5Th8avT
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tiborgyri at gmail dot com
CC: tiborgyri at gmail dot com
  Target Milestone: ---

As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67629, the current
implementation for issuing -Wreturn-type warnings is relatively simplistic and
has limitations due to how early it runs. This results in false positive
warnings being issued even in cases where it is trivial that control cannot
reach the end of the function, such as this:

int foo (bool a) {
if (a) return 0;
else if (!a) return 1;
}

Despite these current (as of GCC 16 trunk) (and longstanding) limitations, the
message emitted is extremely confident:

warning: control reaches end of non-void function [-Wreturn-type]

The wording unambiguously states that control reaches the end, without any
shred of uncertainty. I feel like given how easy it is to run into a false
positive, this is overly confident wording. The issue is made worse by the fact
that Wreturn-type is enabled by default for C++.

I propose that GCC should be more honest about the limitations of its
implementations, such as by changing this message to the following:

warning: cannot prove that control does not reach end of non-void function
[-Wreturn-type]

This message would be clear about the condition being detected and the limited
trust the user should put into the current implementation.

[Bug c/67629] bogus -Wreturn-type in a function with tautological if-else

2025-06-28 Thread tiborgyri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67629

Tibor Győri  changed:

   What|Removed |Added

 CC||tiborgyri at gmail dot com

--- Comment #11 from Tibor Győri  ---
Reconfirmed for GCC 16 trunk.
https://godbolt.org/z/xn5Th8avT