https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90078

--- Comment #4 from bin cheng <amker at gcc dot gnu.org> ---
In get_scaled_computation_cost_at, we have very big ratio between
bb_count/loop_count:

(gdb) p data->current_loop->latch->count                   
$50 = {static n_bits = 61, static max_count = 2305843009213693950, static
uninitialized_count = 2305843009213693951, m_val = 158483, m_quality =
profile_guessed_local}
(gdb) p gimple_bb(at)->count
$51 = {static n_bits = 61, static max_count = 2305843009213693950, static
uninitialized_count = 2305843009213693951, m_val = 1569139790, m_quality =
profile_guessed_local}
(gdb) p 1569139790 / 158483
$52 = 9900
(gdb) p cost
$53 = {cost = 20, complexity = 2, scratch = 1}
(gdb) p 19 * 9900
$54 = 188100

as a result, sum_cost soon reaches to overflow of infinite_cost.  Shall we cap
the ratio so that it doesn't grow too quick?  Of course, some benchmark data is
needed for this heuristic tuning.


Another problem is the generated binary has segment fault issue even compiled
O0:

$ ./g++ -O0 pr90078.cc -o a.out -ftemplate-depth=1000000 -ftime-report  -g
-std=c++14
$ gdb --args ./a.out

Dump of assembler code for function main():
   0x0000000000400572 <+0>:     push   %rbp
   0x0000000000400573 <+1>:     mov    %rsp,%rbp
   0x0000000000400576 <+4>:     sub    $0x2625a020,%rsp
   0x000000000040057d <+11>:    lea    -0x2625a020(%rbp),%rax
   0x0000000000400584 <+18>:    mov    %rax,%rdi
=> 0x0000000000400587 <+21>:    callq  0x4006c0 <Tensor4<float, 100, 100, 100,
100>::Tensor4()>
   0x000000000040058c <+26>:    lea    -0x4c4b410(%rbp),%rax
   0x0000000000400593 <+33>:    lea    -0xe4e1c10(%rbp),%rdx

The segment fault happens at the callq instruction.

Reply via email to