https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90078
--- Comment #4 from bin cheng <amker at gcc dot gnu.org> --- In get_scaled_computation_cost_at, we have very big ratio between bb_count/loop_count: (gdb) p data->current_loop->latch->count $50 = {static n_bits = 61, static max_count = 2305843009213693950, static uninitialized_count = 2305843009213693951, m_val = 158483, m_quality = profile_guessed_local} (gdb) p gimple_bb(at)->count $51 = {static n_bits = 61, static max_count = 2305843009213693950, static uninitialized_count = 2305843009213693951, m_val = 1569139790, m_quality = profile_guessed_local} (gdb) p 1569139790 / 158483 $52 = 9900 (gdb) p cost $53 = {cost = 20, complexity = 2, scratch = 1} (gdb) p 19 * 9900 $54 = 188100 as a result, sum_cost soon reaches to overflow of infinite_cost. Shall we cap the ratio so that it doesn't grow too quick? Of course, some benchmark data is needed for this heuristic tuning. Another problem is the generated binary has segment fault issue even compiled O0: $ ./g++ -O0 pr90078.cc -o a.out -ftemplate-depth=1000000 -ftime-report -g -std=c++14 $ gdb --args ./a.out Dump of assembler code for function main(): 0x0000000000400572 <+0>: push %rbp 0x0000000000400573 <+1>: mov %rsp,%rbp 0x0000000000400576 <+4>: sub $0x2625a020,%rsp 0x000000000040057d <+11>: lea -0x2625a020(%rbp),%rax 0x0000000000400584 <+18>: mov %rax,%rdi => 0x0000000000400587 <+21>: callq 0x4006c0 <Tensor4<float, 100, 100, 100, 100>::Tensor4()> 0x000000000040058c <+26>: lea -0x4c4b410(%rbp),%rax 0x0000000000400593 <+33>: lea -0xe4e1c10(%rbp),%rdx The segment fault happens at the callq instruction.