On Wed, Oct 15, 2014 at 7:10 PM, Yangfei (Felix) <felix.y...@huawei.com> wrote: > Hi Sterling, > > Since the patch is delayed for a long time, I'm kind of pushing it. Sorry > for that. > Yeah, you are right. We have some performance issue here as GCC may use > one more general register in some cases with this patch. > Take the following arraysum testcase for example. In doloop optimization, > GCC figures out that the number of iterations is 1024 and creates a new > pseudo 79 as the new trip count register. > The pseudo 79 is live throughout the loop, this makes the register > pressure in the loop higher. And it's possible that this new pseudo is > spilled by reload when the register pressure is very high. > I know that the xtensa loop instruction copies the trip count register > into the LCOUNT special register. And we need describe this hardware feature > in GCC in order to free the trip count register. > But I find it difficult to do. Do you have any good suggestions on this?
There are two issues related to the trip count, one I would like you to solve now, one later. 1. Later: The trip count doesn't need to be updated at all inside these loops, once the loop instruction executes. The code below relates to this case. 2. Now: You should be able to use a loop instruction regardless of whether the trip count is spilled. If you have an example where that wouldn't work, I would love to see it. > > arraysum.c: > int g[1024]; > int g_sum; > > void test_entry () > { > int i, Sum = 0; > > for (i = 0; i < 1024; i++) > Sum = Sum + g[i]; > > g_sum = Sum; > } > > > 1. RTL before the doloop optimization pass(arraysum.c.193r.loop2_invariant): > (note 34 0 32 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > (note 32 34 36 2 NOTE_INSN_FUNCTION_BEG) > (insn 36 32 37 2 (set (reg:SI 72 [ ivtmp$8 ]) > (mem/u/c:SI (symbol_ref/u:SI ("*.LC2") [flags 0x2]) [2 S4 A32])) 29 > {movsi_internal} > (expr_list:REG_EQUAL (symbol_ref:SI ("g") <var_decl 0x7f6eef5d62d0 g>) > (nil))) > (insn 37 36 33 2 (set (reg/f:SI 76 [ D.1393 ]) > (mem/u/c:SI (symbol_ref/u:SI ("*.LC3") [flags 0x2]) [2 S4 A32])) 29 > {movsi_internal} > (expr_list:REG_EQUAL (const:SI (plus:SI (symbol_ref:SI ("g") <var_decl > 0x7f6eef5d62d0 g>) > (const_int 4096 [0x1000]))) > (nil))) > (insn 33 37 42 2 (set (reg/v:SI 74 [ Sum ]) > (const_int 0 [0])) arraysum.c:6 29 {movsi_internal} > (nil)) > (code_label 42 33 38 3 2 "" [0 uses]) > (note 38 42 39 3 [bb 3] NOTE_INSN_BASIC_BLOCK) > (insn 39 38 40 3 (set (reg:SI 77 [ MEM[base: _14, offset: 0B] ]) > (mem:SI (reg:SI 72 [ ivtmp$8 ]) [2 MEM[base: _14, offset: 0B]+0 S4 > A32])) arraysum.c:9 29 {movsi_internal} > (nil)) > (insn 40 39 41 3 (set (reg/v:SI 74 [ Sum ]) > (plus:SI (reg/v:SI 74 [ Sum ]) > (reg:SI 77 [ MEM[base: _14, offset: 0B] ]))) arraysum.c:9 1 > {addsi3} > (expr_list:REG_DEAD (reg:SI 77 [ MEM[base: _14, offset: 0B] ]) > (nil))) > (insn 41 40 43 3 (set (reg:SI 72 [ ivtmp$8 ]) > (plus:SI (reg:SI 72 [ ivtmp$8 ]) > (const_int 4 [0x4]))) 1 {addsi3} > (nil)) > (jump_insn 43 41 52 3 (set (pc) > (if_then_else (ne (reg:SI 72 [ ivtmp$8 ]) > (reg/f:SI 76 [ D.1393 ])) > (label_ref:SI 52) > (pc))) arraysum.c:8 39 {*btrue} > (int_list:REG_BR_PROB 9899 (nil)) > -> 52) > (code_label 52 43 51 5 3 "" [1 uses]) > (note 51 52 44 5 [bb 5] NOTE_INSN_BASIC_BLOCK) > (note 44 51 45 4 [bb 4] NOTE_INSN_BASIC_BLOCK) > (insn 45 44 46 4 (set (reg/f:SI 78) > (mem/u/c:SI (symbol_ref/u:SI ("*.LC4") [flags 0x2]) [2 S4 A32])) > arraysum.c:11 29 {movsi_internal} > (expr_list:REG_EQUAL (symbol_ref:SI ("g_sum") <var_decl 0x7f6eef5d6360 > g_sum>) > (nil))) > (insn 46 45 0 4 (set (mem/c:SI (reg/f:SI 78) [2 g_sum+0 S4 A32]) > (reg/v:SI 74 [ Sum ])) arraysum.c:11 29 {movsi_internal} > (expr_list:REG_DEAD (reg/f:SI 78) > (expr_list:REG_DEAD (reg/v:SI 74 [ Sum ]) > (nil)))) > > > 2. RTL after the doloop optimization pass(arraysum.c.195r.loop2_doloop): > (note 34 0 32 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > (note 32 34 36 2 NOTE_INSN_FUNCTION_BEG) > (insn 36 32 37 2 (set (reg:SI 72 [ ivtmp$8 ]) > (mem/u/c:SI (symbol_ref/u:SI ("*.LC2") [flags 0x2]) [2 S4 A32])) 29 > {movsi_internal} > (expr_list:REG_EQUAL (symbol_ref:SI ("g") <var_decl 0x7f6eef5d62d0 g>) > (nil))) > (insn 37 36 33 2 (set (reg/f:SI 76 [ D.1393 ]) > (mem/u/c:SI (symbol_ref/u:SI ("*.LC3") [flags 0x2]) [2 S4 A32])) 29 > {movsi_internal} > (expr_list:REG_EQUAL (const:SI (plus:SI (symbol_ref:SI ("g") <var_decl > 0x7f6eef5d62d0 g>) > (const_int 4096 [0x1000]))) > (nil))) > (insn 33 37 54 2 (set (reg/v:SI 74 [ Sum ]) > (const_int 0 [0])) arraysum.c:6 29 {movsi_internal} > (nil)) > (insn 54 33 42 2 (set (reg:SI 79) > (const_int 1024 [0x400])) arraysum.c:6 -1 > (nil)) > (code_label 42 54 38 3 2 "" [0 uses]) > (note 38 42 39 3 [bb 3] NOTE_INSN_BASIC_BLOCK) > (insn 39 38 40 3 (set (reg:SI 77 [ MEM[base: _14, offset: 0B] ]) > (mem:SI (reg:SI 72 [ ivtmp$8 ]) [2 MEM[base: _14, offset: 0B]+0 S4 > A32])) arraysum.c:9 29 {movsi_internal} > (nil)) > (insn 40 39 41 3 (set (reg/v:SI 74 [ Sum ]) > (plus:SI (reg/v:SI 74 [ Sum ]) > (reg:SI 77 [ MEM[base: _14, offset: 0B] ]))) arraysum.c:9 1 > {addsi3} > (expr_list:REG_DEAD (reg:SI 77 [ MEM[base: _14, offset: 0B] ]) > (nil))) > (insn 41 40 53 3 (set (reg:SI 72 [ ivtmp$8 ]) > (plus:SI (reg:SI 72 [ ivtmp$8 ]) > (const_int 4 [0x4]))) 1 {addsi3} > (nil)) > (jump_insn 53 41 52 3 (parallel [ > (set (pc) > (if_then_else (ne (reg:SI 79) > (const_int 1 [0x1])) > (label_ref 52) > (pc))) > (set (reg:SI 79) > (plus:SI (reg:SI 79) > (const_int -1 [0xffffffffffffffff]))) > (unspec [ > (const_int 0 [0]) > ] 13) > (clobber (scratch:SI)) > ]) -1 > (int_list:REG_BR_PROB 9899 (nil)) > -> 52) > (code_label 52 53 51 5 3 "" [1 uses]) > (note 51 52 44 5 [bb 5] NOTE_INSN_BASIC_BLOCK) > (note 44 51 45 4 [bb 4] NOTE_INSN_BASIC_BLOCK) > (insn 45 44 46 4 (set (reg/f:SI 78) > (mem/u/c:SI (symbol_ref/u:SI ("*.LC4") [flags 0x2]) [2 S4 A32])) > arraysum.c:11 29 {movsi_internal} > (expr_list:REG_EQUAL (symbol_ref:SI ("g_sum") <var_decl 0x7f6eef5d6360 > g_sum>) > (nil))) > (insn 46 45 0 4 (set (mem/c:SI (reg/f:SI 78) [2 g_sum+0 S4 A32]) > (reg/v:SI 74 [ Sum ])) arraysum.c:11 29 {movsi_internal} > (expr_list:REG_DEAD (reg/f:SI 78) > (expr_list:REG_DEAD (reg/v:SI 74 [ Sum ]) > (nil)))) > > >> >> On Tue, Oct 14, 2014 at 8:39 AM, Felix Yang <fei.yang0...@gmail.com> wrote: >> > PING? >> > Cheers, >> > Felix >> >> Felix, >> >> This isn't my day job, 24-hour pings are unproductive. >> >> You shouldn't need to worry about the trip count register getting spilled. It >> makes no difference whatsoever to how the loop operates--the trip count is >> dead with regards to the loop once the instruction executes. You don't need >> to >> describe LCOUNT to gcc in order for this not to matter. It should be enough >> to >> describe the zcl as consuming the value in the same way a branch instruction >> consumes a value. >> >> If you have a case where spilling it is causing a problem, then there is a >> bug in >> your code, papered over by dropping case when it is spilled. Similarly with >> iter_reg_used_outside--it shouldn't affect whether or not a zcl is valid >> here. If >> you have a case where it does, there is likely a bug in your code. >> >> If the code is easier to write by maintaining trip_count up, then fine (for >> now); >> you give up some performance (in fact, a lot of performance), but that >> doesn't >> matter as to the correctness. >> >> >> > >> > >> > On Tue, Oct 14, 2014 at 12:30 AM, Felix Yang <fei.yang0...@gmail.com> >> wrote: >> >> Thanks for the comments. >> >> >> >> The patch checked the usage of teh trip count register, making sure >> >> that it is not used in the loop body other than the doloop_end or >> >> lives past the doloop_end instruction, as the following code snippet >> >> shows: >> >> >> >> + /* Scan all the blocks to make sure they don't use iter_reg. */ >> >> + if (loop->iter_reg_used || loop->iter_reg_used_outside) >> >> + { >> >> + if (dump_file) >> >> + fprintf (dump_file, ";; loop %d uses iterator\n", >> >> + loop->loop_no); >> >> + return false; >> >> + } >> >> >> >> For the spill issue, I think we need to handle it. The reason is >> >> that currently we are not telling GCC about the existence of the >> >> LCOUNT register. Instead, we keep the trip count in a general >> >> register and it's possible that this register can be spilled when >> >> register pressure is high. >> >> It's a good idea to post another patch to describe the LCOUNT >> >> register in GCC in order to free this general register. But I want >> >> this patch applied as a first step, OK? >> >> >> >> Cheers, >> >> Felix >> >> >> >> >> >> On Tue, Oct 14, 2014 at 12:09 AM, augustine.sterl...@gmail.com >> >> <augustine.sterl...@gmail.com> wrote: >> >>> On Fri, Oct 10, 2014 at 6:59 AM, Felix Yang <fei.yang0...@gmail.com> >> wrote: >> >>>> Hi Sterling, >> >>>> >> >>>> I made some improvement to the patch. Two changes: >> >>>> 1. TARGET_LOOPS is now used as a condition of the doloop >> >>>> related patterns, which is more elegant. >> >>> >> >>> Fine. >> >>> >> >>>> 2. As the trip count register of the zero-cost loop maybe >> >>>> potentially spilled, we need to change the patterns in order to >> >>>> handle this issue. >> >>> >> >>> Actually, for xtensa you don't. The trip count is copied into LCOUNT >> >>> at the execution of the loop instruction, and therefore a spill or >> >>> whatever doesn't matter--it won't affect the result. So as long as >> >>> you have the trip count at the start of the loop, you are fine. >> >>> >> >>> This does bring up an issue of whether or not the trip count can be >> >>> modified during the loop. (note that this is different than early >> >>> exit.) If it can, you can't use a zero-overhead loop. Does your >> >>> patch address this case. >> >>> >> >>> The solution is similar to that adapted by c6x backend. >> >>>> Just turn the zero-cost loop into a regular loop when that happens >> >>>> when reload is completed. >> >>>> Attached please find version 4 of the patch. Make check >> >>>> regression tested with xtensa-elf-gcc/simulator. >> >>>> OK for trunk?