question on arm soft-fp function __aeabi_d2uiz
Hi, I found in gcc/config/arm/ieee754-df.S, the function __aeabi_d2uiz converts double into unsigned integer and the function always return 0 if the double value is negative. for example the following codes: ---sample codes-- unsigned long ul; double d = -1.1; int main(void) { ul = (unsigned long)d; fprintf (stdout, "ul = 0x%X\n", ul); return 0; } the output of __aeabi_d2uiz on arm soft-fpu is 0x0, resulting in different behaviors between "(unsigned int)(int)d" and "(unsigned int)d". I also tried the code on x86-cygwin, which prints 0x. I am wondering why __aeabi_d2uiz returns 0 for negative double values. Is this behavior defined by arm fpu and it's different with x86 in fpu implementation? I have no arm fpu platform to verify this question and have know little about float porints, So any clarification? Thanks very much. -- Best Regards.
question on find_if_case_2 in ifcvt.c
Hi, In ifcvt.c's function find_if_case_2, it uses cheap_bb_rtx_cost_p to judge the conversion. Function cheap_bb_rtx_cost_p checks whether the total insn_rtx_cost on non-jump insns in basic block BB is less than MAX_COST. So the question is why uses cheap_bb_rtx_cost_p, even when we know the ELSE is predicted, which means there is benefit from this conversion anyway. Second, should cheap_bb_rtx_cost_p be tuned as "checks whether the total insn_rtx_cost on non-jump insns in basic block BB is no larger than MAX_COST." to prefer normal instructions than branch even there have same costs. Any suggestions? Thanks in advance. -- Best Regards.
CFLAGS used in libgcc makefile?
Hi guys, Is it CFLAGS used by libgcc/Makefile.in to build libgcc.a? It seems if I configure gcc with CFLAGS="-O0 -g " environment variable, libgcc is also compiled with -O0 option. I'm wondering why do not use CFLAGS_FOR_TARGET here(CFLAGS->INTERNAL_CFLAGS->gcc_compile_bare->gcc_compile). Please help, thanks. -- Best Regards.
Question on _GLIBCXX_HOSTED macro libstdc++ and libsupc++
Hi, In libstdc++-v3/libsupc++/eh_term_handler.cc, it says by default the demangler things are pulled in, according to whether _GLIBCXX_HOSTED is defined. the demangler exception terminating handler are really big, especially for embedded system. Secondly, _GLIBCXX_HOSTED is now defined if --enable-hosted-libstdcxx is given(by default it is). This option also controls whether libstdc++.a itself is built for target system. So, for an embedded system, how could I provide the earlier "silent death" handler by defining _GLIBCXX_HOSTED, also with libstdc++ built? Any suggestion? Thanks in advance. FYI, all above are talking about cross-toolchain. -- Best Regards.
Re: Question on _GLIBCXX_HOSTED macro libstdc++ and libsupc++
> (Any reason this wasn't sent to the libstdc++ list?) > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43852 proposes a "quiet > mode" which would reduce code size by disabling some of the code in > eh_term_handler.cc and pure.cc - would that do what you want? > > I've not had time to do anything about it, but I think Sebastian > (CC'd) has a copyright assignment in place now, and he's provided a > patch implementing it. > Sorry for missing the list, cced now. It is exactly what I meant, thanks very much. -- Best Regards.
missing conditional propagation in cprop.c pass
Hi, I ran into a case and found conditional (const) propagation is mishandled in cprop pass. With following insn sequence after cprop1 pass: (note 878 877 880 96 [bb 96] NOTE_INSN_BASIC_BLOCK) (insn 882 881 883 96 (set (reg:CC 24 cc) (compare:CC (reg:SI 684 [ default_num_contexts ]) (const_int 0 [0]))) core_main.c:265 211 {*arm_cmpsi_insn} (nil)) (jump_insn 883 882 886 96 (set (pc) (if_then_else (ne (reg:CC 24 cc) (const_int 0 [0])) (label_ref:SI 905) (pc))) core_main.c:265 223 {*arm_cond_branch} (expr_list:REG_DEAD (reg:CC 24 cc) (expr_list:REG_BR_PROB (const_int 9100 [0x238c]) (nil))) -> 905) (note 886 883 49 97 [bb 97] NOTE_INSN_BASIC_BLOCK) (insn 49 886 0 97 (set (reg/v:SI 291 [ total_errors ]) (reg:SI 684 [ default_num_contexts ])) core_main.c:265 709 {*thumb2_movsi_insn} (expr_list:REG_DEAD (reg:SI 684 [ default_num_contexts ]) (expr_list:REG_EQUAL (const_int 0 [0]) (nil .. (code_label 905 54 904 47 54 "" [1 uses]) (note 904 905 46 47 [bb 47] NOTE_INSN_BASIC_BLOCK) (insn 46 904 47 47 (set (reg/v:SI 291 [ total_errors ]) (const_int 0 [0])) core_main.c:265 709 {*thumb2_movsi_insn} (nil)) The insn49 should be propagated with conditional const from insn882 and jump_insn883, optimized into "r291<-0" as following code, then let pre do redundancy elimination work. (note 878 877 880 96 [bb 96] NOTE_INSN_BASIC_BLOCK) (insn 882 881 883 96 (set (reg:CC 24 cc) (compare:CC (reg:SI 684 [ default_num_contexts ]) (const_int 0 [0]))) core_main.c:265 211 {*arm_cmpsi_insn} (nil)) (jump_insn 883 882 886 96 (set (pc) (if_then_else (ne (reg:CC 24 cc) (const_int 0 [0])) (label_ref:SI 905) (pc))) core_main.c:265 223 {*arm_cond_branch} (expr_list:REG_DEAD (reg:CC 24 cc) (expr_list:REG_BR_PROB (const_int 9100 [0x238c]) (nil))) -> 905) (note 886 883 49 97 [bb 97] NOTE_INSN_BASIC_BLOCK) (insn 49 886 0 97 (set (reg/v:SI 291 [ total_errors ]) (const_int 0 [0])) core_main.c:265 709 {*thumb2_movsi_insn} (expr_list:REG_DEAD (reg:SI 684 [ default_num_contexts ]) (expr_list:REG_EQUAL (const_int 0 [0]) (nil .. (code_label 905 54 904 47 54 "" [1 uses]) (note 904 905 46 47 [bb 47] NOTE_INSN_BASIC_BLOCK) (insn 46 904 47 47 (set (reg/v:SI 291 [ total_errors ]) (const_int 0 [0])) core_main.c:265 709 {*thumb2_movsi_insn} (nil)) The problem is function one_cprop_pass does local const/copy propagation pass first, then the global pass, which only handles global opportunities. Though conditional const information "r684 <- 0" is collected by find_implicit_sets, the conditional information is recorded as local information of bb 97, and it is not recorded in avout of bb 96, so not in avin of bb 97 either. Unfortunately, the global pass only considers potential opportunities from avin of each basic block in function cprop_insn and find_avail_set. That's why the conditional propagation opportunity in bb 97 is missed. I worked a patch to fix this, and wanna hear more suggestions on this topic. Is it a bug or I missed something important? Thanks BTW, I'm using gcc mainline which configured for arm-none-eabi target.
Re: missing conditional propagation in cprop.c pass
On Tue, Sep 27, 2011 at 4:19 PM, Amker.Cheng wrote: > Hi, > I ran into a case and found conditional (const) propagation is > mishandled in cprop pass. > With following insn sequence after cprop1 pass: > > (note 878 877 880 96 [bb 96] NOTE_INSN_BASIC_BLOCK) > > (insn 882 881 883 96 (set (reg:CC 24 cc) > (compare:CC (reg:SI 684 [ default_num_contexts ]) > (const_int 0 [0]))) core_main.c:265 211 {*arm_cmpsi_insn} > (nil)) > > (jump_insn 883 882 886 96 (set (pc) > (if_then_else (ne (reg:CC 24 cc) > (const_int 0 [0])) > (label_ref:SI 905) > (pc))) core_main.c:265 223 {*arm_cond_branch} > (expr_list:REG_DEAD (reg:CC 24 cc) > (expr_list:REG_BR_PROB (const_int 9100 [0x238c]) > (nil))) > -> 905) > > (note 886 883 49 97 [bb 97] NOTE_INSN_BASIC_BLOCK) > > (insn 49 886 0 97 (set (reg/v:SI 291 [ total_errors ]) > (reg:SI 684 [ default_num_contexts ])) core_main.c:265 709 > {*thumb2_movsi_insn} > (expr_list:REG_DEAD (reg:SI 684 [ default_num_contexts ]) > (expr_list:REG_EQUAL (const_int 0 [0]) > (nil > .. > > (code_label 905 54 904 47 54 "" [1 uses]) > > (note 904 905 46 47 [bb 47] NOTE_INSN_BASIC_BLOCK) > > (insn 46 904 47 47 (set (reg/v:SI 291 [ total_errors ]) > (const_int 0 [0])) core_main.c:265 709 {*thumb2_movsi_insn} > (nil)) > > > The insn49 should be propagated with conditional const from insn882 > and jump_insn883, optimized into "r291<-0" as following code, then let > pre do redundancy elimination work. > > (note 878 877 880 96 [bb 96] NOTE_INSN_BASIC_BLOCK) > > (insn 882 881 883 96 (set (reg:CC 24 cc) > (compare:CC (reg:SI 684 [ default_num_contexts ]) > (const_int 0 [0]))) core_main.c:265 211 {*arm_cmpsi_insn} > (nil)) > > (jump_insn 883 882 886 96 (set (pc) > (if_then_else (ne (reg:CC 24 cc) > (const_int 0 [0])) > (label_ref:SI 905) > (pc))) core_main.c:265 223 {*arm_cond_branch} > (expr_list:REG_DEAD (reg:CC 24 cc) > (expr_list:REG_BR_PROB (const_int 9100 [0x238c]) > (nil))) > -> 905) > > (note 886 883 49 97 [bb 97] NOTE_INSN_BASIC_BLOCK) > > (insn 49 886 0 97 (set (reg/v:SI 291 [ total_errors ]) > (const_int 0 [0])) core_main.c:265 709 {*thumb2_movsi_insn} > (expr_list:REG_DEAD (reg:SI 684 [ default_num_contexts ]) > (expr_list:REG_EQUAL (const_int 0 [0]) > (nil > .. > > (code_label 905 54 904 47 54 "" [1 uses]) > > (note 904 905 46 47 [bb 47] NOTE_INSN_BASIC_BLOCK) > > (insn 46 904 47 47 (set (reg/v:SI 291 [ total_errors ]) > (const_int 0 [0])) core_main.c:265 709 {*thumb2_movsi_insn} > (nil)) > > > The problem is function one_cprop_pass does local const/copy > propagation pass first, then the global pass, which only handles > global opportunities. > Though conditional const information "r684 <- 0" is collected by > find_implicit_sets, the conditional information is recorded as local > information of bb 97, and it is not recorded in avout of bb 96, so not > in avin of bb 97 either. > > Unfortunately, the global pass only considers potential opportunities > from avin of each basic block in function cprop_insn and > find_avail_set. > > That's why the conditional propagation opportunity in bb 97 is missed. > > I worked a patch to fix this, and wanna hear more suggestions on this topic. > Is it a bug or I missed something important? > > Thanks > > BTW, I'm using gcc mainline which configured for arm-none-eabi target. > No Interest? Any tips will be great appreciated, thanks. -- Best Regards.
Re: missing conditional propagation in cprop.c pass
> Unless there's something arch specific related to arm, insn 882 is a > compare, which won't change r684. Why do you think 0 should > propagated to r291 if r684 is not zero? > Thanks for replying. Sorry if I misunderstood anything below, and please correct me. insn 882 : cc <- compare (r684, 0) jump_insn 883 : if (cc != 0) goto insn 46 insn 49: r291 <- r684 .. insn 46 cc contains the result of subtracting 0 from r684; control flow goes to insn_49 only if (cc == 0), which implies (r684 == 0). Then at insn_49 we have conditional const propagation "r684 <- 0", is it right? Thanks again. -- Best Regards.
Re: missing conditional propagation in cprop.c pass
> > Nobody mentioned this so I might be way off but cc doesn't get (minus > (reg r684) (const_int 0)). It gets the `condition codes` modification as > a consequence of the subtraction. > Hi Paulo, According to section "comparison operations" in internal: "The comparison operators may be used to compare the condition codes (cc0) against zero, as in (eq (cc0) (const_int 0)). Such a construct actually refers to the result of the preceding instruction in which the condition codes were set." and the result of preceding instruction here is the result of the (compare: r684, 0), which according to the definition: " (compare:m x y) Represents the result of subtracting y from x for purposes of comparison." I'm not sure if I've misunderstood any thing and please comment. Thanks very much. -- Best Regards.
Re: missing conditional propagation in cprop.c pass
>> >> I believe, the optimization you may be referring to is value range >> propagation which does predication of values based on predicates of >> conditions. GCC definitely applies VRP at the tree stage, I am not >> sure if there is an RTL pass to do the same. > There are also RTL optimizers which perform this kind of constant > propagation. See cprop.c (in older versions of gcc this code was in > gcse.c) > Hi Jeff, This is exactly what I referred in the first message. Though the cprop.c pass collected the implicit_set information, it is recorded as local info of basic block, and cprop only does global propagation. The result is such conditional const propagation opportunities is missed. The whole process in cprop pass is like: bb0 : if (x) then bb1 else bb2 end 1, implicit_set from the preceding bb0 is tagged as local in bb1; 2, in compute_local_properties, the implicit_set is recorded in avloc[bb1]; 3, in compute_cprop_available, the implicit_set is only recorded in avout[bb1], not in avin[bb1], which it should be; 4, in cprop_insn and find_avail_set, only info recorded in avin[bb1] is considered when try to do propagation for bb1; Well, I believe it is a small problem, since implicit_set is recorded in avout[bb1], The basic block bb1 is the only one get missed in propagation. Don't know if I described the problem clearly and please comment. Thanks very much. -- Best Regards.
Re: missing conditional propagation in cprop.c pass
Hi Jeff, Steven, I have filed a bug at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50663 Could somebody confirm it? I am studying this piece of codes and have spent some time on it, I'm working on a patch and hoping could help on this issue, Please help me review it later. Thanks. -- Best Regards.
At which pass thing goes wrong for PR43491?
Hi, I looked into PR43491 a while and found in this case the gimple generated before pre is like: reg.0_12 = reg ... c() reg.0_1 = reg D.xxx = MEM[reg.0_1 + 8B] The pre pass transforms it into: reg.0_12 = reg ... c() reg.0_1 = reg.0_12 D.xxx = MEM[reg.0_1 + 8B] >From now on, following passes(like copy_prop) can not transform it back and resulting in an additional mov instruction as the bug reported. The flow is like: 1, when rewriting gimple into ssa, reg is treated as a memory use; 2, seems pre noticed that reg is const and replace reg with reg.0_12, by this pre thinks it has eliminated an additional memory load operation; 3, following passes do not transform it back either because reg is treated as mem use or the const attribute is ignored. I think pre does the right thing given the information it knows, so wondering at which pass thing starts going wrong and how could this issue be handled? Thanks very much -- Best Regards.
Re: At which pass thing goes wrong for PR43491?
On Sat, Nov 26, 2011 at 3:41 PM, Amker.Cheng wrote: > Hi, > I looked into PR43491 a while and found in this case the gimple > generated before pre > is like: > > reg.0_12 = reg > ... > c() > reg.0_1 = reg > D.xxx = MEM[reg.0_1 + 8B] > > The pre pass transforms it into: > > reg.0_12 = reg > ... > c() > reg.0_1 = reg.0_12 > D.xxx = MEM[reg.0_1 + 8B] > > From now on, following passes(like copy_prop) can not transform it back and > resulting in an additional mov instruction as the bug reported. > > The flow is like: > 1, when rewriting gimple into ssa, reg is treated as a memory use; > 2, seems pre noticed that reg is const and replace reg with reg.0_12, > by this pre thinks it has eliminated an additional memory load operation; > 3, following passes do not transform it back either because reg is treated > as mem use or the const attribute is ignored. > > I think pre does the right thing given the information it knows, so wondering > at which pass thing starts going wrong and how could this issue be handled? > Should PRE be changed to global register variable aware, thus it does not do the mentioned unnecessary elimination? -- Best Regards.
Re: At which pass thing goes wrong for PR43491?
On Thu, Dec 1, 2011 at 11:45 PM, Richard Guenther wrote: > Well, it's not that easy if you still want to properly do redundant expression > removal on global registers. Yes, it might be complicate to make PRE fully aware of global register. I also found comments in is_gimple_reg which says gcc does not do much optimization with register variable at the tree level for now. Back to this issue, I think it can be fixed by following way without hurting redundancy elimination on global register variables: After insert() being called in pre, in function eliminate() we can check for single assignment statement from global register variable to ssa_name. If it is the case, we can just skip the elimination operation. In this way: 1, normal redundancy elimination on global registers will not be hurt, since sccvn and pre has already detected the true elimination chances and they will be eliminated afterward in function eliminate; 2, the inserted statements(including PHIs) for global register variables will not be marked as NECESSARY in function eliminate and will be deleted in remove_dead_inserted_code; I attached an example which can illustrates that the normal redundancy does get eliminated. I will send a patch for review if it worth a discuss. So what do you think? Thanks -- Best Regards. /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-pre-stats" } */ register int data_0 asm("r4"); register int data_3 asm("r5"); int motion_test1(int data, int v) { int i; int t, u; if (data) i = data_0 + data_3; else { v = 2; i = 5; } t = data_0 + data_3; u = i; return v * t * u; } /* We should eliminate one computation of data_0 + data_3 along the main path. We cannot re-associate v * t * u due to undefined signed overflow so we do not eliminate one computation of v * i along the main path. */ /* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre" { xfail *-*-* } } } */ /* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre" } } */ /* { dg-final { cleanup-tree-dump "pre" } } */ ssa-pre-2.c.093t.crited Description: Binary data ssa-pre-2.c.094t.pre.orig Description: Binary data ;; Function motion_test1 (motion_test1, funcdef_no=0, decl_uid=4055, cgraph_uid=0) Points-to analysis Constraints: ANYTHING = &ANYTHING READONLY = &READONLY ESCAPED = *ESCAPED ESCAPED = ESCAPED + UNKNOWN *ESCAPED = NONLOCAL NONLOCAL = &NONLOCAL NONLOCAL = &ESCAPED INTEGER = &ANYTHING data = &NONLOCAL v = &NONLOCAL *r4 = NONLOCAL data_0.0_4 = *r4 *r5 = NONLOCAL data_3.1_5 = *r5 i_6 = data_0.0_4 i_6 = data_3.1_5 v_1 = v v_1 = &NONLOCAL i_2 = i_6 i_2 = &NONLOCAL data_0.0_10 = *r4 data_3.1_11 = *r5 t_12 = data_0.0_10 t_12 = data_3.1_11 D.4935_14 = v_1 D.4935_14 = t_12 D.4934_15 = i_2 D.4934_15 = D.4935_14 ESCAPED = D.4934_15 Collapsing static cycles and doing variable substitution Building predecessor graph Detecting pointer and location equivalences Rewriting constraints and unifying variables Uniting pointer but not location equivalent variables Finding indirect cycles Solving graph Points-to sets ANYTHING = { ANYTHING } READONLY = { READONLY } ESCAPED = { ESCAPED NONLOCAL } NONLOCAL = { ESCAPED NONLOCAL } same as *r4 STOREDANYTHING = { } INTEGER = { ANYTHING } data = { NONLOCAL } v = { NONLOCAL } same as data data_0.0_4 = { ESCAPED NONLOCAL } same as *r4 *r4 = { ESCAPED NONLOCAL } data_3.1_5 = { ESCAPED NONLOCAL } same as *r4 *r5 = { ESCAPED NONLOCAL } same as *r4 i_6 = { ESCAPED NONLOCAL } same as *r4 v_1 = { NONLOCAL } same as data i_2 = { ESCAPED NONLOCAL } same as *r4 data_0.0_10 = { ESCAPED NONLOCAL } same as *r4 data_3.1_11 = { ESCAPED NONLOCAL } same as *r4 t_12 = { ESCAPED NONLOCAL } same as *r4 D.4935_14 = { ESCAPED NONLOCAL } same as *r4 D.4934_15 = { ESCAPED NONLOCAL } same as *r4 Alias information for motion_test1 Aliased symbols .MEM, UID D.4937, void, is global, default def: .MEM_16(D) data_0, UID D.4051, int, is global data_3, UID D.4052, int, is global Call clobber information ESCAPED, points-to non-local, points-to vars: { } Flow-insensitive points-to information ;; 1 loops found ;; ;; Loop 0 ;; header 0, latch 1 ;; depth 0, outer -1 ;; nodes: 0 1 2 5 3 4 ;; 2 succs { 3 5 } ;; 5 succs { 4 } ;; 3 succs { 4 } ;; 4 succs { 1 } Could not find SSA_NAME representative for expression:{mult_expr,i_6,2} Created SSA_NAME representative pretmp.5_17 for expression:{mult_expr,i_6,2} Could not find SSA_NAME representative for expression:{mult_expr,i_6,v_7(D)} Created SSA_NAME representative pretmp.5_18 for expression:{mult_expr,i_6,v_7(D)} Symbols to be put in SSA form { .MEM } Incremental SSA update started at block: 0 Number of blocks in CFG: 6 Number of blocks to update: 5 ( 83%) motion_test1 (int data, int v) { int prephitmp.6; int pretmp.5; int t; int i; int D.4935; int D.4934; int data_3.1; int data_0.0; : if (data_3(D) != 0) goto ; else goto ; : pretmp.5_19 = data_0; pretmp.5_21 = data_3; pretmp.5_23 = pretmp.5_1
question on behavior of tree-ssa-ccp
HI, I encountered a case with below codes: int data_0; int motion_test1(int data, int v) { int i; int t, u; int x; if (data) i = data_0 + x; else { v = 2; i = 5; } t = data_0 + x; u = i; return v * t * u; } The dump file for 023t.ccp1 is like: motion_test1 (int data, int v) { int x; int t; int D.4723; int D.4722; int data_0.0; : if (data_3(D) != 0) goto ; else goto ; : v_8 = 2; : # v_1 = PHI data_0.0_10 = data_0; t_11 = data_0.0_10 + x_5(D); D.4723_13 = v_1 * t_11; D.4722_14 = D.4723_13 * 5; return D.4722_14; } Seems the result is computed as "v*(data_0+x)*5", which is wrong. The question is whether it is a bug or intended behavior due to unintialized "x"? Any tips is welcome. Thanks. -- Best Regards.
Re: question on behavior of tree-ssa-ccp
Forgot the command line: arm-none-eabi-gcc -O2 -mthumb -mcpu=cortex-m3 -S test.c -o test.S -fdump-tree-all gcc is comfigured as arm-non-eabi, but I think it's independent of target. -- Best Regards.
RFC: Handle conditional expression in sccvn/fre/pre
Hi, Since SCCVN operates on SSA graph instead of the control flow graph for the sake of efficiency, it does not handle or value number the conditional expression of GIMPLE_COND statement. As a result, FRE/PRE does not simplify conditional expression, as reported in bug 30997. Since it would be complicate and difficult to process conditional expression in currently SCCVN algorithm, how about following method? STEP1 Before starting FRE/PRE, we can factor out the conditional expression, like change following codes: if (cond_expr) goto lable_a else goto label_b into codes: tmp = cond_expr if (tmp == 1) goto label_a else goto label_b STEP2 Let SCCVN/FRE/PRE do its job on value numbering cond_expr and redundancy elimination; STEP3 After FRE/PRE, for those "tmp=cond_expr" not used in any redundancy elimination, we can forward it to the corresponding GIMPLE_COND statement, just like tree-ssa-forwprop.c. In this way, the conditional expression will be handled as other expressions and no redundant assignment generated. Most important,this does not affect the current implementation of SCCVN/FRE/PRE. The only problem is the method cannot work on reversion of conditional expression. For example: x = a > 2; if (a<=2) goto label_a else goto lable_b could be optimized as: x = a > 2 if (x == 0) goto label_a else goto label_b I have worked a draft patch to do the work and would like to hear your comments on this. Thanks very much. -- Best Regards.
Re: RFC: Handle conditional expression in sccvn/fre/pre
Thanks Richard, On Mon, Jan 2, 2012 at 8:33 PM, Richard Guenther wrote: > > I've previously worked on changing GIMPLE_COND to no longer embed > the comparison but carry a predicate SSA_NAME only (this is effectively > what you do as pre-processing before SCCVN). It had some non-trivial > fallout (for example PRE get's quite confused and ends up separating > conditionals and jumps too far ...) so I didn't finish it. Here changing GIMPLE_COND to no longer embed the comparison, do you mean this only in fre/pre passes or in common? If only in fre/pre passes, when and how these changed GIMPLE_COND be changed back to normal ones? If in common, won't this affects passes working on GIMPLE_COND, like tree-ssa-forwprop.c? > > A subset of all cases can be catched by simply looking up the > N-ary at eliminate () time and re-writing the GIMPLE_COND to use > the predicate - which might not actually be beneficial (but forwprop > will undo not beneficial cases - hopefully). > > In the end I'd rather go the way changing the GIMPLE IL to not > embed the comparison in the GIMPLE_COND - that reduces > the amount of redundant way we can express the same thing. Will you try to handle the reversion comparison case as mentioned in my previous message? I guess this needs both sccvn and fre/pre's work. It would be great to hear your thoughts on this. Thanks -- Best Regards.
Re: RFC: Handle conditional expression in sccvn/fre/pre
On Mon, Jan 2, 2012 at 9:37 PM, Richard Guenther wrote: > Well, with > > Index: gcc/tree-ssa-pre.c > === > --- gcc/tree-ssa-pre.c (revision 182784) > +++ gcc/tree-ssa-pre.c (working copy) > @@ -4335,16 +4335,23 @@ eliminate (void) > available value-numbers. */ > else if (gimple_code (stmt) == GIMPLE_COND) > { > - tree op0 = gimple_cond_lhs (stmt); > - tree op1 = gimple_cond_rhs (stmt); > + tree op[2]; > tree result; > + vn_nary_op_t nary; > > - if (TREE_CODE (op0) == SSA_NAME) > - op0 = VN_INFO (op0)->valnum; > - if (TREE_CODE (op1) == SSA_NAME) > - op1 = VN_INFO (op1)->valnum; > + op[0] = gimple_cond_lhs (stmt); > + op[1] = gimple_cond_rhs (stmt); > + if (TREE_CODE (op[0]) == SSA_NAME) > + op[0] = VN_INFO (op[0])->valnum; > + if (TREE_CODE (op[1]) == SSA_NAME) > + op[1] = VN_INFO (op[1])->valnum; > result = fold_binary (gimple_cond_code (stmt), boolean_type_node, > - op0, op1); > + op[0], op[1]); > + if (!result) > + result = vn_nary_op_lookup_pieces (2, gimple_cond_code (stmt), > + boolean_type_node, > + op, &nary); > + > if (result && TREE_CODE (result) == INTEGER_CST) > { > if (integer_zerop (result)) > @@ -4354,6 +4361,13 @@ eliminate (void) > update_stmt (stmt); > todo = TODO_cleanup_cfg; > } > + else if (result && TREE_CODE (result) == SSA_NAME) > + { > + gimple_cond_set_code (stmt, NE_EXPR); > + gimple_cond_set_lhs (stmt, result); > + gimple_cond_set_rhs (stmt, boolean_false_node); > + update_stmt (stmt); > + } > } > /* Visit indirect calls and turn them into direct calls if > possible. */ > > you get the CSE (too simple patch, you need to check leaders properly). > You can then add similar lookups for an inverted conditional. Thanks for your explanation. On shortcoming of this method is that it cannot find/take cond_expr(and the implicitly defined variable) as the leader in pre. I guess this is why you said it can handle a subset of all cases in previous message? on the other hand, I like this method, given the simplicity especially. :) -- Best Regards.
Re: RFC: Handle conditional expression in sccvn/fre/pre
On Mon, Jan 2, 2012 at 10:54 PM, Richard Guenther wrote: > Yes. It won't handle > > if (x > 1) > ... > tem = x > 1; > > or > > if (x > 1) > ... > if (x > 1) > > though maybe we could teach PRE to do the insertion by properly > putting x > 1 into EXP_GEN in compute_avail (but not into AVAIL_OUT). > Not sure about this though. Currently we don't do anything to > GIMPLE_COND operands (which seems odd anyway, we should > at least add the operands to EXP_GEN). I did an experiment which shows by setting cond_expr in EXP_GEN properly, PRE could insert expression in following case: //necessary declaration of variable a/b/g int tmp; if (x_cond) tmp = a > 2; else tmp = b; if (a > 2) g = tmp; But the problem you mention : "PRE separates conditional expression and jump to far" still exists in this kind of cases. Now I doubt the benefit to make PRE handle cond_expr, because in back end, machines normally have only one flag to store the result. And for other cases like: if (a > 2) ... if (a > 2) Current logic of insertion(in do_regular_insertion) simply won't insert expression before the first GIMPLE_COND statement, because it only considers basic blocks have multiple predecessors and the expression are partial redundant. Anyway I think this can be done by implementing new insertion strategy for GIMPLE_COND. -- Best Regards.
question on inconsistent generated codes for builtin calls
Hi, I noticed gcc generates inconsistent codes for same function for builtin calls. compile following program: -- #include int a(float x) { return sqrtf(x); } int b(float x) { return sqrtf(x); } With command: arm-none-eabi-gcc -mthumb -mhard-float -mfpu=fpv4-sp-d16 -mcpu=cortex-m4 -O0 -S a.c -o a.S The generated assembly codes is like: -- a: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 1, uses_anonymous_args = 0 push{r7, lr} sub sp, sp, #8 add r7, sp, #0 fstss0, [r7, #4] fldss15, [r7, #4] fsqrts s15, s15 fcmps s15, s15 fmstat beq .L2 fldss0, [r7, #4] bl sqrtf fcpys s15, s0 .L2: ftosizs s15, s15 fmrsr3, s15 @ int mov r0, r3 add r7, r7, #8 mov sp, r7 pop {r7, pc} .size a, .-a .align 2 .global b .thumb .thumb_func .type b, %function b: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 1, uses_anonymous_args = 0 push{r7, lr} sub sp, sp, #8 add r7, sp, #0 fstss0, [r7, #4] fldss0, [r7, #4] bl sqrtf fcpys s15, s0 ftosizs s15, s15 fmrsr3, s15 @ int mov r0, r3 add r7, r7, #8 mov sp, r7 pop {r7, pc} .size b, .-b The cause is in function expand_builtin, gcc checks following conditions: -- /* When not optimizing, generate calls to library functions for a certain set of builtins. */ if (!optimize && !called_as_built_in (fndecl) && DECL_ASSEMBLER_NAME_SET_P (fndecl) && fcode != BUILT_IN_ALLOCA && fcode != BUILT_IN_ALLOCA_WITH_ALIGN && fcode != BUILT_IN_FREE) return expand_call (exp, target, ignore); The control flow is: 1, DECL_ASSEMBLER_NAME_SET_P (fndecl) is false at the first time when compiling a; 2, It is then set in following codes when expanding sqrtf call in function a; 3, When compiling function b, gcc checks DECL_ASSEMBLER_NAME_SET_P (fndecl) again and this time it's true; I am a little confused why we check DECL_ASSEMBLER_NAME_SET_P here. Does it have special meaning? Thanks in advance. -- Best Regards.
Re: question on inconsistent generated codes for builtin calls
On Fri, Jan 13, 2012 at 5:33 PM, Richard Guenther wrote: > > No, I think the check is superfluous and should be removed. I also wonder > why we exempt BUILT_IN_FREE here ... can you dig in SVN history a bit? > For both things? Thanks for clarifying. I will look into it. -- Best Regards.
Re: question on inconsistent generated codes for builtin calls
On Fri, Jan 13, 2012 at 10:17 PM, Amker.Cheng wrote: > On Fri, Jan 13, 2012 at 5:33 PM, Richard Guenther > wrote: >> >> No, I think the check is superfluous and should be removed. I also wonder >> why we exempt BUILT_IN_FREE here ... can you dig in SVN history a bit? >> For both things? Hi Richard, The BUILT_IN_FREE was introduced in r138362 fixing PR36970, in which gcc did not give warning on freeing non-heap memory, as in program: main () { char array[100]; free (array); } I will run make check to see whether it's ok we do not check DECL_ASSEMBLER_NAME_SET_P and send a patch then. BTW, should I create a bug for this? Thanks.
question on bitmap_set_subtract unction in pre
Hi, In PRE, function compute_antic_aux uses bitmap_set_subtract to compute value/expression set subtraction. The comment of bitmap_set_subtract says it subtracts all the values and expressions contained in ORIG from DEST. But the implementation as following --- static bitmap_set_t bitmap_set_subtract (bitmap_set_t dest, bitmap_set_t orig) { bitmap_set_t result = bitmap_set_new (); bitmap_iterator bi; unsigned int i; bitmap_and_compl (&result->expressions, &dest->expressions, &orig->expressions); FOR_EACH_EXPR_ID_IN_SET (result, i, bi) { pre_expr expr = expression_for_id (i); unsigned int value_id = get_expr_value_id (expr); bitmap_set_bit (&result->values, value_id); } return result; } Does it just subtract the expressions, rather than values. And It resets values according to the resulting expression. I am a little confused here. Any explanation? Thanks very much. -- Best Regards.
Re: question on bitmap_set_subtract unction in pre
On Mon, Feb 6, 2012 at 7:28 PM, Richard Guenther wrote: > It's probably to have the SET in some canonical form - the resulting I am wondering how the canonical form is maintained, since according to the paper: For an antileader set, it does not matter which expression represents a value, as long as that value is live. Could you show me where is the code maintaining such attributes? > values are simply re-computed from the expression subtraction > (multiple expressions may have the same value, so in > { a, b } { 0 } - { a } { 0 } you need to either compute { } { } or { b } { 0 } > neither which you can reach by simply subtracting both bitmaps. Take this example, Shouldn't the expected result be: {b}{0} if a is defined by some known expr; {} {} if a is defined by some unknown expr; which not as in gcc now. Following words are from the paper: A temporary potentially in ANTIC_IN becomes dead if it is assigned to. For an antileader set, it does not matter which expression represents a value, so long as that value is live. A temporary potentially in ANTIC_IN becomes dead if it is assigned to. If the assignment is from something we can make an expression for (as opposed to ?), that expression replaces the temporary as the antileader. If the assignment is from ?, then the value is no longer represented at all. Furthermore, any other expression that has that (no longer represented) value as an operand also becomes dead. In the previous expression subtraction, I don't see value depending on tmp which is defined by unknown operation like tmp<-? is handled. Still confused and most likely I have missed something important. Please help, thanks very much. -- Best Regards.
Question about the difference between two instruction scheduling passes
Hi all: I'm currently studying implementation of instruction sched in gcc. it is possible to schedule insns directly from queue in case there is nothing better to do and there are still vacant dispatch slots in the current cycle. Gcc only does this work in the second pass, but what's the point? Is it wrong or just not necessary in the first sched pass? Thanks. -- Best Regards.
Is Non-Blocking cache supported in GCC?
Hi all: Recently I found two relative old papers about non-blocking cache, etc. which are : 1) Reducing memory latency via non-blocking and prefetching caches. BY Tien-Fu Chen and Jean-Loup Baer. 2) Data Prefetching:A Cost/Performance Analysis BY Chris Metcalf It seems the hardware facility does have the potential to improve the performance with compiler's assistance(especially instruction scheduling). while on the other hand, lifting ahead load instructions may resulting in increasing register pressure. So I'm thinking : 1, Has anyone from gcc folks done any investigation on this topic yet, or any statistic data based on gcc available? 2, Does GCC(in any release version) supports it in any targets(such as mips 24ke) with this hardware feature? If not currently, does it possible to support it by using target definition macros and functions? Any tips will be highly appreciated, thanks. -- Best Regards.
Re: Is Non-Blocking cache supported in GCC?
On Sat, Sep 19, 2009 at 1:17 AM, Janis Johnson wrote: > On Thu, 2009-09-17 at 21:48 -0700, Ian Lance Taylor wrote: > > There's also a prefetch built-in function; see > > http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#Other-Builtins > > It's been in GCC since 3.1. > > Janis > > Thank you all, It seems prefetch is more useful than non-blocking, no wonder gcc takes advantage of prefetch, rather than non-blocking. -- Best Regards.
question about speculative scheduling in gcc
Hi : I'm puzzled when looking into speculative scheduling in gcc, the 4.2.4 version. First, I noticed the document describing IBM haifa instruction scheduler(as PowerPC Reference Compiler Optimization Project). It presents that the instruction motion from bb s(dominated by t) to t is speculative when split_blocks(s, t) not empty. Second, There is SCED_FLAGS like DO_SPECULATION in codes. Here goes questions. 1, Does the DO_SPECULATION flag constrol whether do the mentioned speculative motion or not? 2, For mips target, which has the DO_SPECULATION bit cleared, gcc still does speculative motion when scheduling(first pass), so it seems the answer of question 1 is negative, but then what the DO_SPECULATION flag for? I must have missed something important, Please help out. Thanks -- Best Regards.
Re: question about speculative scheduling in gcc
On Sun, Sep 20, 2009 at 3:43 PM, Maxim Kuvyrkov wrote: > Amker.Cheng wrote: >> >> Hi : >> I'm puzzled when looking into speculative scheduling in gcc, the 4.2.4 >> version. >> >> First, I noticed the document describing IBM haifa instruction >> scheduler(as PowerPC Reference Compiler Optimization Project). >> >> It presents that the instruction motion from bb s(dominated by t) >> to t is speculative when split_blocks(s, t) not empty. >> >> Second, There is SCED_FLAGS like DO_SPECULATION in codes. > > These are two different types of speculative optimizations. > >> >> Here goes questions. >> 1, Does the DO_SPECULATION flag constrol whether do the >> mentioned speculative motion or not? > > DO_SPECULATION flag controls generation of IA64 data and control speculative > instructions. It is not used on other architectures. > > Speculative instruction moves from the split blocks are controlled by > flag_schedule_speculative. > > -- > Maxim > Yes! I've just found it's used for IA64 and was merged into gcc in version 4.2.0. Thanks. -- Best Regards.
what does the calling for min_insn_conflict_delay mean
Hi : In function new_ready, it calls to min_insn_conflict_delay with "min_insn_conflict_delay (curr_state, next, next)". But the function's comments say that it returns minimal delay of issue of the 2nd insn after issuing the 1st in given state. Why the last two parameter for the call are both "next"? seems conflict with the comments. Thanks. -- Best Regards.
Re: what does the calling for min_insn_conflict_delay mean
On Tue, Sep 22, 2009 at 11:50 PM, Vladimir Makarov wrote: > Ian Lance Taylor wrote: >> >> "Amker.Cheng" writes: >> >> >>> >>> In function new_ready, it calls to min_insn_conflict_delay with >>> "min_insn_conflict_delay (curr_state, next, next)". >>> But the function's comments say that it returns minimal delay of issue of >>> the 2nd insn after issuing the 1st in given state. >>> Why the last two parameter for the call are both "next"? >>> seems conflict with the comments. >>> >> >> > > Amker, thanks for finding this issue. It's great pleasure if can help anything. >> >> This change dates back to the first DFA scheduler patch. It does seem a >> little odd, particularly as the call in new_ready is the only use of >> min_insn_conflict_delay. CC'ing vmakarov in case he remembers anything >> about this old code. >> > > I've not remembered this. I guess it was a result of long period of > transition from the old pipeline hazard recognizier to the DFA one which > required to rewrite all old pipeline descriptions. > > Also after starring at this code for some time, I don't like this code. > Now I'd use min_issue_delay (curr_state, next) which is delay of issuing > next in the current function unit reservation state instead of > min_insn_conflict_delay (curr_state, next, next) which is a delay of > issuing the first insn (next) after issuing the second insn (next) on a free > processor (when all function units are free). Probably it was a typo. > Although I think that such change (in many other conditions to move insn > speculatively to the ready list) will not give a visible improvement for > most processors, I'll try it. > > It looks to me that probably I had also some plans for usage of > min_insn_conflict_delay, but I forgot them because it was long ago. > > Is it the delay of issuing next in the current reservation state which expected here? seems the call to min_insn_conflict_delay does nothing harm, except may result in more or less speculative motions(which are all valid ones). -- Best Regards.
Problem when computing memory dependencies for scheduling pass1
Hi all: I have found something strange when scheduling instructions. considering following piece of code: -c start int func(float x) { int r = 0; r = (*(unsigned int*)&x) >> 23; return r; } -c end the return value is different when compiling with or without optimization. Have tested on 4.2.4 and 4.3.3 on mips, 4.3.2(ubuntu) on x86 and results are the same. Is this a bug, or something wrong with the example code? following is output for mips target, hope it can help. commands: mipsel-elf-gcc -march=mips1 -EL -G0 -mabi=32 -S test/dummy.c -o test/dummy.S -fdump-rtl-all -fsched-verbose=9 -v -O2 the as output is : -as start .section .mdebug.abi32 .previous .text .align 2 .globl func .entfunc func: .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 .mask 0x,0 .fmask 0x,0 .setnoreorder .setnomacro lw $2,0($sp) sw $4,0($sp) j $31 srl $2,$2,23 .setmacro .setreorder .endfunc .size func, .-func .ident "GCC: (GNU) 4.2.4" --as end it seems the load insn is scheduled before store, which resulting in using of uninitialized data. following is the dumped rtls before and after sched1: before sched1: --- (note 8 2 6 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 6 8 7 2 (set (mem/c/i:SF (reg/f:SI 77 $arg) [3 x+0 S4 A32]) (reg:SF 4 $4 [ x ])) 233 {*movsf_softfloat} (nil) (expr_list:REG_DEAD (reg:SF 4 $4 [ x ]) (nil))) (note 7 6 12 2 NOTE_INSN_FUNCTION_BEG) (insn 12 7 13 2 (set (reg:SI 196) (mem:SI (reg/f:SI 77 $arg) [4 S4 A32])) 213 {*movsi_internal} (nil) (nil)) - after sched1: --- (insn 12 17 6 2 (set (reg:SI 196) (mem:SI (reg/f:SI 77 $arg) [4 S4 A32])) 213 {*movsi_internal} (nil) (nil)) (insn 6 12 20 2 (set (mem/c/i:SF (reg/f:SI 77 $arg) [3 x+0 S4 A32]) (reg:SF 4 $4 [ x ])) 233 {*movsf_softfloat} (nil) (expr_list:REG_DEAD (reg:SF 4 $4 [ x ]) (nil))) - have checked gcc's source, It seems the two mem operands in insn12 and insn6 is computed into two different alias sets, so maybe this problem have something to do with the force type cast? The result is very strange, so please help and any comments will be highly appreciated. -- Best Regards.
Re: Problem when computing memory dependencies for scheduling pass1
Thanks Eric Fisher, got the answer, Please ignore this message. -- Best Regards.
Puzzles about implementation of bb-reorder pass
Hi : The bb-reorder pass is relative simple comparing with others, but still I got following puzzles. 1 : the comment at top of the bb-reorder.c file says that : There are two parameters: Branch Threshold and Exec Threshold. If the edge to a successor of the actual basic block is lower than Branch Threshold or the frequency of the successor is lower than Exec Threshold the successor will be the seed in one of the next rounds. but when computing which_heap in function "find_traces_1_round", it uses push_to_next_round_p to decide whether the successor should go to next round, which takes only exec_th as argument, not branch_th. Is this inconsistent ? 2 : when checking for situation : A / | B | \ | C gcc uses the condition EDGE_FREQUENCY (AB) + EDGE_FREQUENCY (BC) >= EDGE_FREQUENCY (AC). what does "EDGE_FREQUENCY (AB) + EDGE_FREQUENCY (BC)" stand for? Since edge B is dominated by A and C is the only successor, the frequency of path(ABC) is less than path(AC), I think. 3 : It is possible to merge two traces by copying exactly one basic block. gcc uses following code to take trace which has only one bb into consider: if (bbd[e->dest->index].start_of_trace >= 0 && traces[bbd[e->dest->index].start_of_trace].length == 1) { best = e; try_copy = true; continue; } Here is the question, what about that trace has already been merged and has no free successor traces(traces which start bb is the successor of the single bb). in this situation next_bb is NULL and all we did is just copy a already merged bb. Is this right? Please correct me and help me out, Thanks. -- Best Regards.
mis-set value for trial in function fill_simple_delay_slots?
Hi : In function fill_simple_delay_slots, there is following codes: >starts here /* If there are slots left to fill and our search was stopped by an unconditional branch, try the insn at the branch target. We can redirect the branch if it works. Don't do this if the insn at the branch target is a branch. */ if (slots_to_fill != slots_filled && trial && JUMP_P (trial) && simplejump_p (trial) && (target == 0 || JUMP_LABEL (trial) == target) && ...)
Question about filling multi delay slots
Hi All : It's possible to define multi delay slots for branch insns by using define_delay, and different slot should satisfy its own attribute test "delay-n". Here comes question, in function "fill_simple_delay_slots", seems it only uses slots_filled to record how many slots needs to fill, and puts slot insns already found in delay_list. I can't find any codes keeping the information about which insn in delay_list belongs to which slot(defined in "define_delay"). So, how does gcc make sure that insns in delay_list go into right delay slot? Thanks in advance. -- Best Regards.
Re: Question about filling multi delay slots
On Tue, Dec 1, 2009 at 5:31 AM, Jeff Law wrote: > On 11/25/09 07:34, Amker.Cheng wrote: > > First, it's worth noting very few targets support multiple delay slots and > as a result that code isn't tested nearly as well as handling of single > delay slots. > > I'm pretty sure we assume that the first insn we add to the delay list > always goes in the first slot, 2nd insn in the 2nd slot and so-on. > > Jeff > > > Thanks for explanation, I will take closer look into at these codes. -- Best Regards.
question about replace_in_call_usage in regmove.c
Hi : In regmove.c there is function "replace_in_call_usage" called in fixup_match_1, It replaces dst register by src in call_insn, I suspect whether it is necessary Since comment of CALL_INSN_FUNCTION_USAGE says that no pseudo register can appear in it and seems src is pseudo register. further more, no replace(dst->src) is done when building bootstrap gcc-4.2.4, which confirmed my understanding. Is it right or I've missed something important? Please help. Thanks in advance. -- Best Regards.
Puzzle about mips pipeline description
Hi All: In gcc internal, section 16.19.8, there is a rule about "define_insn_reservation" like: "`condition` defines what RTL insns are described by this construction. You should re- member that you will be in trouble if `condition` for two or more different `define_insn_ reservation` constructors if TRUE for an insn". While in mips.md, pipeline description for each processor are included along with generic.md, which providing a fallback for processor without specific pipeline description. Here is the PUZZLE: Won't `define_insn_reservation` constructors from both specific processor's and the generic md file break the rule mentioned before? For example, It seems conditions for the r3k_load(from 3000.md) and generic_load(from generic.md) are both TRUE for lw insn. Further more, In those md files for specific processors, It is said that these description are supposed to override parts of generic md file, but i don't know how it works without reading codes in genautomata.c. Please help me out, Thanks very much. -- Best Regards.
Question on mips multiply patterns in md file
Hi : I am studying multiplication-accumulate patterns for mips and noticed there are some changes when IRA was merged. There are two pattern which confused me, as : 1: In pattern "*mul_acc_si", there's constraint like "*?*?". what does this supposed to do? I could not connect "*?" with document on constraints in gcc internal document, and totally have no idea about it. 2: there is a split pattern for "*mul_acc_si" as following: (define_split [(set (match_operand:SI 0 "d_operand") (plus:SI (mult:SI (match_operand:SI 1 "d_operand") (match_operand:SI 2 "d_operand")) (match_operand:SI 3 "d_operand"))) (clobber (match_operand:SI 4 "lo_operand")) (clobber (match_operand:SI 5 "d_operand"))] "reload_completed" [(parallel [(set (match_dup 5) (mult:SI (match_dup 1) (match_dup 2))) (clobber (match_dup 4))]) (set (match_dup 0) (plus:SI (match_dup 5) (match_dup 3)))] "") this will generate integer multiply instruction with register write, but what if the processor has only integer multiply instructions, which only store results in HILO? So, any tips? Thanks a lot. -- Best Regards.
Re: Question on mips multiply patterns in md file
> If you don't know anything about register class preferencing or reload as > yet, then this is probably not going to make much sense to you, but it isn't > anything important you need to worry about at this point. It is a very > minor performance optimization. > It makes sense to me now, though I haven't read codes for IRA and reloads yet. Thanks for the detailed explanation. > > A define_split can only match something generated by a define_insn, and the > mul_acc_si define_insn is testing "GENERATE_MADD_MSUB && !TARGET_MIPS16" > so there is no serious problem. We are just running a define_split that can > never match anything. This could be cleaned up a little by adding an > appropriate condition to the define_split, or by combining the define_insn > and define_split patterns into a define_insn_and_split pattern. In upper words, you mean that define_split would only get chance to split insn generated by the corresponding pattern "define_insn \"*mul_acc_si\"", though the split condition is some kind of weak(with only "reload_completed"). Because that kind of insn would only be generated by the "define_insn \"*mul_acc_si\"" pattern. Did I get it right? if so, i'm afraid this is actually not my question. What wanna know is: mips processors normally implement following kinds of mult/mult-acc insns: mult: HILO <-- s * t mul : HILO <-- s * t ; d <-- LO madd : HILO <-- HILO + s * t madd2: HILO <-- HILO + s * t ; d <-- HILO cut here- In my understanding, the macro GENERATE_MADD_MSUB is true when the processor has madd insn, rather than madd2. And the macro "ISA_HAS_MUL3" is false if it has no mul insn. for this kind processor, gcc will step 1 : generate insn using gen_mul3_internal, according to pattern "mul3"; step 2 : the combiner try to combine by matching against pattern "*mul_acc_si"; step 3 : it's possible that gcc fail to get LO register allocated for the combined "*mul_acc_si" insn; step 4 : after reload, the combined insn will be split according to the split pattern listed in previous mail. step 5 : the split insn is actually a "mul3_internal" , but get no LO allocated, which break the constraints in "mul3_internal" pattern; So, what should I do to handle this case? I see no methods except adding new split pattern like: (define_split [(set (match_operand:SI 0 "d_operand") (plus:SI (mult:SI (match_operand:SI 1 "d_operand") (match_operand:SI 2 "d_operand")) (match_operand:SI 3 "d_operand"))) (clobber (match_operand:SI 4 "lo_operand")) (clobber (match_operand:SI 5 "d_operand"))] "SPECIAL_PROCESSOR && reload_completed" [(parallel [(set (match_dup 4) (mult:SI (match_dup 1) (match_dup 2))) (clobber (match_dup 4))]) (set (match_dup 5) (match_dup 4)) (set (match_dup 0) (plus:SI (match_dup 5) (match_dup 3)))] "") Thanks again, looking forward your new explanations. -- Best Regards.
Re: Question on mips multiply patterns in md file
> The reasoning here is > that if splitting will result in worse code, then we shouldn't have > accepted it in the first place. If dropping this alternative results in > register allocator failures for some strange reason, then we accept it > and generate the 3 instruction sequence with a new define_split. Thanks Jim. I could not get your method well since don't know much about the IRA and reload pass. Here comes the question, Does it possible that the method would ever result in register allocator failure? In my understanding, doesn't reload pass would do whatever it can to make all insns' constraints satisfied? > If dropping this alternative results in the register allocator generating > worse code for other surrounding operations, then it is better to accept > it and add the new define_split. By this , you mean I should go back to the define_split method if dropping the alternative does results in bad insns generated by RA? > > Some experimentation might be necessary to see which change is the > better solution. Yes, I profiled MiBench and found gcc generates better codes by using madd instruction; on the other hand, how bad the code is generated by define_split still not closely checked. Another thought on this topic, Maybe I should make two copy of mul_acc_si like you said, with one remains the constraints, the other drop the "*?*?". Does this is the same as your method suggested? -- Best Regards.
Puzzle about CFG on rtl during delay slot schedule
Hi : I'm wondering whether cfg is maintained properly during delay slot scheduling, Because when compiling libgcc/_divsc3.o, rtl dump in libgcc2.c.198r.mach has following lines: no bb for insn with uid = 293. deleting insn with uid = 690. deleting insn with uid = 904. .. (note 298 905 303 [bb 25] NOTE_INSN_BASIC_BLOCK) (note 303 298 304 [bb 26] NOTE_INSN_BASIC_BLOCK) -cut here after that pass, bb 25 still has il.rtl->head_ == insn_uid_690, which has already deleted. Seems the bb's head_/tail_ are not handled properly. I traced cc1 and found it deleted insn_690 by function remove_insn, It seems that the end the function takes BB_HEAD/BB_END into consider, But the BLOCK_FOR_INSN(insn_690) is null, which results in the problem. BTW, the version working on is gcc-4.4.1, mips target. So, any tips? Thanks very much. -- Best Regards.
Fwd: Puzzle about CFG on rtl during delay slot schedule
> The CFG is not maintained during delay slot scheduling. This is, in > fact, a very old and well-known problem. Look for any e-mail on this > list that mentions reorg.c :-) > Thanks, further more , It seems cfg are not maintained after delay slot scheduling. also find that problem just before final pass. -- Best Regards.
Re: Puzzle about CFG on rtl during delay slot schedule
> Cheng, can you explain what lead you to this "discovery", and what > you're trying to achieve? Thanks for all your enthusiastic explanation. Well, we are now trying to find our processor's critical timing path by running it at higher frequency than it was designed for. One timing prob we found is in following insn sequence : insn1 : insn_kind_a insn2 : memory access So, in order to find more timing prob, we wanna modify gcc to insert nop insn between that two insns. unfortunately, insn1 could be in delay slot, I have to do that job after delay slot scheduling, which results in the first message. BTW, the processor has no pipeline stall when branching, so I think the nop is totally necessary for our sake. Thanks again. -- Best Regards.
Problem on handling fall-through edge in bb-reorder
Hi All: I read codes in bb-reorder pass. normally it's fine to take the most probable basic block as the downward bb. unfortunately, the processor I'm working on is a little different. It has no pipeline stall when branches are taken, but does introduce stall when they are not taken. take an example code like: -- statement 0; if likely(condition) statement 1; else statement 2; return; gcc may generate : --- statement 0; if !(condition) branch to label x; statement 1; return; label x: statement 2; return; Which is less effective on my processor. I am wondering whether possible to modify codes in bb-reorder making gcc to take the less probable basic block as the downward bb? So any tips? Thanks in advance. -- Best Regards.
why mult generated for unsigned int multiply on mips?
Hi : I noticed that on mips, the signed form instruction of multiply is generated for unsigned integer multiply operation. for example, mult is used, rather than multu for following codes: unsigned int x, y, z; x = y * z; Is it reasonable to do so? Thanks. -- Best Regards.
Re: why mult generated for unsigned int multiply on mips?
found the cause, sorry to disturb, please ignore this message. -- Best Regards.
Re: why mult generated for unsigned int multiply on mips?
> It would, however, be nice if you actually posted an answer to your > (now solved) question. That way, any casual reader may learn something > new. > Sorry for the unintentional offense, here comes the method: for 2's complement binary number x31x30...x0, unsigned value U = 2^(31)*x31 + 2^(30)*x30 + ... + 2^(0)*x0 signed value S = - 2^(31)*x31 + 2^(30)*x30 + ... + 2^(0)*x0 say V = 2^(30)*x30 + ... + 2^(0)*x0, and s = x31 so, S = U - 2^(32)*s. now think about two number U1, U2, the corresponding signed value are S1, S2. S1 * S2 = (U1-2^32 *s1 ) * (U2-2^32 *s2) = U1*U2 - 2^32*s2*U1 - 2^32*s1*U2 + 2^64*s1*s2 It's easy to prove that the lower 32 bit of S1*S2 is determined by the lower part of U1*U2. Maybe this is the reason gcc can safely use mult for unsigned multiplication for mips. Hope this is right and it's hard to edit equations in plain text -_- -- Best Regards.
Puzzle:where does gcc_cv_as come from?
Hi all: Currently I'm building cross gcc for mips32 on winXp+cygwin. I tried both gcc 4.2.4 and 4.2.3 and there is a building problem with 4.2.4 gcc makefile normally issue shell command "echo 'exec $(ORIGINAL_AS_FOR_TARGET) "$$@"' >> as ; \" at around line 1370, but ORIGINAL_AS_FOR_TARGET defined several lines above is empty. so I got some kind like "exec -options..." which should be "exec assembler -options...", of course this will fail. I checked configure and found following codes and comments: # --- # Assembler & linker features # --- # Identify the assembler which will work hand-in-glove with the newly # built GCC, so that we can examine its features. This is the assembler # which will be driven by the driver program. # # If build != host, and we aren't building gas in-tree, we identify a # build->target assembler and hope that it will have the same features # as the host->target assembler we'll be using. gcc_cv_gas_major_version= gcc_cv_gas_minor_version= gcc_cv_as_gas_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/gas if test "${gcc_cv_as+set}" = set; then : else #other commands... fi ORIGINAL_AS_FOR_TARGET=$gcc_cv_as
Help: does define_peephole still work in gcc-4.2.4
Hi all: Currently I am studying peephole optimization in gcc. I defined a peephole using "define_peephole", but nothing happened. It seems gcc does do the pattern match work in codes surrounded by "HAVE_peephole", but codes from "out-template" in that "define_peephole" are not compiled into gcc at all. I know "define_peephole" is deprecated, so not sure about whether define_peephole still works in gcc-4.2.4, or I just missed something important? Thanks, any tips will be appreciated! -- Best Regards.
Re: Help: does define_peephole still work in gcc-4.2.4
It turns out there is a mistake in "out-template" of "define_peephole". So, Sorry for disturbing! -- Best Regards.
pattern "s_" not used when generating rtl for float comparison on mips?
Hi : There is a pattern "define_insn "s_"" in mips md file, like (define_insn "s_" [(set (match_operand:CC 0 "register_operand" "=z") (swapped_fcond:CC (match_operand:SCALARF 1 "register_operand" "f") (match_operand:SCALARF 2 "register_operand" "f")))] "" "c..\t%Z0%2,%1" [(set_attr "type" "fcmp") (set_attr "mode" "FPSW")]) I am wondering whether this insn pattern would ever be used when generating float comparison, Since we use cmp and branch expand to do the job And comparison operation are normally followed by a branch. Am i right? Any idea? Thanks for helping. -- Best Regards.
Re: pattern "s_" not used when generating rtl for float comparison on mips?
> > You can get the RTL for these patterns when expanding stores like > > a = (b < c); > > In this case, GCC tries to avoid a conditional branch and (I suppose you are > on GCC <4.5) instead of cmp and b you go through cmp and > s. cmp does nothing but stashing away its operands, while > s expands RTL for both the comparison and the above insn. Thanks, and yes, I'm using GCC 4.4, But gcc didn't work in this way for me, I tried piece of code like: extern float a, b; extern int c; int main(void) { c = (a < b); return 0; } after tracing cc1, found gcc would also do it with set/compare/jump/set code at the end of function do_store_flag, i.e., unsing cmp and b sequence. -- Best Regards.
Re: pattern "s_" not used when generating rtl for float comparison on mips?
> Indeed, looking at GCC 4.5 there's no cstore expander for floating-point > variables. Maybe you can make a patch! :-) > yes, it seems gcc always generates set/compare/jump/set sequence, then optimizes it out in if-convert pass. Maybe it was left behind by early mips1, which has no conditional move instructions. it is some kinda related with my current work, I'll try to see if I could help with it after more study. Thanks. -- Best Regards.
split lui_movf pattern on mips?
HI: There is comment on lui_movf in mips.md like following, ;; because we don't split it. FIXME: we should split instead. I can split it into a move and a condmove(movesi_on_cc) insns , like (define_split [(set (match_operand:CC 0 "d_operand" "") (match_operand:CC 1 "fcc_reload_operand" ""))] "reload_completed && ISA_HAS_8CC && TARGET_HARD_FLOAT && ISA_HAS_CONDMOVE && !CANNOT_CHANGE_MODE_CLASS(CCmode, SImode, REGNO_REG_CLASS(REGNO(operands[0])))" [(set (match_dup 2) (match_dup 3)) (set (match_dup 2) (if_then_else:SI (eq:SI (match_dup 1) (match_dup 4)) (match_dup 2) (match_dup 4)))] " { operands[2] = gen_rtx_REG(SImode, REGNO(operands[0])); operands[3] = GEN_INT(0x3f80); operands[4] = const0_rtx; } ") But I have two questions. Firstly, the lui_movf pattern is output as "lui\t%0,0x3f80\n\tmovf\t%0,%.,%1" in mips_output_move, why 0x3f80? is it some kind of magic number, or anything else important? Secondly, I have to change mode of operands[0] into SImode when splitting, otherwise there is no insn pattern matching the first insn generated. Since no new REG generated I assuming the mode transforming is OK here, any suggestion? Thanks. -- Best Regards.
Re: split lui_movf pattern on mips?
> It's the encoding of 1.0f (single precision). The point is that we want > something we can safely compare with 0.0f using floating-point instructions. > "Safe" means "without generating any kind of exception", so a subnormal > representation like 0x0001 isn't acceptable. 1.0f seems as good a > value as any. > > Yes, this is OK. Your split looks good, but I don't see any reason > for the !CANNOT_CHANGE_MODE_CLASS condition. > > Couple of minor suggestions: > > - There is no need for the double quotes around the { ... }. > Plain { ... } is better. (Support for plain { ... } was > added a few years ago, so you can still see some older code > that uses "{ ... }". But { ... } is better for new code.) > > - It's generally better to restrict match_dups to things > that depend on the operands of the original insn. > In the above, it'd be better to replace (match_dup 4) > with (const_int 0) and then not set operands[4] in the > C code. (match_dup 3) is OK as an exception because > read-rtl.c doesn't support hex constants yet... Thanks, learned a lot from your detailed explanation. -- Best Regards.
a peculiar fpload problem on an inferior processor
Hi : Our processor has an errata that the direct fpu load cannot work right, so I have to substitute instruction sequence "load_into_gpr ; move_gpr_into_fpr" for direct fpload insn. Currently I thought of two potential methods as following: method 1: step1 : keep a scratch register when expanding fpload; step2 : split insn fpload into "load_into_gpr ; move_gpr_into_fpr" sequence by using the reserved scratch register; method 2: generate "load_into_gpr ; move_gpr_into_fpr" when expanding directly. I have only tried the first method, which end up with the errro "insn does not satisfy its constraints". after tracing cc1, found that the problematic insn was generated by reloading, which trying to spill float register into memory, which itself using direct fpload. here is the question : Is it possible to replace all direct fpload with "load_into_gpr ; move_gpr_into_fpr" sequence. I doubt it since the reload pass might generate direct fpload insn for spilling fpu register. BTW, I prefer to do the replacement in gcc, rather than assembler, since it might produce lots of pipeline stalls. So, any advice? Thank you all. -- Best Regards.
Re: a peculiar fpload problem on an inferior processor
> It is possible. Your expander can handle it before reload; to handle it > during and after reload, you need to implement a TARGET_SECONDARY_RELOAD hook. > > http://gcc.gnu.org/onlinedocs/gccint/Register-Classes.html#index-TARGET_005fSECONDARY_005fRELOAD-3974 > Thanks Dave, It works, but I found that reload is not the only pass which might generate fpload/fpstore instructions. I am working with GCC 4.4(mips), there is function(mips_emit_move), which is called in many pass after register allocation and might generate fpload/fpstore. For example, in pass pro_and_epilogue, it generates load/store for fpu register which saved by function prologue/epilogue. Seems I have to track down all calling of this function and make sure it works in my way. Thanks. -- Best Regards.
Re: a peculiar fpload problem on an inferior processor
> Ah, I forgot pro/epilogue generation, but I think that's the only other > thing that happens after reload. That is a special case: it has to generate > strict rtl that directly matches the insns it wants. You'll probably have to > arrange for it to save at least one GPR early enough in the prologue sequence > to be able to use it as a temp for your FP moves, and similar in the epilogue > sequence. Yes, Thanks for your help , Dave -- Best Regards.
Re: a peculiar fpload problem on an inferior processor
On Sat, May 8, 2010 at 2:52 PM, Amker.Cheng wrote: >> Ah, I forgot pro/epilogue generation, but I think that's the only other >> thing that happens after reload. That is a special case: it has to generate >> strict rtl that directly matches the insns it wants. You'll probably have to >> arrange for it to save at least one GPR early enough in the prologue sequence >> to be able to use it as a temp for your FP moves, and similar in the epilogue >> sequence. > Sorry to disturb again, concerning this problem, There is another case have to be handled. the reload pass also takes care of call saved registers by generating save/restore insns, which might generate direct fpload/fpstore instructions. (in save_call_clobbered_regs, etc.) I see no way to keep GPR for this case, except using the temporary register of the ABI, and it seems safe in this case since the temp register are only used around calling insn. Actually I am not very sure about this. Any suggestion? Thanks. -- Best Regards.
Is it safe to use $t0 when handling call clobbered registers (on MIPS)
Hi : I'm working on a fpu which cannot work fpload insns right, so I have to use a GPR reg as temp reg to first load mem into GPR then move GPR into fpu register. I have handled most cases but the case gcc handling call clobbered fpu registers. since it is in reload pass, I have no available GPR to use here. I'm wondering whether I could use temporary registers such as $t0...$t9 in this case. It's safe as far as I can see, since the save/restore operation is around calling insn, and there are MIPS_PROLOGUE_TEMP and MIPS_EPILOGUE_TEMP which used in the prologue/epilogue cases. but I am not very sure about it, Any suggestion? Thank you all. -- Best Regards.
mips secondary reload question
Hi: as to page http://gcc.gnu.org/ml/gcc/2010-05/msg00091.html, If the fpu register can not copied to/from memory directly, I have to use intermediate GPR registers. In fact, I return GP_REGS if copying x to a register in class FP_REGS in any mode(including CCmode), this results in infinite recursive calling of memory_move_secondary_cost. After tracing cc1, I found the calling sequence is like: memory_move_secondary_cost (CCmode, ST_REGS, 1) --> memory_move_secondary_cost (CCmode, FP_REGS, 1) --> memory_move_secondary_cost (CCmode, ST_REGS, 1) --> memory_move_secondary_cost (CCmode, FP_REGS, 1) --> ... infinite recursive It seems function default_secondary_reload always use ST_REGS as intermediate register for FP_REGS:CCmode according to reload_incc pattern. This is all what i found, and I have totally no idea about how reload pass works . any explanation? Thanks. -- Best Regards.
GCC4.3.4 downside against GCC3.4.4 on mips?
Hi all, I compared assembly files of a function compiled by GCC4.3.4 and GCC3.4.4. The function focuses on array computation and has no branch, or any loop structure, The command line is like "-march=mips32r2 -O3", and here is the instruction statics: total: 1879 : 1534 addiu :6 :6 addu : 216 : 129 jr :1 :1 lui :5:5 lw : 396 : 353 madd : 41 :0 mfhi: 80 : 80 mflo: 121 : 86 move :0: 21 mtlo : 39 :0 mul : 85 :0 mult : 18 : 80 multu : 64 :0 or: 80 : 80 sll : 80 : 80 sra : 79 : 47 srl: 80 : 80 subu : 80 : 80 sw : 408 : 406 Considering there is no any branch or loop structure ,It seems result of GCC3.4.4 is much better, since generating much less instructions. secondly, GCC4.3.4 does consume less stack slots(1224 bytes against 1408). So, any comments? Thanks in advance. -- Best Regards.
Re: GCC4.3.4 downside against GCC3.4.4 on mips?
> Posting some random numbers without a test-case and precise command line > parameters for both compilers makes the numbers useless, IMHO. You also > only mention instruction counts. Have you actually benchmarked the > resulting code? CPUs are complicated and what you might perceive as worse > code might actually be superior thanks to scheduling and internal CPU > parallelism etc. Thanks for reminding. After some investigation, I could demonstrate the issue by following piece of code: -begin here--- extern int *p[5]; # define REAL_RADIX_224 # define REAL_MUL_2(x, y)(((long long)(x) * (long long)(y)) >> REAL_RADIX_2) void func(int *b1, int *b2) { int c0 = p[3][0]; int c1 = p[3][1]; b2[0x18] = b1[0x18] + b1[0x1B]; b2[0x1B] = REAL_MUL_2((b1[0x18] - b1[0x1B]) , c0); b2[0x19] = b1[0x19] + b1[0x1A]; b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1); b2[0x1C] = b1[0x1C] + b1[0x1F]; b2[0x1F] = REAL_MUL_2((b1[0x1F] - b1[0x1C]) , c0); b2[0x1D] = b1[0x1D] + b1[0x1E]; b2[0x1E] = REAL_MUL_2((b1[0x1E] - b1[0x1D]) , c1); } -cut here--- It seems GCC4.3.4 always expands the long long multiplication into three long multiplications, like -begin here--- # b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1); lw $6,104($4) lw $2,100($4) subu$2,$2,$6 mult$11,$2 sra $6,$2,31 madd$6,$9 mflo$6 multu $2,$9 mfhi$3 addu$3,$6,$3 sll $6,$3,8 mflo$2 srl $7,$2,24 or $7,$6,$7 sw $7,104($5) -cut here--- while GCC3.4.4 treats the long long multiplication just like simple ones, which generates only one mult insn for each statement, like -begin here--- # b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1); lw $2,100($4) lw $7,104($4) subu$3,$2,$7 mult$3,$9 mflo$6 mfhi$25 srl $15,$6,24 sll $24,$25,8 or $14,$15,$24 sw $14,104($5) -cut here--- In my understanding, It‘s not necessary using three mult insn to implement long long mult, since the operands are converted from int type. And as before, the compiling options are like "-march=mips32r2 -O3" Thanks. -- Best Regards.
Puzzle about macro MIPS_PROLOGUE_TEMP_REGNUM
Hi : I found the temp register used for saving registers when expanding prologue is defined by macro MIPS_PROLOGUE_TEMP_REGNUM on mips target, like: #define MIPS_PROLOGUE_TEMP_REGNUM \ (cfun->machine->interrupt_handler_p ? K0_REG_NUM : GP_REG_FIRST + 3) I don't understand why using registers starting from $3? in my application, I have to save DFmode fpu regs through gpr regs, that is $3,$4 in this case, just like : mfc1 $3, $fpr sw $3, addr mfc1 $4, $fpr+1 sw $4, addr+4 apparently this would crush the argument in $4. Here is question, why don't use $8 for MIPS_PROLOGUE_TEMP_REGNUM like EPILOGUE_TEMP? Or have I done something wrong? So, any clarification? Thanks in advance. -- Best Regards.
Re: Puzzle about macro MIPS_PROLOGUE_TEMP_REGNUM
> > It's not "starting from $3". It's $3 and nothing else ;-) It's not > intended to be used as (MIPS_PROLOGUE_TEMP_REGNUM + N). > > $3 was chosen because it's a MIPS16 register, and can therefore > be used for both MIPS16 and normal-mode code. $2 used to be the > static chain register, which left $3 as the only free call-clobbered Thank all of you for explanation. > MIPS16 register. I changed the static chain register to $15 to avoid > a clash with the MIPS16 gp-load sequence: > > http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00622.html > > so $2 is probably free now too. Seems $2 is used for gp load in MIPS16 defined by MIPS16_PIC_TEMP_REGNUM, which should not conflict with MIPS_PROLOGUE_TEMP_REGNUM either. Mips target uses mips_split_doubleword_move in mips_save_reg to implement double float reg saving. Seems I have to provide a special pattern using exactly the only (MIPS_PROLOGUE_TEMP_REGNUM) register, rather than paired registers starting from it. But, more patterns might result in consuming more memory, time. Since my application is some kinda very unique(o32 abi and no MIPS16), maybe I could use some paired temporary register in this purpose, like $8-$15, $24-$25. Thanks. -- Best Regards.
a typo in ira-emit.c?
Hi : I am studying ira right now, there is following code in change_loop if (parent_allocno == NULL || REGNO (ALLOCNO_REG (parent_allocno)) == REGNO (original_reg)) { if (internal_flag_ira_verbose > 3 && ira_dump_file) fprintf (ira_dump_file, " %i vs parent %i:", ALLOCNO_HARD_REGNO (allocno), ALLOCNO_HARD_REGNO (parent_allocno)); set_allocno_reg (allocno, create_new_reg (original_reg)); } Is it possible that parent_allocno == NULL here? or the fprintf might broken. Thanks. -- Best Regards.
Re: a typo in ira-emit.c?
> > Yes, I think it can be NULL in some complicated cases when a loop exit edge > comes not in the parent loop. By that, you mean the case an regno lives on edges which transfer between adjacent loops, and not lives in parent loop? So, the fprintf would access null pointer in this case. Thanks for explanation. -- Best Regards.
subreg against register allocation?
Hi : I am studying IRA right now (GCC4.4.1,mips32 target), for following piece of code: long long func(int a, int b) { long long r = (long long)a * (long long)b; return r; } the asm generated on mips is like: mult$5,$4 mfhi$5 mflo$2 j $31 move$3,$5 <--unnecessary move insn Please note the unnecessary move insn. RTL list before subreg1 and IRA pass are like: before subreg1 (insn 7 4 8 2 mult-problem.c:2 (set (reg:DI 196) (mult:DI (sign_extend:DI (reg/v:SI 195 [ b ])) (sign_extend:DI (reg/v:SI 194 [ a ] 50 {mulsidi3_32bit} (nil)) (insn 8 7 12 2 mult-problem.c:2 (set (reg:DI 193 [ ]) (reg:DI 196)) 282 {*movdi_32bit} (nil)) (insn 12 8 18 2 mult-problem.c:6 (set (reg/i:DI 2 $2) (reg:DI 193 [ ])) 282 {*movdi_32bit} (nil)) before IRA (insn 7 4 25 2 mult-problem.c:2 (set (reg:DI 196) (mult:DI (sign_extend:DI (reg:SI 5 $5 [ b ])) (sign_extend:DI (reg:SI 4 $4 [ a ] 50 {mulsidi3_32bit} (expr_list:REG_DEAD (reg:SI 5 $5 [ b ]) (expr_list:REG_DEAD (reg:SI 4 $4 [ a ]) (nil (insn 25 7 26 2 mult-problem.c:6 (set (reg:SI 2 $2) (subreg:SI (reg:DI 196) 0)) 287 {*movsi_internal} (nil)) (insn 26 25 18 2 mult-problem.c:6 (set (reg:SI 3 $3 [+4 ]) (subreg:SI (reg:DI 196) 4)) 287 {*movsi_internal} (expr_list:REG_DEAD (reg:DI 196) (nil))) ---end Seems DImode split prevents IRA allocating $2/$3 directly by introducing conflicts between $196 and $2/3 in (insn 25/26). Wondering whether possible to handle multi-word mode with more accuracy, in either subreg or IRA pass? Thanks in advance. -- Best Regards.
Re: subreg against register allocation?
Thanks for explanation. here are three more questions 1 , If I am talking the right thing, there are two insns like "*mulsi3_1" and "*smulsi3_highpart_insn", which set two parts of DImode pseudo regs of DImode mult. Since both parts pf result are used in the original example, I am not sure how to make split pattern to handle this case without generating two duplicate mult insns in parallel. 2 , If I could set the two parts of result in parallel insn, I also have to handle mips specific constraints in this case, i.e, constraints for HI/LO registers. Unfortunately, There is no "h" constraint now according to patch http://gcc.gnu.org/ml/gcc-patches/2008-05/msg01750.html It is not possible to write hi reg without clobbering the lo reg now, How should I handle this? 3 , Since I am studying IRA right now, I am very curious about whether possible to solve this in IRA. e.g, by shrinking live ranges of multi-word pseudo regs? PS, maybe I am talking gibberish, Sorry If not clear enough. Thanks. -- Best Regards.
question on function change_loop in IRA
Hi: At last of function change_loop, gcc try to change ALLOCNO_REG of local allocno. In the loop, ALLOCNO_SOMEWHERE_RENAMED_P (allocno) is set if allocno is not caps. Don't understand why the flag is set here. Doesn't all local allocnos' flag are set in this loop? seems conflicting with function set_allocno_somewhere_renamed_p and comments about that flag in ira-int.h Any tips? Thanks in advance. -- Best Regards.
Re: GCC4.3.4 downside against GCC3.4.4 on mips?
>>> >>> while GCC3.4.4 treats the long long multiplication just like simple >>> ones, which generates only one >>> mult insn for each statement, like >>> >>> In my understanding, It‘s not necessary using three mult insn to implement >>> long long mult, since the operands are converted from int type. >> >> This is more helpful. It is a known case in which GCC 4.x generates worse >> code. > > Should be fixed with 4.6. Hi, I tested this problem on GCC4.6 snapshot, and it works. But I could not find the specific patch or record in buglist, could you help? thanks very much. -- Best Regards.
question about float insns like ceil/floor on mips machine
Hi: I found although there are standard pattern names such as "ceilm2/floorm2", there is no insn pattern in mips.md for such float insns on mips target. further more, there is no ceil/floor rtl code in rtl.def either. based on these facts, I assuming those float insns are not supported by gcc, but don't know why, seems not difficult to add such insns. Did I miss anything important? please help, thanks. -- Best Regards.
why are multiply-accumulate insns not used when -mfp32 on mips
HI: found mult-acc insns like madd.s/d are only used when -mfp64 is specified, as to codes, there macros defined as: #define ISA_HAS_FP4 ((ISA_MIPS4 \ || (ISA_MIPS32R2 && TARGET_FLOAT64) \ <--only float 64 || ISA_MIPS64 \ || ISA_MIPS64R2) \ && !TARGET_MIPS16) #define ISA_HAS_FP_MADD4_MSUB4 ISA_HAS_FP4 why not use madd when fp32? Is there anything special with fp32? any clarification? Thanks very much -- Best Regards.
A minor mistake in cse_main?
Hi : In function cse_main, gcc processes ebb path by path. firstly, gcc finds the first bb of path in the reverse post order queue, plus if the bb is still not visited. then gcc finds all paths starting with that first bb. the corresponding code is like: do { bb = BASIC_BLOCK (rc_order[i++]); } while (TEST_BIT (cse_visited_basic_blocks, bb->index) && i < n_blocks); <---i might be equal to n_blocks at last while (cse_find_path (bb, &ebb_data, flag_cse_follow_jumps)) //...other codes But this code might result in unwanted operation. looking into one .cse2 dump file i've encountered, the paths information like: ;; Following path with 37 sets: 2 ;; Following path with 23 sets: 3 ;; Following path with 11 sets: 4 5 ;; Following path with 9 sets: 6 7 9 deferring rescan insn with uid = 163. ;; Following path with 8 sets: 6 7 8 <---basic block 8 first handled here ;; Following path with 19 sets: 10 11 ;; Following path with 2 sets: 8 <---handled again Apparently, basic block 8 in the last path has already been processed(in path 6, 8, 9). the problem is that both conditions of the do-while statement could be false, and gcc does not break out from here. for more information, the reverse post order (rc_order) for that case is dumped : rc_order [0] = 2 rc_order [1] = 3 rc_order [2] = 4 rc_order [3] = 5 rc_order [4] = 6 rc_order [5] = 7 rc_order [6] = 9 rc_order [7] = 10 rc_order [8] = 11 rc_order [9] = 8 < the last basic block is 8 Seems gcc should break after do-while statement if `i' and `b_blocks' are equal. Any comments? Thanks. -- Best Regards.
question on points-to analysis
Hi, I am studying gcc's points-to analysis right now and encountered a question. In paper "Off-line Variable Substitution for Scaling Points-to Analysis", section 3.2 It says that we should not substitute a variable with other if it is taken address. But in GCC's implementation, it units pointer but not location equivalent variables in function unite_pointer_equivalences. I am puzzled why gcc does this operation and How gcc keeps accuracy of points-to information after doing this. Further more, I did not found any words about this in paper "Exploiting Pointer and Location Equivalence to Optimize Pointer Analysis", which according comments in gcc, is the basis of GCC's implementation. Any tips?Thanks in advance. -- Best Regards.
Re: question on points-to analysis
> In theory, this is true, but a lot of the optimizations decrease > accuracy at a cost of making the problem solvable in a reasonable > amount of time. > By performing it after building initial points-to sets, the amount of > accuracy loss is incredibly small. > The only type of constraint that will generate inaccuracy at that > point is a complex address taken with offset one, which is pretty > rare. > On the other hand, *not* doing it will make the problem take forever to solve > :) > > What's better, something that gives correct but slightly conservative > answers in 10s, or something that gives correct and 1% less > conservative answers in 200s? > Got it, Thanks for Richard's quick reply and Daniel's detailed explanation. I need to dig deep to understand the codes. -- Best Regards.
question on ssa representation of aggregates
Hi : In paper "Memory SSA-A Unified Approach for Sparsely Representing Memory Operations", section 2.2, it says : "Whenever possible, compiler will create symbolic names to represent distinct regions inside aggregates(called structure field tags or SFT). For instance, in Figure 2(b), GCC will create three SFT symbols for this structure, namely SFT.0 for A.x, SFT.1 for A.b and SFT.2 for A.a" I tried GCC4.4.1(mips target) with following piece of code, ---start struct tag_1 { int *i; int *j; int *x; int y; }a; struct tag_2 { struct tag_1 t1[100]; int x[200]; int *y; }s; int func(int **p) { int *c = *p; if (a.y > 0) s.y = *p1; else *c = *s.y; return 0; } ---end The "055t.alias" dumped are like, ---start func (int * * p) { int * c; int * gp.2; int g.1; int D.1352; int * D.1351; int * D.1349; int * * p1.0; int D.1345; : # VUSE c_2 = *p_1(D); # VUSE D.1345_3 = a.y; if (D.1345_3 > 0) goto ; else goto ; : # VUSE p1.0_4 = p1; # VUSE D.1349_5 = *p1.0_4; # s_18 = VDEF s.y = D.1349_5; goto ; : # VUSE D.1351_6 = s.y; # VUSE D.1352_7 = *D.1351_6; # g_21 = VDEF # a_22 = VDEF # s_23 = VDEF # SMT.14_24 = VDEF *c_2 = D.1352_7; ---end. it seems structure a and s are treated as array variables, no SFT is created. Did I miss anything or the implementation is different? Thanks. -- Best Regards.
Re: question on ssa representation of aggregates
> The implementation of this stuff changes fairly regularly. The people > who like this kind of thing are still honing in on the best way to > handle aliasing information. Richard Guenther is the main guy working > in this area today. thanks very much for clarification. -- Best Regards.