Calculating instruction costs
I'm working on a gcc backend for an architecture. The architecture has instructions for indexed array access; so, ld r0, (r1, r2) is equivalent to r0 = r1[r2] where r1 is a int32_t*. I'm representing this in the .md file with the following pattern: (define_insn "*si_load_indexed" [ (set (match_operand:SI 0 "register_operand" "=r") (mem:SI (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "%r") (const_int 4)) (match_operand:SI 2 "register_operand" "r" ] "" "ld %0, (%2, %1)" [(set_attr "length" "4")] ) However, the instruction is never actually being emitted. Looking at the debug output from the instruction combining stage, I see this: Trying 8, 9 -> 10: Successfully matched this instruction: (set (reg:SI 47 [ *_5 ]) (mem:SI (plus:SI (mult:SI (reg/v:SI 43 [ b ]) (const_int 4 [0x4])) (reg:SI 0 r0 [ a ])) [2 *_5+0 S4 A32])) rejecting combination of insns 8, 9 and 10 original costs 8 + 4 + 4 = 16 replacement cost 32 Instructions 8, 9 and 10 are: (insn 8 5 9 2 (set (reg:SI 45) (ashift:SI (reg/v:SI 43 [ b ]) (const_int 2 [0x2]))) test.c:5 15 {ashlsi3} (expr_list:REG_DEAD (reg/v:SI 43 [ b ]) (nil))) (insn 9 8 10 2 (set (reg/f:SI 46) (plus:SI (reg/v/f:SI 42 [ a ]) (reg:SI 45))) test.c:5 13 {addsi3} (expr_list:REG_DEAD (reg:SI 45) (expr_list:REG_DEAD (reg/v/f:SI 42 [ a ]) (nil (insn 10 9 15 2 (set (reg:SI 47 [ *_5 ]) (mem:SI (reg/f:SI 46) [2 *_5+0 S4 A32])) test.c:5 6 {*si_load} (expr_list:REG_DEAD (reg/f:SI 46) (nil))) If I've read this correctly, it indicates that the instruction pattern has been matched, but the instruction has been rejected due to being more expensive than the original instructions. So, how is it calculating the cost of my instruction? Where's it getting that 32 from (which seems weirdly high)? Right now all the cost macros are left as the default, which is probably the root of the problem; but I'm having a lot of trouble getting my head around them. In the interest of actually getting something to work, are there any ways of using a simplified cost model where the cost of each instruction is specified manually in the instruction pattern alongside the length? (Or even just *using* the length as the cost...) -- ┌─── dg@cowlark.com ─ http://www.cowlark.com ─ │ "USER'S MANUAL VERSION 1.0: The information presented in this │ publication has been carefully for reliability." --- anonymous │ computer hardware manual signature.asc Description: OpenPGP digital signature
Re: Calculating instruction costs
On 9 July 2013 11:02, David Given wrote: > Right now all the cost macros are left as the default, which is probably > the root of the problem; but I'm having a lot of trouble getting my head > around them. In the interest of actually getting something to work, are > there any ways of using a simplified cost model where the cost of each > instruction is specified manually in the instruction pattern alongside > the length? (Or even just *using* the length as the cost...) Hi David! Does -Os achieve this? Jay. P.S. Sorry if you got two copies of this.
Re: Calculating instruction costs
2013/7/9, David Given : > I'm working on a gcc backend for an architecture. The architecture has > instructions for indexed array access; so, ld r0, (r1, r2) is equivalent > to r0 = r1[r2] where r1 is a int32_t*. > > I'm representing this in the .md file with the following pattern: > > (define_insn "*si_load_indexed" > [ > (set > (match_operand:SI 0 "register_operand" "=r") > (mem:SI > (plus:SI > (mult:SI > (match_operand:SI 1 "register_operand" "%r") > (const_int 4)) > (match_operand:SI 2 "register_operand" "r" > ] > "" > "ld %0, (%2, %1)" > [(set_attr "length" "4")] > ) > > However, the instruction is never actually being emitted. Looking at the > debug output from the instruction combining stage, I see this: > > Trying 8, 9 -> 10: > Successfully matched this instruction: > (set (reg:SI 47 [ *_5 ]) > (mem:SI (plus:SI (mult:SI (reg/v:SI 43 [ b ]) > (const_int 4 [0x4])) > (reg:SI 0 r0 [ a ])) [2 *_5+0 S4 A32])) > rejecting combination of insns 8, 9 and 10 > original costs 8 + 4 + 4 = 16 > replacement cost 32 > > Instructions 8, 9 and 10 are: > > (insn 8 5 9 2 (set (reg:SI 45) > (ashift:SI (reg/v:SI 43 [ b ]) > (const_int 2 [0x2]))) test.c:5 15 {ashlsi3} > (expr_list:REG_DEAD (reg/v:SI 43 [ b ]) > (nil))) > (insn 9 8 10 2 (set (reg/f:SI 46) > (plus:SI (reg/v/f:SI 42 [ a ]) > (reg:SI 45))) test.c:5 13 {addsi3} > (expr_list:REG_DEAD (reg:SI 45) > (expr_list:REG_DEAD (reg/v/f:SI 42 [ a ]) > (nil > (insn 10 9 15 2 (set (reg:SI 47 [ *_5 ]) > (mem:SI (reg/f:SI 46) [2 *_5+0 S4 A32])) test.c:5 6 {*si_load} > (expr_list:REG_DEAD (reg/f:SI 46) > (nil))) > > If I've read this correctly, it indicates that the instruction pattern > has been matched, but the instruction has been rejected due to being > more expensive than the original instructions. > > So, how is it calculating the cost of my instruction? Where's it getting > that 32 from (which seems weirdly high)? > > Right now all the cost macros are left as the default, which is probably > the root of the problem; but I'm having a lot of trouble getting my head > around them. In the interest of actually getting something to work, are > there any ways of using a simplified cost model where the cost of each > instruction is specified manually in the instruction pattern alongside > the length? (Or even just *using* the length as the cost...) > Hi, David, I suggest setting a break point on rtx_costs and see how it calculates the overall costs when combining insns 8, 9, and 10. Best regards, jasonwucj
Should -Wmaybe-uninitialized be included in -Wall?
When building gdb with newer gcc versions I frequently stumble across maybe-uninitialized false positives, like the ones documented in bug 57237. Various bugs address similar issues, and in bug 56526 Jakub Jelinek wrote: > Maybe-uninitialized warnings have tons of known false positives, while > the predicated analysis can handle the simplest cases, it can't handle > anything more complicated. Even the description of this option states: > These warnings are made optional because GCC is not smart enough to > see all the reasons why the code might be correct in spite of > appearing to have an error. With this situation at hand, I wonder whether it's a good idea to keep maybe-uninitialized included in -Wall. Projects which have been using "-Wall -Werror" successfully for many years are now forced to investigate non-existing bugs in their code.
Re: Should -Wmaybe-uninitialized be included in -Wall?
On 07/09/2013 12:59 PM, Andreas Arnez wrote: > With this situation at hand, I wonder whether it's a good idea to keep > maybe-uninitialized included in -Wall. Projects which have been using > "-Wall -Werror" successfully for many years are now forced to > investigate non-existing bugs in their code. But maybe-uninitialized is very useful, and it's not really inappropriate for -Wall. I would question the appropriateness of using -Wall -Werror in production code. Andrew.
Re: Should -Wmaybe-uninitialized be included in -Wall?
On 9 July 2013 13:04, Andrew Haley wrote: > On 07/09/2013 12:59 PM, Andreas Arnez wrote: >> With this situation at hand, I wonder whether it's a good idea to keep >> maybe-uninitialized included in -Wall. Projects which have been using >> "-Wall -Werror" successfully for many years are now forced to >> investigate non-existing bugs in their code. > > But maybe-uninitialized is very useful, and it's not really inappropriate > for -Wall. I would question the appropriateness of using -Wall -Werror > in production code. Me too, but for those who prefer it there is -Wno-error=maybe-uninitialized so they can keep everything else as an error.
Re: Inline virtual call puzzling behaviour
Hi, On Sun, 7 Jul 2013, Thierry Lavoie wrote: > int main(int argc, char** argv) > { > A* ptr = 0; > if(argc == 1) > ptr = new B(); > else > ptr = new A(); > > ptr->blah(); > > B().blah(); > C().blah(); > } > > The puzzling part is to find that the call for C().blah() is indeed > inlined and the ptr->blah() uses a VTable resolution, but the code for > B.blah() uses neither: the static adress is resolved but the code is not > inlined! (The same behaviour occurs if there would be a static-typed > pointer to an object of class B). I understand the compiler propagates > the types properly, but even after determining the correct type for the > object of type B, it only resolves the vtable reference (hence no call > *(%..x) ), but cannot perform the inlining. It can, but decides to not do. main is special and known to the compiler to be called only once, hence is itself cold, and GCC thinks inlining B::blah increases code size (of main). Rename main to foo, or put the whole code of main into a loop with unknown bounds and you see B::blah inlined (as well as some other functions). That C::blah is inlined in your example is a side-effect of that function being non-virtual, so some other transformations remove the 'this' argument, ultimately making gcc think inlining C::blah decreases code size of main (and hence it inlines even into that cold function). > From a practical point of view, I understand this example does not > justify by itself the absolute need for inlining. However, I do have a > time-critical application that would get 25-30% increase in speed if I > could solve this issue. Then the above testcase doesn't reflect the real issue you have. Ciao, Michael.
Re: Calculating instruction costs
Hi, On Tue, 9 Jul 2013, David Given wrote: > Trying 8, 9 -> 10: > Successfully matched this instruction: > (set (reg:SI 47 [ *_5 ]) > (mem:SI (plus:SI (mult:SI (reg/v:SI 43 [ b ]) > (const_int 4 [0x4])) > (reg:SI 0 r0 [ a ])) [2 *_5+0 S4 A32])) > rejecting combination of insns 8, 9 and 10 > original costs 8 + 4 + 4 = 16 > replacement cost 32 > > So, how is it calculating the cost of my instruction? Where's it getting > that 32 from (which seems weirdly high)? As you didn't adjust any cost I would guess the high value comes from the default implementation of address_cost, which simply uses arithmetic cost, and the MULT in there is quite expensive by default. See TARGET_ADDRESS_COST in several ports. Ciao, Michael.
Re: Should -Wmaybe-uninitialized be included in -Wall?
On 07/09/2013 02:56 PM, Andreas Arnez wrote: > What matters is whether *some* stages of production code development use > this combination of options. It could certainly be argued whether it > should also be a project's "configure" default, like currently the case > for gdb. It's not a problem for GDB because they track GCC development. Andrew.
Re: Should -Wmaybe-uninitialized be included in -Wall?
Andrew Haley writes: > On 07/09/2013 12:59 PM, Andreas Arnez wrote: >> With this situation at hand, I wonder whether it's a good idea to keep >> maybe-uninitialized included in -Wall. Projects which have been using >> "-Wall -Werror" successfully for many years are now forced to >> investigate non-existing bugs in their code. > > But maybe-uninitialized is very useful, and it's not really inappropriate > for -Wall. The warning would be extremely useful, if false positives didn't obscure real problems. In its current state I consider its usefulness limited. The reason I consider the warning (currently) inappropriate for -Wall is that it does not seem to behave as reliably and predictably as the other -Wall options. The description of -Wall says: "This enables all the warnings about constructions that some users consider questionable, and that are easy to avoid (or modify to prevent the warning) [...]" Note the "easy to avoid". If everybody agrees that -Wall should include maybe-uninitialized, then I suggest to change the wording here. > I would question the appropriateness of using -Wall -Werror in > production code. What matters is whether *some* stages of production code development use this combination of options. It could certainly be argued whether it should also be a project's "configure" default, like currently the case for gdb.
Pointer arithmetic
On a machine with ABI ILP32LL64: (insn 123 122 124 (nil) (set (reg:SI 392) (mem:SI (plus:SI (reg/v:SI 386) (reg/v:SI 349)) [0 sec 0 space 0, cmsmode 0 S4 A32])) -1 (nil) (nil)) If we support legitimate memory addresses like [r1+r2] (e.g. indexed addresses), can the above RTL match such a load? I am asking because of overflows, I am not sure how that part is defined, and where the Spec is. What do I need to check in the backend for such a definition? Is this POINTER_SIZE? E.g. what if the machine supports > 32 bits, who is responsible to make sure that there is no overflow > 32 bits in this case? Compiler? Assembler? Or even the user? Thanks, Hendrik
Re: Should -Wmaybe-uninitialized be included in -Wall?
Andrew> I would question the appropriateness of using -Wall -Werror in Andrew> production code. Andreas> What matters is whether *some* stages of production code Andreas> development use this combination of options. It could Andreas> certainly be argued whether it should also be a project's Andreas> "configure" default, like currently the case for gdb. gdb only enables it for the development branch, not for releases. If you're building from CVS you're expected to know how to either fix these problems or disable -Werror. Typically the fix is trivial; if you look through the archives you'll see fixes along these lines. Tom
Resolving an issue of bootstrap failure from vectorization.
Hi My name is Cong Hou, and I am a Noogler working in the compiler optimization team at Google. When we were trying moving the vectorization from O3 to O2 in GCC 4.9, we met a bootstrap failure from comparison between stage 2&3. This failure is caused by a potential bug in GCC as stated below. In the file tree-vect-data-refs.c, there is a qsort() function call which sorts a group of data references using a comparison function called "dr_group_sort_cmp()". In this function, the iterative hash values of tree nodes are used for comparisons. For a declaration tree node, its UID participates in the calculation of the hash value. However, a specific declaration may have different UIDs whether the debug information is switched on/off. In consequence, the results of comparisons may vary in stage 2&3 during bootstrapping. As a solution, I think we need a function to compare two tree nodes that does not rely on UIDs. An apparent idea is comparing the tree code first, and if they have the same tree code, then compare their subnodes recursively. To resolve the issue we have now, I just composed a very basic comparison function which only compares tree codes, and this patch can make the bootstrap get passed. I have attached the patch here. I am wondering if this is a valid solution and appreciate if someone could give me any feedback. Thank you! Cong patch.diff Description: Binary data
Re: Resolving an issue of bootstrap failure from vectorization.
Please repost the patch to gcc-patches@ mailing list. David On Tue, Jul 9, 2013 at 1:10 PM, Cong Hou wrote: > Hi > > My name is Cong Hou, and I am a Noogler working in the compiler > optimization team at Google. > > When we were trying moving the vectorization from O3 to O2 in GCC 4.9, > we met a bootstrap failure from comparison between stage 2&3. This > failure is caused by a potential bug in GCC as stated below. > > In the file tree-vect-data-refs.c, there is a qsort() function call > which sorts a group of data references using a comparison function > called "dr_group_sort_cmp()". In this function, the iterative hash > values of tree nodes are used for comparisons. For a declaration tree > node, its UID participates in the calculation of the hash value. > However, a specific declaration may have different UIDs whether the > debug information is switched on/off. In consequence, the results of > comparisons may vary in stage 2&3 during bootstrapping. > > As a solution, I think we need a function to compare two tree nodes > that does not rely on UIDs. An apparent idea is comparing the tree > code first, and if they have the same tree code, then compare their > subnodes recursively. To resolve the issue we have now, I just > composed a very basic comparison function which only compares tree > codes, and this patch can make the bootstrap get passed. I have > attached the patch here. > > I am wondering if this is a valid solution and appreciate if someone > could give me any feedback. > > > Thank you! > > > Cong
HAVE_ATTR_enabled mishandling?
I think I have found a bug. This is in stock gcc 4.8.1... My backend does not use the 'enabled' attribute; therefore the following code in insn-attr.h kicks in: #ifndef HAVE_ATTR_enabled #define HAVE_ATTR_enabled 0 #endif Therefore the following code in gcc/lra-constraints.c is enabled: #ifdef HAVE_ATTR_enabled if (curr_id->alternative_enabled_p != NULL && ! curr_id->alternative_enabled_p[nalt]) continue; #endif ->alternative_enabled_p is bogus; therefore segfault. Elsewhere I see structures of the form: #if HAVE_ATTR_enabled ... #endif So I think that #ifdef above is a straight typo. Certainly, changing it to a #if makes the crash go away... -- ┌─── dg@cowlark.com ─ http://www.cowlark.com ─ │ "Every planet is weird. I spent six weeks on a moon where the │ principal form of recreation was juggling geese. Baby geese. Goslings. │ They were juggled." --- Firefly, _Our Mrs. Reynolds_ signature.asc Description: OpenPGP digital signature
Re: Should -Wmaybe-uninitialized be included in -Wall?
On 07/09/2013 07:56 AM, Andreas Arnez wrote: Andrew Haley writes: On 07/09/2013 12:59 PM, Andreas Arnez wrote: With this situation at hand, I wonder whether it's a good idea to keep maybe-uninitialized included in -Wall. Projects which have been using "-Wall -Werror" successfully for many years are now forced to investigate non-existing bugs in their code. But maybe-uninitialized is very useful, and it's not really inappropriate for -Wall. The warning would be extremely useful, if false positives didn't obscure real problems. In its current state I consider its usefulness limited. Depends on your point of view. This topic has been hashed through repeatedly on this list. I personally like -Wall -Werror. While we do run into false positives and the set of false positives does change from release to release as a result of optimizations, I believe there's been an overall improvement in the quality of the codebases that use -Wall -Werror. I certainly see fewer bugs these days blamed on the compiler which are in fact uninitialized variables. I'd like to revamp how we do path isolation to eliminate more of the false positives, but this warning is by its nature going to generate false positives. What I would ask is that folks continue to file bugs for false positives. While I can't guarantee we'll fix them, they do provide a good base of tests for path sensitive analysis. I'd also suggest that when an initialization is added merely to avoid the warning that a comment be added to the code so that folks know it's merely to avoid the warning. Jeff