Regarding code portability across different gcc/g++ versions
Hi, I had a big piece of code that ran smoothly on gcc 3.2.2. For some reason, I had to start using that code on a machine with GCC 4.2.1. Now, it would throw segmentation faults (invalid free pointer etc) and abort the program. I presume this happens because the glibc with gcc 4.2.1 is smarter than the one with gcc 3.2.2. Hence, what was missed during execution with 3.2.2 was caught in 4.2.1 While it is great to catch as many errors as possible, will it not be better that execution support for code running on earlier versions was provided? May be what was missed in earlier versions should be flagged as "error with the current gcc version" or something like that and it does not abort the program thus continuing its execution leaving the developer with the option to fix the error later. Since, the code size in my case is very big and the original developer is not there to support, it is extremely difficult to resolve this issue. Regards, Sharad Sinha Research Scholar, Center for High Performance Embedded Systems, Level 3, Border X-Block, Research Techno Plaza, Nanyang Technological University, Singapore-637553
Re: Regarding code portability across different gcc/g++ versions
On 09/29/2010 08:07 AM, #SINHA SHARAD# wrote: > Hi, > > I had a big piece of code that ran smoothly on gcc 3.2.2. For > some reason, I had to start using that code on a machine with GCC > 4.2.1. Now, it would throw segmentation faults (invalid free pointer > etc) and abort the program. I presume this happens because the glibc > with gcc 4.2.1 is smarter than the one with gcc 3.2.2. Hence, what > was missed during execution with 3.2.2 was caught in 4.2.1 Maybe; it's hard to say without more investigation. > While it is great to catch as many errors as possible, will it > not be better that execution support for code running on earlier > versions was provided? That's not generally possible, because we don't know all the crazy things programmers do. > May be what was missed in earlier versions should be flagged as > "error with the current gcc version" or something like that and it > does not abort the program thus continuing its execution leaving the > developer with the option to fix the error later. We don't deliberately generate code that segfaults, I assure you. > Since, the code size in my case is very big and the original > developer is not there to support, it is extremely difficult to > resolve this issue. I suggest you start with Valgrind's memory checker. Andrew.
Re: Bugzilla not whining [was Re: Bugzilla outage Thursday, September 23, 18:00GMT-21:00GMT]
On 28/09/2010 22:24, Frédéric Buclin wrote: > Le 28. 09. 10 11:25, Dave Korn a écrit : >> I'm no longer >> receiving my nightly emails that the whine is supposed to be sending me. > > This should be fixed now. Let me know if you still don't get nightly emails. > > Frédéric Working fine now, thank you. cheers, DaveK
Re: Regarding code portability across different gcc/g++ versions
On 29 September 2010 10:29, Andrew Haley wrote: > On 09/29/2010 08:07 AM, #SINHA SHARAD# wrote: >> Hi, >> >> I had a big piece of code that ran smoothly on gcc 3.2.2. For >> some reason, I had to start using that code on a machine with GCC >> 4.2.1. Now, it would throw segmentation faults (invalid free pointer >> etc) and abort the program. I presume this happens because the glibc >> with gcc 4.2.1 is smarter than the one with gcc 3.2.2. Hence, what >> was missed during execution with 3.2.2 was caught in 4.2.1 > > Maybe; it's hard to say without more investigation. > >> While it is great to catch as many errors as possible, will it >> not be better that execution support for code running on earlier >> versions was provided? > > That's not generally possible, because we don't know all the crazy > things programmers do. > >> May be what was missed in earlier versions should be flagged as >> "error with the current gcc version" or something like that and it >> does not abort the program thus continuing its execution leaving the >> developer with the option to fix the error later. > > We don't deliberately generate code that segfaults, I assure you. > >> Since, the code size in my case is very big and the original >> developer is not there to support, it is extremely difficult to >> resolve this issue. > > I suggest you start with Valgrind's memory checker. > This should be in the FAQ. http://gcc.gnu.org/wiki/FAQ And it should mention: http://gcc.gnu.org/bugs/#upgrading Cheers, Manuel.
Worse code generated by PRE
Hello, I have been examining a significant performance regression between 4.5 and 4.4 in our port. I found that Partial Redundancy Elimination introduced in 4.5 causes the issue. The following pseudo code explains the problem: BB 3: r118 <- r114 + 2 BB 4: R114 <- r114 + 2 ... Conditional jump to BB 4 After PRE BB 3: r123 <- r114 + 2 r118 <- r123 BB 4: r114 <- r123 conditional jump to BB 5 BB5: r123 <- r114 + 2 jump to BB 4 A simple loop (BB 4) is divided into two basic blocks (BB 4 & 5). An extra jump instruction is introduced. On some targets, this jump can be removed by bb-reorder pass. On our target, it cannot be reordered due to complex doloop_end pattern we generate later. Additionally, since bb-reorder is done in very late phase, the code miss some optimization opportunity such as auto_inc_dec. I don't see any benefit here to do PRE like this. Maybe we should exclude such case in the first place? I read the relevant text in "Advanced Compiler Design Implementation", the example used is linear CFG and it doesn't mention how to handle loop case. Any suggestion is greatly appreciated. Thanks, Bingfeng Mei
Re: Worse code generated by PRE
On Wed, Sep 29, 2010 at 2:16 PM, Bingfeng Mei wrote: > Hello, > I have been examining a significant performance regression > between 4.5 and 4.4 in our port. I found that Partial Redundancy > Elimination introduced in 4.5 causes the issue. The following > pseudo code explains the problem: > > BB 3: > r118 <- r114 + 2 > > BB 4: > R114 <- r114 + 2 > ... > Conditional jump to BB 4 > > After PRE > > BB 3: > r123 <- r114 + 2 > r118 <- r123 > > BB 4: > r114 <- r123 > conditional jump to BB 5 > > BB5: > r123 <- r114 + 2 > jump to BB 4 > > > A simple loop (BB 4) is divided into two basic blocks (BB 4 & 5). > An extra jump instruction is introduced. On some targets, this > jump can be removed by bb-reorder pass. On our target, it cannot > be reordered due to complex doloop_end pattern we generate later. > Additionally, since bb-reorder is done in very late phase, the code > miss some optimization opportunity such as auto_inc_dec. I don't > see any benefit here to do PRE like this. Maybe we should exclude > such case in the first place? I read the relevant text in > "Advanced Compiler Design Implementation", the example used is linear > CFG and it doesn't mention how to handle loop case. PRE basically sinks the computation into the latch block (possibly creating that). Note that without a testcase it's hard to tell whether this is ok in general. PRE tries to avoid generation of new induction variables and cross-iteration data-dependences, see insert_into_preds_of_block. Note that 4.4 in principle performs the same optimization (you might figure that PRE in 4.4 is generally disabled for -Os but enabled in 4.5, but only for hot execution traces following existing practice to tune code-size/performance on a fine-grained basis). Richard. > Any suggestion is greatly appreciated. > > Thanks, > Bingfeng Mei > > > > > > > >
RE: Worse code generated by PRE
Richard, Here is the test code. typedef short int16_t; typedef unsigned short uint16_t; void MemSet16( int16_t *pBuf, /* Buffer address */ int16_t Val,/* Value to be set */ uint16_tBytes /* Total size in bytes */ ) { uint16_t Idx; for(Idx=0; Idx<(Bytes>>1); Idx++) *pBuf++ = Val; } I grepped insert_into_preds_of_block and found it is called only by tree-ssa-pre.c. Actually, I am referring to RTL PRE pass in gcse.c and lcm.c. Before PRE: ;; Start of basic block ( 2) -> 3 ;; bb 3 artificial_defs: { } ;; bb 3 artificial_uses: { u9(55){ }u10(57){ }u11(62){ }} ;; lr in55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 115 ;; lr use 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 ;; lr def 110 118 119 120 121 ;; live in 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 115 ;; live gen 110 118 119 120 121 ;; live kill ;; Pred edge 2 [91.0%] (fallthru) (note 34 33 35 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn 35 34 36 3 tst.c:4 (set (reg/f:SI 118) (plus:SI (reg/v/f:SI 114 [ pBuf ]) (const_int 2 [0x2]))) 273 {addsi3} (nil)) (insn 36 35 37 3 tst.c:4 (set (reg:HI 119) (plus:HI (reg:HI 113 [ D.3441 ]) (const_int -1 [0x]))) 276 {addhi3} (expr_list:REG_DEAD (reg:HI 113 [ D.3441 ]) (nil))) (insn 37 36 38 3 tst.c:4 (set (reg:SI 120) (zero_extend:SI (reg:HI 119))) 1056 {zero_extendhisi2} (expr_list:REG_DEAD (reg:HI 119) (nil))) (insn 38 37 39 3 tst.c:4 (set (reg:SI 121) (ashift:SI (reg:SI 120) (const_int 1 [0x1]))) 389 {ashlsi3} (expr_list:REG_DEAD (reg:SI 120) (nil))) (insn 39 38 43 3 tst.c:4 (set (reg/f:SI 110 [ D.3464 ]) (plus:SI (reg/f:SI 118) (reg:SI 121))) 273 {addsi3} (expr_list:REG_DEAD (reg:SI 121) (expr_list:REG_DEAD (reg/f:SI 118) (nil ;; End of basic block 3 -> ( 4) ;; lr out 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115 ;; live out 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115 ;; Succ edge 4 [100.0%] (fallthru) ;; Start of basic block ( 4 3) -> 4 ;; bb 4 artificial_defs: { } ;; bb 4 artificial_uses: { u18(55){ }u19(57){ }u20(62){ }} ;; lr in55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115 ;; lr use 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115 ;; lr def 114 122 ;; live in 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115 ;; live gen 114 122 ;; live kill ;; Pred edge 4 [91.0%] ;; Pred edge 3 [100.0%] (fallthru) (code_label 43 39 40 4 3 "" [1 uses]) (note 40 43 41 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (insn 41 40 42 4 tst.c:14 (set (mem:HI (reg/v/f:SI 114 [ pBuf ]) [2 *pBuf+0 S2 A16]) (reg/v:HI 115 [ Val ])) 236 {*movhhi} (nil)) (insn 42 41 44 4 tst.c:14 (set (reg/v/f:SI 114 [ pBuf ]) (plus:SI (reg/v/f:SI 114 [ pBuf ]) (const_int 2 [0x2]))) 273 {addsi3} (nil)) (insn 44 42 45 4 tst.c:13 (set (reg:BI 122) (ne:BI (reg/v/f:SI 114 [ pBuf ]) (reg/f:SI 110 [ D.3464 ]))) 1006 {cmp_simode} (nil)) (jump_insn 45 44 48 4 tst.c:13 (set (pc) (if_then_else (ne (reg:BI 122) (const_int 0 [0x0])) (label_ref 43) (pc))) 1085 {cbranchbi4} (expr_list:REG_DEAD (reg:BI 122) (expr_list:REG_BR_PROB (const_int 9100 [0x238c]) (expr_list:REG_PRED_WIDTH (const_int 4 [0x4]) (nil -> 43) ;; End of basic block 4 -> ( 4 5) ;; lr out 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115 ;; live out 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115 After PRE: ;; Start of basic block ( 2) -> 3 ;; bb 3 artificial_defs: { } ;; bb 3 artificial_uses: { u9(55){ }u10(57){ }u11(62){ }} ;; lr in55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 115 ;; lr use 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 ;; lr def 110 118 119 120 121 ;; live in 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 115 ;; live gen 110 118 119 120 121 ;; live kill ;; Pred edge 2 [91.0%] (fallthru) (note 34 33 35 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn 35 34 53 3 tst.c:4 (set (reg/f:SI 123 [ pBuf ]) (plus:SI (reg/v/f:SI 114 [ pBuf ]) (const_int 2 [0x2]))) 273 {addsi3} (nil)) (insn 53 35 36 3 tst.c:4 (set (reg/f:SI 118) (reg/f:SI 123 [ pBuf ])) -1 (nil)) (insn 36 53 37 3 tst.c:4 (set (reg:HI 119) (plus:HI (reg:HI 113 [ D.3441 ]) (const_int -1 [0x]))) 276 {addhi3} (expr_list:REG_DEAD (reg:HI 113 [ D.3441 ]) (nil))) (insn 37 36 38 3 tst.c:4 (set (reg:SI 120) (zero_extend:SI (reg:HI 119))) 1056 {zero_extendhisi2} (expr_list:REG_DEAD (reg:HI 119) (nil))) (insn 38 37 39 3 tst.c:4 (set (reg:SI 121)
Re: Worse code generated by PRE
The optimization does look bad -- splitting backedge to allow expression hoisting rarely removes any redundancy -- unless the loop is really short trip counted. Besides it introduces extra copy, jump instruction and increases register pressure. David On Wed, Sep 29, 2010 at 5:55 AM, Bingfeng Mei wrote: > Richard, > Here is the test code. > typedef short int16_t; > typedef unsigned short uint16_t; > > void MemSet16( > int16_t *pBuf, /* Buffer address */ > int16_t Val, /* Value to be set */ > uint16_t Bytes /* Total size in bytes */ > ) > > { > uint16_t Idx; > > for(Idx=0; Idx<(Bytes>>1); Idx++) > *pBuf++ = Val; > } > > I grepped insert_into_preds_of_block and found it is called only by > tree-ssa-pre.c. Actually, I am referring to RTL PRE pass in gcse.c > and lcm.c. > > > Before PRE: > > > ;; Start of basic block ( 2) -> 3 > ;; bb 3 artificial_defs: { } > ;; bb 3 artificial_uses: { u9(55){ }u10(57){ }u11(62){ }} > ;; lr in 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 115 > ;; lr use 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 > ;; lr def 110 118 119 120 121 > ;; live in 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 115 > ;; live gen 110 118 119 120 121 > ;; live kill > > ;; Pred edge 2 [91.0%] (fallthru) > (note 34 33 35 3 [bb 3] NOTE_INSN_BASIC_BLOCK) > > (insn 35 34 36 3 tst.c:4 (set (reg/f:SI 118) > (plus:SI (reg/v/f:SI 114 [ pBuf ]) > (const_int 2 [0x2]))) 273 {addsi3} (nil)) > > (insn 36 35 37 3 tst.c:4 (set (reg:HI 119) > (plus:HI (reg:HI 113 [ D.3441 ]) > (const_int -1 [0x]))) 276 {addhi3} > (expr_list:REG_DEAD (reg:HI 113 [ D.3441 ]) > (nil))) > > (insn 37 36 38 3 tst.c:4 (set (reg:SI 120) > (zero_extend:SI (reg:HI 119))) 1056 {zero_extendhisi2} > (expr_list:REG_DEAD (reg:HI 119) > (nil))) > > (insn 38 37 39 3 tst.c:4 (set (reg:SI 121) > (ashift:SI (reg:SI 120) > (const_int 1 [0x1]))) 389 {ashlsi3} (expr_list:REG_DEAD (reg:SI > 120) > (nil))) > > (insn 39 38 43 3 tst.c:4 (set (reg/f:SI 110 [ D.3464 ]) > (plus:SI (reg/f:SI 118) > (reg:SI 121))) 273 {addsi3} (expr_list:REG_DEAD (reg:SI 121) > (expr_list:REG_DEAD (reg/f:SI 118) > (nil > ;; End of basic block 3 -> ( 4) > ;; lr out 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115 > ;; live out 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115 > > > ;; Succ edge 4 [100.0%] (fallthru) > > ;; Start of basic block ( 4 3) -> 4 > ;; bb 4 artificial_defs: { } > ;; bb 4 artificial_uses: { u18(55){ }u19(57){ }u20(62){ }} > ;; lr in 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115 > ;; lr use 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115 > ;; lr def 114 122 > ;; live in 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115 > ;; live gen 114 122 > ;; live kill > > ;; Pred edge 4 [91.0%] > ;; Pred edge 3 [100.0%] (fallthru) > (code_label 43 39 40 4 3 "" [1 uses]) > > (note 40 43 41 4 [bb 4] NOTE_INSN_BASIC_BLOCK) > > (insn 41 40 42 4 tst.c:14 (set (mem:HI (reg/v/f:SI 114 [ pBuf ]) [2 *pBuf+0 > S2 A16]) > (reg/v:HI 115 [ Val ])) 236 {*movhhi} (nil)) > > (insn 42 41 44 4 tst.c:14 (set (reg/v/f:SI 114 [ pBuf ]) > (plus:SI (reg/v/f:SI 114 [ pBuf ]) > (const_int 2 [0x2]))) 273 {addsi3} (nil)) > > (insn 44 42 45 4 tst.c:13 (set (reg:BI 122) > (ne:BI (reg/v/f:SI 114 [ pBuf ]) > (reg/f:SI 110 [ D.3464 ]))) 1006 {cmp_simode} (nil)) > > (jump_insn 45 44 48 4 tst.c:13 (set (pc) > (if_then_else (ne (reg:BI 122) > (const_int 0 [0x0])) > (label_ref 43) > (pc))) 1085 {cbranchbi4} (expr_list:REG_DEAD (reg:BI 122) > (expr_list:REG_BR_PROB (const_int 9100 [0x238c]) > (expr_list:REG_PRED_WIDTH (const_int 4 [0x4]) > (nil > -> 43) > ;; End of basic block 4 -> ( 4 5) > ;; lr out 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115 > ;; live out 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115 > > > After PRE: > > ;; Start of basic block ( 2) -> 3 > ;; bb 3 artificial_defs: { } > ;; bb 3 artificial_uses: { u9(55){ }u10(57){ }u11(62){ }} > ;; lr in 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 115 > ;; lr use 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 > ;; lr def 110 118 119 120 121 > ;; live in 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 115 > ;; live gen 110 118 119 120 121 > ;; live kill > > ;; Pred edge 2 [91.0%] (fallthru) > (note 34 33 35 3 [bb 3] NOTE_INSN_BASIC_BLOCK) > > (insn 35 34 53 3 tst.c:4 (set (reg/f:SI 123 [ pBuf ]) > (plus:SI (reg/v/f:SI 114 [ pBuf ]) > (const_int 2 [0x2]))) 273 {addsi3} (nil)) > > (insn 5
Re: Clarification on who can approve Objective-C/Objective-C++ parser patches
Thanks Joseph Is it confirmed that this is the opinion of the C++ FE maintainers as well ? Can we get that clarified ? Do they want to review Objective-C++ patches ? (I'm still personally of the opinion the Objective-C++ maintainer should approve Objective-C++ patches, but Mike tells me he's been told he can't approve any changes inside gcc/cp, not even if they are Objective-C++-only, so I'm asking again) Thanks! -Original Message- From: "Joseph S. Myers" Sent: Thursday, 23 September, 2010 17:05 To: "Nicola Pero" Cc: "g...@gnu.org" Subject: Re: Clarification on who can approve Objective-C/Objective-C++ parser patches On Thu, 23 Sep 2010, Nicola Pero wrote: > For example, if I post a patch that changes a piece of code in > gcc/c-parser.c which is only ever used if (c_dialect_objc ()), then I > assume that it is part of the Objective-C front-end, and the > Objective-C/Objective-C++ maintainers are in charge of approving it. > Once they approve it, I can commit. > > Is that correct ? Yes. I generally expect ObjC maintainers to review changes to those parts of c-parser.c. -- Joseph S. Myers jos...@codesourcery.com
check_cxa_atexit_available
The test program in target-supports.exp is broken, since it doesn't preclude the use of cleanups instead. Indeed, the init/cleanup3.C seems to be essentially identical to the target-supports test. Any suggestions that doesn't essentially reverse this situation? I.e. I could switch the target-supports test to grep the assembly for __cxa_atexit, but I suspect that would more or less automatically cause the cleanup3.C test to pass. r~
Re: Regarding code portability across different gcc/g++ versions
On 29 September 2010 08:07, #SINHA SHARAD# wrote: > Hi, > > I presume this happens because the glibc with gcc 4.2.1 is smarter than the > one with gcc 3.2.2. Hence, what was missed during execution with 3.2.2 was > caught in 4.2.1 N.B. glibc does not come with GCC, you can generally use a new GCC on a machine with an old glibc and vice versa - they are separate projects and are released independently.
Handling NaNs in FP comparisons
Hi -- I'm working with a processor which sets the condition bits when a NaN is used as an operand in a compare in a way which is the same as a valid ordered compare. There is a flag bit which is set for a NaN compare, but it may also be set in a non-NaN compare. float a = 1.0, b = 2.0, x = NaN; (a < b) generates the same condition flags as (a < x). IEEE std requires all comparisons involving a NaN to fail (or trap). Are there other processors which do this? How do they handle generating IEEE std compliant code? A related problem is that CSE will optimize FP comparisons and garble the result. (This doesn't happen with soft-fp.) int r = 0, s = 0; float a = 1.0, x = NaN; r = (a <= x); s = (a > x); should result in r == s == 0. CSE translates this (more or less) into r = (a <= x); s = !r; Is there a way to prevent CSE from optimizing FP comparisons? -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Re: Handling NaNs in FP comparisons
On 09/29/2010 04:31 PM, Michael Eager wrote: > float a = 1.0, b = 2.0, x = NaN; > (a < b) generates the same condition flags as (a < x). ... > Are there other processors which do this? How do they > handle generating IEEE std compliant code? It looks like there is a bunch of code under config that's conditionalized on flag_finite_math_only, which disables support for NaN and Inf. At a glance, rs6000_generate_compare may be relevant. > > A related problem is that CSE will optimize FP comparisons > and garble the result. (This doesn't happen with soft-fp.) > > int r = 0, s = 0; > float a = 1.0, x = NaN; > > r = (a <= x); > s = (a > x); > > should result in r == s == 0. CSE translates this (more > or less) into > > r = (a <= x); > s = !r; > > Is there a way to prevent CSE from optimizing FP comparisons? Add the missing check vs HONOR_NANS. This is clearly a bug. r~