little endian code on sparc v8
We are experimenting on the custom sparc-based core for embedded applications. The big headache that I am facing these days is that "-mcpu=v8"(gcc-sparc-v8) does not support little-endian. After web searching, it seems that gcc-sparclet supported V8 engine but it seems that it's now deleted(?). Is there a way to do generate little-endian code for -mcpu=v8? There is the gcc option "-mlittle-endian" to support little-endian in gcc-sparc, but the manual also says "-mlittle-endian" is not supported in sparc. Is there someone who know the patches or way to do little-endian for sparc core? As I am working on gcc these days, it would be appreciated if someone tells me how I can join gcc developers group for sparc v8's custom core. Regards, Youngsu
Re: Debugging C++ Function Calls
On Mon, Mar 25, 2013 at 7:20 PM, Tom Tromey wrote: >> "Lawrence" == Lawrence Crowl writes: > > Lawrence> Hm. I haven't thought about this deeply, but I think SFINAE may > Lawrence> not be less of an issue because it serves to remove candidates > Lawrence> from potential instantiation, and gdb won't be instantiating. > Lawrence> The critical distinction is that I'm not trying to call arbitrary > Lawrence> expressions (which would have a SFINAE problem) but call expressions > Lawrence> that already appear in the source. > > Thanks. > I will think about it. > > Lawrence> I agree that the best long-term solution is an integrated compiler, > Lawrence> interpreter, and debugger. That's not likely to happen soon. :-) > > Sergio is re-opening our look into reusing GCC. > Keith Seitz wrote a GCC plugin to try to let us farm out > expression-parsing to the compiler. This has various issues, some > because gdb allows various C++ extensions that are useful when > debugging; and also g++ was too slow. Did you consider using clang?
Re: rfc: another switch optimization idea
On Mon, Mar 25, 2013 at 10:23 PM, Dinar Temirbulatov wrote: > Hi, > We noticed some performance gains if we are not using jump over some > simple switch statements. Here is the idea: Check whether the switch > statement can be expanded with conditional instructions. In that case > jump tables should be avoided since some branch instructions can be > eliminated in further passes (replaced by conditional execution). > >For example: >switch (i) >{ > case 1: sum += 1; > case 2: sum += 3; > case 3: sum += 5; > case 4: sum += 10; >} > > Using jump tables the following code will be generated (ARM assembly): > >ldrcc pc, [pc, r0, lsl #2] >b .L5 >.L0: > .word L1 > .word L2 > .word L3 > .word L4 > >.L1: > add r3, #1 >.L2: > add r3, #4 >.L3: > add r3, #5 >.L4: > add r3, #10 >.L5 > > Although this code has a constant complexity it can be improved by the > conditional execution to avoid implicit branching: > >cmp r0,1 >addeq r3, #1 >cmp r0,2 >addeq r3, #4 >cmp r0,3 >addeq r3, #5 >cmp r0,4 >addeq r3, #10 > > Although the assembly below requires more assembly instructions to be > executed, it doesn't violate the CPU pipeline (since no branching is > performed). > > The original version of patch for was developed by Alexey Kravets. I > measured some performance improvements/regressions using spec 2000 int > benchmark on Samsumg's exynos 5250. Here is the result: > > before: >Base Base Base Peak > Peak Peak >BenchmarksRef Time Run Time RatioRef Time Run Time Ratio > > >164.gzip 1400287 487* 1400 288 485* >175.vpr 1400376 373* 1400 374 374* >176.gcc 1100121 912* 1100 118 933* >181.mcf 1800242 743* 1800 251 718* >186.crafty1000159 628* 1000 165 608* >197.parser1800 347 518* 1800 329 547* >252.eon 1300 960 135* 1300 960 135* >253.perlbmk 1800 214 842* 1800 212 848* >254.gap 1100 138 797* 1100 136 806* >255.vortex1900 253 750* 1900 255 744* >256.bzip2 1500 237 632* 1500 230 653* >300.twolf X X >SPECint_base2000 561 >SPECint2000 563 > > After: >164.gzip 1400 286 490* 1400 288 486* >175.vpr 1400 213 656* 1400 215 650* >176.gcc 1100 119 923* 1100 118 933* >181.mcf 1800 247 730* 1800 251 717* >186.crafty1000 145 688* 1000 150 664* >197.parser 1800 296 608* 1800 275 654* >252.eon X X >253.perlbmk 1800 206 872* 1800 211 853* >254.gap 1100 133 825* 1100 131 838* >255.vortex1900 241 789* 1900 239 797* >256.bzip2 1500 235 638* 1500 226 663* >300.twolf X X > > The error in 252.eon was due to incorrect setup. Also "if (count > > 3*PARAM_VALUE (PARAM_SWITCH_JUMP_TABLES_BB_OPS_LIMIT))" does not look > correct, and probably it is better to move this code in the earlier > stage just before the gimple expand and keep preference expand state > (jump-tables or not) for every switch statement to avoid dealing with > the RTL altogether. Moving switch "expansion" to GIMPLE is an idea that is around since quite some time. Basically you'd lower switches so that remaining switch statements directly map to jump-tables only. Steven was working on this a bit and if I remember correctly 4.8 has some improvements here in the switch-conversion pass. Richard. > > thanks, Dinar.
Re: rfc: another switch optimization idea
On Tue, Mar 26, 2013 at 10:31 AM, Richard Biener wrote: > On Mon, Mar 25, 2013 at 10:23 PM, Dinar Temirbulatov wrote: >> The error in 252.eon was due to incorrect setup. Also "if (count > >> 3*PARAM_VALUE (PARAM_SWITCH_JUMP_TABLES_BB_OPS_LIMIT))" does not look >> correct, and probably it is better to move this code in the earlier >> stage just before the gimple expand and keep preference expand state >> (jump-tables or not) for every switch statement to avoid dealing with >> the RTL altogether. > > Moving switch "expansion" to GIMPLE is an idea that is around since > quite some time. Basically you'd lower switches so that remaining > switch statements directly map to jump-tables only. Steven was working > on this a bit and if I remember correctly 4.8 has some improvements > here in the switch-conversion pass. Right. I will move switch lowering to GIMPLE for GCC 4.9. Everything not lowered will be expanded as a casesi or tablejump. New methods of lowering switches can be added in the new GIMPLE lowering pass once that's done. Ciao! Steven
Re: Compiler speed (vanilla vs. LTO, PGO and LTO+PGO)
> Yes, the binary size is 8-10% smaller. Unfortunately there are no performance > improvements. > > LTO+PGO-disable-plugin: > -rwxr-xr-x 1 markus markus 15025568 Mar 25 15:49 cc1 > -rwxr-xr-x 1 markus markus 16198584 Mar 25 15:49 cc1plus > -rwxr-xr-x 1 markus markus 13907328 Mar 25 15:49 lto1 > -rwxr-xr-x 4 markus markus 492360 Mar 25 15:49 c++ > -rwxr-xr-x 1 markus markus 488240 Mar 25 15:49 cpp > -rwxr-xr-x 3 markus markus 488216 Mar 25 15:49 gcc > > Firefox: > LTO+PGO-disable-plugin: 4590.55s user 273.70s system 343% cpu 23:34.65 total > > kernel: > LTO+PGO-disable-plugin: > 344.11s user 23.59s system 322% cpu 1:54.08 total 340.94s user 23.65s system > 326% cpu 1:51.56 total 339.66s user 23.41s system 329% cpu 1:50.09 total Interesting, I was able to get faste LTO+PGO compile times than non-LTO,PGO. I however did testng only on combine.c compliation, so not very scientific. There are some cases FDO information is not streamed well in all cases. I will post patch for that later today. Perhaps it will make situation bit better. Honza
Re: Re: Compiler speed (vanilla vs. LTO, PGO and LTO+PGO)
-Ursprüngliche Nachricht- Gesendet: Dienstag, 26 März 2013 um 12:13:26 Uhr Von: "Jan Hubicka" An: "Markus Trippelsdorf" Betreff: Re: Compiler speed (vanilla vs. LTO, PGO and LTO+PGO) > Yes, the binary size is 8-10% smaller. Unfortunately there are no performance > improvements. > > LTO+PGO-disable-plugin: > -rwxr-xr-x 1 markus markus 15025568 Mar 25 15:49 cc1 > -rwxr-xr-x 1 markus markus 16198584 Mar 25 15:49 cc1plus > -rwxr-xr-x 1 markus markus 13907328 Mar 25 15:49 lto1 > -rwxr-xr-x 4 markus markus 492360 Mar 25 15:49 c++ > -rwxr-xr-x 1 markus markus 488240 Mar 25 15:49 cpp > -rwxr-xr-x 3 markus markus 488216 Mar 25 15:49 gcc > > Firefox: > LTO+PGO-disable-plugin: 4590.55s user 273.70s system 343% cpu 23:34.65 total > > kernel: > LTO+PGO-disable-plugin: > 344.11s user 23.59s system 322% cpu 1:54.08 total 340.94s user 23.65s system > 326% cpu 1:51.56 total 339.66s user 23.41s system 329% cpu 1:50.09 total Interesting, I was able to get faste LTO+PGO compile times than non-LTO,PGO. I however did testng only on combine.c compliation, so not very scientific. There are some cases FDO information is not streamed well in all cases. I will post patch for that later today. Perhaps it will make situation bit better. Honza Thanks for all the input for the question "is it useful to compile gcc 4.8.0 with the lto option?" Best regard!
RE: Modeling predicate registers with more than one bit
Hi, sorry for the delay of this reply but just returned from paternity leave. > > Have you had a look at the SH backend? SH cores have a "T Bit" > register, which functions as carry bit, over/underflow, comparison > result and branch condition register. In the SH backend it's treated as > a fixed SImode hard-reg (although BImode would suffice in this case, I > guess). > I have looked at sh but didn't fully understand how it worked. Your explanation made it clear. > > The predicate is for matching various forms of T bit negation patterns. > > Maybe you could try the same approach for your case. > If your predicate register has multiple independent bit(fields), you > could try defining separate hard-regs for every bit(field). > It sounds that could be what I want. I probably need not different hard-regs but different pseudos (since I have different pseudo regs) at different modes (since the register might be set differently depending of the mode of the comparison). That seems to be the way to go. Cheers, Paulo Matos
RE: Modeling predicate registers with more than one bit
Hi, sorry for the delay of this reply but just returned from paternity leave. > -Original Message- > From: Hans-Peter Nilsson [mailto:h...@bitrange.com] > Sent: 05 March 2013 01:45 > To: Paulo Matos > Cc: gcc@gcc.gnu.org > Subject: Re: Modeling predicate registers with more than one bit > > Except for CCmodes being dependent on source-modes, I'd sneak > peeks at PowerPC. > What do you mean by source modes? > > If not, is there any way to currently > > (as of HEAD) model this in GCC? > > IIUC, this sounds simply like having multiple separate > condition-code registers, just with a size-dependent CCmodes > twist; for each type of comparison where there'd be a separate > CCmode variant, you also need separate CCmodes for each source > mode M, all separated in cbranchM4 and cstoreM4. > I am not sure CC_MODE can solve the problem but I am not entirely experienced with using different CC_MODEs, the first thing that comes to mind is, how do you set the size of a CCmode? A predicate register in our backend can be set as if it had different sizes. So, even though the register has 8 bits. It's possible to have just 1 bit set, 2 bit sets, 4 bit sets of 8 bits sets depending if a comparison is of mode BI, QI, SI or DI. I might have to use proper registers like SH does (following Oleg suggestion). Thanks, Paulo Matos
cond_exec no-ops in RTL optimisations
Hi everyone, While working with some splitters I noticed that the RTL optimisation passes do not optimise away a no-op wrapped in a cond_exec. So for example, if my splitter generates something like: (cond_exec (lt:SI (reg:CC CC_REGNUM) (const_int 0)) (set (match_dup 1) (match_dup 2))) and operand 1 and 2 are the same register (say r0), this persists through all the optimisation passes and results on ARM in a redundant movlt r0, r0 I noticed that if I generate an unconditional SET it gets optimised away in the cases when it's a no-op. I can work around this by introducing a peephole2 that lifts the SET out of the cond_exec like so: (define_peephole2 [(cond_exec (match_operator 0 "comparison_operator" [(reg:CC CC_REGNUM) (const_int 0)]) (set (match_operand:SI 1 "register_operand" "") (match_dup 1)))] "" [(set (match_dup 1) (match_dup 1))]) and the optimisers will catch it and remove it but this seems like a hack. What if it was a redundant ADD (with 0) or an AND (and r0, r0, r0)? Doesn't seem right to add peepholes for each of those cases. Is that something the RTL optimisers should be able to remove? Are there any targets where a conditional no-op may not be removed? Thanks, Kyrill
Widening multiplication limitations
I was playing with adding support of the various modes of widening multiplies on my backend, and hit some restrictions in the expansion code that I couldn't explain to myself. These restrictions only impact the signed by unsigned version. The first limitation was about the detection of widening multiplies when one of the operands is a big constant of opposite signedness of the other. It might very well be the case that nobody cared adding the support for that. I used the following simple patch to overcome that: @@ -2059,16 +2059,30 @@ is_widening_mult_p (gimple stmt, if (*type1_out == NULL) { - if (*type2_out == NULL || !int_fits_type_p (*rhs1_out, *type2_out)) - return false; - *type1_out = *type2_out; + if (*type2_out == NULL) +return false; + if (!int_fits_type_p (*rhs1_out, *type2_out)) { +tree other_type = signed_or_unsigned_type_for (!TYPE_UNSIGNED (*type2_out) + *type2_out); +if (!int_fits_type_p (*rhs1_out, other_type)) + return false; +*type1_out = other_type; + } else { +*type1_out = *type2_out; + } } if (*type2_out == NULL) { - if (!int_fits_type_p (*rhs2_out, *type1_out)) - return false; - *type2_out = *type1_out; + if (!int_fits_type_p (*rhs2_out, *type1_out)) { +tree other_type = signed_or_unsigned_type_for (!TYPE_UNSIGNED (*type1_out) + *type1_out); +if (!int_fits_type_p (*rhs2_out, other_type)) + return false; +*type2_out = other_type; + } else { +*type2_out = *type1_out; + } } Is that extension of the logic correct? After having done that modification and thus having the middle end generate widening multiplies of this kind, I hit the second limitation in expr.c:expand_expr_real_2 : /* First, check if we have a multiplication of one signed and one unsigned operand. */ if (TREE_CODE (treeop1) != INTEGER_CST && (TYPE_UNSIGNED (TREE_TYPE (treeop0)) != TYPE_UNSIGNED (TREE_TYPE (treeop1 Here, the code trying to expand a signed by unsigned widening multiply explicitly checks that the operand isn't a constant. Why is that? I removed that condition to try to find the failing cases, but the few million random multiplies that I threw at it didn't fail in any visible way. One difficulty I found was that the widening multiplies are expressed as eg: (mult (zero_extend (operand 1)) (zero_extend (operand 2))) and that simplify_rtx will ICE when trying to simplify a zero_extend of a VOIDmode const_int. It forced me to carefully add different patterns to handle the immediate versions of the operations. But that doesn't seem like a good reason to limit the code expansion... Can anyone explain this condition? Many thanks, Fred
RE: Modeling predicate registers with more than one bit
On Tue, 26 Mar 2013, Paulo Matos wrote: > > -Original Message- > > From: Hans-Peter Nilsson [mailto:h...@bitrange.com] > > Sent: 05 March 2013 01:45 > > To: Paulo Matos > > Cc: gcc@gcc.gnu.org > > Subject: Re: Modeling predicate registers with more than one bit > > > > Except for CCmodes being dependent on source-modes, I'd sneak > > peeks at PowerPC. > > > > What do you mean by source modes? The SI and HI in subsi3 and subhi3. IIRC you said your ISA set CC-bits differently depending on the size of the operand. > I am not sure CC_MODE can solve the problem but I am not > entirely experienced with using different CC_MODEs, the first > thing that comes to mind is, how do you set the size of a > CCmode? Unfortunately undocumented, but UTSL, for example gcc/config/mips/mips-modes.def. If any register can be set to a "CC-value" then you don't need to set any specific set of registers aside. brgds, H-P
expmed.c cost calculation limited to word size
While working on having the divisions by constants optimized by my GCC targeting, I realized that whatever *muldi3_highpart my backend provides, it would never be used because of the bounds checks that expmed.c does on the cost arrays. For example: choose_multiplier (abs_d, size, size - 1, &mlr, &post_shift, &lgup); ml = (unsigned HOST_WIDE_INT) INTVAL (mlr); if (ml < (unsigned HOST_WIDE_INT) 1 << (size - 1)) { rtx t1, t2, t3; =>if (post_shift >= BITS_PER_WORD =>|| size - 1 >= BITS_PER_WORD) goto fail1; extra_cost = (shift_cost[speed][compute_mode][post_shift] + shift_cost[speed][compute_mode][size - 1] + add_cost[speed][compute_mode]); According to the commit log where these checks where added, they only serve as to not overflow the cost arrays bellow. Even though a backend is fully capable of DImode shifts and multiplies, they won't be considered because of this check. The cost arrays are filled up to MAX_BITS_PER_WORD, thus as a temporary workaround I have defined MAX_BITS_PER_WORD to 64, and I have softened the checks to fail only above MAX_BITS_PER_WORD. This allows my 32bits backend to specify that it wants these optimizations to take place for 64bits arithmetic. What do people think about this approach? does it make sense? Many thanks, Fred
GIt Issue
Hello everyone, I am trying to clone a git repository and I am getting the following error. Can someone please tell me what this error could be and how I could fix this? It worked for me a while back but not now. I tried the following command: git clone http://gcc.gnu.org/git/gcc.git gcc.git This is the output I got: Initialized empty Git repository in /export/users/gcc.git/.git/ error: The requested URL returned error: 503 (curl_result = 22, http_code = 503, sha1 = 948da40c5008d4d7c2f8020f4e0ebddfc144350d) error: Unable to find 52979cfc138789a17ec692b59dd239566b055e94 under http://gcc.gnu.org/git/gcc.git Cannot obtain needed tree 52979cfc138789a17ec692b59dd239566b055e94 while processing commit caacf4a76941db5eb611e9350da17f05447316a6. error: Fetch failed. Thanks, Balaji V. Iyer.
Re: GIt Issue
On 2013.03.26 at 18:28 +, Iyer, Balaji V wrote: > Hello everyone, > I am trying to clone a git repository and I am getting the following > error. Can someone please tell me what this error could be and how I could > fix this? It worked for me a while back but not now. > > I tried the following command: > git clone http://gcc.gnu.org/git/gcc.git gcc.git > > > This is the output I got: > > Initialized empty Git repository in /export/users/gcc.git/.git/ > error: The requested URL returned error: 503 (curl_result = 22, http_code = > 503, sha1 = 948da40c5008d4d7c2f8020f4e0ebddfc144350d) > error: Unable to find 52979cfc138789a17ec692b59dd239566b055e94 under > http://gcc.gnu.org/git/gcc.git > Cannot obtain needed tree 52979cfc138789a17ec692b59dd239566b055e94 > while processing commit caacf4a76941db5eb611e9350da17f05447316a6. > error: Fetch failed. Please try the git protocol instead of http: git clone git://gcc.gnu.org/git/gcc.git gcc.git -- Markus
Re: Debugging C++ Function Calls
Richard> Did you consider using clang? Richard> We may look at it after re-examining g++. I think there are some reasons to prefer gcc. Tom
Re: Debugging C++ Function Calls
On Tue, Mar 26, 2013 at 3:02 PM, Tom Tromey wrote: > Richard> Did you consider using clang? > Richard> > > We may look at it after re-examining g++. > I think there are some reasons to prefer gcc. Yes, obviously :-) -- Gaby
RE: GIt Issue
> -Original Message- > From: Markus Trippelsdorf [mailto:mar...@trippelsdorf.de] > Sent: Tuesday, March 26, 2013 3:16 PM > To: Iyer, Balaji V > Cc: 'gcc@gcc.gnu.org'; Jason Merrill (ja...@redhat.com) > Subject: Re: GIt Issue > > On 2013.03.26 at 18:28 +, Iyer, Balaji V wrote: > > Hello everyone, > > I am trying to clone a git repository and I am getting the following > > error. > Can someone please tell me what this error could be and how I could fix this? > It > worked for me a while back but not now. > > > > I tried the following command: > > git clone http://gcc.gnu.org/git/gcc.git gcc.git > > > > > > This is the output I got: > > > > Initialized empty Git repository in /export/users/gcc.git/.git/ > > error: The requested URL returned error: 503 (curl_result = 22, > > http_code = 503, sha1 = 948da40c5008d4d7c2f8020f4e0ebddfc144350d) > > error: Unable to find 52979cfc138789a17ec692b59dd239566b055e94 under > > http://gcc.gnu.org/git/gcc.git Cannot obtain needed tree > > 52979cfc138789a17ec692b59dd239566b055e94 > > while processing commit caacf4a76941db5eb611e9350da17f05447316a6. > > error: Fetch failed. > > Please try the git protocol instead of http: > git clone git://gcc.gnu.org/git/gcc.git gcc.git Thanks for your help Markus. Unfortunately, http is the only option for me. Thanks, Balaji V. Iyer. > > -- > Markus
Re: Debugging C++ Function Calls
On 3/25/13, Tom Tromey wrote: > I think the intro text of this message provides the best summary > of the approach: > > http://sourceware.org/ml/gdb-patches/2010-07/msg00284.html Are the symbol searches specific to the scope context, or does it search all globally defined symbols? If you recreate the name lookup as the compiler did, I think the approach will be workable. Otherwise, there is a potential for doing overload resoulution and getting a different result. I don't think the template names directly make much difference here. There is a weakness in the patch, int that the following is legal. template T func(T arg) { return arg + 0; } template<> int func(int arg) { return arg + 1; } int func(int arg) { return arg + 2; } int main() { return func(0); } The language prefers to call a non-template function over a template instance with the same paramter types. So, in your new search, you could have two functions with the same name and same parameter types. You will need to keep a bit on the template-derived version so that you can break the tie. -- Lawrence Crowl
Re: little endian code on sparc v8
> The big headache that I am facing these days is that > "-mcpu=v8"(gcc-sparc-v8 does not support little-endian. After web > searching, it seems that gcc-sparclet supported V8 engine but it seems that > it's now deleted(?). I'm not sure that SPARClet was V8. In any case, the SPARC V8 architecture is big-endian only according to the manual. You need SPARC V9 to have support for little-endian. > Is there a way to do generate little-endian code for -mcpu=v8? > There is the gcc option "-mlittle-endian" to support little-endian > in gcc-sparc, but the manual also says "-mlittle-endian" is not supported > in sparc. No, there is no way. > Is there someone who know the patches or way to do little-endian > for sparc core? Not that I know of. > As I am working on gcc these days, it would be appreciated if someone > tells me how I can join gcc developers group for sparc v8's custom core. There is no formal group, just individuals on the GCC lists. -- Eric Botcazou
Re: GIt Issue
"Iyer, Balaji V" writes: >> > I tried the following command: >> > git clone http://gcc.gnu.org/git/gcc.git gcc.git >> >> Please try the git protocol instead of http: >> git clone git://gcc.gnu.org/git/gcc.git gcc.git > > Thanks for your help Markus. Unfortunately, http is the only option for me. Does gcc.gnu.org not support the modern git "smart http" transport? "git:" and "http:" should be near identical these days (except that http will get through many more firewalls and proxies). http://git-scm.com/2010/03/04/smart-http.html -miles -- 永日の 澄んだ紺から 永遠へ
Problem in understanding points-to analysis
Hello everyone, I am trying to understand the points-to analysis ("pta") ipa pass, but I am not able to match the information generated by the pass and that in structure "SSA_NAME_PTR_INFO". For the code segment, -- int var1, var2, var3, var4, *ptr1, *ptr2, **ptr3; if (var1==10) { ptr1 = &var1; ptr2 = &var2; } else { ptr1 = &var3; ptr2 = &var4; } if (var2==3) { ptr3 = &ptr1; } else { ptr3 = &ptr2; } printf("\n %d %d \n",*ptr1, **ptr3); -- The points-to information in dump_file of "pta" pass: ptr1.2_6 = { var1 var3 } ptr1 = { var1 var3 } same as ptr1.2_6 But accessing the structure "SSA_NAME_PTR_INFO" (using API dump_points_to_info_for(..) ) in a pass AFTER "pta", shows ptr1.2_6, points-to vars: { var1 var3 } ptr1, points-to anything Why here 'ptr1' is not pointing to '{ var1 var3 }' as found by "pta"? Can someone please help me understand this behaviour? -- Thanks, Nikhil Patil.