RE: selective linking of floating point support for *printf / *scanf
> From: Joern Rennecke [mailto:joern.renne...@embecosm.com] > Sent: Tuesday, August 26, 2014 6:44 PM > > Due to the library order defined in the specs, the float-enbled printf > definition will > be picked up from libprintf_flt.a . It seems to me that it relies heavily on how symbol resolution works. If I understand correctly all undefined symbols (for instance __int_printf) in object files are processed first. Symbols definition are search in the order in which libraries are given in the command line (this part seems pretty reliable since it's at least documented in ld's manual). When doing so, if a symbol definition reference an undefined symbol (like __int_printf referencing printf), it is left aside until all undefined symbol from object files have been processed. At some point printf from object file will be processed and will pull the printf with float support since it's the first one encountered. Then the undefined reference discovered when pulling symbols from library will be processed and since printf with float was already pulled in that's the one being used. Is this behavior the same for all linker? It sounds like a reasonable algorithm but I don't know well the variety of linkers out there. > That testcase is not valid. You'd to use one of the v*printf functions. > Solving the general problem for these is not computable; for specific cases, > it > would be possible, but at the cost of varying degrees of complexity. > So I let this for manual selection: it's not handled with the > calls.stdio_altname > hook, and you have to use a special link line to use the integer-only > implementations. > Well, if desired, a spec change could give an option to do that. Right, my bad, no problem indeed. What "general problem" are you referring too that is not solved with this patch? > > That can be implemented with suitable *newlib*.[ch] files that are > selected in config.gcc, > akin to newlib-stdint.h and glibc-c.c . Absolutely, that was the approach I followed in my own patch. > > Well, all the *printf functions are variadic, and as stated above, > your example is invalid. > The wildcard are va_list taking functions. You first have to decide > what you want to > happen with these by default, and what kind of non-default behaviour > you'd like to be > able to select, and how; than we can talk about if this neeeds any > extra infrastructure > to implement. Yes my apologize, it was a mistake from me. I'll now do a more thorough testing and report back to you how it works for us. Best regards, Thomas
Turning a single warning into an error in dejagnu test
I'm writing a dejagnu test and encounter this warning at one place: warning: passing argument 1 of '...' makes integer from pointer without a cast [enabled by default] Now, I have a "{ dg-error ... }" comment in that line. The line is generated from a script among hundreds of others that are all expected to produce errors, not warnings. It would be very inconvenient (= lots of work) to change the script to make an exception just for that single line (because there's no easy way to identify lines that produce the warning instead of an error). So the question is: Is it possible to turn only this one warning into an error inside a dejagnu test? As I understand it, there are no -W... switches for "enabled by default" options, and I cannot use -Werror because that would break other tests in the file. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: selective linking of floating point support for *printf / *scanf
On 27 August 2014 08:02, Thomas Preud'homme wrote: >> From: Joern Rennecke [mailto:joern.renne...@embecosm.com] >> Sent: Tuesday, August 26, 2014 6:44 PM >> >> Due to the library order defined in the specs, the float-enbled printf >> definition will >> be picked up from libprintf_flt.a . > > It seems to me that it relies heavily on how symbol resolution works. I don't see how it can be any other way. We want to be able to compile translation units individually, and then let the linker sort out if we need the floating point enabled implementation(s), and skip the integer-only ones if so. > If > I understand correctly all undefined symbols (for instance __int_printf) > in object files are processed first. Symbols definition are search in the > order in which libraries are given in the command line (this part seems > pretty reliable since it's at least documented in ld's manual). When > doing so, if a symbol definition reference an undefined symbol (like > __int_printf referencing printf), it is left aside until all undefined > symbol from object files have been processed. At some point printf > from object file will be processed and will pull the printf with float > support since it's the first one encountered. Then the undefined > reference discovered when pulling symbols from library will be > processed and since printf with float was already pulled in that's > the one being used. > > Is this behavior the same for all linker? It sounds like a reasonable > algorithm but I don't know well the variety of linkers out there. Well, the part of processing libraries in order is pretty much universal, although there are options to change that behaviour. I'd say you really have to know what you are doing when using these options. Now, to make the __int_printf function entry line up with the printf implementation, I'm relying on GNU AS (gas) linker scripts. That part is unfortunately not so portable, so this trick has to be restricted to targets/configurations that use gas, or another linker (if any) that allows to alphasort the relevant sections. >> That testcase is not valid. You'd to use one of the v*printf functions. >> Solving the general problem for these is not computable; for specific cases, >> it >> would be possible, but at the cost of varying degrees of complexity. >> So I let this for manual selection: it's not handled with the >> calls.stdio_altname >> hook, and you have to use a special link line to use the integer-only >> implementations. >> Well, if desired, a spec change could give an option to do that. > > Right, my bad, no problem indeed. What "general problem" are you > referring too that is not solved with this patch? The general problem also includes trying to decide definitely if we need a/any floating point enabled implementation(s) in cases with calls of va_list taking functions, (which. while not always, but usually also take the format as a variable), and have no non-va_list calls to decide the matter in favour of needing floating point. The question if any floating-point indicating actual format string and/or(*) va_list arguments reach the v*printf / v*scanf calls is non-trivial and respects functions (in the computability theory sense), hence, this is not computable according to Rice's theorem. (*) Any way you language lawyer it, you can only chip away at the set of programs you can compute the answer for, but can never do it for the whole set.
Re: Turning a single warning into an error in dejagnu test
On Wed, Aug 27, 2014 at 10:59:40AM +0100, Dominik Vogt wrote: > I'm writing a dejagnu test and encounter this warning at one place: > > warning: passing argument 1 of '...' makes integer from pointer > without a cast [enabled by default] > > Now, I have a "{ dg-error ... }" comment in that line. The line > is generated from a script among hundreds of others that are all > expected to produce errors, not warnings. It would be very > inconvenient (= lots of work) to change the script to make an > exception just for that single line (because there's no easy way > to identify lines that produce the warning instead of an error). > > So the question is: Is it possible to turn only this one warning > into an error inside a dejagnu test? As I understand it, there > are no -W... switches for "enabled by default" options, and I > cannot use -Werror because that would break other tests in the > file. For C, I recently added the -Wint-conversion option, so with recent enough GCC you should be able to use -Werror=int-conversion. Marek
RE: selective linking of floating point support for *printf / *scanf
> From: Joern Rennecke [mailto:joern.renne...@embecosm.com] > Sent: Wednesday, August 27, 2014 6:13 PM > > I don't see how it can be any other way. We want to be able to compile > translation units individually, and then let the linker sort out if we need > the > floating point enabled implementation(s), and skip the integer-only ones if > so. Consider the new scheme in newlib when printf calls another function for handling floating point formats. This other function is weakly defined so that it's not pulled by default and printf is effectively integer only. You just need to link with an extra -u option to pull in the float support. > > Well, the part of processing libraries in order is pretty much > universal, although > there are options to change that behaviour. I'd say you really have > to know what > you are doing when using these options. > Now, to make the __int_printf function entry line up with the printf > implementation, > I'm relying on GNU AS (gas) linker scripts. That part is > unfortunately not so portable, > so this trick has to be restricted to targets/configurations that use gas, > or another linker (if any) that allows to alphasort the relevant sections. Yes, I don't see the order of libraries as a problem for portability. I was concerned of the following possible algorithm: __int_printf is processed first and is found in libc. The linker sees that __int_printf needs printf and search for printf according to libraries order and so will find it in the next section. This printf doesn't provide float support. Then the linker proceeds to process the next undefined symbol in the object file that is printf and use the one already found. I concede that such an algorithm looks more convoluted as it implies some form of recursion instead of just having a queue where you put the undefined symbol. Indeed I missed the linker script which is the most obvious problem. > > The general problem also includes trying to decide definitely if we > need a/any floating > point enabled implementation(s) in cases with calls of va_list taking > functions, > (which. while not always, but usually also take the format as a > variable), and have > no non-va_list calls to decide the matter in favour of needing floating point. > The question if any floating-point indicating actual format string > and/or(*) va_list > arguments reach the v*printf / v*scanf calls is non-trivial and > respects functions > (in the computability theory sense), hence, this is not computable > according to Rice's theorem. > > (*) Any way you language lawyer it, you can only chip away at the set > of programs > you can compute the answer for, but can never do it for the whole set. Ok. Of course detecting more cases where an integer version of IO functions would be enough would be nice but I'm already satisfied with the current scheme. I'm wondering what's happening for v*printf: are they only defined in the libc_float? Would you accept a patch that would turn this solution into something also suitable for newlib? For instance we would need to also include v*printf and v*scanf functions into builtin as well. A new switch would also be needed so that compiling newlib doesn't define the _printf_float and _scanf_float symbols because of calls to v*printf and v*scanf functions. I need to check if these calls are made in the same file in which case I could maybe just guard the function call rewriting by a test checking if the caller is itself a builtin. Best regards, Thomas
Re: selective linking of floating point support for *printf / *scanf
On 27 August 2014 11:41, Thomas Preud'homme wrote: >> From: Joern Rennecke [mailto:joern.renne...@embecosm.com] >> Sent: Wednesday, August 27, 2014 6:13 PM >> >> I don't see how it can be any other way. We want to be able to compile >> translation units individually, and then let the linker sort out if we need >> the >> floating point enabled implementation(s), and skip the integer-only ones if >> so. > > Consider the new scheme in newlib when printf calls another function for > handling floating point formats. This other function is weakly defined so > that it's not pulled by default and printf is effectively integer only. You > just > need to link with an extra -u option to pull in the float support. Well, my goal was to have the selection be automatic for most use cases. That you can do a manual selection by providing -u / -l arguments to the linker is pretty much a given. Hmm, instead of needing -u you could make gcc spit out definitions of a dummy local symbol to the trigger symbol in question (forcing a non-weak reference), using SET_ASM_OP (assuming it's defined). But you'd still be left with the extra call overhead, increasing code size no matter if float is needed or not. >> I'm relying on GNU AS (gas) linker scripts. That part is >> unfortunately not so portable, Oops, of course that should read GNU LD. > Ok. Of course detecting more cases where an integer version of IO functions > would be enough would be nice but I'm already satisfied with the current > scheme. I'm wondering what's happening for v*printf: are they only defined > in the libc_float? It's defined in both. The way i wrote the avr gcc specs / avr-libc makefile rules, this will result in the floating point enabled implementation to be used by default. Which makes the gcc test results so much nicer... > Would you accept a patch that would turn this solution into something also > suitable for newlib? For instance we would need to also include v*printf > and v*scanf functions into builtin as well. Yes. I'll have to adjust the avr hook that it'll leave the v*printf / v*scanf functions alone - at least by default / for ISO C behaviour - but it'll give me an easy way to add a switch to tweak the behaviour. Or maybe we can use a -f option to select the v*printf / v*scanf default and put the a stdio_altname__int_ target hook in targhooks.c, to be shared by all configs that want an __int_ prefix. > A new switch would also be > needed so that compiling newlib doesn't define the _printf_float and > _scanf_float symbols because of calls to v*printf and v*scanf functions. > I need to check if these calls are made in the same file in which case > I could maybe just guard the function call rewriting by a test checking if the > caller is itself a builtin. FWIW, to safely shift the symbol into the implementation namespace you need a prefix that starts with two underbars or one underbar and a capital letter. Or use some funny non-standard character in the symbol - but that's asking for more portability issues. For references made automatically by gcc, it's a good idea not to impinge on the application namespace. An application might use printf from , but define its own functions iprintf, printf_float and _printf_float. Therefore, it's a good idea to put the definition of newlib's iprintf in a separate file from __int_printf. Having essentialy the same contents, but defining a different symbol, and let the linker match them up to the definition.
Possible LRA issue?
Hi, I have a large codebase where at some point, there's a structure that takes an unsigned integer template argument, and uses as the size of an array, something like template struct Struct { typedef std::array Chunk; typedef std::list Content; Content c; }; Changing the values of S alters significantly the compile time and memory that the compiler takes. We use some large numbers there. At some point, the compiler runs out of memory (xmalloc fails). I wondered why, and did some analysis by debugging the 4.8.2 (same with 4.8.3), and did the following experiment turning off all the optimizations (-fno-* and -O0): I generated a report of xmalloc usage of two programs: one having S=10u, and another with S=11u, just to see the difference of 1. The report was generated as follows: I set a breakpoint at xmalloc, appending a bt to a file. Then I found common stack traces and counted how many xmallocs were called in one and another versions of the program (S=10u and S=11u as mentioned above). The difference were: a) Stack trace: xmalloc | pool_alloc | create_live_range | mark_pseudo_live | mark_regno_live | process_bb_lives | lra_create_live_ranges | lra | do_reload | rest_of_handle_reload | execute_one_pass | execute_pass_list | execute_pass_list | expand_function | output_in_order | compile | finalize_compilation_unit | cp_write_global_declarations | compile_file | do_compile | toplev_main | __libc_start_main | _start | S=10u: 15 times S=11u: 16 times b) Stack trace: xmalloc | lra_set_insn_recog_data | lra_get_insn_recog_data | lra_update_insn_regno_info | lra_update_insn_regno_info | lra_push_insn_1 | lra_push_insn | push_insns | lra_process_new_insns | curr_insn_transform | lra_constraints | lra | do_reload | rest_of_handle_reload | execute_one_pass | execute_pass_list | execute_pass_list | expand_function | output_in_order | compile | finalize_compilation_unit | cp_write_global_declarations | compile_file | do_compile | toplev_main | __libc_start_main | _start | S=10u: 186 times S=11u: 192 times c) Stack trace: xmalloc | df_install_refs | df_refs_add_to_chains | df_insn_rescan | emit_insn_after_1 | emit_pattern_after_noloc | emit_pattern_after_setloc | emit_insn_after_setloc | try_split | split_insn | split_all_insns | rest_of_handle_split_after_reload | execute_one_pass | execute_pass_list | execute_pass_list | execute_pass_list | expand_function | output_in_order | compile | finalize_compilation_unit | cp_write_global_declarations | compile_file | do_compile | toplev_main | __libc_start_main | _start | S=10u: 617 times S=11u: 619 times d) Stack trace: xmalloc | df_install_refs | df_refs_add_to_chains | df_bb_refs_record | df_scan_blocks | rest_of_handle_df_initialize | execute_one_pass | execute_pass_list | execute_pass_list | expand_function | output_in_order | compile | finalize_compilation_unit | cp_write_global_declarations | compile_file | do_compile | toplev_main | __libc_start_main | _start | S=10u: 13223 times S=11u: 13227 times e) Stack trace: xmalloc | __GI__obstack_newchunk | bitmap_element_allocate | bitmap_set_bit | update_lives | assign_hard_regno | assign_by_spills | lra_assign | lra | do_reload | rest_of_handle_reload | execute_one_pass | execute_pass_list | execute_pass_list | expand_function | output_in_order | compile | finalize_compilation_unit | cp_write_global_declarations | compile_file | do_compile | toplev_main | __libc_start_main | _start | S=10u: 0 times (never!) S=11u: 1 Unfortunately I can't disclose the source code nor have the time to isolate a piece of code reproducing the issue. Some comments about the code: I don't do template metaprogramming depending on S, but I do some for-range on the Content. I can extend the analysis to S=12 and compare with the previous values. I thought to fix this myself but lack the time and background on theses optimizations. Any hint? I'm open to do more experiments if anybody asks me, or post -fdumps. I suspect that playing with gcc-min-heapsize and similar values this issue could be worked around, but I'd like to know why just changing the size of an array has such a consequence. Thanks! Daniel. -- Daniel F. Gutson Chief Engineering Officer, SPD San Lorenzo 47, 3rd Floor, Office 5 Córdoba, Argentina Phone: +54 351 4217888 / +54 351 4218211 Skype: dgutson
RE: Possible LRA issue?
The cause of xmalloc occurring at times given below in Register Allocator will not be caused only by the structure and changing the passed S as template argument. It depends on how the below structures is referenced or used. From the stack trace I can see the live ranges creation is based on how the below structure is referenced and Used. Thanks & Regards Ajit -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Daniel Gutson Sent: Wednesday, August 27, 2014 7:58 PM To: gcc Mailing List Subject: Possible LRA issue? Hi, I have a large codebase where at some point, there's a structure that takes an unsigned integer template argument, and uses as the size of an array, something like template struct Struct { typedef std::array Chunk; typedef std::list Content; Content c; }; Changing the values of S alters significantly the compile time and memory that the compiler takes. We use some large numbers there. At some point, the compiler runs out of memory (xmalloc fails). I wondered why, and did some analysis by debugging the 4.8.2 (same with 4.8.3), and did the following experiment turning off all the optimizations (-fno-* and -O0): I generated a report of xmalloc usage of two programs: one having S=10u, and another with S=11u, just to see the difference of 1. The report was generated as follows: I set a breakpoint at xmalloc, appending a bt to a file. Then I found common stack traces and counted how many xmallocs were called in one and another versions of the program (S=10u and S=11u as mentioned above). The difference were: a) Stack trace: xmalloc | pool_alloc | create_live_range | mark_pseudo_live | mark_regno_live | process_bb_lives | lra_create_live_ranges | lra | do_reload | rest_of_handle_reload | execute_one_pass | execute_pass_list | execute_pass_list | expand_function | output_in_order | compile | finalize_compilation_unit | cp_write_global_declarations | compile_file | do_compile | toplev_main | __libc_start_main | _start | S=10u: 15 times S=11u: 16 times b) Stack trace: xmalloc | lra_set_insn_recog_data | lra_get_insn_recog_data | lra_update_insn_regno_info | lra_update_insn_regno_info | lra_push_insn_1 | lra_push_insn | push_insns | lra_process_new_insns | curr_insn_transform | lra_constraints | lra | do_reload | rest_of_handle_reload | execute_one_pass | execute_pass_list | execute_pass_list | expand_function | output_in_order | compile | finalize_compilation_unit | cp_write_global_declarations | compile_file | do_compile | toplev_main | __libc_start_main | _start | S=10u: 186 times S=11u: 192 times c) Stack trace: xmalloc | df_install_refs | df_refs_add_to_chains | df_insn_rescan | emit_insn_after_1 | emit_pattern_after_noloc | emit_pattern_after_setloc | emit_insn_after_setloc | try_split | split_insn | split_all_insns | rest_of_handle_split_after_reload | execute_one_pass | execute_pass_list | execute_pass_list | execute_pass_list | expand_function | output_in_order | compile | finalize_compilation_unit | cp_write_global_declarations | compile_file | do_compile | toplev_main | __libc_start_main | _start | S=10u: 617 times S=11u: 619 times d) Stack trace: xmalloc | df_install_refs | df_refs_add_to_chains | df_bb_refs_record | df_scan_blocks | rest_of_handle_df_initialize | execute_one_pass | execute_pass_list | execute_pass_list | expand_function | output_in_order | compile | finalize_compilation_unit | cp_write_global_declarations | compile_file | do_compile | toplev_main | __libc_start_main | _start | S=10u: 13223 times S=11u: 13227 times e) Stack trace: xmalloc | __GI__obstack_newchunk | bitmap_element_allocate | bitmap_set_bit | update_lives | assign_hard_regno | assign_by_spills | lra_assign | lra | do_reload | rest_of_handle_reload | execute_one_pass | execute_pass_list | execute_pass_list | expand_function | output_in_order | compile | finalize_compilation_unit | cp_write_global_declarations | compile_file | do_compile | toplev_main | __libc_start_main | _start | S=10u: 0 times (never!) S=11u: 1 Unfortunately I can't disclose the source code nor have the time to isolate a piece of code reproducing the issue. Some comments about the code: I don't do template metaprogramming depending on S, but I do some for-range on the Content. I can extend the analysis to S=12 and compare with the previous values. I thought to fix this myself but lack the time and background on theses optimizations. Any hint? I'm open to do more experiments if anybody asks me, or post -fdumps. I suspect that playing with gcc-min-heapsize and similar values this issue could be worked around, but I'd like to know why just changing the size of an array has such a consequence. Thanks! Daniel. -- Daniel F. Gutson Chief Engineering Officer, SPD San Lorenzo 47, 3rd Floor, Office 5 Córdoba, Argentina Phone: +54 351 42178
Re: Possible LRA issue?
On Wed, Aug 27, 2014 at 12:16 PM, Ajit Kumar Agarwal wrote: > The cause of xmalloc occurring at times given below in Register Allocator > will not be caused only by the structure and changing the passed S as > template argument. > It depends on how the below structures is referenced or used. From the stack > trace I can see the live ranges creation is based on how the below structure > is referenced and Used. Could you please show me an example of such different usages and references? > > Thanks & Regards > Ajit > > -Original Message- > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of > Daniel Gutson > Sent: Wednesday, August 27, 2014 7:58 PM > To: gcc Mailing List > Subject: Possible LRA issue? > > Hi, > >I have a large codebase where at some point, there's a structure that > takes an unsigned integer template argument, and uses as the size of an > array, something like > > template > struct Struct > { > typedef std::array Chunk; > typedef std::list Content; > >Content c; > }; > > Changing the values of S alters significantly the compile time and memory > that the compiler takes. We use some large numbers there. > At some point, the compiler runs out of memory (xmalloc fails). I wondered > why, and did some analysis by debugging the 4.8.2 (same with 4.8.3), and did > the following experiment turning off all the optimizations (-fno-* and -O0): > I generated a report of xmalloc usage of two programs: one having S=10u, > and another with S=11u, just to see the difference of 1. > The report was generated as follows: I set a breakpoint at xmalloc, appending > a bt to a file. Then I found common stack traces and counted how many > xmallocs were called in one and another versions of the program (S=10u and > S=11u as mentioned above). > The difference were: > > a) Stack trace: > xmalloc | pool_alloc | create_live_range | mark_pseudo_live | > mark_regno_live | process_bb_lives | lra_create_live_ranges | lra | do_reload > | rest_of_handle_reload | execute_one_pass | execute_pass_list | > execute_pass_list | expand_function | output_in_order | compile | > finalize_compilation_unit | cp_write_global_declarations | compile_file | > do_compile | toplev_main > | __libc_start_main | _start | > > S=10u: 15 times > S=11u: 16 times > > > b) Stack trace: > xmalloc | lra_set_insn_recog_data | lra_get_insn_recog_data | > lra_update_insn_regno_info | lra_update_insn_regno_info | > lra_push_insn_1 | lra_push_insn | push_insns | lra_process_new_insns | > curr_insn_transform | lra_constraints | lra | do_reload | > rest_of_handle_reload | execute_one_pass | execute_pass_list | > execute_pass_list | expand_function | output_in_order | compile | > finalize_compilation_unit | cp_write_global_declarations | compile_file | > do_compile | toplev_main | __libc_start_main | _start | > > S=10u: 186 times > S=11u: 192 times > > c) Stack trace: > xmalloc | df_install_refs | df_refs_add_to_chains | df_insn_rescan | > emit_insn_after_1 | emit_pattern_after_noloc | emit_pattern_after_setloc | > emit_insn_after_setloc | try_split | split_insn | split_all_insns | > rest_of_handle_split_after_reload | execute_one_pass | execute_pass_list | > execute_pass_list | execute_pass_list | expand_function | output_in_order | > compile | finalize_compilation_unit | cp_write_global_declarations | > compile_file | do_compile | toplev_main | __libc_start_main | _start | > > S=10u: 617 times > S=11u: 619 times > > d) Stack trace: > xmalloc | df_install_refs | df_refs_add_to_chains | df_bb_refs_record | > df_scan_blocks | rest_of_handle_df_initialize | execute_one_pass | > execute_pass_list | execute_pass_list | expand_function | output_in_order | > compile | finalize_compilation_unit | cp_write_global_declarations | > compile_file | do_compile | toplev_main | __libc_start_main | _start | > > S=10u: 13223 times > S=11u: 13227 times > > e) Stack trace: > xmalloc | __GI__obstack_newchunk | bitmap_element_allocate | > bitmap_set_bit | update_lives | assign_hard_regno | assign_by_spills | > lra_assign | lra | do_reload | rest_of_handle_reload | execute_one_pass | > execute_pass_list | execute_pass_list | expand_function | output_in_order | > compile | finalize_compilation_unit | cp_write_global_declarations | > compile_file | do_compile | toplev_main | __libc_start_main | _start | > > S=10u: 0 times (never!) > S=11u: 1 > > Unfortunately I can't disclose the source code nor have the time to isolate a > piece of code reproducing the issue. > Some comments about the code: I don't do template metaprogramming depending > on S, but I do some for-range on the Content. > > I can extend the analysis to S=12 and compare with the previous values. > I thought to fix this myself but lack the time and background on theses > optimizations. Any hint? > I'm open to do more experiments if anybody asks me, or post -fdum
Register allocation: caller-save vs spilling
Hi, I'm investigating various register allocation inefficiencies. The first thing that stands out is that GCC both supports caller-saves as well as spilling. Spilling seems to spill all definitions and all uses of a liverange. This means you often end up with multiple reloads close together, while it would be more efficient to do a single load and then reuse the loaded value several times. Caller-save does better in that case, but it is inefficient in that it repeatedly stores registers across every call even if unchanged. If both were fixed to minimise the number of loads/stores I can't see how one could beat the other, so you'd no longer need both. Anyway due to the current implementation there are clearly cases where caller-save is best and cases where spilling is best. However I do not see it making the correct decision despite trying to account for the costs - some code is significantly faster with -fno-caller-saves, other code wins with -fcaller-saves. As an example, I see code like this on AArch64: ldr s4, .LC20 fmuls0, s0, s4 str s4, [x29, 104] bl f ldr s4, [x29, 104] fmuls0, s0, s4 With -fno-caller-saves it spills and rematerializes the constant as you'd expect: ldr s1, .LC20 fmuls0, s0, s1 bl f ldr s5, .LC20 fmuls0, s0, s5 So given this, is the cost calculation correct and does it include rematerialization? The spill code understands how to rematerialize so it should take this into account in the costs. I did find some code in ira-costs.c in scan_one_insn() that attempts something that looks like an adjustment for rematerialization but it doesn't appear to handle all cases (simple immediates, 2-instruction immediates, address-constants, non-aliased loads such as literal pool and const data loads). Also the hook CALLER_SAVE_PROFITABLE appears to have disappeared - overall performance improves significantly if I add this (basically the default heuristic used on instruction frequencies): --- a/gcc/ira-costs.c +++ b/gcc/ira-costs.c @@ -2230,6 +2230,8 @@ ira_tune_allocno_costs (void) * ALLOCNO_FREQ (a) * IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2); #endif + if (ALLOCNO_FREQ (a) < 4 * ALLOCNO_CALL_FREQ (a)) +cost = INT_MAX; } if (INT_MAX - cost < reg_costs[j]) reg_costs[j] = INT_MAX; If such a simple heuristic can beat the costs, they can't be quite right. Is there anyone who understands the cost calculations? Wilco
consistent naming of passes....
Hello all, When I compile some file (precisely, the gcc/melt-runtime.cc from the latest melt branch) with -O1 -fdump-passes (using GCC 4.9) I'm getting notably ipa-cp : OFF ipa-cdtor : OFF ipa-inline : ON ipa-pure-const : ON ipa-static-var : ON ipa-pta : OFF ipa-simdclone : OFF *free_cfg_annotations : ON However, in file gcc/ipa-inline.c there is const pass_data pass_data_ipa_inline = { IPA_PASS, /* type */ "inline", /* name */ OPTGROUP_INLINE, /* optinfo_flags */ false, /* has_gate */ true, /* has_execute */ TV_IPA_INLINING, /* tv_id */ I find strange that the two names (the one given by -fdump-passes and the one in the pass_data_ipa_inline object) are different. When I try to insert a plugin pass (actually in MELT, file gcc/melt/xtramelt-ana-simple.melt) named "inline" it gives: cc1plus: fatal error: pass 'inline' not found but is referenced by new pass 'melt_justcountipa' If I use "ipa-inline" I'm getting cc1plus: fatal error: pass 'ipa-inline' not found but is referenced by new pass 'melt_justcountipa' How should a plugin writer find the name of the reference pass to insert his own new pass? At the very least it should be documented, and preferably it should be identical to output of -fdump-passes Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***
Re: Enable EBX for x86 in 32bits PIC code
On 2014-08-26 5:42 PM, Ilya Enkovich wrote: Hi, Here is a patch I tried. I apply it over revision 214215. Unfortunately I do not have a small reproducer but the problem can be easily reproduced on SPEC2000 benchmark 175.vpr. The problem is in read_arch.c:701 where float value is compared with float constant 1.0. It is inlined into read_arch function and can be easily found in RTL dump of function read_arch as a float comparison with 1.0 after the first call to strtod function. Here is a compilation string I use: gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math -mfpmath=sse -m32 -march=slm -fPIE -pie -c -o read_arch.o -DSPEC_CPU2000 read_arch.c In my final assembler comparison with 1.0 looks like: comiss .LC11@GOTOFF(%ebp), %xmm0 # 1101 *cmpisf_sse [length = 7] and %ebp here doesn't have a proper value. I'll try to make a smaller reproducer if these instructions don't help. I've managed to reproduce it. Although it would be better to send the patch as an attachment. The problem is actually in IRA not LRA. IRA splits pseudo used for PIC. Then in a region when a *new* pseudo used as PIC we rematerialize a constant which transformed in memory addressed through *original* PIC pseudo. To solve the problem we should prevent such splitting and guarantee that PIC pseudo allocnos in different region gets the same hard reg. The following patch should solve the problem. Index: ira-color.c === --- ira-color.c (revision 214576) +++ ira-color.c (working copy) @@ -3239,9 +3239,10 @@ ira_assert (ALLOCNO_CLASS (subloop_allocno) == rclass); ira_assert (bitmap_bit_p (subloop_node->all_allocnos, ALLOCNO_NUM (subloop_allocno))); - if ((flag_ira_region == IRA_REGION_MIXED) - && (loop_tree_node->reg_pressure[pclass] - <= ira_class_hard_regs_num[pclass])) + if ((flag_ira_region == IRA_REGION_MIXED + && (loop_tree_node->reg_pressure[pclass] + <= ira_class_hard_regs_num[pclass])) + || regno == (int) REGNO (pic_offset_table_rtx)) { if (! ALLOCNO_ASSIGNED_P (subloop_allocno)) { Index: ira-emit.c === --- ira-emit.c (revision 214576) +++ ira-emit.c (working copy) @@ -620,7 +620,8 @@ /* don't create copies because reload can spill an allocno set by copy although the allocno will not get memory slot. */ - || ira_equiv_no_lvalue_p (regno))) + || ira_equiv_no_lvalue_p (regno) + || ALLOCNO_REGNO (allocno) == REGNO (pic_offset_table_rtx))) continue; original_reg = allocno_emit_reg (allocno); if (parent_allocno == NULL
Re: Enable EBX for x86 in 32bits PIC code
On 08/26/14 15:42, Ilya Enkovich wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. Jeff
Re: Conditional negation elimination in tree-ssa-phiopt.c
On 08/18/14 04:33, Kyrill Tkachov wrote: On 18/08/14 10:19, Richard Earnshaw wrote: On 14/08/14 09:45, Kyrill Tkachov wrote: On 13/08/14 18:32, Segher Boessenkool wrote: On Wed, Aug 13, 2014 at 03:57:31PM +0100, Richard Earnshaw wrote: The problem with the frankenmonster patterns is that they tend to proliferate into the machine description, and before you know where you are the back-end is full of them. Furthermore, they are very sensitive to the greedy first-match nature of combine: a better, later, combination is missed because a less good, earlier, optimization matched. If the first insn in the sequence is merged into an earlier instruction then you can end up with a junk sequence that completely fails to simplify. That ends up with super-frankenmonster patterns to deal with all the subcases and the problems grow exponentially from there. Right. Of course, combine should be fixed, yadda yadda. I really do think that the best solution would be to try and catch this during expand if possible and generate the right pattern from the start; then you don't risk combine failing to come to the rescue after several intermediate transformations have taken place. I think ssa-phiopt should simply not do this obfuscation at all. Without it, RTL ifcvt picks it up just fine on targets with conditional assignment instructions. I agree on targets without expand should do a better job (also for more generic conditional assignment). That particular transformation was added to tree-ssa-phiopt.c for PR 45685, the problem it was trying to solve was a missed vectorisation opportunity and transforming it made it into straightline code that was more amenable to vectorisation, that's why I'm somewhat reluctant to completely disable it. Hmm... I noticed in the midend we guard some optimisations on HAVE_conditional_move. Maybe we can guard this one on something like !HAVE_conditional_negation ? Can't we just guard it on HAVE_conditional_move? With such an instruction expand would then generate t1 = -a r = ? b : t1 and combine will do the rest. That was my first idea, but then it disables this transformation for x86, for which it was added specifically to solve PR45685... And more generally, using HAVE_XXX in the gimple optimizers is generally frowned upon. That's really bring a level of target knowledge into the gimple optimizers we don't want. I wonder if TER could create the res = (rhs & -cond) + cond form as a single expression which the gimple->ssa expanders could then emit as a series of insns or as a conditional negation on targets that have conditional negation. jeff
Re: Conditional negation elimination in tree-ssa-phiopt.c
On 08/13/14 08:57, Richard Earnshaw wrote: The problem with the frankenmonster patterns is that they tend to proliferate into the machine description, and before you know where you are the back-end is full of them. Can't argue with that :-) I really do think that the best solution would be to try and catch this during expand if possible and generate the right pattern from the start; then you don't risk combine failing to come to the rescue after several intermediate transformations have taken place. So the big question in my mind is what form do we want through the gimple optimizers (COND_EXPR or branchless) and given the chosen form, can we see a complex-enough expression at expansion time to realize it's just conditional negation and DTRT based on what capabilities the target has? If keeping the COND_EXPR form allows us to make good decisions at expansion time, I'm not opposed to pulling out those bits from phi-opt and making the transformation conditional on target attributes during expansion. jeff
gcc-4.9-20140827 is now available
Snapshot gcc-4.9-20140827 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20140827/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.9 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch revision 214609 You'll find: gcc-4.9-20140827.tar.bz2 Complete GCC MD5=a04385e042728145006bda74b6bd4572 SHA1=29ee60c2b9030e97274be00f56929cd1a591ec00 Diffs from 4.9-20140820 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
RE: selective linking of floating point support for *printf / *scanf
> From: Joern Rennecke [mailto:joern.renne...@embecosm.com] > Sent: Wednesday, August 27, 2014 7:54 PM > > Well, my goal was to have the selection be automatic for most use cases. > That you can do a manual selection by providing -u / -l arguments to the > linker is pretty much a given. > Hmm, instead of needing -u you could make gcc spit out definitions of a > dummy > local symbol to the trigger symbol in question (forcing a non-weak > reference), > using SET_ASM_OP (assuming it's defined). But you'd still be left with the > extra call overhead, increasing code size no matter if float is needed or not. That's indeed the approach I took in my own patch. > > Yes. I'll have to adjust the avr hook that it'll leave the v*printf / > v*scanf functions > alone - at least by default / for ISO C behaviour - but it'll give me > an easy way > to add a switch to tweak the behaviour. > > Or maybe we can use a -f option to select the v*printf / v*scanf default and > put the a stdio_altname__int_ target hook in targhooks.c, to be shared by all > configs that want an __int_ prefix. Are you aware of other C libraries that would benefit from such a default (newlib wouldn't)? Right now I'm having trouble to define stdio_altname in newlib-c.c since this would require it to be a C target hook but such a hook cannot be called from middle end. Did I mis(understood|s) something? > > FWIW, to safely shift the symbol into the implementation namespace you > need a prefix that starts with two underbars or one underbar and a > capital letter. > Or use some funny non-standard character in the symbol - but that's asking > for > more portability issues. > For references made automatically by gcc, it's a good idea not to impinge on > the application namespace. I'll consider about renaming the symbol but we've been using this one for some time in our toolchain so it might not be possible to change. > An application might use printf from , but define its own functions > iprintf, printf_float and _printf_float. > Therefore, it's a good idea to put the definition of newlib's iprintf > in a separate > file from __int_printf. Having essentialy the same contents, but > defining a different > symbol, and let the linker match them up to the definition. I'm confused here. Why would we have a __int_printf? Right now we only have iprintf as an alias to printf, _printf_float being a weakly defined function called from printf for the float support. Best regards, Thomas