RE: selective linking of floating point support for *printf / *scanf

2014-08-27 Thread Thomas Preud'homme
> From: Joern Rennecke [mailto:joern.renne...@embecosm.com]
> Sent: Tuesday, August 26, 2014 6:44 PM
> 
> Due to the library order defined in the specs, the float-enbled printf
> definition will
> be picked up from libprintf_flt.a .

It seems to me that it relies heavily on how symbol resolution works. If
I understand correctly all undefined symbols (for instance __int_printf)
in object files are processed first. Symbols definition are search in the
order in which libraries are given in the command line (this part seems
pretty reliable since it's at least documented in ld's manual). When
doing so, if a symbol definition reference an undefined symbol (like
__int_printf referencing printf), it is left aside until all undefined
symbol from object files have been processed. At some point printf
from object file will be processed and will pull the printf with float
support since it's the first one encountered. Then the undefined
reference discovered when pulling symbols from library will be
processed and since printf with float was already pulled in that's
the one being used.

Is this behavior the same for all linker? It sounds like a reasonable
algorithm but I don't know well the variety of linkers out there.

> That testcase is not valid. You'd to use one of the v*printf functions.
> Solving the general problem for these is not computable; for specific cases, 
> it
> would be possible, but at the cost of varying degrees of complexity.
> So I let this for manual selection: it's not handled with the
> calls.stdio_altname
> hook, and you have to use a special link line to use the integer-only
> implementations.
> Well, if desired, a spec change could give an option to do that.

Right, my bad, no problem indeed. What "general problem" are you
referring too that is not solved with this patch?

> 
> That can be implemented with suitable *newlib*.[ch] files that are
> selected in config.gcc,
> akin to newlib-stdint.h and glibc-c.c .

Absolutely, that was the approach I followed in my own patch.

> 
> Well, all the *printf functions are variadic, and as stated above,
> your example is invalid.
> The wildcard are va_list taking functions.  You first have to decide
> what you want to
> happen with these by default, and what kind of non-default behaviour
> you'd like to be
> able to select, and how; than we can talk about if this neeeds any
> extra infrastructure
> to implement.

Yes my apologize, it was a mistake from me.

I'll now do a more thorough testing and report back to you how it works for
us.

Best regards,

Thomas




Turning a single warning into an error in dejagnu test

2014-08-27 Thread Dominik Vogt
I'm writing a dejagnu test and encounter this warning at one place:

  warning: passing argument 1 of '...' makes integer from pointer
  without a cast [enabled by default]

Now, I have a "{ dg-error ... }" comment in that line.  The line
is generated from a script among hundreds of others that are all
expected to produce errors, not warnings.  It would be very
inconvenient (= lots of work) to change the script to make an
exception just for that single line (because there's no easy way
to identify lines that produce the warning instead of an error).

So the question is:  Is it possible to turn only this one warning
into an error inside a dejagnu test?  As I understand it, there
are no -W... switches for "enabled by default" options, and I
cannot use -Werror because that would break other tests in the
file.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: selective linking of floating point support for *printf / *scanf

2014-08-27 Thread Joern Rennecke
On 27 August 2014 08:02, Thomas Preud'homme  wrote:
>> From: Joern Rennecke [mailto:joern.renne...@embecosm.com]
>> Sent: Tuesday, August 26, 2014 6:44 PM
>>
>> Due to the library order defined in the specs, the float-enbled printf
>> definition will
>> be picked up from libprintf_flt.a .
>
> It seems to me that it relies heavily on how symbol resolution works.

I don't see how it can be any other way.  We want to be able to compile
translation units individually, and then let the linker sort out if we need the
floating point enabled implementation(s), and skip the integer-only ones if so.

> If
> I understand correctly all undefined symbols (for instance __int_printf)
> in object files are processed first. Symbols definition are search in the
> order in which libraries are given in the command line (this part seems
> pretty reliable since it's at least documented in ld's manual). When
> doing so, if a symbol definition reference an undefined symbol (like
> __int_printf referencing printf), it is left aside until all undefined
> symbol from object files have been processed. At some point printf
> from object file will be processed and will pull the printf with float
> support since it's the first one encountered. Then the undefined
> reference discovered when pulling symbols from library will be
> processed and since printf with float was already pulled in that's
> the one being used.
>
> Is this behavior the same for all linker? It sounds like a reasonable
> algorithm but I don't know well the variety of linkers out there.

Well, the part of processing libraries in order is pretty much
universal, although
there are options to change that behaviour.  I'd  say you really have
to know what
you are doing when using these options.
Now, to make the __int_printf function entry line up with the printf
implementation,
I'm relying on GNU AS (gas) linker scripts.  That part is
unfortunately not so portable,
so this trick has to be restricted to targets/configurations that use gas,
or another linker (if any) that allows to alphasort the relevant sections.

>> That testcase is not valid. You'd to use one of the v*printf functions.
>> Solving the general problem for these is not computable; for specific cases, 
>> it
>> would be possible, but at the cost of varying degrees of complexity.
>> So I let this for manual selection: it's not handled with the
>> calls.stdio_altname
>> hook, and you have to use a special link line to use the integer-only
>> implementations.
>> Well, if desired, a spec change could give an option to do that.
>
> Right, my bad, no problem indeed. What "general problem" are you
> referring too that is not solved with this patch?

The general problem also includes trying to decide definitely if we
need a/any floating
point enabled implementation(s) in cases with calls of va_list taking functions,
(which. while not always, but usually also take the format as a
variable), and have
no non-va_list calls to decide the matter in favour of needing floating point.
The question if any floating-point indicating actual format string
and/or(*) va_list
arguments reach the v*printf / v*scanf calls is non-trivial and
respects functions
(in the computability theory sense),  hence, this is not computable
according to Rice's theorem.

(*) Any way you language lawyer it, you can only chip away at the set
of programs
you can compute the answer for, but can never do it for the whole set.


Re: Turning a single warning into an error in dejagnu test

2014-08-27 Thread Marek Polacek
On Wed, Aug 27, 2014 at 10:59:40AM +0100, Dominik Vogt wrote:
> I'm writing a dejagnu test and encounter this warning at one place:
> 
>   warning: passing argument 1 of '...' makes integer from pointer
>   without a cast [enabled by default]
> 
> Now, I have a "{ dg-error ... }" comment in that line.  The line
> is generated from a script among hundreds of others that are all
> expected to produce errors, not warnings.  It would be very
> inconvenient (= lots of work) to change the script to make an
> exception just for that single line (because there's no easy way
> to identify lines that produce the warning instead of an error).
> 
> So the question is:  Is it possible to turn only this one warning
> into an error inside a dejagnu test?  As I understand it, there
> are no -W... switches for "enabled by default" options, and I
> cannot use -Werror because that would break other tests in the
> file.

For C, I recently added the -Wint-conversion option, so with recent
enough GCC you should be able to use -Werror=int-conversion.

Marek


RE: selective linking of floating point support for *printf / *scanf

2014-08-27 Thread Thomas Preud'homme
> From: Joern Rennecke [mailto:joern.renne...@embecosm.com]
> Sent: Wednesday, August 27, 2014 6:13 PM
> 
> I don't see how it can be any other way.  We want to be able to compile
> translation units individually, and then let the linker sort out if we need 
> the
> floating point enabled implementation(s), and skip the integer-only ones if
> so.

Consider the new scheme in newlib when printf calls another function for
handling floating point formats. This other function is weakly defined so
that it's not pulled by default and printf is effectively integer only. You just
need to link with an extra -u option to pull in the float support.

> 
> Well, the part of processing libraries in order is pretty much
> universal, although
> there are options to change that behaviour.  I'd  say you really have
> to know what
> you are doing when using these options.
> Now, to make the __int_printf function entry line up with the printf
> implementation,
> I'm relying on GNU AS (gas) linker scripts.  That part is
> unfortunately not so portable,
> so this trick has to be restricted to targets/configurations that use gas,
> or another linker (if any) that allows to alphasort the relevant sections.

Yes, I don't see the order of libraries as a problem for portability. I was
concerned of the following possible algorithm:

__int_printf is processed first and is found in libc. The linker sees that
__int_printf needs printf and search for printf according to libraries order
and so will find it in the next section. This printf doesn't provide float
support. Then the linker proceeds to process the next undefined symbol in
the object file that is printf and use the one already found.

I concede that such an algorithm looks more convoluted as it implies
some form of recursion instead of just having a queue where you put
the undefined symbol. Indeed I missed the linker script which is the most
obvious problem.

> 
> The general problem also includes trying to decide definitely if we
> need a/any floating
> point enabled implementation(s) in cases with calls of va_list taking
> functions,
> (which. while not always, but usually also take the format as a
> variable), and have
> no non-va_list calls to decide the matter in favour of needing floating point.
> The question if any floating-point indicating actual format string
> and/or(*) va_list
> arguments reach the v*printf / v*scanf calls is non-trivial and
> respects functions
> (in the computability theory sense),  hence, this is not computable
> according to Rice's theorem.
> 
> (*) Any way you language lawyer it, you can only chip away at the set
> of programs
> you can compute the answer for, but can never do it for the whole set.

Ok. Of course detecting more cases where an integer version of IO functions
would be enough would be nice but I'm already satisfied with the current
scheme. I'm wondering what's happening for v*printf: are they only defined
in the libc_float?

Would you accept a patch that would turn this solution into something also
suitable for newlib? For instance we would need to also include v*printf
and v*scanf functions into builtin as well. A new switch would also be
needed so that compiling newlib doesn't define the _printf_float and
_scanf_float symbols because of calls to v*printf and v*scanf functions.
I need to check if these calls are made in the same file in which case
I could maybe just guard the function call rewriting by a test checking if the
caller is itself a builtin.

Best regards,

Thomas




Re: selective linking of floating point support for *printf / *scanf

2014-08-27 Thread Joern Rennecke
On 27 August 2014 11:41, Thomas Preud'homme  wrote:
>> From: Joern Rennecke [mailto:joern.renne...@embecosm.com]
>> Sent: Wednesday, August 27, 2014 6:13 PM
>>
>> I don't see how it can be any other way.  We want to be able to compile
>> translation units individually, and then let the linker sort out if we need 
>> the
>> floating point enabled implementation(s), and skip the integer-only ones if
>> so.
>
> Consider the new scheme in newlib when printf calls another function for
> handling floating point formats. This other function is weakly defined so
> that it's not pulled by default and printf is effectively integer only. You 
> just
> need to link with an extra -u option to pull in the float support.

Well, my goal was to have the selection be automatic for most use cases.
That you can do a manual selection by providing -u / -l arguments to the
linker is pretty much a given.
Hmm, instead of needing -u you could make gcc spit out definitions of a dummy
local symbol to the trigger symbol in question (forcing a non-weak reference),
using SET_ASM_OP (assuming it's defined).  But you'd still be left with the
extra call overhead, increasing code size no matter if float is needed or not.

>> I'm relying on GNU AS (gas) linker scripts.  That part is
>> unfortunately not so portable,

Oops, of course that should read GNU LD.

> Ok. Of course detecting more cases where an integer version of IO functions
> would be enough would be nice but I'm already satisfied with the current
> scheme. I'm wondering what's happening for v*printf: are they only defined
> in the libc_float?

It's defined in both.  The way i wrote the avr gcc specs / avr-libc
makefile rules,
this will result in the floating point enabled implementation to be
used by default.
Which makes the gcc test results so much nicer...

> Would you accept a patch that would turn this solution into something also
> suitable for newlib? For instance we would need to also include v*printf
> and v*scanf functions into builtin as well.

Yes.  I'll have to adjust the avr hook that it'll leave the v*printf /
v*scanf functions
alone - at least by default / for ISO C behaviour - but it'll give me
an easy way
to add a switch to tweak the behaviour.

Or maybe we can use a -f option to select the v*printf / v*scanf default and
put the a stdio_altname__int_ target hook in targhooks.c, to be shared by all
configs that want an __int_ prefix.

> A new switch would also be
> needed so that compiling newlib doesn't define the _printf_float and
> _scanf_float symbols because of calls to v*printf and v*scanf functions.
> I need to check if these calls are made in the same file in which case
> I could maybe just guard the function call rewriting by a test checking if the
> caller is itself a builtin.

FWIW, to safely shift the symbol into the implementation namespace you
need a prefix that starts with two underbars or one underbar and a
capital letter.
Or use some funny non-standard character in the symbol - but that's asking for
more portability issues.
For references made automatically by gcc, it's a good idea not to impinge on
the application namespace.
An application might use printf from , but define its own functions
iprintf, printf_float and _printf_float.
Therefore, it's a good idea to put the definition of newlib's iprintf
in a separate
file from __int_printf.  Having essentialy the same contents, but
defining a different
symbol, and let the linker match them up to the definition.


Possible LRA issue?

2014-08-27 Thread Daniel Gutson
Hi,

   I have a large codebase where at some point, there's a structure
that takes an unsigned integer template argument, and uses as the size
of an array, something like

template 
struct Struct
{
typedef std::array Chunk;
typedef std::list Content;

   Content c;
};

Changing the values of S alters significantly the compile time and
memory that the compiler takes. We use some large numbers there.
At some point, the compiler runs out of memory (xmalloc fails). I
wondered why, and did some analysis by debugging the 4.8.2 (same with
4.8.3), and did the following experiment turning off all the
optimizations (-fno-* and -O0):
  I generated a report of xmalloc usage of two programs: one having
S=10u, and another with S=11u, just to see the difference of 1.
The report was generated as follows: I set a breakpoint at xmalloc,
appending a bt to a file. Then I found common stack traces and counted
how many xmallocs were called in one and another versions of the
program (S=10u and S=11u as mentioned above).
The difference were:

a) Stack trace:
  xmalloc | pool_alloc | create_live_range | mark_pseudo_live |
mark_regno_live | process_bb_lives | lra_create_live_ranges | lra |
do_reload | rest_of_handle_reload | execute_one_pass |
execute_pass_list | execute_pass_list | expand_function |
output_in_order | compile | finalize_compilation_unit |
cp_write_global_declarations | compile_file | do_compile | toplev_main
| __libc_start_main | _start |

 S=10u: 15 times
 S=11u: 16 times


b) Stack trace:
  xmalloc | lra_set_insn_recog_data | lra_get_insn_recog_data |
lra_update_insn_regno_info | lra_update_insn_regno_info |
lra_push_insn_1 | lra_push_insn | push_insns | lra_process_new_insns |
curr_insn_transform | lra_constraints | lra | do_reload |
rest_of_handle_reload | execute_one_pass | execute_pass_list |
execute_pass_list | expand_function | output_in_order | compile |
finalize_compilation_unit | cp_write_global_declarations |
compile_file | do_compile | toplev_main | __libc_start_main | _start |

 S=10u: 186 times
 S=11u: 192 times

c) Stack trace:
 xmalloc | df_install_refs | df_refs_add_to_chains |
df_insn_rescan | emit_insn_after_1 | emit_pattern_after_noloc |
emit_pattern_after_setloc | emit_insn_after_setloc | try_split |
split_insn | split_all_insns | rest_of_handle_split_after_reload |
execute_one_pass | execute_pass_list | execute_pass_list |
execute_pass_list | expand_function | output_in_order | compile |
finalize_compilation_unit | cp_write_global_declarations |
compile_file | do_compile | toplev_main | __libc_start_main | _start |

 S=10u: 617 times
 S=11u: 619 times

d) Stack trace:
 xmalloc | df_install_refs | df_refs_add_to_chains |
df_bb_refs_record | df_scan_blocks | rest_of_handle_df_initialize |
execute_one_pass | execute_pass_list | execute_pass_list |
expand_function | output_in_order | compile |
finalize_compilation_unit | cp_write_global_declarations |
compile_file | do_compile | toplev_main | __libc_start_main | _start |

S=10u: 13223 times
S=11u: 13227 times

e) Stack trace:
 xmalloc | __GI__obstack_newchunk | bitmap_element_allocate |
bitmap_set_bit | update_lives | assign_hard_regno | assign_by_spills |
lra_assign | lra | do_reload | rest_of_handle_reload |
execute_one_pass | execute_pass_list | execute_pass_list |
expand_function | output_in_order | compile |
finalize_compilation_unit | cp_write_global_declarations |
compile_file | do_compile | toplev_main | __libc_start_main | _start |

S=10u: 0 times (never!)
S=11u: 1

Unfortunately I can't disclose the source code nor have the time to
isolate a piece of code reproducing the issue.
Some comments about the code: I don't do template metaprogramming
depending on S, but I do some for-range on the Content.

I can extend the analysis to S=12 and compare with the previous values.
I thought to fix this myself but lack the time and background on
theses optimizations. Any hint?
I'm open to do more experiments if anybody asks me, or post -fdumps.

I suspect that playing with gcc-min-heapsize and similar values this
issue could be worked around, but I'd like to know why just changing
the size of an array has such a consequence.

Thanks!

Daniel.

-- 

Daniel F. Gutson
Chief Engineering Officer, SPD


San Lorenzo 47, 3rd Floor, Office 5

Córdoba, Argentina


Phone: +54 351 4217888 / +54 351 4218211

Skype: dgutson


RE: Possible LRA issue?

2014-08-27 Thread Ajit Kumar Agarwal
The cause of xmalloc occurring at times given below in Register Allocator will 
not be caused only by the structure and changing the passed S as template 
argument.
It depends on how the below structures is referenced or used. From the stack 
trace I can see the live ranges creation is based on how the below structure is 
referenced and Used.

Thanks & Regards
Ajit

-Original Message-
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Daniel 
Gutson
Sent: Wednesday, August 27, 2014 7:58 PM
To: gcc Mailing List
Subject: Possible LRA issue?

Hi,

   I have a large codebase where at some point, there's a structure that takes 
an unsigned integer template argument, and uses as the size of an array, 
something like

template 
struct Struct
{
typedef std::array Chunk;
typedef std::list Content;

   Content c;
};

Changing the values of S alters significantly the compile time and memory that 
the compiler takes. We use some large numbers there.
At some point, the compiler runs out of memory (xmalloc fails). I wondered why, 
and did some analysis by debugging the 4.8.2 (same with 4.8.3), and did the 
following experiment turning off all the optimizations (-fno-* and -O0):
  I generated a report of xmalloc usage of two programs: one having S=10u, and 
another with S=11u, just to see the difference of 1.
The report was generated as follows: I set a breakpoint at xmalloc, appending a 
bt to a file. Then I found common stack traces and counted how many xmallocs 
were called in one and another versions of the program (S=10u and S=11u as 
mentioned above).
The difference were:

a) Stack trace:
  xmalloc | pool_alloc | create_live_range | mark_pseudo_live | 
mark_regno_live | process_bb_lives | lra_create_live_ranges | lra | do_reload | 
rest_of_handle_reload | execute_one_pass | execute_pass_list | 
execute_pass_list | expand_function | output_in_order | compile | 
finalize_compilation_unit | cp_write_global_declarations | compile_file | 
do_compile | toplev_main
| __libc_start_main | _start |

 S=10u: 15 times
 S=11u: 16 times


b) Stack trace:
  xmalloc | lra_set_insn_recog_data | lra_get_insn_recog_data | 
lra_update_insn_regno_info | lra_update_insn_regno_info |
lra_push_insn_1 | lra_push_insn | push_insns | lra_process_new_insns | 
curr_insn_transform | lra_constraints | lra | do_reload | rest_of_handle_reload 
| execute_one_pass | execute_pass_list | execute_pass_list | expand_function | 
output_in_order | compile | finalize_compilation_unit | 
cp_write_global_declarations | compile_file | do_compile | toplev_main | 
__libc_start_main | _start |

 S=10u: 186 times
 S=11u: 192 times

c) Stack trace:
 xmalloc | df_install_refs | df_refs_add_to_chains | df_insn_rescan | 
emit_insn_after_1 | emit_pattern_after_noloc | emit_pattern_after_setloc | 
emit_insn_after_setloc | try_split | split_insn | split_all_insns | 
rest_of_handle_split_after_reload | execute_one_pass | execute_pass_list | 
execute_pass_list | execute_pass_list | expand_function | output_in_order | 
compile | finalize_compilation_unit | cp_write_global_declarations | 
compile_file | do_compile | toplev_main | __libc_start_main | _start |

 S=10u: 617 times
 S=11u: 619 times

d) Stack trace:
 xmalloc | df_install_refs | df_refs_add_to_chains | df_bb_refs_record | 
df_scan_blocks | rest_of_handle_df_initialize | execute_one_pass | 
execute_pass_list | execute_pass_list | expand_function | output_in_order | 
compile | finalize_compilation_unit | cp_write_global_declarations | 
compile_file | do_compile | toplev_main | __libc_start_main | _start |

S=10u: 13223 times
S=11u: 13227 times

e) Stack trace:
 xmalloc | __GI__obstack_newchunk | bitmap_element_allocate | 
bitmap_set_bit | update_lives | assign_hard_regno | assign_by_spills | 
lra_assign | lra | do_reload | rest_of_handle_reload | execute_one_pass | 
execute_pass_list | execute_pass_list | expand_function | output_in_order | 
compile | finalize_compilation_unit | cp_write_global_declarations | 
compile_file | do_compile | toplev_main | __libc_start_main | _start |

S=10u: 0 times (never!)
S=11u: 1

Unfortunately I can't disclose the source code nor have the time to isolate a 
piece of code reproducing the issue.
Some comments about the code: I don't do template metaprogramming depending on 
S, but I do some for-range on the Content.

I can extend the analysis to S=12 and compare with the previous values.
I thought to fix this myself but lack the time and background on theses 
optimizations. Any hint?
I'm open to do more experiments if anybody asks me, or post -fdumps.

I suspect that playing with gcc-min-heapsize and similar values this issue 
could be worked around, but I'd like to know why just changing the size of an 
array has such a consequence.

Thanks!

Daniel.

-- 

Daniel F. Gutson
Chief Engineering Officer, SPD


San Lorenzo 47, 3rd Floor, Office 5

Córdoba, Argentina


Phone: +54 351 42178

Re: Possible LRA issue?

2014-08-27 Thread Daniel Gutson
On Wed, Aug 27, 2014 at 12:16 PM, Ajit Kumar Agarwal
 wrote:
> The cause of xmalloc occurring at times given below in Register Allocator 
> will not be caused only by the structure and changing the passed S as 
> template argument.
> It depends on how the below structures is referenced or used. From the stack 
> trace I can see the live ranges creation is based on how the below structure 
> is referenced and Used.

Could you please show me an example of such different usages and references?

>
> Thanks & Regards
> Ajit
>
> -Original Message-
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of 
> Daniel Gutson
> Sent: Wednesday, August 27, 2014 7:58 PM
> To: gcc Mailing List
> Subject: Possible LRA issue?
>
> Hi,
>
>I have a large codebase where at some point, there's a structure that 
> takes an unsigned integer template argument, and uses as the size of an 
> array, something like
>
> template 
> struct Struct
> {
> typedef std::array Chunk;
> typedef std::list Content;
>
>Content c;
> };
>
> Changing the values of S alters significantly the compile time and memory 
> that the compiler takes. We use some large numbers there.
> At some point, the compiler runs out of memory (xmalloc fails). I wondered 
> why, and did some analysis by debugging the 4.8.2 (same with 4.8.3), and did 
> the following experiment turning off all the optimizations (-fno-* and -O0):
>   I generated a report of xmalloc usage of two programs: one having S=10u, 
> and another with S=11u, just to see the difference of 1.
> The report was generated as follows: I set a breakpoint at xmalloc, appending 
> a bt to a file. Then I found common stack traces and counted how many 
> xmallocs were called in one and another versions of the program (S=10u and 
> S=11u as mentioned above).
> The difference were:
>
> a) Stack trace:
>   xmalloc | pool_alloc | create_live_range | mark_pseudo_live | 
> mark_regno_live | process_bb_lives | lra_create_live_ranges | lra | do_reload 
> | rest_of_handle_reload | execute_one_pass | execute_pass_list | 
> execute_pass_list | expand_function | output_in_order | compile | 
> finalize_compilation_unit | cp_write_global_declarations | compile_file | 
> do_compile | toplev_main
> | __libc_start_main | _start |
>
>  S=10u: 15 times
>  S=11u: 16 times
>
>
> b) Stack trace:
>   xmalloc | lra_set_insn_recog_data | lra_get_insn_recog_data | 
> lra_update_insn_regno_info | lra_update_insn_regno_info |
> lra_push_insn_1 | lra_push_insn | push_insns | lra_process_new_insns | 
> curr_insn_transform | lra_constraints | lra | do_reload | 
> rest_of_handle_reload | execute_one_pass | execute_pass_list | 
> execute_pass_list | expand_function | output_in_order | compile | 
> finalize_compilation_unit | cp_write_global_declarations | compile_file | 
> do_compile | toplev_main | __libc_start_main | _start |
>
>  S=10u: 186 times
>  S=11u: 192 times
>
> c) Stack trace:
>  xmalloc | df_install_refs | df_refs_add_to_chains | df_insn_rescan | 
> emit_insn_after_1 | emit_pattern_after_noloc | emit_pattern_after_setloc | 
> emit_insn_after_setloc | try_split | split_insn | split_all_insns | 
> rest_of_handle_split_after_reload | execute_one_pass | execute_pass_list | 
> execute_pass_list | execute_pass_list | expand_function | output_in_order | 
> compile | finalize_compilation_unit | cp_write_global_declarations | 
> compile_file | do_compile | toplev_main | __libc_start_main | _start |
>
>  S=10u: 617 times
>  S=11u: 619 times
>
> d) Stack trace:
>  xmalloc | df_install_refs | df_refs_add_to_chains | df_bb_refs_record | 
> df_scan_blocks | rest_of_handle_df_initialize | execute_one_pass | 
> execute_pass_list | execute_pass_list | expand_function | output_in_order | 
> compile | finalize_compilation_unit | cp_write_global_declarations | 
> compile_file | do_compile | toplev_main | __libc_start_main | _start |
>
> S=10u: 13223 times
> S=11u: 13227 times
>
> e) Stack trace:
>  xmalloc | __GI__obstack_newchunk | bitmap_element_allocate | 
> bitmap_set_bit | update_lives | assign_hard_regno | assign_by_spills | 
> lra_assign | lra | do_reload | rest_of_handle_reload | execute_one_pass | 
> execute_pass_list | execute_pass_list | expand_function | output_in_order | 
> compile | finalize_compilation_unit | cp_write_global_declarations | 
> compile_file | do_compile | toplev_main | __libc_start_main | _start |
>
> S=10u: 0 times (never!)
> S=11u: 1
>
> Unfortunately I can't disclose the source code nor have the time to isolate a 
> piece of code reproducing the issue.
> Some comments about the code: I don't do template metaprogramming depending 
> on S, but I do some for-range on the Content.
>
> I can extend the analysis to S=12 and compare with the previous values.
> I thought to fix this myself but lack the time and background on theses 
> optimizations. Any hint?
> I'm open to do more experiments if anybody asks me, or post -fdum

Register allocation: caller-save vs spilling

2014-08-27 Thread Wilco Dijkstra
Hi,

I'm investigating various register allocation inefficiencies. The first thing 
that stands out is
that GCC both supports caller-saves as well as spilling. Spilling seems to 
spill all definitions and
all uses of a liverange. This means you often end up with multiple reloads 
close together, while it
would be more efficient to do a single load and then reuse the loaded value 
several times.
Caller-save does better in that case, but it is inefficient in that it 
repeatedly stores registers
across every call even if unchanged. If both were fixed to minimise the number 
of loads/stores I
can't see how one could beat the other, so you'd no longer need both.

Anyway due to the current implementation there are clearly cases where 
caller-save is best and cases
where spilling is best. However I do not see it making the correct decision 
despite trying to
account for the costs - some code is significantly faster with 
-fno-caller-saves, other code wins
with -fcaller-saves. As an example, I see code like this on AArch64:

ldr s4, .LC20
fmuls0, s0, s4
str s4, [x29, 104]
bl  f
ldr s4, [x29, 104]
fmuls0, s0, s4

With -fno-caller-saves it spills and rematerializes the constant as you'd 
expect:

ldr s1, .LC20
fmuls0, s0, s1
bl  f
ldr s5, .LC20
fmuls0, s0, s5

So given this, is the cost calculation correct and does it include 
rematerialization? The spill code
understands how to rematerialize so it should take this into account in the 
costs. I did find some
code in ira-costs.c in scan_one_insn() that attempts something that looks like 
an adjustment for
rematerialization but it doesn't appear to handle all cases (simple immediates, 
2-instruction
immediates, address-constants, non-aliased loads such as literal pool and const 
data loads).

Also the hook CALLER_SAVE_PROFITABLE appears to have disappeared - overall 
performance improves
significantly if I add this (basically the default heuristic used on 
instruction frequencies):

--- a/gcc/ira-costs.c
+++ b/gcc/ira-costs.c
@@ -2230,6 +2230,8 @@ ira_tune_allocno_costs (void)
   * ALLOCNO_FREQ (a)
   * IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2);
 #endif
+  if (ALLOCNO_FREQ (a) < 4 * ALLOCNO_CALL_FREQ (a))
+cost = INT_MAX;
}
  if (INT_MAX - cost < reg_costs[j])
reg_costs[j] = INT_MAX;

If such a simple heuristic can beat the costs, they can't be quite right. 

Is there anyone who understands the cost calculations?

Wilco




consistent naming of passes....

2014-08-27 Thread Basile Starynkevitch
Hello all,

When I compile some file (precisely, the gcc/melt-runtime.cc from the latest 
melt branch) with -O1 -fdump-passes (using GCC 4.9) I'm getting
notably

   ipa-cp  :  OFF
   ipa-cdtor   :  OFF
   ipa-inline  :  ON
   ipa-pure-const  :  ON
   ipa-static-var  :  ON
   ipa-pta :  OFF
   ipa-simdclone   :  OFF
   *free_cfg_annotations   :  ON

However, in file gcc/ipa-inline.c there is

const pass_data pass_data_ipa_inline =
{
  IPA_PASS, /* type */
  "inline", /* name */
  OPTGROUP_INLINE, /* optinfo_flags */
  false, /* has_gate */
  true, /* has_execute */
  TV_IPA_INLINING, /* tv_id */

I find strange that the two names (the one given by -fdump-passes and the one 
in the pass_data_ipa_inline object) are different.

When I try to insert a plugin pass (actually in MELT, file 
gcc/melt/xtramelt-ana-simple.melt) named "inline" it gives:

cc1plus: fatal error: pass 'inline' not found but is referenced by new pass 
'melt_justcountipa'

If I use "ipa-inline" I'm getting
cc1plus: fatal error: pass 'ipa-inline' not found but is referenced by new pass 
'melt_justcountipa'

How should a plugin writer find the name of the reference pass to insert his 
own new pass? At the very least it should be documented, and preferably it 
should be identical to output of -fdump-passes

Regards.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***



Re: Enable EBX for x86 in 32bits PIC code

2014-08-27 Thread Vladimir Makarov

On 2014-08-26 5:42 PM, Ilya Enkovich wrote:

Hi,

Here is a patch I tried.  I apply it over revision 214215.  Unfortunately I do 
not have a small reproducer but the problem can be easily reproduced on 
SPEC2000 benchmark 175.vpr.  The problem is in read_arch.c:701 where float 
value is compared with float constant 1.0.  It is inlined into read_arch 
function and can be easily found in RTL dump of function read_arch as a float 
comparison with 1.0 after the first call to strtod function.

Here is a compilation string I use:

gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math -mfpmath=sse 
-m32  -march=slm -fPIE -pie -c -o read_arch.o   -DSPEC_CPU2000
read_arch.c

In my final assembler comparison with 1.0 looks like:

comiss  .LC11@GOTOFF(%ebp), %xmm0   # 1101  *cmpisf_sse [length = 7]

and %ebp here doesn't have a proper value.

I'll try to make a smaller reproducer if these instructions don't help.


I've managed to reproduce it.  Although it would be better to send the 
patch as an attachment.


The problem is actually in IRA not LRA.  IRA splits pseudo used for PIC. 
 Then in a region when a *new* pseudo used as PIC we rematerialize a 
constant which transformed in memory addressed through *original* PIC 
pseudo.


To solve the problem we should prevent such splitting and guarantee that 
PIC pseudo allocnos in different region gets the same hard reg.


The following patch should solve the problem.


Index: ira-color.c
===
--- ira-color.c (revision 214576)
+++ ira-color.c (working copy)
@@ -3239,9 +3239,10 @@
  ira_assert (ALLOCNO_CLASS (subloop_allocno) == rclass);
  ira_assert (bitmap_bit_p (subloop_node->all_allocnos,
ALLOCNO_NUM (subloop_allocno)));
- if ((flag_ira_region == IRA_REGION_MIXED)
- && (loop_tree_node->reg_pressure[pclass]
- <= ira_class_hard_regs_num[pclass]))
+ if ((flag_ira_region == IRA_REGION_MIXED
+  && (loop_tree_node->reg_pressure[pclass]
+  <= ira_class_hard_regs_num[pclass]))
+ || regno == (int) REGNO (pic_offset_table_rtx))
{
  if (! ALLOCNO_ASSIGNED_P (subloop_allocno))
{
Index: ira-emit.c
===
--- ira-emit.c  (revision 214576)
+++ ira-emit.c  (working copy)
@@ -620,7 +620,8 @@
  /* don't create copies because reload can spill an
 allocno set by copy although the allocno will not
 get memory slot.  */
- || ira_equiv_no_lvalue_p (regno)))
+ || ira_equiv_no_lvalue_p (regno)
+ || ALLOCNO_REGNO (allocno) == REGNO (pic_offset_table_rtx)))
continue;
  original_reg = allocno_emit_reg (allocno);
  if (parent_allocno == NULL


Re: Enable EBX for x86 in 32bits PIC code

2014-08-27 Thread Jeff Law

On 08/26/14 15:42, Ilya Enkovich wrote:

diff --git a/gcc/calls.c b/gcc/calls.c
index 4285ec1..85dae6b 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals 
ATTRIBUTE_UNUSED,
  call_expr_arg_iterator iter;
  tree arg;

+if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+  {
+   gcc_assert (pic_offset_table_rtx);
+   args[j].tree_value = make_tree (ptr_type_node,
+   pic_offset_table_rtx);
+   j--;
+  }
+
  if (struct_value_addr_value)
{
args[j].tree_value = struct_value_addr_value;
So why do you need this?  Can't this be handled in the call/call_value 
expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE 
from inside ix86_expand_call?  Basically I'm not seeing the need for 
another target hook here.  I think that would significantly simply the 
patch as well.



Jeff


Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-27 Thread Jeff Law

On 08/18/14 04:33, Kyrill Tkachov wrote:


On 18/08/14 10:19, Richard Earnshaw wrote:

On 14/08/14 09:45, Kyrill Tkachov wrote:

On 13/08/14 18:32, Segher Boessenkool wrote:

On Wed, Aug 13, 2014 at 03:57:31PM +0100, Richard Earnshaw wrote:

The problem with the frankenmonster patterns is that they tend to
proliferate into the machine description, and before you know where
you
are the back-end is full of them.

Furthermore, they are very sensitive to the greedy first-match
nature of
combine: a better, later, combination is missed because a less good,
earlier, optimization matched.  If the first insn in the sequence is
merged into an earlier instruction then you can end up with a junk
sequence that completely fails to simplify.  That ends up with
super-frankenmonster patterns to deal with all the subcases and the
problems grow exponentially from there.

Right.  Of course, combine should be fixed, yadda yadda.


I really do think that the best solution would be to try and catch
this
during expand if possible and generate the right pattern from the
start;
then you don't risk combine failing to come to the rescue after
several
intermediate transformations have taken place.

I think ssa-phiopt should simply not do this obfuscation at all.
Without
it, RTL ifcvt picks it up just fine on targets with conditional
assignment
instructions.  I agree on targets without expand should do a better job
(also for more generic conditional assignment).

That particular transformation was added to tree-ssa-phiopt.c for PR
45685, the problem it was trying to solve was a missed vectorisation
opportunity and transforming it made it into straightline code that was
more amenable to vectorisation, that's why I'm somewhat reluctant to
completely disable it.

Hmm... I noticed in the midend we guard some optimisations on
HAVE_conditional_move. Maybe we can guard this one on something like
!HAVE_conditional_negation ?


Can't we just guard it on HAVE_conditional_move?  With such an
instruction expand would then generate

t1 = -a
r =  ? b : t1

and combine will do the rest.


That was my first idea, but then it disables this transformation for
x86, for which it was added
specifically to solve PR45685...
And more generally, using HAVE_XXX in the gimple optimizers is generally 
frowned upon.  That's really bring a level of target knowledge into the 
gimple optimizers we don't want.


I wonder if TER could create the res = (rhs & -cond) + cond form as a 
single expression which the gimple->ssa expanders could then emit as a 
series of insns or as a conditional negation on targets that have 
conditional negation.


jeff


Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-27 Thread Jeff Law

On 08/13/14 08:57, Richard Earnshaw wrote:

The problem with the frankenmonster patterns is that they tend to
proliferate into the machine description, and before you know where you
are the back-end is full of them.

Can't argue with that :-)



I really do think that the best solution would be to try and catch this
during expand if possible and generate the right pattern from the start;
then you don't risk combine failing to come to the rescue after several
intermediate transformations have taken place.
So the big question in my mind is what form do we want through the 
gimple optimizers (COND_EXPR or branchless) and given the chosen form, 
can we see a complex-enough expression at expansion time to realize it's 
just conditional negation and DTRT based on what capabilities the target 
has?


If keeping the COND_EXPR form allows us to make good decisions at 
expansion time, I'm not opposed to pulling out those bits from phi-opt 
and making the transformation conditional on target attributes during 
expansion.




jeff


gcc-4.9-20140827 is now available

2014-08-27 Thread gccadmin
Snapshot gcc-4.9-20140827 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20140827/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch 
revision 214609

You'll find:

 gcc-4.9-20140827.tar.bz2 Complete GCC

  MD5=a04385e042728145006bda74b6bd4572
  SHA1=29ee60c2b9030e97274be00f56929cd1a591ec00

Diffs from 4.9-20140820 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


RE: selective linking of floating point support for *printf / *scanf

2014-08-27 Thread Thomas Preud'homme
> From: Joern Rennecke [mailto:joern.renne...@embecosm.com]
> Sent: Wednesday, August 27, 2014 7:54 PM
> 
> Well, my goal was to have the selection be automatic for most use cases.
> That you can do a manual selection by providing -u / -l arguments to the
> linker is pretty much a given.
> Hmm, instead of needing -u you could make gcc spit out definitions of a
> dummy
> local symbol to the trigger symbol in question (forcing a non-weak
> reference),
> using SET_ASM_OP (assuming it's defined).  But you'd still be left with the
> extra call overhead, increasing code size no matter if float is needed or not.

That's indeed the approach I took in my own patch.

> 
> Yes.  I'll have to adjust the avr hook that it'll leave the v*printf /
> v*scanf functions
> alone - at least by default / for ISO C behaviour - but it'll give me
> an easy way
> to add a switch to tweak the behaviour.
> 
> Or maybe we can use a -f option to select the v*printf / v*scanf default and
> put the a stdio_altname__int_ target hook in targhooks.c, to be shared by all
> configs that want an __int_ prefix.

Are you aware of other C libraries that would benefit from such a default 
(newlib
wouldn't)?

Right now I'm having trouble to define stdio_altname in newlib-c.c since this 
would
require it to be a C target hook but such a hook cannot be called from middle 
end.

Did I mis(understood|s) something?

> 
> FWIW, to safely shift the symbol into the implementation namespace you
> need a prefix that starts with two underbars or one underbar and a
> capital letter.
> Or use some funny non-standard character in the symbol - but that's asking
> for
> more portability issues.
> For references made automatically by gcc, it's a good idea not to impinge on
> the application namespace.

I'll consider about renaming the symbol but we've been using this one for
some time in our toolchain so it might not be possible to change.

> An application might use printf from , but define its own functions
> iprintf, printf_float and _printf_float.
> Therefore, it's a good idea to put the definition of newlib's iprintf
> in a separate
> file from __int_printf.  Having essentialy the same contents, but
> defining a different
> symbol, and let the linker match them up to the definition.

I'm confused here. Why would we have a __int_printf? Right now we only
have iprintf as an alias to printf, _printf_float being a weakly defined 
function
 called from printf for the float support.

Best regards,

Thomas