Re: [PATCH] Bug fix for PR59050
On Fri, 8 Nov 2013, Cong Hou wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59050 > > This is my bad. I forget to check the test result for gfortran. With > this patch the bug should be fixed (tested on x86-64). Ok. Btw, requirements are to bootstrap and test with all default languages enabled (that is, without any --enable-languages or --enable-languages=all). That gets you c,c++,objc,java,fortran,lto and misses obj-c++ ada and go. I am personally using --enable-languages=all,ada,obj-c++. Thanks, Richard. > thanks, > Cong > > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 90b01f2..e62c672 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,8 @@ > +2013-11-08 Cong Hou > + > + PR tree-optimization/59050 > + * tree-vect-data-refs.c (comp_dr_addr_with_seg_len_pair): Bug fix. > + > 2013-11-07 Cong Hou > > * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c > index b2a31b1..b7eb926 100644 > --- a/gcc/tree-vect-data-refs.c > +++ b/gcc/tree-vect-data-refs.c > @@ -2669,9 +2669,9 @@ comp_dr_addr_with_seg_len_pair (const void *p1_, > const void *p2_) >if (comp_res != 0) > return comp_res; > } > - if (tree_int_cst_compare (p11.offset, p21.offset) < 0) > + else if (tree_int_cst_compare (p11.offset, p21.offset) < 0) > return -1; > - if (tree_int_cst_compare (p11.offset, p21.offset) > 0) > + else if (tree_int_cst_compare (p11.offset, p21.offset) > 0) > return 1; >if (TREE_CODE (p12.offset) != INTEGER_CST >|| TREE_CODE (p22.offset) != INTEGER_CST) > @@ -2680,9 +2680,9 @@ comp_dr_addr_with_seg_len_pair (const void *p1_, > const void *p2_) >if (comp_res != 0) > return comp_res; > } > - if (tree_int_cst_compare (p12.offset, p22.offset) < 0) > + else if (tree_int_cst_compare (p12.offset, p22.offset) < 0) > return -1; > - if (tree_int_cst_compare (p12.offset, p22.offset) > 0) > + else if (tree_int_cst_compare (p12.offset, p22.offset) > 0) > return 1; > >return 0; > > -- Richard Biener SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend
Re: Simplify code in gimple_equal_p
On Sat, 9 Nov 2013, Tom de Vries wrote: > Richard, > > This patch simplifies code in gimple_equal_p. > > Bootstrapped and regtested on x86_64. > > OK for trunk? Ok. Thanks, Richard. > Thanks, > - Tom > > 2013-11-06 Tom de Vries > > * tree-ssa-tail-merge.c (gimple_equal_p): Remove equal variable.
Re: [RFA][PATCH] Isolate erroneous paths optimization
> However, that brings up an couple interesting questions. > > Let's say we find a NULL pointer which reaches a return statement in a > function which is marked as returns_nonnull. In that case there is no > dereference. Presumably for that kind of scenario we'll just keep the > builtin trap. > > Similarly, assume we extend this pass to detect out-of-bounds array > indexing. It's fairly simple to do and has always been in my plan. In > that case leaving in the array indexing won't necessarily generate a > fault. For those presumably we'll just want the builtin_trap as well? > > Again, I don't mind inserting a *0, I just want to have a plan for the > other undefined behaviours we currently detect and those which I plan on > catching soon. The more general problem is that, with -fnon-call-exceptions, we really expect a fully-fledged exception to be raised when something goes wrong. Inserting __builtin_trap doesn't work because it's simply not equivalent to a throw. In other words, if __builtin_throw would be inserted instead of __builtin_trap with -fnon-call-exceptions, things would probably be acceptable as-is. -- Eric Botcazou
Re: [PATCH] Factor out gimple_dont_merge_p
On Sat, 9 Nov 2013, Tom de Vries wrote: > Richard, > > This patch factors out gimple_dont_merge_p from gimple_equal_p and > find_duplicate. > > Bootstrapped and regtested on x86_64. > > OK for trunk? +static bool +gimple_dont_merge_p (gimple stmt) +{ + switch (gimple_code (stmt)) +{ +case GIMPLE_CALL: + /* Eventually, we'll significantly complicate the CFG by adding +back edges to properly model the effects of transaction restart. +For the bulk of optimization this does not matter, but what we +cannot recover from is tail merging blocks between two separate +transactions. Avoid that by making commit not match. */ + if (gimple_call_builtin_p (stmt, BUILT_IN_TM_COMMIT)) + return true; + + /* We cannot tail-merge the builtins that end transactions. +??? The alternative being unsharing of BBs in the tm_init pass. */ + if (flag_tm + && (gimple_call_flags (stmt) & ECF_TM_BUILTIN) + && is_tm_ending_fndecl (gimple_call_fndecl (stmt))) + return true; 1) BUILT_IN_TM_COMMIT is handled in is_tm_ending_fndecl, 2) fndecl may be NULL I'd simply get rid of gimple_dont_merge_p and call a is_tm_ending (gimple) function you'd add to TM. Richard. > Thanks, > - Tom > > 2013-11-06 Tom de Vries > > * tree-ssa-tail-merge.c (gimple_dont_merge_p): Factor function out > of ... > (gimple_equal_p): ... here, and ... > (find_duplicate): ... here. Use gimple_dont_merge_p.
Re: [PATCH] Factor out gimple_operand_equal_value_p from gimple_equal_p
On Sat, 9 Nov 2013, Tom de Vries wrote: > Richard, > > This patch factors out gimple_operand_equal_value_p from gimple_equal_p. > > Bootstrapped and regtested on x86_64. > > OK for trunk? Ok. Thanks, Richard. > Thanks, > - Tom > > 2013-11-06 Tom de Vries > > * tree-ssa-tail-merge.c (gimple_operand_equal_value_p): Factor new > function out of ... > (gimple_equal_p): ... here.
Re: [PATCH] Handle GIMPLE_ASSIGNs with different vuse in gimple_equal_p
On Sat, 9 Nov 2013, Tom de Vries wrote: > Richard, > > Consider the test-case test.c: > ... > int z; > int x; > > void > f (int c, int d) > { > if (c) > z = 5; > else > { > if (d) > x = 4; > z = 5; > } > } > ... > > Atm, we don't tail-merge the 'z = 5' blocks, because gimple_equal_p returns > false for the 'z = 5' statements. The relevant code is this: > ... > if (TREE_CODE (lhs1) != SSA_NAME > && TREE_CODE (lhs2) != SSA_NAME) > return (vn_valueize (gimple_vdef (s1)) > == vn_valueize (gimple_vdef (s2))); > ... > The vdefs of the 'z = 5' statements are different, because the incoming vuses > are different. > > This patch handles GIMPLE_ASSIGNs with different vuse in gimple_equal_p, by > doing a structural comparison. > > Bootstrapped and regtested on x86_64. > > OK for trunk? Comments inline diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c index 98b5882..43516a7 100644 --- a/gcc/tree-ssa-tail-merge.c +++ b/gcc/tree-ssa-tail-merge.c @@ -1173,8 +1173,47 @@ gimple_equal_p (same_succ same_succ, gimple s1, gimple s2) lhs2 = gimple_get_lhs (s2); if (TREE_CODE (lhs1) != SSA_NAME && TREE_CODE (lhs2) != SSA_NAME) - return (vn_valueize (gimple_vdef (s1)) - == vn_valueize (gimple_vdef (s2))); + { + /* If the vdef is the same, it's the same statement. */ + if (vn_valueize (gimple_vdef (s1)) + == vn_valueize (gimple_vdef (s2))) + return true; + + /* If the vdef is not the same but the vuse is the same, it's not the +same stmt. */ + if (vn_valueize (gimple_vuse (s1)) + == vn_valueize (gimple_vuse (s2))) + return false; What's the logic behind this? We want to use VN to get more "positive" results - doing a negative early out looks suspicious to me ... + /* If the vdef is not the same and the vuse is not the same, it might be +same stmt. */ + + /* Test for structural equality. */ + if (gimple_assign_rhs_code (s1) != gimple_assign_rhs_code (s1) s2 + || (gimple_assign_nontemporal_move_p (s1) + != gimple_assign_nontemporal_move_p (s2))) I don't think we should care (it'll be false - a later pass sets it, it's an optimization hint, not a correctness issue). More interesting would be to assert that has_volatile_ops is the same if the operands turned out to be the same. + return false; + + if (!operand_equal_p (lhs1, lhs2, 0)) + return false; + + t1 = gimple_assign_rhs1 (s1); + t2 = gimple_assign_rhs1 (s2); + if (!gimple_operand_equal_value_p (t1, t2)) + return false; + + t1 = gimple_assign_rhs2 (s1); + t2 = gimple_assign_rhs2 (s2); + if (!gimple_operand_equal_value_p (t1, t2)) + return false; + + t1 = gimple_assign_rhs3 (s1); + t2 = gimple_assign_rhs3 (s2); + if (!gimple_operand_equal_value_p (t1, t2)) + return false; for (i = 1; i < gimple_num_ops (s1); ++i) t1 = gimple_op (s1, i); ... but I think you should only compare rhs1 and thus only handle GIMPLE_ASSIGN_SINGLEs this way - the others have a SSA name lhs. That makes the whole thing just if (TREE_CODE (lhs1) != SSA_NAME && TREE_CODE (lhs2) != SSA_NAME) { if (vn_valueize (gimple_vdef (s1)) == vn_valueize (gimple_vdef (s2))) return true; return operand_equal_p (lhs1, lhs2, 0) && gimple_operand_equal_value_p (gimple_assign_rhs1 (s1), gimple_assign_rhs2 (s2)); } Ok with doing it this way. Thanks, Richard. + /* Same structure. */ + return true; + } else if (TREE_CODE (lhs1) == SSA_NAME && TREE_CODE (lhs2) == SSA_NAME) return vn_valueize (lhs1) == vn_valueize (lhs2);
Re: [RFC] replace malloc with a decl on the stack
On Sun, Nov 10, 2013 at 04:27:00PM +0100, Marc Glisse wrote: > Hello, > > I am posting this patch to get some feedback on the approach. The > goal is to replace malloc+free with a stack allocation (a decl > actually) when the size is a small constant. > Why constraint yourself to small sizes. Stack allocation benefits is speed and less memory comsumption due lack of fragmentation. A possible way is to have thread local bounds to stack size and call function with custom logic when it is outside of bounds. Below is a simple implementation which creates a separate stack for that (for simplicity and because it does not need to find bounds on thread stack.) With bit of more work it could do allocations in similar way as in splitstack. > For testing, I highjacked the "leaf" attribute, but it isn't right, > I'll remove it from the list (not sure what I'll do for the > testcases then). What I'd want instead is a "returns" attribute that > means the function will return (either by "return" or an exception), > as opposed to having an infinite loop, calling exit or longjmp, etc > (sched-deps.c has a related call_may_noreturn_p). The situation I am > trying to avoid is: > p=malloc(12); > f(p) > free(p) > > where f contains somewhere: > free(p); exit(42); > (or longjmp or something else that takes the regular call to free > out of the execution flow). > One of plans to extend malloc is add custom free handler, interface would be something like dalloc(amount, destructor) which would invoke destructor on free. Main motivation is memory pool that can be returned by free. With that extension it would be possible to mark pointer so its free would be a nop. > > > The size above which the malloc->stack transformation is not applied > should depend on a parameter, I don't know if it should get its own > or depend on an existing one. In any case, I'd like to avoid ending > up with a ridiculously low threshold (my programs use GMP, which > internally uses alloca up to 65536 bytes (even in recursive > functions that have a dozen allocations), so I don't want gcc to > tell me that 50 bytes are too much). > > A program with a double-free may, with this patch, end up crashing > on the first free instead of the second, but that's an invalid > program anyway. > > #include __thread void *__stack_from; __thread void *__stack_cur; __thread void *__stack_to; #define STACK_ALLOC(size) ({ \ void *__stack_new = __stack_cur + size; \ if (__stack_new < __stack_cur || __stack_to > __stack_new) \ __stack_alloc (size); \ else \ { \ void *__s = __stack_cur; \ __stack_cur = __stack_new; \ __s; \ } \ }) #define STACK_FREE(__stack_new) ({ \ if (__stack_new < __stack_from || __stack_to > __stack_new) \ __stack_free (size); \ else \ __stack_cur = __stack_new; \ }) static pthread_key_t key; void __stack_destroy (void *x) { free (stack_from); } void * __stack_alloc (size_t size) { if (!__stack_from) { __stack_from = malloc (1 << 18); __stack_to = __stack_from + (1 << 18); __stack_cur = __stack_from; _ pthread_key_create (&key, destroy); pthread_setspecific (key, &key); } return malloc (size); } void __stack_free (void *p) { free (p); }
[patch] Fix PR ada/35998
Hi, this is an old bug report from Jan, which was closed, then reopened by Tom at some point, but the patch never got reviewed. The original submission is at: http://gcc.gnu.org/ml/gcc-patches/2008-05/msg01857.html Tested on x86_64-suse-linux, OK for the mainline? 2013-11-11 Jan Kratochvil PR ada/35998 * dwarf2out.c (add_byte_size_attribute): Omit attribute for size -1. -- Eric BotcazouIndex: dwarf2out.c === --- dwarf2out.c (revision 20) +++ dwarf2out.c (working copy) @@ -16355,9 +16355,10 @@ add_byte_size_attribute (dw_die_ref die, /* Note that `size' might be -1 when we get to this point. If it is, that indicates that the byte size of the entity in question is variable. We - have no good way of expressing this fact in Dwarf at the present time, - so just let the -1 pass on through. */ - add_AT_unsigned (die, DW_AT_byte_size, size); + have no good way of expressing this fact in Dwarf at the present time + when location description was not used by the caller code instead. */ + if (size != (unsigned) -1) +add_AT_unsigned (die, DW_AT_byte_size, size); } /* For a FIELD_DECL node which represents a bit-field, output an attribute
Re: [RFC] replace malloc with a decl on the stack
On Mon, Nov 11, 2013 at 11:08:14AM +0100, Ondřej Bílka wrote: > On Sun, Nov 10, 2013 at 04:27:00PM +0100, Marc Glisse wrote: > > I am posting this patch to get some feedback on the approach. The > > goal is to replace malloc+free with a stack allocation (a decl > > actually) when the size is a small constant. > > > Why constraint yourself to small sizes. Stack allocation benefits is > speed and less memory comsumption due lack of fragmentation. Because you can hardly predict what the program will have as stack requirements in functions you call? In leaf functions sure, you only care not to create too large allocations that would go over the stack size, but if you call other functions, you usually can't know (at least without sufficient IPA analysis, but that is really hard because it is too early) how much stack will it really need (both fixed requirements for non-VLA vars on the stack, spill space, function call arguments and other overhead and VLAs and also these malloc turned into stack allocation). So, if you say have a malloc with corresponding free shortly afterwards but some call in between and decide that it is fine to change it into stack allocation when it is half the size of the remaining stack space, but then two frames down there will be some non-VLA var that needs 3/4 of the old remaining stack space, you've turned a correct program into a broken one. Jakub
Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals
Hello, As it appeared that concerns about the speed of location_get_source_line were as present as the need of just fixing this bug, I have conflated the two concerns in a new attempt below, trying to address the points you guys have raised during the previous reviews. The patch below introduces a cache for the data read from the file we want to emit caret diagnostic for. In that cache it stashes the bytes read from the file as well as a number of positions of line delimiters that we encountered while reading the file. It keeps a number of the last file caches in memory in case location_get_source_line is later asked for lines from the same file. To avoid exploding the memory consumption, the number line delimiter position saved is fixed (100). So if a file is smaller than 100 lines all of its line positions can be saved. That is, if location_get_source_line is first asked to return line 20, all the position of the lines encountered since the beginning of the file -- up to line 20 -- are going to be saved in the cache. Next time, if location_get_source_line is asked to return line 10, as the position of the beginning/end of line 10 is saved in the cache, returning that line is fast. If it's asked to return line 25, it will have to start reading from line 20, not from the beginning of the file. If the file is bigger than 100, then the patch just saves 100 line positions. To evenly spread the line position saved, it needs to know the total number lines of the file. Luckily we can usually get this information from the line map subsystem (from libcpp). The patch thus adds a new entry point in the line map (linemap_get_file_highest_location) that gives the greatest source_location seen for a given file and uses that to decide what line position to save in the cache. The speed gain I have seen is variable, depending on the size (in number of quasi adjacent lines) of the diagnostics, but on some pathological cases I have seen, it can divide the time spend displaying the diagnostics by two ore more. I had to add hackery in the code to measure this, unfortunately :-( The patch doesn't try to reuse the same infrastructure for gcov for now. I am letting that for later now when I have more time. Bootstrapped on x86_64-unknown-linux-gnu against trunk. PS: To ease the review (especially for Tom Tromey who I am CC-ing because of the new entry point in the line map sub-system) I am attaching the cover letter of the patch itself that does the analysis of the initial bug. Sorry to the other addressees of this message for the redundancy. Thanks. >8<--- In this problem report, the compiler is fed a (bogus) translation unit in which some literals contain bytes whose value is zero. The preprocessor detects that and proceeds to emit diagnostics for that king of bogus literals. But then when the diagnostics machinery re-reads the input file again to display the bogus literals with a caret, it attempts to calculate the length of each of the lines it got using fgets. The line length calculation is done using strlen. But that doesn't work well when the content of the line can have several zero bytes. The result is that the read_line never sees the end of the line because strlen repeatedly reports that the line ends before the end-of-line character; so read_line thinks its buffer for reading the line is too small; it thus increases the buffer, leading to a huge memory consumption and disaster. Here is what this patch does. location_get_source_line is modified to return the length of a source line that can now contain bytes with zero value. diagnostic_show_locus() is then modified to consider that a line can have characters of value zero, and so just shows a white space when instructed to display one of these characters. Additionally location_get_source_line is modified to avoid re-reading each and every line from the beginning of the file until it reaches the line number N that it is instructed to get; this was leading to annoying quadratic behaviour when reading adjacent lines near the end of (big) files. So a cache is now associated to the file opened in text mode. When the content of the file is read, that content is stashed in the file cache. That file cache is searched for line delimiters. A number of line positions are saved in the cache and a number of file caches are kept in memory. That way when location_get_source_line is asked to read line N + 1, it just has to start reading from line N that it has already read. >8<--- And now the real patch. libcpp/ChangeLog: * include/line-map.h (linemap_get_file_highest_location): Declare new function. * line-map.c (linemap_get_file_highest_location): Define it. gcc/ChangeLog: * input.h (location_get_source_line): Take an additional line_size parameter. (void diagno
Committed: config/arc/arc.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Define.
2013-11-11 Joern Rennecke * config/arc/arc.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Define. Index: config/arc/arc.h === --- config/arc/arc.h(revision 204665) +++ config/arc/arc.h(working copy) @@ -1087,6 +1087,22 @@ #define MEMORY_MOVE_COST(MODE,CLASS,IN) expensive than reg->reg moves. */ #define BRANCH_COST(speed_p, predictable_p) 2 +/* Scc sets the destination to 1 and then conditionally zeroes it. + Best case, ORed SCCs can be made into clear - condset - condset. + But it could also end up as five insns. So say it costs four on + average. + These extra instructions - and the second comparison - will also be + an extra cost if the first comparison would have been decisive. + So get an average saving, with a probability of the first branch + beging decisive of p0, we want: + p0 * (branch_cost - 4) > (1 - p0) * 5 + ??? We don't get to see that probability to evaluate, so we can + only wildly guess that it might be 50%. + ??? The compiler also lacks the notion of branch predictability. */ +#define LOGICAL_OP_NON_SHORT_CIRCUIT \ + (BRANCH_COST (optimize_function_for_speed_p (cfun), \ + false) > 9) + /* Nonzero if access to memory by bytes is slow and undesirable. For RISC chips, it means that access to memory by bytes is no better than access by words when possible, so grab a whole word
Re: RFA: Fix PR middle-end/59049
On Fri, Nov 8, 2013 at 5:02 PM, Jeff Law wrote: > On 11/08/13 07:45, Steven Bosscher wrote: >> >> On Fri, Nov 8, 2013 at 3:40 PM, Joern Rennecke >> wrote: >>> >>> bootstrapped / regtested on i686-pc-linux-gnu. >> >> >> Not a very elaborate description of the patch, eh? :-) >> >> This is IMHO not OK without at least an explanation of why the >> comparison of two const_ints is not folded. Better yet would be to fix >> that underlying problem. We should not present such non-sense to the >> RTL parts of the middle end. > Agreed. In this case it's fold-all-builtins folding a strlen call with a PHI <"foo", "bar"> argument. IMHO not presenting RTL with such non-sense is best achieved by not letting TER do constant propagation (because it doesn't "fold" the result). We can never rule out such stray non-propagated constants, so that makes expand more robust (and hopes for RTL CCP). Index: gcc/tree-ssa-ter.c === --- gcc/tree-ssa-ter.c (revision 204664) +++ gcc/tree-ssa-ter.c (working copy) @@ -438,6 +439,12 @@ ter_is_replaceable_p (gimple stmt) && !is_gimple_val (gimple_assign_rhs1 (stmt))) return false; + /* Do not propagate "modeless" constants - we may end up confusing the RTL +expanders. Leave the optimization to RTL CCP. */ + if (gimple_assign_single_p (stmt) + && CONSTANT_CLASS_P (gimple_assign_rhs1 (stmt))) + return false; + return true; } return false; does that make sense? I'll test it then. Thanks, Richard. > jeff >
Re: [RFC] replace malloc with a decl on the stack
On Mon, Nov 11, 2013 at 11:19:05AM +0100, Jakub Jelinek wrote: > On Mon, Nov 11, 2013 at 11:08:14AM +0100, Ondřej Bílka wrote: > > On Sun, Nov 10, 2013 at 04:27:00PM +0100, Marc Glisse wrote: > > > I am posting this patch to get some feedback on the approach. The > > > goal is to replace malloc+free with a stack allocation (a decl > > > actually) when the size is a small constant. > > > > > Why constraint yourself to small sizes. Stack allocation benefits is > > speed and less memory comsumption due lack of fragmentation. > > Because you can hardly predict what the program will have as stack > requirements in functions you call? In leaf functions sure, you only care > not to create too large allocations that would go over the stack size, > but if you call other functions, you usually can't know (at least without > sufficient IPA analysis, but that is really hard because it is too early) Which is completely irrelevant. Checking it in run time is easy as in my prototype which uses a alternative stack so added stack space does not coun. > how much stack will it really need (both fixed requirements for non-VLA > vars on the stack, spill space, function call arguments and other overhead > and VLAs and also these malloc turned into stack allocation). This is argument to turn a alloca and VLA to ones that check if there is enough stack and get space by other means. This can be with same logic as described above. > So, if you say have a malloc with corresponding free shortly afterwards > but some call in between and decide that it is fine to change it into stack > allocation when it is half the size of the remaining stack space, but then Of total stack space, also if we use main stack user should enlarge stacks accordingly. > two frames down there will be some non-VLA var that needs 3/4 of the old > remaining stack space, you've turned a correct program into a broken one. > > Jakub
[patch] Fix debug info for modified parameter
Hi, since the switch to SSA form at -O0, the compiler generates wrong debug info for something like: void foo (int i) { int j = 0; i = 1; j = j + 1; /* BREAK */ } If you try to display the value of i after breaking in GDB, you don't get 1. The reason is that there is no default def SSA_NAME for i in this case, so no partitions get the RTL location of the parameter. Tentative patch attached, it's admittedly a little heavy, but I don't see any other solution. Tested on x86_64-suse-linux, OK for the mainline? 2013-11-11 Eric Botcazou * tree-outof-ssa.c (remove_ssa_form): For a parameter without default def, pretend that the single partition that contains all the SSA_NAMEs for this parameter, if it exists, also contains the default def. 2013-11-11 Eric Botcazou * gcc.dg/guality/param-4.c: New test. -- Eric BotcazouIndex: tree-outof-ssa.c === --- tree-outof-ssa.c (revision 20) +++ tree-outof-ssa.c (working copy) @@ -972,7 +972,10 @@ expand_phi_nodes (struct ssaexpand *sa) static void remove_ssa_form (bool perform_ter, struct ssaexpand *sa) { + tree fndecl = current_function_decl, arg; bitmap values = NULL; + bitmap parameter_has_default_def; + struct pointer_map_t *parameter_map; var_map map; unsigned i; @@ -999,17 +1002,85 @@ remove_ssa_form (bool perform_ter, struc sa->map = map; sa->values = values; + + /* Find out which partitions contain a default def. If, for a parameter, no + partitions contain a default def for this parameter, then we pretend that + the single partition that contains all the SSA_NAMEs for this parameter + also contains the (ghost) default def, if such a partition exists. This + makes it possible to have correct debug info, without optimization, for + parameters written to before being read. */ sa->partition_has_default_def = BITMAP_ALLOC (NULL); + parameter_has_default_def = BITMAP_ALLOC (NULL); + parameter_map = pointer_map_create (); + for (i = 1; i < num_ssa_names; i++) { - tree t = ssa_name (i); - if (t && SSA_NAME_IS_DEFAULT_DEF (t)) + tree t, var; + + t = ssa_name (i); + if (t == NULL_TREE) + continue; + + var = SSA_NAME_VAR (t); + + /* If this is a default def in a partition, set the appropriate flag for + the partition and, if this is a SSA_NAME for a parameter, record that + there is a default def for the parameter. */ + if (SSA_NAME_IS_DEFAULT_DEF (t)) + { + int p = var_to_partition (map, t); + if (p != NO_PARTITION) + { + bitmap_set_bit (sa->partition_has_default_def, p); + if (var && TREE_CODE (var) == PARM_DECL) + bitmap_set_bit (parameter_has_default_def, DECL_UID (var)); + } + } + + /* Otherwise, if this is a SSA_NAME for a parameter without default def, + at least yet, and in a partition, record the partition as associated + with the parameter, unless there is already a different one recorded, + in which case record NO_PARTITION. */ + else if (var && TREE_CODE (var) == PARM_DECL + && !bitmap_bit_p (parameter_has_default_def, DECL_UID (var))) { int p = var_to_partition (map, t); if (p != NO_PARTITION) - bitmap_set_bit (sa->partition_has_default_def, p); + { + void **slot = pointer_map_contains (parameter_map, var); + if (slot) + { + int oldp = (int) (intptr_t) *slot; + if (oldp != p) + *slot = (void *) (intptr_t) NO_PARTITION; + } + else + { + slot = pointer_map_insert (parameter_map, var); + *slot = (void *) (intptr_t) p; + } + } } } + + /* Finally walk the scalar parameters without default def and mark that + the partition that contains all the SSA_NAMEs for a parameter, if it + exists, contains the (ghost) default def. */ + for (arg = DECL_ARGUMENTS (fndecl); arg; arg = DECL_CHAIN (arg)) +if (is_gimple_reg_type (TREE_TYPE (arg)) + && !bitmap_bit_p (parameter_has_default_def, DECL_UID (arg))) + { + void **slot = pointer_map_contains (parameter_map, arg); + if (slot) + { + int p = (int) (intptr_t) *slot; + if (p != NO_PARTITION) + bitmap_set_bit (sa->partition_has_default_def, p); + } + } + + pointer_map_destroy (parameter_map); + BITMAP_FREE (parameter_has_default_def); } /* { dg-do run } */ /* { dg-options "-g" } */ /* { dg-skip-if "" { *-*-* } { "*" } { "-O0" } } */ void foo (int i) { int j = 0; i = 1; j = j + 1; /* BREAK */ } int main (void) { foo (0); return 0; } /* { dg-final { gdb-test 10 "i" "1" } } */
[patch] Add CONSTRUCTOR_NO_CLEARING flag
Hi, in Ada 2012 it is allowed to omit components of aggregates (the equivalent of initializers/constructors); in this case, the contents of the corresponding fields in the record become undefined. Now the gimplifier implements the C semantics of clearing the missing components, so this patch introduces a new flag on constructors to modify that. Tested on x86_64-suse-linux, OK for the mainline? 2013-11-11 Tristan Gingold Eric Botcazou * tree.h (CONSTRUCTOR_NO_CLEARING): Define. * tree-core.h (CONSTRUCTOR_NO_CLEARING): Document it. * tree.def (CONSTRUCTOR): Likewise. * gimplify.c (gimplify_init_constructor): Do not clear the object when the constructor is incomplete and CONSTRUCTOR_NO_CLEARING is set. ada/ * gcc-interface/utils2.c (gnat_build_constructor): Also set the flag CONSTRUCTOR_NO_CLEARING on the constructor. -- Eric BotcazouIndex: tree-core.h === --- tree-core.h (revision 20) +++ tree-core.h (working copy) @@ -823,6 +823,9 @@ struct GTY(()) tree_base { VAR_DECL, FUNCTION_DECL IDENTIFIER_NODE + CONSTRUCTOR_NO_CLEARING in + CONSTRUCTOR + ASM_VOLATILE_P in ASM_EXPR Index: tree.h === --- tree.h (revision 20) +++ tree.h (working copy) @@ -957,6 +957,8 @@ extern void omp_clause_range_check_faile (&(*CONSTRUCTOR_ELTS (NODE))[IDX]) #define CONSTRUCTOR_NELTS(NODE) \ (vec_safe_length (CONSTRUCTOR_ELTS (NODE))) +#define CONSTRUCTOR_NO_CLEARING(NODE) \ + (CONSTRUCTOR_CHECK (NODE)->base.public_flag) /* Iterate through the vector V of CONSTRUCTOR_ELT elements, yielding the value of each element (stored within VAL). IX must be a scratch variable Index: tree.def === --- tree.def (revision 20) +++ tree.def (working copy) @@ -458,7 +458,10 @@ DEFTREECODE (OBJ_TYPE_REF, "obj_type_ref value in a SAVE_EXPR if you want to evaluate side effects only once.) For RECORD_TYPE, UNION_TYPE, or QUAL_UNION_TYPE: - The field INDEX of each node is a FIELD_DECL. */ + The field INDEX of each node is a FIELD_DECL. + Components that aren't present are cleared as per the C semantics, + unless the CONSTRUCTOR_NO_CLEARING flag is set, in which case they + become undefined. */ DEFTREECODE (CONSTRUCTOR, "constructor", tcc_exceptional, 0) /* The expression types are mostly straightforward, with the fourth argument Index: gimplify.c === --- gimplify.c (revision 20) +++ gimplify.c (working copy) @@ -4052,7 +4052,7 @@ gimplify_init_constructor (tree *expr_p, objects. Initializers for such objects must explicitly set every field that needs to be set. */ cleared = false; - else if (!complete_p) + else if (!complete_p && !CONSTRUCTOR_NO_CLEARING (ctor)) /* If the constructor isn't complete, clear the whole object beforehand. Index: ada/gcc-interface/utils2.c === --- ada/gcc-interface/utils2.c (revision 20) +++ ada/gcc-interface/utils2.c (working copy) @@ -1874,6 +1874,7 @@ gnat_build_constructor (tree type, vecqsort (compare_elmt_bitpos); result = build_constructor (type, v); + CONSTRUCTOR_NO_CLEARING (result) = 1; TREE_CONSTANT (result) = TREE_STATIC (result) = allconstant; TREE_SIDE_EFFECTS (result) = side_effects; TREE_READONLY (result) = TYPE_READONLY (type) || allconstant;
[ARM] Use standard t-elf libgcc fragment on VxWorks
Hi, this is something I forgot to submit right after submitting http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01906.html We want to use the standard t-elf fragment on VxWorks as well. Tested on ARM/VxWorks, OK for the mainline? 2013-11-11 Eric Botcazou * config.host (arm-wrs-vxworks): Replace arm/t-vxworks with arm/t-elf in tmake_file. * config/arm/t-vxworks: Delete. -- Eric BotcazouIndex: config.host === --- config.host (revision 20) +++ config.host (working copy) @@ -333,7 +333,7 @@ arc*-*-linux-uclibc*) extra_parts="crti.o crtn.o crtend.o crtbegin.o crtendS.o crtbeginS.o libgmon.a crtg.o crtgend.o" ;; arm-wrs-vxworks) - tmake_file="$tmake_file arm/t-arm arm/t-vxworks t-softfp-sfdf t-softfp-excl arm/t-softfp t-softfp" + tmake_file="$tmake_file arm/t-arm arm/t-elf t-softfp-sfdf t-softfp-excl arm/t-softfp t-softfp" extra_parts="$extra_parts crti.o crtn.o" ;; arm*-*-netbsdelf*) Index: config/arm/t-vxworks === --- config/arm/t-vxworks (revision 20) +++ config/arm/t-vxworks (working copy) @@ -1 +0,0 @@ -LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func _call_via_rX _interwork_call_via_rX _clzsi2 _clzdi2 _ctzsi2
Re: RFA: Fix PR middle-end/59049
> In this case it's fold-all-builtins folding a strlen call with a > PHI <"foo", "bar"> argument. IMHO not presenting RTL with such > non-sense is best achieved by not letting TER do constant propagation > (because it doesn't "fold" the result). We can never rule out such > stray non-propagated constants, so that makes expand more robust > (and hopes for RTL CCP). > > Index: gcc/tree-ssa-ter.c > === > --- gcc/tree-ssa-ter.c (revision 204664) > +++ gcc/tree-ssa-ter.c (working copy) > @@ -438,6 +439,12 @@ ter_is_replaceable_p (gimple stmt) > && !is_gimple_val (gimple_assign_rhs1 (stmt))) > return false; > > + /* Do not propagate "modeless" constants - we may end up > confusing the RTL > +expanders. Leave the optimization to RTL CCP. */ > + if (gimple_assign_single_p (stmt) > + && CONSTANT_CLASS_P (gimple_assign_rhs1 (stmt))) > + return false; > + >return true; > } >return false; > > does that make sense? I'll test it then. I agree with Joern that we want more constant propagation/folding during RTL expansion, not less, so IMO that's the wrong direction. -- Eric Botcazou
Re: RFA: Fix PR middle-end/59049
On Mon, Nov 11, 2013 at 12:21 PM, Eric Botcazou wrote: >> In this case it's fold-all-builtins folding a strlen call with a >> PHI <"foo", "bar"> argument. IMHO not presenting RTL with such >> non-sense is best achieved by not letting TER do constant propagation >> (because it doesn't "fold" the result). We can never rule out such >> stray non-propagated constants, so that makes expand more robust >> (and hopes for RTL CCP). >> >> Index: gcc/tree-ssa-ter.c >> === >> --- gcc/tree-ssa-ter.c (revision 204664) >> +++ gcc/tree-ssa-ter.c (working copy) >> @@ -438,6 +439,12 @@ ter_is_replaceable_p (gimple stmt) >> && !is_gimple_val (gimple_assign_rhs1 (stmt))) >> return false; >> >> + /* Do not propagate "modeless" constants - we may end up >> confusing the RTL >> +expanders. Leave the optimization to RTL CCP. */ >> + if (gimple_assign_single_p (stmt) >> + && CONSTANT_CLASS_P (gimple_assign_rhs1 (stmt))) >> + return false; >> + >>return true; >> } >>return false; >> >> does that make sense? I'll test it then. > > I agree with Joern that we want more constant propagation/folding during > RTL expansion, not less, so IMO that's the wrong direction. The question is whether you for example want to handle a_2 = 1 + 0; at RTL expansion time? I'd say it's better to have b_1 = 0; a_2 = 1 + b_1; not to say the proposed patch would be a way to ensure the first didn't happen - it just makes it less likely that TER gets you into this. OTOH as TER is now "explicit" (the expander has to lookup SSA defs) those uses should better deal with constants they receive from that. Richard. > -- > Eric Botcazou
Re: [PATCH 2/3] libstdc++-v3: ::tmpnam depends on uClibc SUSV4_LEGACY
On 8 November 2013 10:29, Bernhard Reutner-Fischer wrote: >> On 04/11/2013 02:04 PM, Bernhard Reutner-Fischer wrote: >>> >>> I would have expected that somebody would tell me that omitting ::tmpnam >>> violates 27.9.2 from the spec but noone yelled at me yet? std::tmpnam, like std::gets, should be killed with fire. If a target C library doesn't provide it then I have no problem is libstdc++ doesn't provide it either. > Attaching an updated patch that i was using since March (without > regressions) which takes Rainer's comments about _GLIBCXX_USE_TMPNAM > into account. > Ok for trunk? Thanks for following this up. I'm curious why you use tmpnam("NULL") rather than tmpnam(NULL) or tmpnam("something")? Using the string literal "NULL" is a bit confusing (although not a problem.) How does __UCLIBC_SUSV4_LEGACY__ get defined? We'd have a problem if users defined that at configure time but not later when using the library.
Re: [ARM] Use standard t-elf libgcc fragment on VxWorks
On 11/11/13 11:11, Eric Botcazou wrote: Hi, this is something I forgot to submit right after submitting http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01906.html We want to use the standard t-elf fragment on VxWorks as well. Tested on ARM/VxWorks, OK for the mainline? Ok, please apply. regards Ramana
Re: RFA: Fix PR middle-end/59049
On Mon, Nov 11, 2013 at 12:26:09PM +0100, Richard Biener wrote: > The question is whether you for example want to handle > > a_2 = 1 + 0; > > at RTL expansion time? I'd say it's better to have I think we already handle that just fine, there are tons of various simplify_gen_* calls during expansion, and we know there the mode etc. Just Joern hit a place which wasn't prepared to handle it properly, so either we handle it as you are suggesting by forcing one of the constants into a register, or we simplify the comparison and if it simplifies into a constant, we transform it to something simpler. Jakub
Re: [PATCH] Enhance ifcombine to recover non short circuit branches
On Fri, Nov 8, 2013 at 6:41 PM, Steven Bosscher wrote: > On Fri, Nov 8, 2013 at 6:20 PM, Steven Bosscher wrote: >> On Wed, Oct 30, 2013 at 5:03 AM, Andrew Pinski wrote: >>> Here is what I applied in the end; Jeff told me just to remove the >>> testcase. I added the comment trying to explain why it was the >>> opposite order of PHI-opt. >>> >>> Thanks, >>> Andrew Pinski >>> >>> ChangeLog: >>> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. >> >> Eh, why??? >> >> The file has this comment: >> >> 25/* rtl is needed only because arm back-end requires it for >> 26 BRANCH_COST. */ >> 27#include "rtl.h" >> 28#include "tm_p.h" >> >> Can you please clarify why this is not something to be fixed in the >> ARM back end? > > > Could be fixed like attached. Can you please have a look and > foster-parent it if you like it? Looks good to me - Richard, can you pick it up? Thanks, Richard. > Thanks, > > Ciao! > Steven
Re: [RFA][PATCH] Minor fix to aliasing machinery
On Sat, Nov 9, 2013 at 12:18 AM, Marc Glisse wrote: > On Wed, 6 Nov 2013, Richard Biener wrote: > >> So the only thing that remains is the mem_ref_offset thing and yes, I >> guess >> I'd prefer to use double-ints because we deal with bit offsets in the end. > > > Here it is (bootstrap+testsuite on x86_64-unknown-linux-gnu). It feels > rather artificial to do this small bit of computation with double_int > when so much else assumes HWI is enough, but why not... Ok. Thanks, Richard. > 2013-11-09 Marc Glisse > > Jeff Law > > gcc/ > * tree-ssa-alias.c (stmt_kills_ref_p_1): Use > ao_ref_init_from_ptr_and_size for builtins. > > gcc/testsuite/ > * gcc.dg/tree-ssa/alias-27.c: New testcase. > > -- > Marc Glisse > > Index: gcc/testsuite/gcc.dg/tree-ssa/alias-27.c > === > --- gcc/testsuite/gcc.dg/tree-ssa/alias-27.c(revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/alias-27.c(working copy) > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O1 -fdump-tree-optimized" } */ > + > +void f (long *p) { > + *p = 42; > + p[4] = 42; > + __builtin_memset (p, 0, 100); > +} > + > +/* { dg-final { scan-tree-dump-not "= 42" "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > > Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-27.c > ___ > Added: svn:keywords > ## -0,0 +1 ## > +Author Date Id Revision URL > \ No newline at end of property > Added: svn:eol-style > ## -0,0 +1 ## > +native > \ No newline at end of property > Index: gcc/tree-ssa-alias.c > === > --- gcc/tree-ssa-alias.c(revision 204608) > +++ gcc/tree-ssa-alias.c(working copy) > @@ -2001,23 +2001,24 @@ stmt_may_clobber_ref_p (gimple stmt, tre >return stmt_may_clobber_ref_p_1 (stmt, &r); > } > > /* If STMT kills the memory reference REF return true, otherwise > return false. */ > > static bool > stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref) > { >/* For a must-alias check we need to be able to constrain > - the access properly. */ > - ao_ref_base (ref); > - if (ref->max_size == -1) > + the access properly. > + FIXME: except for BUILTIN_FREE. */ > + if (!ao_ref_base (ref) > + || ref->max_size == -1) > return false; > >if (gimple_has_lhs (stmt) >&& TREE_CODE (gimple_get_lhs (stmt)) != SSA_NAME >/* The assignment is not necessarily carried out if it can throw > and we can catch it in the current function where we could inspect > the previous value. > ??? We only need to care about the RHS throwing. For aggregate > assignments or similar calls and non-call exceptions the LHS > might throw as well. */ > @@ -2090,37 +2091,47 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref > case BUILT_IN_MEMPCPY: > case BUILT_IN_MEMMOVE: > case BUILT_IN_MEMSET: > case BUILT_IN_MEMCPY_CHK: > case BUILT_IN_MEMPCPY_CHK: > case BUILT_IN_MEMMOVE_CHK: > case BUILT_IN_MEMSET_CHK: > { > tree dest = gimple_call_arg (stmt, 0); > tree len = gimple_call_arg (stmt, 2); > - tree base = NULL_TREE; > - HOST_WIDE_INT offset = 0; > if (!host_integerp (len, 0)) > return false; > - if (TREE_CODE (dest) == ADDR_EXPR) > - base = get_addr_base_and_unit_offset (TREE_OPERAND (dest, > 0), > - &offset); > - else if (TREE_CODE (dest) == SSA_NAME) > - base = dest; > - if (base > - && base == ao_ref_base (ref)) > + tree rbase = ref->base; > + double_int roffset = double_int::from_shwi (ref->offset); > + ao_ref dref; > + ao_ref_init_from_ptr_and_size (&dref, dest, len); > + tree base = ao_ref_base (&dref); > + double_int offset = double_int::from_shwi (dref.offset); > + double_int bpu = double_int::from_uhwi (BITS_PER_UNIT); > + if (!base || dref.size == -1) > + return false; > + if (TREE_CODE (base) == MEM_REF) > + { > + if (TREE_CODE (rbase) != MEM_REF) > + return false; > + // Compare pointers. > + offset += bpu * mem_ref_offset (base); > + roffset += bpu * mem_ref_offset (rbase); > + base = TREE_OPERAND (base, 0); > + rbase = TREE_OPERAND (rbase, 0); > + } > + if (base == rbase) > { > - HOST_WIDE_INT size = TREE_INT_CST_LOW (len); > - if (offset <= ref->offset / BITS_PER_UNIT > - && (
Re: Some wide-int review comments
On Sat, Nov 9, 2013 at 3:23 PM, Kenneth Zadeck wrote: > On 11/08/2013 05:30 AM, Richard Sandiford wrote: >> From tree-vrp.c: >> >> @@ -1893,6 +1884,10 @@ vrp_int_const_binop (enum tree_code code >> /* If the singed operation wraps then int_const_binop has done >> everything we want. */ >> ; >> + /* Signed division of -1/0 overflows and by the time it gets here >> + returns NULL_TREE. */ >> + else if (!res) >> +return NULL_TREE; >> else if ((TREE_OVERFLOW (res) >> && !TREE_OVERFLOW (val1) >> && !TREE_OVERFLOW (val2)) >> >> Why is this case different from trunk? Or is it a bug-fix independent >> of wide-int? > > the api for division is different for wide-int than it was for double-int. But this is using a tree API (int_const_binop) that didn't change (it returned NULL for / 0 previously). So what makes us arrive here now? (I agree there is a bug in VRP, but it shouldn't manifest itself only on wide-int) Richard. >> >> Thanks, >> Richard > >
Re: some prep work to make JUMP_TABLE_DATA a non-active_insn_p object
On Sun, Nov 10, 2013 at 6:11 PM, Eric Botcazou wrote: >> This is the first patch of what I think will be four to fix those few >> places. >> >> Bootstrapped&tested on powerpc64-unknown-linux-gnu. Also built SH to be >> sure. >> >> OK for trunk? > > The generic part is OK (modulo the additional space after ! in the 3rd hunk of > the haifa-sched.c patch). The rest is ok as well. Richard. > -- > Eric Botcazou
Re: [PATCH] make has_gate and has_execute useless
On Mon, Nov 11, 2013 at 1:39 AM, Trevor Saunders wrote: > On Fri, Nov 08, 2013 at 10:37:00AM +0100, Richard Biener wrote: >> On Thu, Nov 7, 2013 at 5:00 PM, wrote: >> > From: Trevor Saunders >> > >> > Hi, >> > >> > This is the result of seeing what it would take to get rid of the >> > has_gate and >> > has_execute flags on pass_data. It turns out not much, but I wanted >> > confirmation this part is ok before I go deal with all the places that >> > initialize the fields. >> > >> > I bootstrapped this part on x86_64-unknown-linux-gnu and saw no new test >> > suite >> > regressions (ignoring the silk stuff because the full paths in its test >> > names >> > break my test script for now) Any reason this patch with the actual >> > removal of the flags wouldn't be ok? >> >> The has_gate flag is easy to remove (without a TODO_ hack), right? >> No gate simply means that the gate returns always true. The only >> weird thing is >> >> /* If we have a gate, combine the properties that we could have with >> and without the pass being examined. */ >> if (pass->has_gate) >> properties &= new_properties; >> else >> properties = new_properties; >> >> which I don't understand (and you just removed all properties handling >> there). >> >> So can you split out removing has_gate? This part is obviously ok. >> >> Then, for ->execute () I'd have refactored the code to make >> ->sub passes explicitely executed by the default ->execute () >> implementation only. That is, passes without overriding execute >> are containers only. Can you quickly check whether that would >> work out? > > Ok, I've now given this a shot and wasn't terribly successful, if I just > change execute_pass_list and execute_ipa_pass_list to handle container > passes executing their sub passes I get the following ICE > > ./gt-passes.h:77:2: internal compiler error: Segmentation fault > }; >^ >0xd43d96 crash_signal >/src/gcc/gcc/toplev.c:334 >0xd901a9 ssa_default_def(function*, tree_node*) >/src/gcc/gcc/tree-dfa.c:318 >0xb56d77 ipa_analyze_params_uses >/src/gcc/gcc/ipa-prop.c:2094 >0xb57275 ipa_analyze_node(cgraph_node*) >/src/gcc/gcc/ipa-prop.c:2179 >0x13e2b6d ipcp_generate_summary >/src/gcc/gcc/ipa-cp.c:3615 >0xc55a2a > > execute_ipa_summary_passes(ipa_opt_pass_d*) >/src/gcc/gcc/passes.c:1991 >0x943341 ipa_passes > > /src/gcc/gcc/cgraphunit.c:2011 >0x943675 >compile() > > /src/gcc/gcc/cgraphunit.c:2118 > >now > Which is because fn->gimple_df is null. I expect that's because pass > ordering changed somehow, but I'm not sure off hand how or ifthat's > something worth figuring out right now. Yeah, some of the walking doesn't seem to work ... probably a pass with sub-passes already has an execute member function ;) > Another issue I realized is that doing this will change the order of > plugin events from > start container pass a > end container pass a > start contained pass b > end contained pass b > ... > > to > > start container pass a > start contained pass b > end contained pass b > end container pass a > > Arguably that's an improvement, but I'm not sure if changing that APi > like that is acceptable. Indeed an improvement. Can you attach your patch? Thanks, Richard. > Trev > >> >> Thanks, >> Richard. >> >> > Trev >> > >> > 2013-11-06 Trevor Saunders >> > >> > * pass_manager.h (pass_manager): Adjust. >> > * passes.c (opt_pass::execute): Tell the pass manager it doesn't >> > need >> > to do anything for this pass. >> > (pass_manager::register_dump_files_1): Don't uselessly deal with >> > properties of passes. >> > (pass_manager::register_dump_files): Adjust. >> > (dump_one_pass): Just call pass->gate (). >> > (execute_ipa_summary_passes): Likewise. >> > (execute_one_pass): Don't check pass->has_execute flag. >> > (ipa_write_summaries_2): Don't check pass->has_gate flag. >> > (ipa_write_optimization_summaries_1): Likewise. >> > (ipa_read_summaries_1): Likewise. >> > (ipa_read_optimization_summaries_1): Likewise. >> > (execute_ipa_stmt_fixups): Likewise. >> > * tree-pass.h (pass_data): Rename has_gate to useless_has_ga
Re: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2
On Thu, Oct 31, 2013 at 1:46 AM, Sandra Loosemore wrote: > On 10/29/2013 02:51 AM, Bernd Edlinger wrote: >> >> >> On Mon, 28 Oct 2013 21:29:24, Sandra Loosemore wrote: >>> >>> On 10/28/2013 03:20 AM, Bernd Edlinger wrote: I have attached an update to your patch, that should a) fix the recursion problem. b) restrict the -fstrict-volatile-bitfields to not violate the C++ memory model. > > > Here's a new version of the update patch. > > >>> Alternatively, if strict_volatile_bitfield_p returns false but >>> flag_strict_volatile_bitfields> 0, then always force to word_mode and >>> change the -fstrict-volatile-bitfields documentation to indicate that's >>> the fallback if the insertion/extraction cannot be done in the declared >>> mode, rather than claiming that it tries to do the same thing as if >>> -fstrict-volatile-bitfields were not enabled at all. > > > I decided that this approach was more expedient, after all. > > I've tested this patch (in conjunction with my already-approved but > not-yet-applied patch) on mainline for arm-none-eabi, x86_64-linux-gnu, and > mips-linux gnu. I also backported the entire series to GCC 4.8 and tested > there on arm-none-eabi and x86_64-linux-gnu. OK to apply? Hm, I can't seem to find the context for @@ -923,6 +935,14 @@ store_fixed_bit_field (str_rtx, bitsize, bitnum, 0, 0, value); return; } + else if (MEM_P (str_rtx) + && MEM_VOLATILE_P (str_rtx) + && flag_strict_volatile_bitfields > 0) +/* This is a case where -fstrict-volatile-bitfields doesn't apply + because we can't do a single access in the declared mode of the field. + Since the incoming STR_RTX has already been adjusted to that mode, + fall back to word mode for subsequent logic. */ +str_rtx = adjust_address (str_rtx, word_mode, 0); /* Under the C++0x memory model, we must not touch bits outside the bit region. Adjust the address to start at the beginning of the and the other similar hunk. I suppose they apply to earlier patches in the series? I suppose the above applies to store_bit_field (diff -p really helps!). Why would using word_mode be any good as fallback? That is, why is "Since the incoming STR_RTX has already been adjusted to that mode" not the thing to fix? Richard. > -Sandra
Re: [PATCH GCC]Refactor force_expr_to_var_cost and handle type conversion
On Mon, Nov 11, 2013 at 8:29 AM, Bin.Cheng wrote: > On Fri, Nov 8, 2013 at 10:06 PM, Richard Biener > wrote: >> On Fri, Nov 8, 2013 at 2:41 PM, bin.cheng wrote: >>> Hi, >>> This patch refactors force_expr_to_var_cost and handles type conversion >>> along with other tree nodes. It is split from the patch posted at >>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00546.html >>> Bootstrap and test with the patch lowering address expressions on >>> x86/x86_64/arm. Is it OK? >> >> ENOPATCH >> > Attached here. > I think it should be stated that this patch and the lowering one are > logically one because we rely on this patch to compute cost of lowered > address expression like "&arr + offset". > Moreover, address_cost on x86/x86_64 (is 1) are small, so this patch > has small impact on these two targets. While address_cost on arm (is > 9) is non-trivial, it has greater impact on ARM. The statement is in > line with various benchmark data. Ok. Thanks, Richard. > Thanks, > bin > -- > Best Regards.
Re: [RFA][PATCH] Isolate erroneous paths optimization
On Mon, Nov 11, 2013 at 4:11 AM, Jeff Law wrote: > On 11/10/13 05:34, Eric Botcazou wrote: But I think that you cannot transform foo () { *0 = 1; } to __builtin_trap as you can catch the trap via an exception handler in a caller of foo, no? >>> >>> >>> That is true. OK, I can see an argument that when using >>> -fnon-call-exceptions that kind of code should not be changed to call >>> __builtin_trap. >> >> >> That's exactly the reason why gnat.dg/memtrap.adb started to fail after >> Jeff's >> patch went it. So, if -fisolate-erroneous-paths isn't amended, we'll need >> to >> disable it in Ada like in Go. >> >>> In that case I think it would be fine to run the isolate paths >>> optimization, but to not omit the actual dereference of the NULL >>> pointer (possibly the dereference could be followed by a trap). >> >> >> This would probably work for Ada as well. > > OK. It sounds like there's a pretty general consensus that we ought ot go > ahead and leave in a load/store of a NULL pointer. I'll go ahead and run > with that. I'll probably just emit SSA_NAME = *0, unless someone thinks we > ought ot distinguish between loads and stores, emitting SSA_NAME = *0 and *0 > = 0 for the two cases respectively. Can't you simply keep the trapping stmt as-is? Richard. > However, that brings up an couple interesting questions. > > Let's say we find a NULL pointer which reaches a return statement in a > function which is marked as returns_nonnull. In that case there is no > dereference. Presumably for that kind of scenario we'll just keep the > builtin trap. > > Similarly, assume we extend this pass to detect out-of-bounds array > indexing. It's fairly simple to do and has always been in my plan. In that > case leaving in the array indexing won't necessarily generate a fault. For > those presumably we'll just want the builtin_trap as well? > > Again, I don't mind inserting a *0, I just want to have a plan for the other > undefined behaviours we currently detect and those which I plan on catching > soon. > > Jeff >
Re: [patch] Fix PR ada/35998
On Mon, Nov 11, 2013 at 11:10 AM, Eric Botcazou wrote: > Hi, > > this is an old bug report from Jan, which was closed, then reopened by Tom at > some point, but the patch never got reviewed. The original submission is at: > http://gcc.gnu.org/ml/gcc-patches/2008-05/msg01857.html > > Tested on x86_64-suse-linux, OK for the mainline? Due to the different interfaces of int_size_in_bytes and simple_type_size_in_bits (and 'size' in add_byte_size_attribute being unsigned and not [unsigned] HWI) it would be cleaner to add an early return after the call to int_size_in_bytes if its return value is -1 (and make sure the return value doesn't overflow an unsigned int - likewise for simple_type_size_in_bits, not sure why that case doesn't use int_size_in_bytes as well ...)? Can you apply some TLC here? Thanks, Richard. > > 2013-11-11 Jan Kratochvil > > PR ada/35998 > * dwarf2out.c (add_byte_size_attribute): Omit attribute for size -1. > > > -- > Eric Botcazou
Re: [PATCH] Handle GIMPLE_ASSIGNs with different vuse in gimple_equal_p
On 11/11/13 10:50, Richard Biener wrote: > On Sat, 9 Nov 2013, Tom de Vries wrote: >> Bootstrapped and regtested on x86_64. >> >> OK for trunk? > > Comments inline > > > diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c > index 98b5882..43516a7 100644 > --- a/gcc/tree-ssa-tail-merge.c > +++ b/gcc/tree-ssa-tail-merge.c > @@ -1173,8 +1173,47 @@ gimple_equal_p (same_succ same_succ, gimple s1, gimple > s2) >lhs2 = gimple_get_lhs (s2); >if (TREE_CODE (lhs1) != SSA_NAME > && TREE_CODE (lhs2) != SSA_NAME) > - return (vn_valueize (gimple_vdef (s1)) > - == vn_valueize (gimple_vdef (s2))); > + { > + /* If the vdef is the same, it's the same statement. */ > + if (vn_valueize (gimple_vdef (s1)) > + == vn_valueize (gimple_vdef (s2))) > + return true; > + > + /* If the vdef is not the same but the vuse is the same, it's not the > + same stmt. */ > + if (vn_valueize (gimple_vuse (s1)) > + == vn_valueize (gimple_vuse (s2))) > + return false; > Richard, > What's the logic behind this? > We want to use VN to get more "positive" > results We can use it to get both positive and negative results faster. > - doing a negative early out looks suspicious to me ... > If the vdefs are the same, the VN has concluded that the statements are the same. We have a positive early out. If the vdefs are not the same, the VN has concluded that the statements are different. But if the vuses are the same, the VN has concluded that the statements are different because of structural difference. Which means there's no sense in us repeating the structural comparison. We have a negative early out. If the vdefs are not the same, the VN has concluded that the statements are different. But if the vuses are different, the VN may have concluded that the statements are different because of the different vuses. So it still makes sense to try structural comparison. > + /* If the vdef is not the same and the vuse is not the same, it might > be > + same stmt. */ > + > + /* Test for structural equality. */ > + if (gimple_assign_rhs_code (s1) != gimple_assign_rhs_code (s1) > > s2 > > + || (gimple_assign_nontemporal_move_p (s1) > + != gimple_assign_nontemporal_move_p (s2))) > > I don't think we should care (it'll be false - a later pass sets it, > it's an optimization hint, not a correctness issue). More interesting > would be to assert that has_volatile_ops is the same if the operands > turned out to be the same. > OK. > + return false; > + > + if (!operand_equal_p (lhs1, lhs2, 0)) > + return false; > + > + t1 = gimple_assign_rhs1 (s1); > + t2 = gimple_assign_rhs1 (s2); > + if (!gimple_operand_equal_value_p (t1, t2)) > + return false; > + > + t1 = gimple_assign_rhs2 (s1); > + t2 = gimple_assign_rhs2 (s2); > + if (!gimple_operand_equal_value_p (t1, t2)) > + return false; > + > + t1 = gimple_assign_rhs3 (s1); > + t2 = gimple_assign_rhs3 (s2); > + if (!gimple_operand_equal_value_p (t1, t2)) > + return false; > > for (i = 1; i < gimple_num_ops (s1); ++i) > t1 = gimple_op (s1, i); > ... > > but I think you should only compare rhs1 and thus only handle > GIMPLE_ASSIGN_SINGLEs this way - the others have a SSA name > lhs. > I see. > That makes the whole thing just > > if (TREE_CODE (lhs1) != SSA_NAME > && TREE_CODE (lhs2) != SSA_NAME) > { >if (vn_valueize (gimple_vdef (s1)) >== vn_valueize (gimple_vdef (s2))) >return true; >return operand_equal_p (lhs1, lhs2, 0) >&& gimple_operand_equal_value_p (gimple_assign_rhs1 (s1), > gimple_assign_rhs2 (s2)); > } > > Ok with doing it this way. I'll retest and checkin like this, unless you agree about the early out, which I still think is a good idea, although the structural test is now much smaller. Thanks, - Tom > > Thanks, > Richard. > > + /* Same structure. */ > + return true; > + } >else if (TREE_CODE (lhs1) == SSA_NAME > && TREE_CODE (lhs2) == SSA_NAME) > return vn_valueize (lhs1) == vn_valueize (lhs2); >
Re: [RFC] replace malloc with a decl on the stack
On Sun, Nov 10, 2013 at 4:27 PM, Marc Glisse wrote: > Hello, > > I am posting this patch to get some feedback on the approach. The goal is to > replace malloc+free with a stack allocation (a decl actually) when the size > is a small constant. > > For testing, I highjacked the "leaf" attribute, but it isn't right, I'll > remove it from the list (not sure what I'll do for the testcases then). What > I'd want instead is a "returns" attribute that means the function will > return (either by "return" or an exception), as opposed to having an > infinite loop, calling exit or longjmp, etc (sched-deps.c has a related > call_may_noreturn_p). The situation I am trying to avoid is: > p=malloc(12); > f(p) > free(p) > > where f contains somewhere: > free(p); exit(42); > (or longjmp or something else that takes the regular call to free out of the > execution flow). Instead of a new attribute IPA should be used to compute this call-graph property. An infinite loop wouldn't be an issue, but the real issue is that it shouldn't free the object which would be only valid if the free after the call isn't reached. IPA pure-const is used to compute similar properties (may loop, may throw). You also have to make sure to properly align the replacement. > > It passes most of the testsuite, but breaks a couple __builtin_object_size > tests: > > struct A > { > char a[10]; > int b; > char c[10]; > }; > int main(){ > struct A *p = malloc (2 * sizeof (struct A)); > assert (__builtin_object_size (&p->a, 1) == sizeof (p->a)); > free (p); > } > __builtin_object_size now returns 56 instead of 10. I am not sure what to do > about that. > > > The size above which the malloc->stack transformation is not applied should > depend on a parameter, I don't know if it should get its own or depend on an > existing one. In any case, I'd like to avoid ending up with a ridiculously > low threshold (my programs use GMP, which internally uses alloca up to 65536 > bytes (even in recursive functions that have a dozen allocations), so I > don't want gcc to tell me that 50 bytes are too much). > > A program with a double-free may, with this patch, end up crashing on the > first free instead of the second, but that's an invalid program anyway. > > > I don't know if tree-ssa-forwprop is a good place for it, but it was > convenient for a prototype. I like the idea of running it several times: > replacing malloc with a decl early can help other optimizations, and other > optimizations can make this one possible later. We have a similar transform in CCP (fold_builtin_alloca_with_align) which is there because CCP is a good place where size arguments may become constant (the case you handle - you don't seem to handle variable malloc -> alloca transform which would be possible if for example VRP figures out a acceptable upper bound for the size). Richard. > The walk could be a bit expensive, but we only do it if we detected a malloc > of a small constant and at least one matching free. I guess I should mark > the malloc somehow to avoid performing the walk twice if there are several > (but not enough) matching free. > > > stack_vec is nice, it would be convenient if bitmaps also had a version with > a destructor so we don't need to explicitly deallocate them. > > -- > Marc Glisse > Index: gcc/testsuite/g++.dg/tree-ssa/heapstack-1.C > === > --- gcc/testsuite/g++.dg/tree-ssa/heapstack-1.C (revision 0) > +++ gcc/testsuite/g++.dg/tree-ssa/heapstack-1.C (working copy) > @@ -0,0 +1,24 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +struct A { > + void*p; > + A(void*q) : p(q) {} > + ~A(){ __builtin_free(p); } > +}; > +void f(void*)__attribute__((__leaf__)); > +void h(void*)__attribute__((__leaf__,__nothrow__)); > +void g(){ > + void*p=__builtin_malloc(12); > + A a(p); > + f(p); > +} > + > +void i(){ > + void*p=__builtin_malloc(12); > + h(p); > + __builtin_free(p); > +} > + > +/* { dg-final { scan-tree-dump-not "malloc" "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > > Property changes on: gcc/testsuite/g++.dg/tree-ssa/heapstack-1.C > ___ > Added: svn:keywords > ## -0,0 +1 ## > +Author Date Id Revision URL > \ No newline at end of property > Added: svn:eol-style > ## -0,0 +1 ## > +native > \ No newline at end of property > Index: gcc/testsuite/g++.dg/tree-ssa/heapstack-2.C > === > --- gcc/testsuite/g++.dg/tree-ssa/heapstack-2.C (revision 0) > +++ gcc/testsuite/g++.dg/tree-ssa/heapstack-2.C (working copy) > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +void f(void*)__attribute__((__leaf__)); > +void g(){ > + void*p=__builtin_malloc(12); > + f(p); > + __builtin_free(p); > +} > + > +/* { dg-final { scan-tree-dump-times "malloc" 1 "op
Re: [patch] Fix debug info for modified parameter
On Mon, Nov 11, 2013 at 11:56 AM, Eric Botcazou wrote: > Hi, > > since the switch to SSA form at -O0, the compiler generates wrong debug info > for something like: > > void > foo (int i) > { > int j = 0; > i = 1; > j = j + 1; /* BREAK */ > } > > If you try to display the value of i after breaking in GDB, you don't get 1. > The reason is that there is no default def SSA_NAME for i in this case, so no > partitions get the RTL location of the parameter. > > Tentative patch attached, it's admittedly a little heavy, but I don't see any > other solution. Tested on x86_64-suse-linux, OK for the mainline? Hmm, at -O0 we should be able to coalesce all SSA names of a DECL. So in theory the following should work: Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c (revision 204664) +++ gcc/cfgexpand.c (working copy) @@ -1619,7 +1619,7 @@ expand_used_vars (void) we don't do anything here. But those which don't contain the default def (representing a temporary based on the parm/result) we need to allocate space just like for normal VAR_DECLs. */ - if (!bitmap_bit_p (SA.partition_has_default_def, i)) + if (optimize && !bitmap_bit_p (SA.partition_has_default_def, i)) { expand_one_var (var, true, true); gcc_assert (SA.partition_to_pseudo[i]); eventually restricting handling to !DECL_IGNORED_P / !DECL_ARTIFICIAL variables. Richard. > > 2013-11-11 Eric Botcazou > > * tree-outof-ssa.c (remove_ssa_form): For a parameter without default > def, pretend that the single partition that contains all the SSA_NAMEs > for this parameter, if it exists, also contains the default def. > > > 2013-11-11 Eric Botcazou > > * gcc.dg/guality/param-4.c: New test. > > > -- > Eric Botcazou
Re: [patch] Add CONSTRUCTOR_NO_CLEARING flag
On Mon, Nov 11, 2013 at 11:55 AM, Eric Botcazou wrote: > Hi, > > in Ada 2012 it is allowed to omit components of aggregates (the equivalent of > initializers/constructors); in this case, the contents of the corresponding > fields in the record become undefined. Now the gimplifier implements the C > semantics of clearing the missing components, so this patch introduces a new > flag on constructors to modify that. > > Tested on x86_64-suse-linux, OK for the mainline? Ok. Can you update doc/generic.texi? Thanks, Richard. > > 2013-11-11 Tristan Gingold > Eric Botcazou > > * tree.h (CONSTRUCTOR_NO_CLEARING): Define. > * tree-core.h (CONSTRUCTOR_NO_CLEARING): Document it. > * tree.def (CONSTRUCTOR): Likewise. > * gimplify.c (gimplify_init_constructor): Do not clear the object when > the constructor is incomplete and CONSTRUCTOR_NO_CLEARING is set. > ada/ > * gcc-interface/utils2.c (gnat_build_constructor): Also set the flag > CONSTRUCTOR_NO_CLEARING on the constructor. > > > -- > Eric Botcazou
Re: [PATCH] Handle GIMPLE_ASSIGNs with different vuse in gimple_equal_p
On Mon, 11 Nov 2013, Tom de Vries wrote: > On 11/11/13 10:50, Richard Biener wrote: > > On Sat, 9 Nov 2013, Tom de Vries wrote: > >> Bootstrapped and regtested on x86_64. > >> > >> OK for trunk? > > > > Comments inline > > > > > > diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c > > index 98b5882..43516a7 100644 > > --- a/gcc/tree-ssa-tail-merge.c > > +++ b/gcc/tree-ssa-tail-merge.c > > @@ -1173,8 +1173,47 @@ gimple_equal_p (same_succ same_succ, gimple s1, > > gimple s2) > >lhs2 = gimple_get_lhs (s2); > >if (TREE_CODE (lhs1) != SSA_NAME > > && TREE_CODE (lhs2) != SSA_NAME) > > - return (vn_valueize (gimple_vdef (s1)) > > - == vn_valueize (gimple_vdef (s2))); > > + { > > + /* If the vdef is the same, it's the same statement. */ > > + if (vn_valueize (gimple_vdef (s1)) > > + == vn_valueize (gimple_vdef (s2))) > > + return true; > > + > > + /* If the vdef is not the same but the vuse is the same, it's not the > > +same stmt. */ > > + if (vn_valueize (gimple_vuse (s1)) > > + == vn_valueize (gimple_vuse (s2))) > > + return false; > > > > Richard, > > > What's the logic behind this? > > We want to use VN to get more "positive" > > results > > We can use it to get both positive and negative results faster. I can get a negative result for free: "false" ;) VN proves equivalency it doesn't prove non-equivalency - that's simply its conservative fallback. > > - doing a negative early out looks suspicious to me ... > > > > If the vdefs are the same, the VN has concluded that the statements are the > same. We have a positive early out. > > If the vdefs are not the same, the VN has concluded that the statements are > different. But if the vuses are the same, the VN has concluded that the > statements are different because of structural difference. Which means there's > no sense in us repeating the structural comparison. We have a negative early > out. Well, see above - it merely failed to prove equivalency, it didn't actually conclude they are different. > If the vdefs are not the same, the VN has concluded that the statements are > different. But if the vuses are different, the VN may have concluded that the > statements are different because of the different vuses. So it still makes > sense > to try structural comparison. > > > + /* If the vdef is not the same and the vuse is not the same, it might > > be > > +same stmt. */ > > + > > + /* Test for structural equality. */ > > + if (gimple_assign_rhs_code (s1) != gimple_assign_rhs_code (s1) > > > > s2 > > > > + || (gimple_assign_nontemporal_move_p (s1) > > + != gimple_assign_nontemporal_move_p (s2))) > > > > I don't think we should care (it'll be false - a later pass sets it, > > it's an optimization hint, not a correctness issue). More interesting > > would be to assert that has_volatile_ops is the same if the operands > > turned out to be the same. > > > > OK. > > > + return false; > > + > > + if (!operand_equal_p (lhs1, lhs2, 0)) > > + return false; > > + > > + t1 = gimple_assign_rhs1 (s1); > > + t2 = gimple_assign_rhs1 (s2); > > + if (!gimple_operand_equal_value_p (t1, t2)) > > + return false; > > + > > + t1 = gimple_assign_rhs2 (s1); > > + t2 = gimple_assign_rhs2 (s2); > > + if (!gimple_operand_equal_value_p (t1, t2)) > > + return false; > > + > > + t1 = gimple_assign_rhs3 (s1); > > + t2 = gimple_assign_rhs3 (s2); > > + if (!gimple_operand_equal_value_p (t1, t2)) > > + return false; > > > > for (i = 1; i < gimple_num_ops (s1); ++i) > > t1 = gimple_op (s1, i); > > ... > > > > but I think you should only compare rhs1 and thus only handle > > GIMPLE_ASSIGN_SINGLEs this way - the others have a SSA name > > lhs. > > > > I see. > > > That makes the whole thing just > > > > if (TREE_CODE (lhs1) != SSA_NAME > > && TREE_CODE (lhs2) != SSA_NAME) > > { > >if (vn_valueize (gimple_vdef (s1)) > >== vn_valueize (gimple_vdef (s2))) > > return true; > >return operand_equal_p (lhs1, lhs2, 0) > >&& gimple_operand_equal_value_p (gimple_assign_rhs1 (s1), > > gimple_assign_rhs2 (s2)); > > } > > > > Ok with doing it this way. > > I'll retest and checkin like this, unless you agree about the early out, > which I > still think is a good idea, although the structural test is now much smaller. I think the early out is premature. Richard.
[patch] fix libstdc++/54562
This ensures that the standard timed mutexes use steady_clock for relative timeouts and the same clock as pthread_mutex_timedlock for absolute timeouts. I'm not adding a new test as the only ones I could come up with involve waiting for longer than is suitable in a testcase, and don't reliably fail on all systems anyway. PR libstdc++/54562 * include/std/mutex (__timed_mutex_impl::__clock_t): Use high_resolution_clock for absolute timeouts, because pthread_mutex_timedlock uses CLOCK_REALTIME not CLOCK_MONOTONIC. (__timed_mutex_impl::_M_try_lock_for): Use steady_clock for relative timeouts as per [thread.req.timing]. (__timed_mutex_impl::_M_try_lock_until): Convert to __clock_t time point instead of using _M_try_lock_for. Tested x86_64-linux, committed to trunk. I would like to commit this to the 4.8 branch too, but I'm undecided whether it's suitable. commit 1f2401b0338fefaaa521ebf202e563a6c673b47e Author: Jonathan Wakely Date: Mon Nov 11 12:40:01 2013 + PR libstdc++/54562 * include/std/mutex (__timed_mutex_impl::__clock_t): Use high_resolution_clock for absolute timeouts, because pthread_mutex_timedlock uses CLOCK_REALTIME not CLOCK_MONOTONIC. (__timed_mutex_impl::_M_try_lock_for): Use steady_clock for relative timeouts as per [thread.req.timing]. (__timed_mutex_impl::_M_try_lock_until): Convert to __clock_t time point instead of using _M_try_lock_for. diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex index 40b2e31..da2ca0c 100644 --- a/libstdc++-v3/include/std/mutex +++ b/libstdc++-v3/include/std/mutex @@ -203,21 +203,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION class __timed_mutex_impl { protected: -#ifdef _GLIBCXX_USE_CLOCK_MONOTONIC - typedef chrono::steady_clock __clock_t; -#else typedef chrono::high_resolution_clock__clock_t; -#endif template bool _M_try_lock_for(const chrono::duration<_Rep, _Period>& __rtime) { - auto __rt = chrono::duration_cast<__clock_t::duration>(__rtime); - if (ratio_greater<__clock_t::period, _Period>()) + using chrono::steady_clock; + auto __rt = chrono::duration_cast(__rtime); + if (ratio_greater()) ++__rt; - - return _M_try_lock_until(__clock_t::now() + __rt); + return _M_try_lock_until(steady_clock::now() + __rt); } template @@ -225,11 +221,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_try_lock_until(const chrono::time_point<__clock_t, _Duration>& __atime) { - chrono::time_point<__clock_t, chrono::seconds> __s = - chrono::time_point_cast(__atime); - - chrono::nanoseconds __ns = - chrono::duration_cast(__atime - __s); + auto __s = chrono::time_point_cast(__atime); + auto __ns = chrono::duration_cast(__atime - __s); __gthread_time_t __ts = { static_cast(__s.time_since_epoch().count()), @@ -243,7 +236,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template bool _M_try_lock_until(const chrono::time_point<_Clock, _Duration>& __atime) - { return _M_try_lock_for(__atime - _Clock::now()); } + { + auto __rtime = __atime - _Clock::now(); + return _M_try_lock_until(__clock_t::now() + __rtime); + } }; /// timed_mutex
patch ping: diagnostics finalization and plugins
Hello all, I'm pinging the patch http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00056.html ## Index: gcc/toplev.c === --- gcc/toplev.c(revision 204671) +++ gcc/toplev.c(working copy) @@ -1968,11 +1968,13 @@ toplev_main (int argc, char **argv) if (warningcount || errorcount || werrorcount) print_ignored_options (); - diagnostic_finish (global_dc); - /* Invoke registered plugin callbacks if any. */ + /* Invoke registered plugin callbacks if any. Some plugins could + emit some diagnostics here. */ invoke_plugin_callbacks (PLUGIN_FINISH, NULL); + diagnostic_finish (global_dc); + finalize_plugins (); location_adhoc_data_fini (line_table); if (seen_error () || werrorcount) gcc/ChangeLog entry 2013-11-11 Basile Starynkevitch * toplev.c (toplev_main): Move PLUGIN_FINISH invocation before diagnostic_finish. Ok for trunk? -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***
Re: patch ping: diagnostics finalization and plugins
On Mon, Nov 11, 2013 at 8:36 AM, Basile Starynkevitch wrote: > 2013-11-11 Basile Starynkevitch > > * toplev.c (toplev_main): Move PLUGIN_FINISH invocation before > diagnostic_finish. OK. Diego.
Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
On Thu, Nov 7, 2013 at 10:07 PM, Ilya Enkovich wrote: > 2013/11/7 Jeff Law : >> On 11/04/13 05:40, Richard Biener wrote: Effectively the bounds are passed "on the side". >>> >>> >>> Well, not exactly. I can see __bound_tmp.0_4 passed to access_and_store. >> >> I'm referring to how they're dealt with in FUNCTION_ARG and friends, ie, the >> low level aspects. Maybe that's why we're crossing wires here. >> >> >> >>> >>> Are __builtin_ia32_bndcu and access_and_store supposed to be >>> called directly after each other? What if for example profile >>> instrumentation >>> inserts a call inbetween them? >> >> That's a question for Ilya. > > I think there is some misunderstanding. __builtin_ia32_bndcu and > access_and_store calls are completely independent in this example. > This is just a code from example on how user-level builtins may be > used with legacy code. > >> >> You can't actually interleave them -- that results in MPX and normal code not being able to interact. Passing the bound at the end doesn't really work either -- varargs and the desire to pass some of the bounds around in bound registers. >>> >>> >>> I'm only looking at how bound arguments are passed at the GIMPLE >>> level - which I think is arbitrary given they are passed on-the-side >>> during code-generation. So I'm looking for a more "sensible" option >>> for the GIMPLE representation which looks somewhat fragile to me >>> (it looks the argument is only there to have a data dependence >>> between the bndcu intrinsic and the actual call?) >> >> OK, we're clearly looking at two different aspects here. While I think we >> can express them arbitrarily in GIMPLE, we have to pass them on the side >> once we get into the low level code for compatibility purposes. >> >> I think argument passing for MPX is going to ultimately be ugly/fragile >> regardless of what we do given the constraints. Given we're passing on the >> side at the low level, I'd prefer them passed on the side in GIMPLE, but I'm >> willing to consider other alternatives. >> >> >>> > > /* Return the number of arguments used by call statement GS > ignoring bound ones. */ > > static inline unsigned > gimple_call_num_nobnd_args (const_gimple gs) > { > unsigned num_args = gimple_call_num_args (gs); > unsigned res = num_args; > for (unsigned n = 0; n < num_args; n++) > if (POINTER_BOUNDS_P (gimple_call_arg (gs, n))) > res--; > return res; > } > > the choice means that gimple_call_num_nobnd_args is not O(1). Yes, but I don't see that's terribly problematical. >>> >>> >>> Well, consider compile.exp limits-fnargs.c written with int * parameters >>> and bound support ;) [there is a limit on the number of bound registers >>> available, but the GIMPLE parts doesn't put any limit on instrumented >>> args?] >> >> Right. Go back to Ilya's earlier messages :-) >> >> There's these two magic hooks TARGET_{LOAD,STORE}_BOUNDS_FOR_ARG. At some >> point when you run out of regs, you start shoving these things into memory. >> I think we're ultimately going to need further clarification here. >> >> > > GIMPLE layout depending on flag_check_pointer_bounds sounds > like a recipie for desaster if you consider TUs compiled with and > TUs compiled without and LTO. Or if you consider using > optimized attribute with that flag. Sorry, I don't follow. Can you elaborate please. >>> >>> >>> Compile TU1 with -fno-chk, TU2 with -fchk, LTO both object files >>> which gives you - well, either -fno-chk or -fchk. Then you read in >>> the GIMPLE and when calling gimple_call_nobnd_arg you get >>> a mismatch between where you generated the gimple and where >>> you use the gimple. >>> >>> A flag on whether a call is instrumented is better (consider inlining >>> from TU1 into TU2 where a flag on struct function for example wouldn't >>> be enough either). >> >> OK. I see what you're saying. Whether or not we have these extra args is a >> property of the call site, not the CFUN, target DECL or the TU. That makes >> sense. >> >> >> Ilya, I think there's enough confusion & issues around the call ABI/API that >> we need to sort it out before moving forward in any significant way. >> >> Let's go back to argument passing and go through some concrete examples of >> the various cases. Make sure to tie them into the calls to >> TARGET_{LOAD,STORE}_BOUNDS_FOR_ARG and the new gimple_call_* routines. > > OK. Lets see at how expand pass handles instrumented calls. As usual, > for each passed argument expand calls target hook to determine where > argument is passed. Now target may return not only slot for arg, but > also a slot for bounds in case arg may have them (PARALLEL expr is > used to hold all slots). If bounds slot is register, then expand > simply puts bounds to it. If it is not a register, target hook is > called to sto
Re: [PATCH, MPX, 2/X] Pointers Checker [10/25] Calls copy and verification
On Fri, Nov 8, 2013 at 11:03 AM, Ilya Enkovich wrote: > 2013/11/8 Richard Biener : >> On Thu, Nov 7, 2013 at 7:55 PM, Jeff Law wrote: >>> On 11/07/13 04:50, Ilya Enkovich wrote: Hi, Here is an updated patch version. >>> >>> I think this needs to hold until we have a consensus on what the parameter >>> passing looks like for bounded pointers. >> >> I still think the best thing to do on GIMPLE is >> >> arg_2 = __builtin_ia32_bnd_arg (arg_1(D)); >> foo (arg_2); >> >> that is, make the parameter an implicit pair of {value, bound} where >> the bound is determined by the value going through a bound association >> builtin. No extra explicit argument to the calls so arguments match >> the fndecl and fntype. All the complexity is defered to the expander >> which can trivially lookup bound arguments via the SSA def (I suppose >> it does that anyway now for getting at the explicit bound argument now). >> >> As far as I can see (well, think), all currently passed bound arguments >> are the return value of such builtin already. > > All bounds are result of different builtin calls. Small example: > > int *global_p; > void foo (int *p) > { > int buf[10]; > bar (p, buf, global_p); > } > > > It is translated into: > > __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40); > __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab)); > global_p.0_2 = global_p; > __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2); > bar (p_3(D)(ab), __bound_tmp.1_6, &buf, __bound_tmp.1_7, > global_p.0_2, __bound_tmp.1_8); > > Bounds binding via calls as you suggest may be done as following: > > __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40); > __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab)); > global_p.0_2 = global_p; > __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2); > _9 = __builtin_bind_bounds (p_3(D)(ab), __bound_tmp.1_6); > _10 = __builtin_bind_bounds (&buf, __bound_tmp.1_7); > _11 = __builtin_bind_bounds (global_p.0_2, __bound_tmp.1_8); > bar (_9, _10, _11); > > Is it close to what you propose? Yes. Richard. > Ilya >> >> Richard. >> >> >> >>> Jeff
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
Hello, On 05 Nov 16:05, Kirill Yukhin wrote: > Is it ok with that change? Ping? -- Thanks, K
[PATCH][ARM] Fix "control reached end of non-void function" warning and boostrap
Hi all, My patch last Friday introduced a warning about reaching the end of a non-void function which breaks bootstrap. On second thought, instead of breaking at the end of the comparisons handling, we should just return instead. Tested arm-none-eabi on qemu and checked the build log to make sure the warning disappears. Ok for trunk? Thanks, Kyrill 2013-11-11 Kyrylo Tkachov * config/arm/arm.c (arm_new_rtx_costs): Return after handling comparisons. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 2aa739d..62ec7f7 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -10154,7 +10154,7 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code, *cost = 0; return true; } - break; + return false; case ABS: if (TARGET_HARD_FLOAT && GET_MODE_CLASS (mode) == MODE_FLOAT
Re: [PATCH][ARM] Fix "control reached end of non-void function" warning and boostrap
On 11/11/13 13:48, Kyrill Tkachov wrote: Hi all, My patch last Friday introduced a warning about reaching the end of a non-void function which breaks bootstrap. On second thought, instead of breaking at the end of the comparisons handling, we should just return instead. Tested arm-none-eabi on qemu and checked the build log to make sure the warning disappears. Ok for trunk? Ouch ! Ok. Ramana Thanks, Kyrill 2013-11-11 Kyrylo Tkachov * config/arm/arm.c (arm_new_rtx_costs): Return after handling comparisons.
Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
2013/11/11 Richard Biener : > On Thu, Nov 7, 2013 at 10:07 PM, Ilya Enkovich wrote: >> 2013/11/7 Jeff Law : >>> On 11/04/13 05:40, Richard Biener wrote: > > > Effectively the bounds are passed "on the side". Well, not exactly. I can see __bound_tmp.0_4 passed to access_and_store. >>> >>> I'm referring to how they're dealt with in FUNCTION_ARG and friends, ie, the >>> low level aspects. Maybe that's why we're crossing wires here. >>> >>> >>> Are __builtin_ia32_bndcu and access_and_store supposed to be called directly after each other? What if for example profile instrumentation inserts a call inbetween them? >>> >>> That's a question for Ilya. >> >> I think there is some misunderstanding. __builtin_ia32_bndcu and >> access_and_store calls are completely independent in this example. >> This is just a code from example on how user-level builtins may be >> used with legacy code. >> >>> >>> > > You can't actually interleave them -- that results in MPX and normal code > not being able to interact. Passing the bound at the end doesn't really > work either -- varargs and the desire to pass some of the bounds around > in > bound registers. I'm only looking at how bound arguments are passed at the GIMPLE level - which I think is arbitrary given they are passed on-the-side during code-generation. So I'm looking for a more "sensible" option for the GIMPLE representation which looks somewhat fragile to me (it looks the argument is only there to have a data dependence between the bndcu intrinsic and the actual call?) >>> >>> OK, we're clearly looking at two different aspects here. While I think we >>> can express them arbitrarily in GIMPLE, we have to pass them on the side >>> once we get into the low level code for compatibility purposes. >>> >>> I think argument passing for MPX is going to ultimately be ugly/fragile >>> regardless of what we do given the constraints. Given we're passing on the >>> side at the low level, I'd prefer them passed on the side in GIMPLE, but I'm >>> willing to consider other alternatives. >>> >>> >> >> /* Return the number of arguments used by call statement GS >> ignoring bound ones. */ >> >> static inline unsigned >> gimple_call_num_nobnd_args (const_gimple gs) >> { >> unsigned num_args = gimple_call_num_args (gs); >> unsigned res = num_args; >> for (unsigned n = 0; n < num_args; n++) >> if (POINTER_BOUNDS_P (gimple_call_arg (gs, n))) >> res--; >> return res; >> } >> >> the choice means that gimple_call_num_nobnd_args is not O(1). > > > Yes, but I don't see that's terribly problematical. Well, consider compile.exp limits-fnargs.c written with int * parameters and bound support ;) [there is a limit on the number of bound registers available, but the GIMPLE parts doesn't put any limit on instrumented args?] >>> >>> Right. Go back to Ilya's earlier messages :-) >>> >>> There's these two magic hooks TARGET_{LOAD,STORE}_BOUNDS_FOR_ARG. At some >>> point when you run out of regs, you start shoving these things into memory. >>> I think we're ultimately going to need further clarification here. >>> >>> >> >> GIMPLE layout depending on flag_check_pointer_bounds sounds >> like a recipie for desaster if you consider TUs compiled with and >> TUs compiled without and LTO. Or if you consider using >> optimized attribute with that flag. > > > Sorry, I don't follow. Can you elaborate please. Compile TU1 with -fno-chk, TU2 with -fchk, LTO both object files which gives you - well, either -fno-chk or -fchk. Then you read in the GIMPLE and when calling gimple_call_nobnd_arg you get a mismatch between where you generated the gimple and where you use the gimple. A flag on whether a call is instrumented is better (consider inlining from TU1 into TU2 where a flag on struct function for example wouldn't be enough either). >>> >>> OK. I see what you're saying. Whether or not we have these extra args is a >>> property of the call site, not the CFUN, target DECL or the TU. That makes >>> sense. >>> >>> >>> Ilya, I think there's enough confusion & issues around the call ABI/API that >>> we need to sort it out before moving forward in any significant way. >>> >>> Let's go back to argument passing and go through some concrete examples of >>> the various cases. Make sure to tie them into the calls to >>> TARGET_{LOAD,STORE}_BOUNDS_FOR_ARG and the new gimple_call_* routines. >> >> OK. Lets see at how expand pass handles instrumented calls. As usual, >> for each passed argument expand calls target hook to determine where >> argument is passed. Now target may return not only slot for arg, but >> also a slot for bounds in case arg may have them (PARALLEL expr i
Re: [PATCH, MPX, 2/X] Pointers Checker [10/25] Calls copy and verification
2013/11/11 Richard Biener : > On Fri, Nov 8, 2013 at 11:03 AM, Ilya Enkovich wrote: >> 2013/11/8 Richard Biener : >>> On Thu, Nov 7, 2013 at 7:55 PM, Jeff Law wrote: On 11/07/13 04:50, Ilya Enkovich wrote: > > Hi, > > Here is an updated patch version. I think this needs to hold until we have a consensus on what the parameter passing looks like for bounded pointers. >>> >>> I still think the best thing to do on GIMPLE is >>> >>> arg_2 = __builtin_ia32_bnd_arg (arg_1(D)); >>> foo (arg_2); >>> >>> that is, make the parameter an implicit pair of {value, bound} where >>> the bound is determined by the value going through a bound association >>> builtin. No extra explicit argument to the calls so arguments match >>> the fndecl and fntype. All the complexity is defered to the expander >>> which can trivially lookup bound arguments via the SSA def (I suppose >>> it does that anyway now for getting at the explicit bound argument now). >>> >>> As far as I can see (well, think), all currently passed bound arguments >>> are the return value of such builtin already. >> >> All bounds are result of different builtin calls. Small example: >> >> int *global_p; >> void foo (int *p) >> { >> int buf[10]; >> bar (p, buf, global_p); >> } >> >> >> It is translated into: >> >> __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40); >> __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab)); >> global_p.0_2 = global_p; >> __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2); >> bar (p_3(D)(ab), __bound_tmp.1_6, &buf, __bound_tmp.1_7, >> global_p.0_2, __bound_tmp.1_8); >> >> Bounds binding via calls as you suggest may be done as following: >> >> __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40); >> __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab)); >> global_p.0_2 = global_p; >> __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2); >> _9 = __builtin_bind_bounds (p_3(D)(ab), __bound_tmp.1_6); >> _10 = __builtin_bind_bounds (&buf, __bound_tmp.1_7); >> _11 = __builtin_bind_bounds (global_p.0_2, __bound_tmp.1_8); >> bar (_9, _10, _11); >> >> Is it close to what you propose? > > Yes. Is there a way to bind bounds with structures in a similar way? For SSA names I may easily find definition and check if it is a binding builtin call. But for structures it is not possible. The way I see it to mark all such args as addressable and load required bounds on expand pass. Ilya > > Richard. > >> Ilya >>> >>> Richard. >>> >>> >>> Jeff
[PATCH] Fix asan regtest failures c-c++-common/asan/{memcmp-1.c,strncpy-overflow-1.c}
Hello, Since a couple of days I am seeing failure on the tests above on my Fedora system. The errors look like: FAIL: c-c++-common/asan/memcmp-1.c -O0 output pattern test, is = ==21832==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff75df96f4 at pc 0x7f98ecbab68b bp 0x7fff75df96b0 sp 0x7fff75df95b8 READ of size 6 at 0x7fff75df96f4 thread T0 #0 0x7f98ecbab68a in __interceptor_memcmp /home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:295 (discriminator 7) #1 0x7f98ecbb6393 in __asan_report_error /home/dodji/git/gcc/master/libsanitizer/asan/asan_report.cc:774 (discriminator 9) #2 0x7f98ecbab6d0 in __interceptor_memcmp /home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:295 (discriminator 7) #3 0x400b0a in main /home/dodji/git/gcc/master/gcc/testsuite/c-c++-common/asan/memcmp-1.c:14 #4 0x3380821a04 in __libc_start_main ??:? #5 0x4007b8 in _start ??:? Address 0x7fff75df96f4 is located in stack of thread T0 at offset 36 in frame #0 0x40089f in main /home/dodji/git/gcc/master/gcc/testsuite/c-c++-common/asan/memcmp-1.c:11 [...] and FAIL: c-c++-common/asan/strncpy-overflow-1.c -O0 output pattern test, is = ==30666==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020efd9 at pc 0x7f65c09fd8f9 bp 0x7fffd820d170 sp 0x7fffd820d0a0 WRITE of size 10 at 0x6020efd9 thread T0 #0 0x7f65c09fd8f8 in __interceptor_strncpy /home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:509 (discriminator 7) #1 0x7f65c0a06393 in __asan_report_error /home/dodji/git/gcc/master/libsanitizer/asan/asan_report.cc:774 (discriminator 9) #2 0x7f65c09fd93e in __interceptor_strncpy /home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:509 (discriminator 7) #3 0x400a9c in main /home/dodji/git/gcc/master/gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c:11 #4 0x3380821a04 in __libc_start_main ??:? #5 0x400898 in _start ??:? 0x6020efd9 is located 0 bytes to the right of 9-byte region [0x6020efd0,0x6020efd9) allocated by thread T0 here: #0 0x7f65c0a00cd9 in __interceptor_malloc /home/dodji/git/gcc/master/libsanitizer/asan/asan_malloc_linux.cc:72 (discriminator 9) #1 0x400a80 in main /home/dodji/git/gcc/master/gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c:10 [...] It turned out this is just due to the output emitted by the AddressSanitizer that has changed. The patch below just adapts the regexps of the above test to match the fact that a few new line patterns were inserted in the middle of the output. It also fixes what I think are typos namely, un-escaped ']' as in "\[something]". I believe the closing square bracket should be escaped as well (but well, how do I know about tcl really). Tested on x86_64-unknown-linux-gnu against trunk. gcc/testsuite/ChangeLog: * c-c++-common/asan/memcmp-1.c: Adapt the pattern for changed libsanitizer output. * c-c++-common/asan/strncpy-overflow-1.c: Likewise. --- gcc/testsuite/c-c++-common/asan/memcmp-1.c | 8 ++-- gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c | 9 + 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/gcc/testsuite/c-c++-common/asan/memcmp-1.c b/gcc/testsuite/c-c++-common/asan/memcmp-1.c index 03f32e9..9a62e33 100644 --- a/gcc/testsuite/c-c++-common/asan/memcmp-1.c +++ b/gcc/testsuite/c-c++-common/asan/memcmp-1.c @@ -16,5 +16,9 @@ main () } /* { dg-output "ERROR: AddressSanitizer: stack-buffer-overflow.*(\n|\r\n|\r)" } */ -/* { dg-output "#0 0x\[0-9a-f\]+ (in _*(interceptor_|)memcmp |\[(\])\[^\n\r]*(\n|\r\n|\r)" } */ -/* { dg-output "#1 0x\[0-9a-f\]+ (in _*main|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */ +/* { dg-output "#0 0x\[0-9a-f\]+ (in _*(interceptor_|)memcmp |\[(\])\[^\n\r\]*(\n|\r\n|\r)" } */ +/* { dg-output "(\[^\r\n\]*(\n|\r\n|\r))+" } */ +/* { dg-output "#\[0-9\]+ 0x\[0-9a-f\]+ in main"} */ +/* { dg-output "(\[^\r\n\]*(\n|\r\n|\r))+" } */ +/* { dg-output "Address 0x\[0-9a-f\] is located in stack of thread T0 at offset \[0-9\]+ in frame (\n|\r\n|\r)"} */ +/* { dg-output "#0 0x\[0-9a-f\]+ in main" } */ diff --git a/gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c b/gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c index 3ed9fd6..b1a4cda 100644 --- a/gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c +++ b/gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c @@ -13,9 +13,10 @@ int main(int argc, char **argv) { } /* { dg-output "WRITE of size \[0-9\]* at 0x\[0-9a-f\]+ thread T0\[^\n\r]*(\n|\r\n|\r)" } */ -/* { dg-output "#0 0x\[0-9a-f\]+ (in _*(interceptor_|)strncpy|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */ -/* { dg-output "#1 0x\[0-9a-f\]+ (in _*main (\[^\n\r]*strncpy-overflow-1.c:11|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r)" } */ -/* { dg-output "\[^\n\r]*0x\[0-9a-f\]+
Re: [PATCH] Fix asan regtest failures c-c++-common/asan/{memcmp-1.c,strncpy-overflow-1.c}
On Mon, Nov 11, 2013 at 03:01:53PM +0100, Dodji Seketeli wrote: > Since a couple of days I am seeing failure on the tests above on my > Fedora system. The errors look like: > > FAIL: c-c++-common/asan/memcmp-1.c -O0 output pattern test, is > = > ==21832==ERROR: AddressSanitizer: stack-buffer-overflow on address > 0x7fff75df96f4 at pc 0x7f98ecbab68b bp 0x7fff75df96b0 sp 0x7fff75df95b8 > READ of size 6 at 0x7fff75df96f4 thread T0 > #0 0x7f98ecbab68a in __interceptor_memcmp > /home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:295 > (discriminator 7) > #1 0x7f98ecbb6393 in __asan_report_error > /home/dodji/git/gcc/master/libsanitizer/asan/asan_report.cc:774 > (discriminator 9) > #2 0x7f98ecbab6d0 in __interceptor_memcmp > /home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:295 > (discriminator 7) > #3 0x400b0a in main > /home/dodji/git/gcc/master/gcc/testsuite/c-c++-common/asan/memcmp-1.c:14 That looks like a bug in libasan, __asan_report_error doesn't call memcmp again. So something is wrong with getting proper backtrace it seems. Jakub
Re: [PATCH, MPX, 2/X] Pointers Checker [10/25] Calls copy and verification
On Mon, Nov 11, 2013 at 3:00 PM, Ilya Enkovich wrote: > 2013/11/11 Richard Biener : >> On Fri, Nov 8, 2013 at 11:03 AM, Ilya Enkovich >> wrote: >>> 2013/11/8 Richard Biener : On Thu, Nov 7, 2013 at 7:55 PM, Jeff Law wrote: > On 11/07/13 04:50, Ilya Enkovich wrote: >> >> Hi, >> >> Here is an updated patch version. > > I think this needs to hold until we have a consensus on what the parameter > passing looks like for bounded pointers. I still think the best thing to do on GIMPLE is arg_2 = __builtin_ia32_bnd_arg (arg_1(D)); foo (arg_2); that is, make the parameter an implicit pair of {value, bound} where the bound is determined by the value going through a bound association builtin. No extra explicit argument to the calls so arguments match the fndecl and fntype. All the complexity is defered to the expander which can trivially lookup bound arguments via the SSA def (I suppose it does that anyway now for getting at the explicit bound argument now). As far as I can see (well, think), all currently passed bound arguments are the return value of such builtin already. >>> >>> All bounds are result of different builtin calls. Small example: >>> >>> int *global_p; >>> void foo (int *p) >>> { >>> int buf[10]; >>> bar (p, buf, global_p); >>> } >>> >>> >>> It is translated into: >>> >>> __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40); >>> __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab)); >>> global_p.0_2 = global_p; >>> __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2); >>> bar (p_3(D)(ab), __bound_tmp.1_6, &buf, __bound_tmp.1_7, >>> global_p.0_2, __bound_tmp.1_8); >>> >>> Bounds binding via calls as you suggest may be done as following: >>> >>> __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40); >>> __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab)); >>> global_p.0_2 = global_p; >>> __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2); >>> _9 = __builtin_bind_bounds (p_3(D)(ab), __bound_tmp.1_6); >>> _10 = __builtin_bind_bounds (&buf, __bound_tmp.1_7); >>> _11 = __builtin_bind_bounds (global_p.0_2, __bound_tmp.1_8); >>> bar (_9, _10, _11); >>> >>> Is it close to what you propose? >> >> Yes. > > Is there a way to bind bounds with structures in a similar way? Not to have them easy to lookup in the SSA web. A long time ago I proposed to make SSA aggregates possible, so you could do tem_2 = __internal_bind_bounds (aggr(D), __bound_tmp.1_3, __bound_tmp.1_4, ...); bar (tem_2); (originally the SSA aggregates were supposed to make copy-propgagation possible using the SSA copy propagator, and of course I needed it for the middle-end array work) Not sure if that will give other people the creeps (expand would never expand the "load" from tem_2 but instead handle aggr as parameter). A slight complication is due to alias analysis which would be need to taught that bar performs a load of aggr. Richard. > For > SSA names I may easily find definition and check if it is a binding > builtin call. But for structures it is not possible. The way I see it > to mark all such args as addressable and load required bounds on > expand pass. > > Ilya >> >> Richard. >> >>> Ilya Richard. > Jeff
Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals
On Mon, Nov 11, 2013 at 11:19:21AM +0100, Dodji Seketeli wrote: > .../c-c++-common/cpp/warning-zero-in-literals-1.c | Bin 0 -> 240 bytes > libcpp/include/line-map.h | 8 + > libcpp/line-map.c | 40 ++ > 8 files changed, 585 insertions(+), 39 deletions(-) > create mode 100644 > gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index 49285e5..50c2482 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -1469,7 +1469,7 @@ OBJS = \ > > # Objects in libcommon.a, potentially used by all host binaries and with > # no target dependencies. > -OBJS-libcommon = diagnostic.o diagnostic-color.o pretty-print.o intl.o > input.o version.o > +OBJS-libcommon = diagnostic.o diagnostic-color.o pretty-print.o intl.o vec.o > input.o version.o Too long line? > + if (c == '\0') > + c = ' '; >pp_character (context->printer, c); Does that match how libcpp counts the embedded '\0' character in column computation? > +/* The position (byte count) the the last byte of the line. This > + normally points to the '\n' character, or to one byte after the > + last byte of the file, if the file doesn't contain a '\n' > + character. */ > +size_t end_pos; Does it really help to note this? You can always just walk the line from start_pos looking for '\n' or end of file. > +static fcache* > +add_file_to_cache_tab (const char *file_path) > +{ > + static size_t idx; > + fcache *r; > + > + FILE *fp = fopen (file_path, "r"); > + if (ferror (fp)) > +{ > + fclose (fp); > + return NULL; > +} > + > + r = &fcache_tab[idx]; Wouldn't it be better to use some LRU algorithm to determine which file to kick out of the cache? Have some tick counter of cache lookups (or creation) and assign the tick counter to the just created resp. used cache entry, and remove the one with the smallest counter? > + fcache * r = lookup_file_in_cache_tab (file_path); > + if (r == NULL) Formatting (no space after *, extra space after ==). Jakub
Re: Some wide-int review comments
On 11/11/2013 06:49 AM, Richard Biener wrote: On Sat, Nov 9, 2013 at 3:23 PM, Kenneth Zadeck wrote: On 11/08/2013 05:30 AM, Richard Sandiford wrote: From tree-vrp.c: @@ -1893,6 +1884,10 @@ vrp_int_const_binop (enum tree_code code /* If the singed operation wraps then int_const_binop has done everything we want. */ ; + /* Signed division of -1/0 overflows and by the time it gets here + returns NULL_TREE. */ + else if (!res) +return NULL_TREE; else if ((TREE_OVERFLOW (res) && !TREE_OVERFLOW (val1) && !TREE_OVERFLOW (val2)) Why is this case different from trunk? Or is it a bug-fix independent of wide-int? the api for division is different for wide-int than it was for double-int. But this is using a tree API (int_const_binop) that didn't change (it returned NULL for / 0 previously). So what makes us arrive here now? (I agree there is a bug in VRP, but it shouldn't manifest itself only on wide-int) Richard. My reading of the code is that is that i changed int_const_binop to return null_tree for this case. On the trunk, only rem returns null_tree for divide by 0, on the wide int branch, both div and rem return null tree. I know that this is going to bring on a string of questions that i do not remember the answers to as to why i made that change. but fold-const.c:int_const_binop_1 now returns null_tree and this is just fallout from that change. Kenny Thanks, Richard
[PATCH] Fix failing assertion in calls.c:store_unaligned_arguments_into_pseudos
Hello, when implementing the new ABI for powerpc64le-linux, I ran into an assertion failure in store_unaligned_arguments_into_pseudos: gcc_assert (args[i].partial % UNITS_PER_WORD == 0); This can happen in the new ABI since we pass "homogeneous structures" consisting of soleley floating point elements of the same type in floating-point registers until they run out, and the rest of the structure in memory. If the structure member type is a 4-byte float, and we only have an odd number of floating point registers available, then args[i].partial can legitimately end up not being a multiple of UNITS_PER_WORD (i.e. 8 on the platform). Now, there are a number of similar checks that args[i].partial is aligned elsewhere in calls.c and functions.c. But for all of those the logic is: if args[i].reg is a REG, then args[i].partial must be a multiple of the word size; but if args[i].regs is a PARALLEL, then args[i].partial can be any arbitrary value. In the powerpc64le-linux use case, the back-end always generates PARALLEL in this situation, so it seemed the middle-end ought to support this -- and it does, except for this one case. However, looking more closely, it seems store_unaligned_arguments_into_pseudos is not really useful for PARALLEL arguments in the first place. What this routine does is load arguments into args[i].aligned_regs. But if we have an argument where args[i].reg is a PARALLEL, args[i].aligned_regs will in fact never be used later on at all! Instead, PARALLEL will always be handled directly via emit_group_move (in load_register_parameters), so the code generated by store_unaligned_arguments_into_pseudos for such cases is simply dead anyway. Thus, I'd suggest to simply have store_unaligned_arguments_into_pseudos skip PARALLEL arguments. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * calls.c (store_unaligned_arguments_into_pseudos): Skip PARALLEL arguments. Index: gcc/gcc/calls.c === --- gcc.orig/gcc/calls.c +++ gcc/gcc/calls.c @@ -981,6 +981,7 @@ store_unaligned_arguments_into_pseudos ( for (i = 0; i < num_actuals; i++) if (args[i].reg != 0 && ! args[i].pass_on_stack + && GET_CODE (args[i].reg) != PARALLEL && args[i].mode == BLKmode && MEM_P (args[i].value) && (MEM_ALIGN (args[i].value) -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[wwwdocs] backend -> back end
Working to address a user question, I noticed that many of our pages use the spelling of "backend" when http://gcc.gnu.org/codingconventions.html suggest "back end" (noun) and "back-end" (adjective). Joseph, if you confirm that back end it is (as a noun), I'll apply the patch below. Gerald Index: contribute.html === RCS file: /cvs/gcc/wwwdocs/htdocs/contribute.html,v retrieving revision 1.81 diff -u -3 -p -r1.81 contribute.html --- contribute.html 18 Aug 2013 20:59:25 - 1.81 +++ contribute.html 10 Nov 2013 20:34:10 - @@ -308,11 +308,11 @@ is a good candidate for being mentioned Larger accomplishments, either as part of a specific project, or long term commitment, merit mention on the front page. Examples include projects -like tree-ssa, new backends, major advances in optimization or standards +like tree-ssa, new back ends, major advances in optimization or standards compliance. The gcc-announce mailing list serves to announce new releases and changes -like frontends or backends being dropped. +like front ends or back ends being dropped. Index: frontends.html === RCS file: /cvs/gcc/wwwdocs/htdocs/frontends.html,v retrieving revision 1.38 diff -u -3 -p -r1.38 frontends.html --- frontends.html 27 Mar 2013 21:53:00 - 1.38 +++ frontends.html 10 Nov 2013 20:34:10 - @@ -33,14 +33,14 @@ are very mature. href="http://www.mercurylang.org/download/gcc-backend.html";>Mercury, a declarative logic/functional language. The University of Melbourne Mercury compiler is written in Mercury; originally it compiled via C but now it also -has a backend that generates assembler directly, using the GCC backend. +has a back end that generates assembler directly, using the GCC back end. http://CobolForGCC.sourceforge.net/";>Cobol For GCC (at an early stage of development). http://www.nongnu.org/gm2/";>GNU Modula-2 implements the PIM2, PIM3, PIM4 and ISO dialects of the language. The compiler -is fully operational with the GCC 4.1.2 backend (on GNU/Linux x86 +is fully operational with the GCC 4.1.2 back end (on GNU/Linux x86 systems). Work is in progress to move the frontend to the GCC trunk. The frontend is mostly written in Modula-2, but includes a bootstrap procedure via a heavily modified version of p2c. Index: lists.html === RCS file: /cvs/gcc/wwwdocs/htdocs/lists.html,v retrieving revision 1.106 diff -u -3 -p -r1.106 lists.html --- lists.html 24 Oct 2013 01:31:10 - 1.106 +++ lists.html 10 Nov 2013 20:34:10 - @@ -95,7 +95,7 @@ before subscribing< http://gcc.gnu.org/ml/jit/";>jit is for discussion and development of http://gcc.gnu.org/wiki/JIT";>libgccjit, an experimental - library for implementing Just-In-Time compilation using gcc as a backend. + library for implementing Just-In-Time compilation using GCC as a back end. The list is intended for both users and developers of the library. Patches for the jit branch should go to both this list and gcc-patches. Index: news.html === RCS file: /cvs/gcc/wwwdocs/htdocs/news.html,v retrieving revision 1.137 diff -u -3 -p -r1.137 news.html --- news.html 12 Aug 2013 00:01:45 - 1.137 +++ news.html 10 Nov 2013 20:34:11 - @@ -1079,10 +1079,10 @@ noticed or fixed defects or made other u May 1, 2000 Richard Earnshaw of ARM Ltd, and Nick Clifton of Cygnus, a Red Hat -company, have contributed a new backend for the Arm and Thumb +company, have contributed a new back end for the Arm and Thumb processors. -The new backend combines code generation for the Arm, the Thumb and +The new back end combines code generation for the Arm, the Thumb and the StrongArm into one compiler, with the target processor and instruction sets being selectable via command line switches. @@ -1260,7 +1260,7 @@ changes from the old GCC 2 sources. September 2, 1999 -Richard Henderson has finished merging the ia32 backend rewrite into the +Richard Henderson has finished merging the ia32 back end rewrite into the mainline GCC sources. The rewrite is designed to improve optimization opportunities for the Pentium II target, but also provides a cleaner way to optimize for the Pentium III, AMD-K7 and other high end ia32 @@ -1537,7 +1537,7 @@ Cygnus donates August 25, 1998 - David Miller donates rewritten sparc backend. + David Miller donates rewritten SPARC back end. August 19, 1998 Index: svn.html === RCS file: /cvs/gcc/wwwdocs/htdocs/svn.html,v retrieving revision 1.189 diff -u -3 -p -r1.189 svn.html --- svn.html14 Sep 2013 00:42:41 - 1.189 +++ svn.html10 Nov 2013 20:34:11 - @@ -880,7 +880,7 @@ be prefixed with the initials of the dis
[PATCH] Avoid duplicate calls to REG_PARM_STACK_SPACE
Hello, this is another tweak to the middle-end to help support the new powerpc64le-linux ABI. In the new ABI, we make a distinction between functions that pass all arguments solely in registers, and those that do not. Only when calling one the latter type (or a varags routine) does the caller need to provide REG_PARM_STACK_SPACE; in the former case, this can be omitted to save stack space. This leads to a definition of REG_PARM_STACK_SPACE that is a lot more complex than usual, which means it would be helpful if the middle-end were to evaluate it sparingly (e.g. once per function definition / call). The middle-end already does cache REG_PARM_STACK_SPACE results, however, this cache it not consistently used. In particular, the locate_and_pad_parm routine will re-evaluate the macro every time it is called, even though all callers of the routine already have a cached copy. This patch adds a new argument to locate_and_pad_parm which is used instead of evaluating REG_PARM_STACK_SPACE, and updates the callers to pass in the correct value. There should be no change in generated code on any platform. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich 2013-11-11 Ulrich Weigand Alan Modra ChangeLog: * function.c (assign_parms): Use all.reg_parm_stack_space instead of re-evaluating REG_PARM_STACK_SPACE target macro. (locate_and_pad_parm): New parameter REG_PARM_STACK_SPACE. Use it instead of evaluating target macro REG_PARM_STACK_SPACE every time. (assign_parm_find_entry_rtl): Update call. * calls.c (initialize_argument_information): Update call. (emit_library_call_value_1): Likewise. * expr.h (locate_and_pad_parm): Update prototype. Index: gcc/gcc/expr.h === --- gcc.orig/gcc/expr.h +++ gcc/gcc/expr.h @@ -521,8 +521,8 @@ extern rtx expand_divmod (int, enum tree rtx, int); #endif -extern void locate_and_pad_parm (enum machine_mode, tree, int, int, tree, -struct args_size *, +extern void locate_and_pad_parm (enum machine_mode, tree, int, int, int, +tree, struct args_size *, struct locate_and_pad_arg_data *); /* Return the CODE_LABEL rtx for a LABEL_DECL, creating it if necessary. */ Index: gcc/gcc/calls.c === --- gcc.orig/gcc/calls.c +++ gcc/gcc/calls.c @@ -1326,6 +1326,7 @@ initialize_argument_information (int num #else args[i].reg != 0, #endif +reg_parm_stack_space, args[i].pass_on_stack ? 0 : args[i].partial, fndecl, args_size, &args[i].locate); #ifdef BLOCK_REG_PADDING @@ -3736,7 +3737,8 @@ emit_library_call_value_1 (int retval, r #else argvec[count].reg != 0, #endif - 0, NULL_TREE, &args_size, &argvec[count].locate); + reg_parm_stack_space, 0, + NULL_TREE, &args_size, &argvec[count].locate); if (argvec[count].reg == 0 || argvec[count].partial != 0 || reg_parm_stack_space > 0) @@ -3823,7 +3825,7 @@ emit_library_call_value_1 (int retval, r #else argvec[count].reg != 0, #endif - argvec[count].partial, + reg_parm_stack_space, argvec[count].partial, NULL_TREE, &args_size, &argvec[count].locate); args_size.constant += argvec[count].locate.size.constant; gcc_assert (!argvec[count].locate.size.var); Index: gcc/gcc/function.c === --- gcc.orig/gcc/function.c +++ gcc/gcc/function.c @@ -2523,6 +2523,7 @@ assign_parm_find_entry_rtl (struct assig } locate_and_pad_parm (data->promoted_mode, data->passed_type, in_regs, + all->reg_parm_stack_space, entry_parm ? data->partial : 0, current_function_decl, &all->stack_args_size, &data->locate); @@ -3511,11 +3512,7 @@ assign_parms (tree fndecl) /* Adjust function incoming argument size for alignment and minimum length. */ -#ifdef REG_PARM_STACK_SPACE - crtl->args.size = MAX (crtl->args.size, - REG_PARM_STACK_SPACE (fndecl)); -#endif - + crtl->args.size = MAX (crtl->args.size, all.reg_parm_stack_space); crtl->args.size = CEIL_ROUND (crtl->args.size, PARM_BOUNDARY / BITS_PER_UNIT); @@ -3719,6 +3716,9 @@ gimplify_parameters (void) IN_REGS is nonzero if the argument will be passed in registers. It will never be set if REG_PARM_STACK_SPACE is not defined. + REG_PARM_STACK_SPACE is the number o
[PATCH, rs6000] Fix little-endian bug in linux-unwind.h
Hello, this fixes another issue related to CR unwinding, this time only on 64-bit little-endian systems. The problem is linux-unwind.h:ppc_fallback_frame_state, which notes that the kernel places the CR value in the low bits of a 64-bit field in the signal handler frame; but the offset of that field is different in little-endian. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich libgcc/ChangeLog: 2013-11-11 Ulrich Weigand Alan Modra * config/rs6000/linux-unwind.h (ppc_fallback_frame_state): Correct location of CR save area for 64-bit little-endian systems. Index: gcc/libgcc/config/rs6000/linux-unwind.h === --- gcc.orig/libgcc/config/rs6000/linux-unwind.h +++ gcc/libgcc/config/rs6000/linux-unwind.h @@ -185,6 +185,7 @@ ppc_fallback_frame_state (struct _Unwind { struct gcc_regs *regs = get_regs (context); struct gcc_vregs *vregs; + long cr_offset; long new_cfa; int i; @@ -206,11 +207,13 @@ ppc_fallback_frame_state (struct _Unwind fs->regs.reg[i].loc.offset = (long) ®s->gpr[i] - new_cfa; } + /* The CR is saved in the low 32 bits of regs->ccr. */ + cr_offset = (long) ®s->ccr - new_cfa; +#ifndef __LITTLE_ENDIAN__ + cr_offset += sizeof (long) - 4; +#endif fs->regs.reg[R_CR2].how = REG_SAVED_OFFSET; - /* CR? regs are always 32-bit and PPC is big-endian, so in 64-bit - libgcc loc.offset needs to point to the low 32 bits of regs->ccr. */ - fs->regs.reg[R_CR2].loc.offset = (long) ®s->ccr - new_cfa - + sizeof (long) - 4; + fs->regs.reg[R_CR2].loc.offset = cr_offset; fs->regs.reg[R_LR].how = REG_SAVED_OFFSET; fs->regs.reg[R_LR].loc.offset = (long) ®s->link - new_cfa; -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[PATCH, rs6000] Fix corner case in unwinding CR values
Hello, this patch fixes a corner case bug in unwinding CR values. The underlying problem is as follows: In order to save the CR in the prologue, two insns are necessary. The first is a mfcr that loads the CR value into a GPR, and the second is a store of the GPR to the stack. The back-end currently tags the first insn with a REG_FRAME_RELATED_EXPR, and marks both as RTX_FRAME_RELATED_P. This causes the middle-end to emit two CFI records, showing the CR saved first in the GPR, and later in its stack slot. (The first record is omitted if there is no possible exception in between.) Now, assume we use -fnon-call-exceptions, and get a signal on an instruction in between those two points, and the signal handler attempts to throw. The unwind logic will read CFI records and determine that in this frame, CR was saved into a GPR; and in the frame below (which must be a signal frame), that GPR was saved onto the stack. What the unwind logic then decides to do is to restore CR from the save slot of that latter GPR; i.e. the unwinder will copy that GPR save slot over the CR save slot of the unwind routine that calls __builtin_eh_return. Unfortunately, this is a problem on a 64-bit big-endian platform, since that GPR save slot is 8 bytes, while the CR save slot is just 4 bytes. The unwinder therefore copies the wrong half of the GPR save slot over the CR save slot ... As second, related, problem will come up with the new ABI. Here, we want to track each CR field in a separate CFI record, even though the values end up sharing the same stack slot. This is not a problem for a CFI record pointing to memory. However, the current dwarf2out.c middle-end logic is unable to handle the situation where multiple registers are saved in the same *register* at the same time ... Both problems can be fixed in the same way, by never emitting a CFI record showing CR saved into a GPR. This is easily done by simply marking only the store to memory as RTX_FRAME_RELATED_P. However, to avoid generating incorrect code we must then prevent and exception point to occur in between the two insns to save CR. The easiest way to do so is to mark the store to the stack slot as an additional user of all CR fields that need to be saved by the current function. This does restrict scheduling other instructions in between the prologue a bit more than otherwise. But since this affects only very special cases (e.g. an instruction that may triggers a floating- point exception, while -fnon-call-exceptions is active), this seems not too bad ... Note that in the epilogue, this is not really a problem, even though we also need two instructions to restore CR. We can (and do) simply leave the CFI record point to the stack slot all the time, which remains valid even after the CR value was read from there (and even after the stack pointer itself was modified). The patch below fixes this problem as described above, and adds a test case that fails without the patch. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (rs6000_emit_prologue): Do not place a RTX_FRAME_RELATED_P marker on the UNSPEC_MOVESI_FROM_CR insn. Instead, add USEs of all modified call-saved CR fields to the insn storing the result to the stack slot, and provide an appropriate REG_FRAME_RELATED_EXPR for that insn. * config/rs6000/rs6000.md ("*crsave"): New insn pattern. * config/rs6000/predicates.md ("crsave_operation"): New predicate. testsuite/ChangeLog: 2013-11-11 Ulrich Weigand * g++.dg/eh/ppc64-sighandle-cr.C: New test. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -21761,21 +21761,9 @@ rs6000_emit_prologue (void) && REGNO (frame_reg_rtx) != cr_save_regno && !(using_static_chain_p && cr_save_regno == 11)) { - rtx set; - cr_save_rtx = gen_rtx_REG (SImode, cr_save_regno); START_USE (cr_save_regno); - insn = emit_insn (gen_movesi_from_cr (cr_save_rtx)); - RTX_FRAME_RELATED_P (insn) = 1; - /* Now, there's no way that dwarf2out_frame_debug_expr is going -to understand '(unspec:SI [(reg:CC 68) ...] UNSPEC_MOVESI_FROM_CR)'. -But that's OK. All we have to do is specify that _one_ condition -code register is saved in this stack slot. The thrower's epilogue -will then restore all the call-saved registers. -We use CR2_REGNO (70) to be compatible with gcc-2.95 on Linux. */ - set = gen_rtx_SET (VOIDmode, cr_save_rtx, -gen_rtx_REG (SImode, CR2_REGNO)); - add_reg_note (insn, REG_FRAME_RELATED_EXPR, set); + emit_insn (gen_movesi_from_cr (cr_save_rtx)); } /* Do any required saving of fpr's. If only one or two to save, do @@ -22086,26 +22074,
[PATCH] Remove dead code in input_gimple_stmt
The following patch removes now dead code from input_gimple_stmt (which also could be quite slow). Type mismatches can only occur at the decl level after the new tree merging code went in and those are handed by wrapping all decls in MEM_REFs to transparently have them view-converted. Bootstrapped and tested on x86_64-unknown-linux-gnu, LTO bootstrap in progress. Richard. 2013-11-11 Richard Biener * gimple-streamer-in.c (input_gimple_stmt): Remove now dead code dealing with type mismatches inside component reference chains. Index: gcc/gimple-streamer-in.c === *** gcc/gimple-streamer-in.c.orig 2013-11-11 12:16:43.0 +0100 --- gcc/gimple-streamer-in.c2013-11-11 12:20:00.956191322 +0100 *** input_gimple_stmt (struct lto_input_bloc *** 158,242 if (TREE_CODE (*opp) == ADDR_EXPR) opp = &TREE_OPERAND (*opp, 0); while (handled_component_p (*opp)) ! { ! if (TREE_CODE (*opp) == COMPONENT_REF) ! { ! /* Fixup FIELD_DECLs in COMPONENT_REFs, they are not handled !by decl merging. */ ! tree field, type, tem; ! tree closest_match = NULL_TREE; ! field = TREE_OPERAND (*opp, 1); ! type = DECL_CONTEXT (field); ! for (tem = TYPE_FIELDS (type); tem; tem = TREE_CHAIN (tem)) ! { ! if (TREE_CODE (tem) != FIELD_DECL) ! continue; ! if (tem == field) ! break; ! if (DECL_NONADDRESSABLE_P (tem) ! == DECL_NONADDRESSABLE_P (field) ! && gimple_compare_field_offset (tem, field)) ! { ! if (types_compatible_p (TREE_TYPE (tem), ! TREE_TYPE (field))) ! break; ! else ! closest_match = tem; ! } ! } ! /* In case of type mismatches across units we can fail !to unify some types and thus not find a proper !field-decl here. */ ! if (tem == NULL_TREE) ! { ! /* Thus, emit a ODR violation warning. */ ! if (warning_at (gimple_location (stmt), 0, ! "use of type %<%E%> with two mismatching " ! "declarations at field %<%E%>", ! type, TREE_OPERAND (*opp, 1))) ! { ! if (TYPE_FIELDS (type)) ! inform (DECL_SOURCE_LOCATION (TYPE_FIELDS (type)), ! "original type declared here"); ! inform (DECL_SOURCE_LOCATION (TREE_OPERAND (*opp, 1)), ! "field in mismatching type declared here"); ! if (TYPE_NAME (TREE_TYPE (field)) ! && (TREE_CODE (TYPE_NAME (TREE_TYPE (field))) ! == TYPE_DECL)) ! inform (DECL_SOURCE_LOCATION ! (TYPE_NAME (TREE_TYPE (field))), ! "type of field declared here"); ! if (closest_match ! && TYPE_NAME (TREE_TYPE (closest_match)) ! && (TREE_CODE (TYPE_NAME ! (TREE_TYPE (closest_match))) == TYPE_DECL)) ! inform (DECL_SOURCE_LOCATION ! (TYPE_NAME (TREE_TYPE (closest_match))), ! "type of mismatching field declared here"); ! } ! /* And finally fixup the types. */ ! TREE_OPERAND (*opp, 0) ! = build1 (VIEW_CONVERT_EXPR, type, ! TREE_OPERAND (*opp, 0)); ! } ! else ! TREE_OPERAND (*opp, 1) = tem; ! } ! else if ((TREE_CODE (*opp) == ARRAY_REF ! || TREE_CODE (*opp) == ARRAY_RANGE_REF) ! && (TREE_CODE (TREE_TYPE (TREE_OPERAND (*opp, 0))) ! != ARRAY_TYPE)) ! { ! /* And ARRAY_REFs to objects that had mismatched types !during symbol merging to avoid ICEs. */ ! TREE_OPERAND (*opp, 0) ! = build1 (VIEW_CONVERT_EXPR, ! build_array_type (TREE_TYPE (*opp), NULL_TREE), ! TREE_OPERAND (*opp, 0)); ! } ! !
[PATCH, rs6000] ELFv2 ABI preparation: Refactor call expanders
Hello, this patch refactors code to expand calls in preparation for adding support for the new little-endian ABI. Currently, the call expanders always generate the same RTX pattern, which is matched by different insns patterns depending on circumstances (ABI, local vs. global calls, ...). This makes it difficult if in some of these circumstances, we'd really need a different RTX pattern, e.g. becuase the call uses certain special registers, or because the call must perform extra actions like saving or restoring the TOC pointer. There is one exception to this logic already in place: when expanding an indirect call on ABI_AIX, the expander calls a routine in rs6000.c. This patch expands on that by handling *all* calls (including sibling calls) on ABI_AIX via a routine in rs6000.c. This should be fully transparent to generated code, although generated RTL is a bit different: we use CALL_INSN_FUNCTION_USAGE to track uses of ABI-defined registers by the call, which has a couple of other advantages: - we can remove some patterns that were duplicated solely to track whether or not a call uses r11; - we can simply the sibcall patterns so they no longer explicitly refer to LR_REGNO; - we can precisely track which calls use r2 to ensure we have correct data flow for the register (this doesn't actually matter for the current code, but will become necessary with the new ABI). Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (rs6000_call_indirect_aix): Rename to ... (rs6000_call_aix): ... this. Handle both direct and indirect calls. Create call insn directly instead of via various gen_... routines. Mention special registers used by the call in CALL_INSN_FUNCTION_USAGE. (rs6000_sibcall_aix): New function. * config/rs6000/rs6000.md (TOC_SAVE_OFFSET_32BIT): Remove. (TOC_SAVE_OFFSET_64BIT): Likewise. (AIX_FUNC_DESC_TOC_32BIT): Likewise. (AIX_FUNC_DESC_TOC_64BIT): Likewise. (AIX_FUNC_DESC_SC_32BIT): Likewise. (AIX_FUNC_DESC_SC_64BIT): Likewise. ("call" expander): Call rs6000_call_aix. ("call_value" expander): Likewise. ("call_indirect_aix"): Replace this pattern ... ("call_indirect_aix_nor11"): ... and this pattern ... ("*call_indirect_aix"): ... by this insn pattern. ("call_value_indirect_aix"): Replace this pattern ... ("call_value_indirect_aix_nor11"): ... and this pattern ... ("*call_value_indirect_aix"): ... by this insn pattern. ("*call_nonlocal_aix32", "*call_nonlocal_aix64"): Replace by ... ("*call_nonlocal_aix"): ... this pattern. ("*call_value_nonlocal_aix32", "*call_value_nonlocal_aix64"): Replace ("*call_value_nonlocal_aix"): ... by this pattern. ("*call_local_aix"): New insn pattern. ("*call_value_local_aix"): Likewise. ("sibcall" expander): Call rs6000_sibcall_aix. ("sibcall_value" expander): Likewise. Move earlier in file. ("*sibcall_nonlocal_aix"): Replace by ... ("*sibcall_aix"): ... this pattern. ("*sibcall_value_nonlocal_aix"): Replace by ... ("*sibcall_value_aix"): ... this pattern. * config/rs6000/rs6000-protos.h (rs6000_call_indirect_aix): Remove. (rs6000_call_aix): Add prototype. (rs6000_sibcall_aix): Likewise. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -30599,118 +30599,138 @@ rs6000_legitimate_constant_p (enum machi } -/* A function pointer under AIX is a pointer to a data area whose first word - contains the actual address of the function, whose second word contains a - pointer to its TOC, and whose third word contains a value to place in the - static chain register (r11). Note that if we load the static chain, our - "trampoline" need not have any executable code. */ + +/* Expand code to perform a call under the AIX ABI. */ void -rs6000_call_indirect_aix (rtx value, rtx func_desc, rtx flag) +rs6000_call_aix (rtx value, rtx func_desc, rtx flag, rtx cookie) { + rtx toc_reg = gen_rtx_REG (Pmode, TOC_REGNUM); + rtx toc_load = NULL_RTX; + rtx toc_restore = NULL_RTX; rtx func_addr; - rtx toc_reg; - rtx sc_reg; - rtx stack_ptr; - rtx stack_toc_offset; - rtx stack_toc_mem; - rtx func_toc_offset; - rtx func_toc_mem; - rtx func_sc_offset; - rtx func_sc_mem; + rtx abi_reg = NULL_RTX; + rtx call[4]; + int n_call; rtx insn; - rtx (*call_func) (rtx, rtx, rtx, rtx); - rtx (*call_value_func) (rtx, rtx, rtx, rtx, rtx); - - stack_ptr = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM); - toc_reg = gen_rtx_REG (Pmode, TOC_REGNUM); - /* Load up address of the actual function. */ - func_desc = force_reg (Pmode, func_desc); - func_addr = gen_reg_rtx
[PATCH, rs6000] ELFv2 ABI preparation: Refactor rs6000_function_arg
Hello, another patch in preparation for the new ABI. When calling a function in the absence of a prototype, we need to pass arguments that are normally passed in floating-point or vector registers also on the stack and/or in GPRs. The code currently handling this in rs6000_function_arg is implemented in two separate places, one that handles floating-point arguments and one that handles vector arguments. This is probably reasonable with the current ABI, since in the vector case, everything is much simpler than in the floating-point case. However, with the new ABI, we may now pass more complex arguments in vector registers, in which case we might need to duplicate more of the logic currently used only for floating-point arguments. To avoid that duplication, this patch moves the logic to handle argument duplication on stack or in GPRs out to a pair of new helper routines, and uses them for both floating-point and vector arguments. For the current ABI, the result should be exactly the same. No change in generated code intented. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (rs6000_psave_function_arg): New function. (rs6000_finish_function_arg): Likewise. (rs6000_function_arg): Use rs6000_psave_function_arg and rs6000_finish_function_arg to handle both vector and floating point arguments that are also passed in GPRs / the stack. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -9511,6 +9511,83 @@ rs6000_mixed_function_arg (enum machine_ return gen_rtx_PARALLEL (mode, gen_rtvec_v (k, rvec)); } +/* We have an argument of MODE and TYPE that goes into FPRs or VRs, + but must also be copied into the parameter save area starting at + offset ALIGN_WORDS. Fill in RVEC with the elements corresponding + to the GPRs and/or memory. Return the number of elements used. */ + +static int +rs6000_psave_function_arg (enum machine_mode mode, const_tree type, + int align_words, rtx *rvec) +{ + int k = 0; + + if (align_words < GP_ARG_NUM_REG) +{ + int n_words = rs6000_arg_size (mode, type); + + if (align_words + n_words > GP_ARG_NUM_REG + || (TARGET_32BIT && TARGET_POWERPC64)) + { + /* If this is partially on the stack, then we only +include the portion actually in registers here. */ + enum machine_mode rmode = TARGET_32BIT ? SImode : DImode; + int i = 0; + + if (align_words + n_words > GP_ARG_NUM_REG) + { + /* Not all of the arg fits in gprs. Say that it goes in memory +too, using a magic NULL_RTX component. Also see comment in +rs6000_mixed_function_arg for why the normal +function_arg_partial_nregs scheme doesn't work in this case. */ + rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, NULL_RTX, const0_rtx); + } + + do + { + rtx r = gen_rtx_REG (rmode, GP_ARG_MIN_REG + align_words); + rtx off = GEN_INT (i++ * GET_MODE_SIZE (rmode)); + rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, off); + } + while (++align_words < GP_ARG_NUM_REG && --n_words != 0); + } + else + { + /* The whole arg fits in gprs. */ + rtx r = gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words); + rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, const0_rtx); + } +} + else +{ + /* It's entirely in memory. */ + rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, NULL_RTX, const0_rtx); +} + + return k; +} + +/* RVEC is a vector of K components of an argument of mode MODE. + Construct the final function_arg return value from it. */ + +static rtx +rs6000_finish_function_arg (enum machine_mode mode, rtx *rvec, int k) +{ + gcc_assert (k >= 1); + + /* Avoid returning a PARALLEL in the trivial cases. */ + if (k == 1) +{ + if (XEXP (rvec[0], 0) == NULL_RTX) + return NULL_RTX; + + if (GET_MODE (XEXP (rvec[0], 0)) == mode) + return XEXP (rvec[0], 0); +} + + return gen_rtx_PARALLEL (mode, gen_rtvec_v (k, rvec)); +} + /* Determine where to put an argument to a function. Value is zero to push the argument on the stack, or a hard register in which to store the argument. @@ -9580,32 +9657,25 @@ rs6000_function_arg (cumulative_args_t c } if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named)) -if (TARGET_64BIT && ! cum->prototype) - { - /* Vector parameters get passed in vector register - and also in GPRs or memory, in absence of prototype. */ - int align_words; - rtx slot; - align_words = (cum->words + 1) & ~1; +{ + rtx rvec[GP_ARG_NUM_REG + 1]; + rtx r; + int k = 0; -
[PATCH] Fix alias machinery to deal with DECL_BIT_FIELD_REPRESENTATIVE in COMPONENT_REFs
The following patch is necessary to make DECL_BIT_FIELD_REPRESENTATIVE useable in COMPONENT_REFs (which is the easiest way to get some bitfield lowering). An earlier patch passed bootstrapped and testing on x86_64-unknown-linux-gnu (with bitfield lowering), re-testing after minor changes. 2013-11-11 Richard Biener * tree-ssa-alias.c (nonoverlapping_component_refs_of_decl_p): Properly handle DECL_BIT_FIELD_REPRESENTATIVE occuring as COMPONENT_REF operand. * alias.c (nonoverlapping_component_refs_p): Likewise. * stor-layout.c (start_bitfield_representative): Mark DECL_BIT_FIELD_REPRESENTATIVE as DECL_NONADDRESSABLE_P. Index: gcc/tree-ssa-alias.c === *** gcc/tree-ssa-alias.c.orig 2013-11-11 10:32:25.0 +0100 --- gcc/tree-ssa-alias.c2013-11-11 11:35:16.042386275 +0100 *** nonoverlapping_component_refs_of_decl_p *** 829,841 if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) goto may_overlap; - /* Different fields of the same record type cannot overlap. -??? Bitfields can overlap at RTL level so punt on them. */ if (field1 != field2) { component_refs1.release (); component_refs2.release (); ! return !(DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2)); } } --- 829,848 if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) goto may_overlap; if (field1 != field2) { component_refs1.release (); component_refs2.release (); ! /* A field and its representative need to be considered the !same. */ ! if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2 ! || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1) ! return false; ! /* Different fields of the same record type cannot overlap. !??? Bitfields can overlap at RTL level so punt on them. */ ! if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2)) ! return false; ! return true; } } Index: gcc/stor-layout.c === *** gcc/stor-layout.c.orig 2013-11-11 12:05:16.0 +0100 --- gcc/stor-layout.c 2013-11-11 13:42:14.947161958 +0100 *** start_bitfield_representative (tree fiel *** 1747,1752 --- 1747,1757 DECL_SIZE_UNIT (repr) = DECL_SIZE_UNIT (field); DECL_PACKED (repr) = DECL_PACKED (field); DECL_CONTEXT (repr) = DECL_CONTEXT (field); + /* There are no indirect accesses to this field. If we introduce + some then they have to use the record alias set. This makes + sure to properly conflict with [indirect] accesses to addressable + fields of the bitfield group. */ + DECL_NONADDRESSABLE_P (repr) = 1; return repr; } Index: gcc/alias.c === *** gcc/alias.c.orig2013-11-11 15:29:12.0 +0100 --- gcc/alias.c 2013-11-11 15:29:17.054068720 +0100 *** nonoverlapping_component_refs_p (const_r *** 2302,2310 found: /* If we're left with accessing different fields of a structure, then no !possible overlap, unless they are both bitfields. */ if (TREE_CODE (typex) == RECORD_TYPE && fieldx != fieldy) ! return !(DECL_BIT_FIELD (fieldx) && DECL_BIT_FIELD (fieldy)); /* The comparison on the current field failed. If we're accessing a very nested structure, look at the next outer level. */ --- 2302,2316 found: /* If we're left with accessing different fields of a structure, then no !possible overlap, unless they are both bitfields or one is !the representative of the other. */ if (TREE_CODE (typex) == RECORD_TYPE && fieldx != fieldy) ! { ! if (DECL_BIT_FIELD_REPRESENTATIVE (fieldx) == fieldy ! || DECL_BIT_FIELD_REPRESENTATIVE (fieldy) == fieldx) ! return false; ! return !(DECL_BIT_FIELD (fieldx) && DECL_BIT_FIELD (fieldy)); ! } /* The comparison on the current field failed. If we're accessing a very nested structure, look at the next outer level. */
[PATCH, rs0000] ELFv2 ABI preparation: Refactor some uses of ABI_AIX
Hello, this is another patch to prepare for the new ABI. Since this will introduce a new setting of DEFAULT_ABI, which will have to be treated like ABI_AIX in many cases, it would be better if there were fewer explicit references to ABI_AIX in the back-end. This patch eliminates a number of them: - DEFAULT_ABI is either ABI_V4, ABI_DARWIN, or ABI_AIX, so any test for two of these can be replaced by a test for the third - there are some code paths that are never used on ABI_DARWIN. Here, any test for ABI_AIX can be replaced by one for ABI_V4. (This is particularly obvious for the load_toc_v4_... patterns that even have v4 in their name ...) - rs6000_elf_declare_function_name had a duplicated test for ABI_AIX in a code block that already tests for ABI_AIX None of this is intended to change generated code on any platform. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (rs6000_option_override_internal): Replace "DEFAULT_ABI != ABI_AIX" test by testing for ABI_V4 or ABI_DARWIN. (rs6000_savres_strategy): Likewise. (rs6000_return_addr): Likewise. (rs6000_emit_load_toc_table): Replace "DEFAULT_ABI != ABI_AIX" by testing for ABI_V4 (since ABI_DARWIN is impossible here). (rs6000_emit_prologue): Likewise. (legitimate_lo_sum_address_p): Simplify DEFAULT_ABI test. (rs6000_elf_declare_function_name): Remove duplicated test. * config/rs6000/rs6000.md ("load_toc_v4_PIC_1"): Explicitly test for ABI_V4 (instead of "DEFAULT_ABI != ABI_AIX" test). ("load_toc_v4_PIC_1_normal"): Likewise. ("load_toc_v4_PIC_1_476"): Likewise. ("load_toc_v4_PIC_1b"): Likewise. ("load_toc_v4_PIC_1b_normal"): Likewise. ("load_toc_v4_PIC_1b_476"): Likewise. ("load_toc_v4_PIC_2"): Likewise. ("load_toc_v4_PIC_3b"): Likewise. ("load_toc_v4_PIC_3c"): Likewise. * config/rs6000/rs6000.h (RS6000_REG_SAVE): Simplify DEFAULT_ABI test. (RS6000_SAVE_AREA): Likewise. (FP_ARG_MAX_REG): Likewise. (RETURN_ADDRESS_OFFSET): Likewise. * config/rs6000/sysv.h (TARGET_TOC): Test for ABI_V4 instead of ABI_AIX. (SUBTARGET_OVERRIDE_OPTIONS): Likewise. (MINIMAL_TOC_SECTION_ASM_OP): Likewise. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -3669,7 +3669,7 @@ rs6000_option_override_internal (bool gl /* We should always be splitting complex arguments, but we can't break Linux and Darwin ABIs at the moment. For now, only AIX is fixed. */ - if (DEFAULT_ABI != ABI_AIX) + if (DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN) targetm.calls.split_complex_arg = NULL; } @@ -6384,7 +6384,7 @@ legitimate_lo_sum_address_p (enum machin { bool large_toc_ok; - if (DEFAULT_ABI != ABI_AIX && DEFAULT_ABI != ABI_DARWIN && flag_pic) + if (DEFAULT_ABI == ABI_V4 && flag_pic) return false; /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls push_reload from reload pass code. LEGITIMIZE_RELOAD_ADDRESS @@ -19605,7 +19605,8 @@ rs6000_savres_strategy (rs6000_stack_t * by the static chain. It would require too much fiddling and the static chain is rarely used anyway. FPRs are saved w.r.t the stack pointer on Darwin, and AIX uses r1 or r12. */ - if (using_static_chain_p && DEFAULT_ABI != ABI_AIX) + if (using_static_chain_p + && (DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN)) strategy |= ((DEFAULT_ABI == ABI_DARWIN ? 0 : SAVE_INLINE_FPRS) | SAVE_INLINE_GPRS | SAVE_INLINE_VRS | REST_INLINE_VRS); @@ -20301,7 +20302,8 @@ rs6000_return_addr (int count, rtx frame /* Currently we don't optimize very well between prolog and body code and for PIC code the code can be actually quite bad, so don't try to be too clever here. */ - if (count != 0 || (DEFAULT_ABI != ABI_AIX && flag_pic)) + if (count != 0 + || ((DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN) && flag_pic)) { cfun->machine->ra_needs_full_frame = 1; @@ -20449,7 +20451,7 @@ rs6000_emit_load_toc_table (int fromprol rtx dest; dest = gen_rtx_REG (Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM); - if (TARGET_ELF && TARGET_SECURE_PLT && DEFAULT_ABI != ABI_AIX && flag_pic) + if (TARGET_ELF && TARGET_SECURE_PLT && DEFAULT_ABI == ABI_V4 && flag_pic) { char buf[30]; rtx lab, tmp1, tmp2, got; @@ -20477,7 +20479,7 @@ rs6000_emit_load_toc_table (int fromprol emit_insn (gen_load_toc_v4_pic_si ()); emit_move_insn (dest, gen_rtx_REG (Pmode, LR_REGNO)); } - else if (TARGET_ELF && DEFAULT_ABI != ABI_AIX && flag_pic == 2) + else if (TARGET_ELF && DEFA
[PATCH, rs6000] ELFv2 ABI preparation: Remove USE_FP/ALTIVEC_FOR_ARG_P type arg
Hello, another patch in preparation for the new ABI. USE_FP_FOR_ARG_P and USE_ALTIVEC_FOR_ARG_P take a TYPE argument which they never use. Since passing a correct TYPE for the homogeneous struct case would be a bit problematic, it seems cleaner to just remove the unused argument. No change in generated code intented. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (USE_FP_FOR_ARG_P): Remove TYPE argument. (USE_ALTIVEC_FOR_ARG_P): Likewise. (rs6000_darwin64_record_arg_advance_recurse): Update uses. (rs6000_function_arg_advance_1):Likewise. (rs6000_darwin64_record_arg_recurse): Likewise. (rs6000_function_arg): Likewise. (rs6000_arg_partial_bytes): Likewise. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -8434,13 +8434,13 @@ rs6000_member_type_forces_blk (const_tre } /* Nonzero if we can use a floating-point register to pass this arg. */ -#define USE_FP_FOR_ARG_P(CUM,MODE,TYPE)\ +#define USE_FP_FOR_ARG_P(CUM,MODE) \ (SCALAR_FLOAT_MODE_P (MODE) \ && (CUM)->fregno <= FP_ARG_MAX_REG \ && TARGET_HARD_FLOAT && TARGET_FPRS) /* Nonzero if we can use an AltiVec register to pass this arg. */ -#define USE_ALTIVEC_FOR_ARG_P(CUM,MODE,TYPE,NAMED) \ +#define USE_ALTIVEC_FOR_ARG_P(CUM,MODE,NAMED) \ (ALTIVEC_OR_VSX_VECTOR_MODE (MODE) \ && (CUM)->vregno <= ALTIVEC_ARG_MAX_REG \ && TARGET_ALTIVEC_ABI \ @@ -8889,7 +8889,7 @@ rs6000_darwin64_record_arg_advance_recur if (TREE_CODE (ftype) == RECORD_TYPE) rs6000_darwin64_record_arg_advance_recurse (cum, ftype, bitpos); - else if (USE_FP_FOR_ARG_P (cum, mode, ftype)) + else if (USE_FP_FOR_ARG_P (cum, mode)) { unsigned n_fpregs = (GET_MODE_SIZE (mode) + 7) >> 3; rs6000_darwin64_record_arg_advance_flush (cum, bitpos, 0); @@ -8930,7 +8930,7 @@ rs6000_darwin64_record_arg_advance_recur else cum->words += n_fpregs; } - else if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, 1)) + else if (USE_ALTIVEC_FOR_ARG_P (cum, mode, 1)) { rs6000_darwin64_record_arg_advance_flush (cum, bitpos, 0); cum->vregno++; @@ -8993,7 +8993,7 @@ rs6000_function_arg_advance_1 (CUMULATIV { bool stack = false; - if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, named)) + if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named)) { cum->vregno++; if (!TARGET_ALTIVEC) @@ -9366,7 +9366,7 @@ rs6000_darwin64_record_arg_recurse (CUMU if (TREE_CODE (ftype) == RECORD_TYPE) rs6000_darwin64_record_arg_recurse (cum, ftype, bitpos, rvec, k); - else if (cum->named && USE_FP_FOR_ARG_P (cum, mode, ftype)) + else if (cum->named && USE_FP_FOR_ARG_P (cum, mode)) { unsigned n_fpreg = (GET_MODE_SIZE (mode) + 7) >> 3; #if 0 @@ -9394,7 +9394,7 @@ rs6000_darwin64_record_arg_recurse (CUMU if (mode == TFmode || mode == TDmode) cum->fregno++; } - else if (cum->named && USE_ALTIVEC_FOR_ARG_P (cum, mode, ftype, 1)) + else if (cum->named && USE_ALTIVEC_FOR_ARG_P (cum, mode, 1)) { rs6000_darwin64_record_arg_flush (cum, bitpos, rvec, k); rvec[(*k)++] @@ -9579,7 +9579,7 @@ rs6000_function_arg (cumulative_args_t c /* Else fall through to usual handling. */ } - if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, named)) + if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named)) if (TARGET_64BIT && ! cum->prototype) { /* Vector parameters get passed in vector register @@ -9707,7 +9707,7 @@ rs6000_function_arg (cumulative_args_t c if (mode == TDmode && (cum->fregno % 2) == 1) cum->fregno++; - if (USE_FP_FOR_ARG_P (cum, mode, type)) + if (USE_FP_FOR_ARG_P (cum, mode)) { rtx rvec[GP_ARG_NUM_REG + 1]; rtx r; @@ -9823,7 +9823,7 @@ rs6000_arg_partial_bytes (cumulative_arg if (DEFAULT_ABI == ABI_V4) return 0; - if (USE_ALTIVEC_FOR_ARG_P (cum, mode, type, named) + if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named) && cum->nargs_prototype >= 0) return 0; @@ -9833,7 +9833,7 @@ rs6000_arg_partial_bytes (cumulative_arg align_words = rs6000_parm_start (mode, type, cum->words); - if (USE_FP_FOR_ARG_P (cum, mode, type)) + if (USE_FP_FOR_ARG_P (cum, mode)) { /* If we are passing this arg in the fixed parameter save area (gprs or memory) as well as fprs, then this function should -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.w
[PATCH, rs6000] ELFv2 ABI preparation: Refactor rs6000_arg_partial_bytes
Hello, this is the final patch preparing for the new ABI. The logic in rs6000_arg_partial_bytes is a bit complex, since it apparently still contains remnants from the time where this routine was used even for arguments that are returned both in GPRs and FPRs/VRs (*and* memory). These days, all such cases are handled completely in rs6000_function_arg so rs6000_arg_partial_bytes should always return 0; which it *does*, even though in a somewhat complex way. This complex logic would make handling homogeneous structs more difficult than it needs to be. Therefore, this patch simplifies the logic by making explicit the fact that rs6000_arg_partial_bytes does not actually need to handle the cases described above. No change in generated code expected. Tested on powerpc64-linux and powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (rs6000_arg_partial_bytes): Simplify logic by making use of the fact that for vector / floating point arguments passed both in VRs/FPRs and in the fixed parameter area, the partial bytes mechanism is in fact not used. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -9835,15 +9835,25 @@ rs6000_arg_partial_bytes (cumulative_arg tree type, bool named) { CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v); + bool passed_in_gprs = true; int ret = 0; int align_words; if (DEFAULT_ABI == ABI_V4) return 0; - if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named) - && cum->nargs_prototype >= 0) -return 0; + if (USE_ALTIVEC_FOR_ARG_P (cum, mode, named)) +{ + /* If we are passing this arg in the fixed parameter save area + (gprs or memory) as well as VRs, we do not use the partial +bytes mechanism; instead, rs6000_function_arg wil return a +PARALLEL including a memory element as necessary. */ + if (TARGET_64BIT && ! cum->prototype) + return 0; + + /* Otherwise, we pass in VRs only. No partial copy possible. */ + passed_in_gprs = false; +} /* In this complicated case we just disable the partial_nregs code. */ if (TARGET_MACHO && rs6000_darwin64_struct_check_p (mode, type)) @@ -9853,24 +9863,27 @@ rs6000_arg_partial_bytes (cumulative_arg if (USE_FP_FOR_ARG_P (cum, mode)) { + unsigned long n_fpreg = (GET_MODE_SIZE (mode) + 7) >> 3; + /* If we are passing this arg in the fixed parameter save area -(gprs or memory) as well as fprs, then this function should -return the number of partial bytes passed in the parameter -save area rather than partial bytes passed in fprs. */ + (gprs or memory) as well as FPRs, we do not use the partial +bytes mechanism; instead, rs6000_function_arg wil return a +PARALLEL including a memory element as necessary. */ if (type && (cum->nargs_prototype <= 0 || (DEFAULT_ABI == ABI_AIX && TARGET_XL_COMPAT && align_words >= GP_ARG_NUM_REG))) return 0; - else if (cum->fregno + ((GET_MODE_SIZE (mode) + 7) >> 3) - > FP_ARG_MAX_REG + 1) + + /* Otherwise, we pass in FPRs only. Check for partial copies. */ + passed_in_gprs = false; + if (cum->fregno + n_fpreg > FP_ARG_MAX_REG + 1) ret = (FP_ARG_MAX_REG + 1 - cum->fregno) * 8; - else if (cum->nargs_prototype >= 0) - return 0; } - if (align_words < GP_ARG_NUM_REG + if (passed_in_gprs + && align_words < GP_ARG_NUM_REG && GP_ARG_NUM_REG < align_words + rs6000_arg_size (mode, type)) ret = (GP_ARG_NUM_REG - align_words) * (TARGET_32BIT ? 4 : 8); -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PING^2] Re: [PATCH] Caller instrumentation with -finstrument-calls
Hi, I apologize for long time for the review. > > diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c > > index 8f7f5e5..f3ad003 100644 > > --- a/gcc/c-family/c-common.c > > +++ b/gcc/c-family/c-common.c > > @@ -343,6 +343,8 @@ static tree handle_tls_model_attribute (tree *, tree, > > tree, int, > > bool *); > > static tree handle_no_instrument_function_attribute (tree *, tree, > > tree, int, bool *); > > +static tree handle_no_instrument_calls_attribute (tree *, tree, > > +tree, int, bool *); > > static tree handle_malloc_attribute (tree *, tree, tree, int, bool *); > > static tree handle_returns_twice_attribute (tree *, tree, tree, int, bool > > *); > > static tree handle_no_limit_stack_attribute (tree *, tree, tree, int, > > @@ -658,6 +660,9 @@ const struct attribute_spec c_common_attribute_table[] = > >{ "no_instrument_function", 0, 0, true, false, false, > > handle_no_instrument_function_attribute, > > false }, > > + { "no_instrument_calls", 0, 0, true, false, false, > > + handle_no_instrument_calls_attribute, > > + false }, So you are adding no_instrument_calls in addition to no_instrument_function and moreover you add command line option to disable/enable this per line basis. I see this follow the pre-existing functio ninstrumentatio ncode, but can't we just make this part of the function instrumentation instead of adding three new knobs for this? (i.e. have just -finstrument-calls that will be controlled same way as function instrumentation). > > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c > > index 8170a80..4657e48 100644 > > --- a/gcc/c/c-decl.c > > +++ b/gcc/c/c-decl.c > > @@ -2287,6 +2287,8 @@ merge_decls (tree newdecl, tree olddecl, tree > > newtype, tree oldtype) > > DECL_NO_LIMIT_STACK (newdecl) |= DECL_NO_LIMIT_STACK (olddecl); > > DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (newdecl) > > |= DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (olddecl); > > + DECL_NO_INSTRUMENT_CALLS_BEFORE_AFTER (newdecl) > > + |= DECL_NO_INSTRUMENT_CALLS_BEFORE_AFTER (olddecl); > > TREE_THIS_VOLATILE (newdecl) |= TREE_THIS_VOLATILE (olddecl); > > DECL_IS_MALLOC (newdecl) |= DECL_IS_MALLOC (olddecl); > > DECL_IS_OPERATOR_NEW (newdecl) |= DECL_IS_OPERATOR_NEW (olddecl); You do not need to introduce new DECL flag for this. We used to do this in dark ages before we had attributes in place, but this time we usually just look up the given attribute. > > +@item -finstrument-calls > > +@opindex finstrument-calls > > +Generate instrumentation calls immediately before and after each > > +function call. The following profiling functions will be called with > > +the address of the function that is called between them. Use > > +@code{__builtin_return_address(0)} inside the profiling functions to > > +get the addresses from where they are called. > > + > > +@smallexample > > +void __gnu_profile_call_before (void *fn); > > +void __gnu_profile_call_after (void *fn); > > +@end smallexample I would expect this to be useful i.e. for dynamic callgraph construction. In this case I would expect to have two parameters (calling function and called function) instead of just one (calling function, right?) > > diff --git a/gcc/gimplify.c b/gcc/gimplify.c > > index e2ae893..8401278 100644 > > --- a/gcc/gimplify.c > > +++ b/gcc/gimplify.c > > @@ -89,6 +89,8 @@ static struct gimplify_omp_ctx *gimplify_omp_ctxp; > > > > /* Forward declaration. */ > > static enum gimplify_status gimplify_compound_expr (tree *, gimple_seq *, > > bool); > > +static bool flag_instrument_calls_exclude_p (tree fndecl); > > +static bool flag_instrument_functions_exclude_p (tree fndecl); > > > > /* Mark X addressable. Unlike the langhook we expect X to be in gimple > > form and we don't do any syntax checking. */ > > @@ -1153,6 +1155,63 @@ build_stack_save_restore (gimple *save, gimple > > *restore) > > 1, tmp_var); > > } > > > > +/* Returns the function decl that corresponds the function called in > > + CALL_EXPR if call instrumentation is enabled. */ > > + > > +static tree > > +addr_expr_for_call_instrumentation (tree call_expr) > > +{ > > + tree addr_expr = NULL_TREE; > > + > > + if (!gimplify_ctxp->into_ssa && flag_instrument_calls_before_after > > + && !DECL_NO_INSTRUMENT_CALLS_BEFORE_AFTER (current_function_decl) > > + && !flag_instrument_calls_exclude_p (current_function_decl)) > > +{ > > + tree fndecl = get_callee_fndecl (call_expr); > > + if (fndecl) > > +{ > > + if (!DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (fndecl) > > + && !flag_instrument_functions_exclude_p (fndecl)) Here you can just use lookup_attribute ("no_instrument", attributes) != NULL instead of DECL_NO_
Re: RFA: Fix PR middle-end/59049
On Fri, Nov 08, 2013 at 05:11:39PM +, Joern Rennecke wrote: > On 8 November 2013 15:50, Steven Bosscher wrote: > > Even with the gimple opts disabled, a const-const comparison would > > normally be folded by the RTL expanders. > > Well, in this spirit, attached is another way to address the RTL side > of the problem. > As mention in the PR, the tree side of the problem started showing up > in r204194. > 2013-11-08 Joern Rennecke > > PR middle-end/59049 > * expmed.c (emit_store_flag): Fail for const-const comparison. LGTM (I'd just write return NULL_RTX; instead of return 0;). > --- expmed.c (revision 204568) > +++ expmed.c (working copy) > @@ -5401,6 +5401,13 @@ emit_store_flag (rtx target, enum rtx_co >rtx subtarget; >rtx tem, last, trueval; > > + /* If we compare constants, we shouldn't use a store-flag operation, > + but a constant load. We can get there via the vanilla route that > + usually generates a compare-branch sequence, but will in this case > + fold the comparison to a constant, and thus elide the branch. */ > + if (CONSTANT_P (op0) && CONSTANT_P (op1)) > +return 0; > + >tem = emit_store_flag_1 (target, code, op0, op1, mode, unsignedp, > normalizep, > target_mode); >if (tem) Jakub
[PATCH, rs6000] ELFv2 ABI 3/8: Track single CR fields in DWARF CFI
Hello, this patch adds another ELFv2 ABI feature: explicit tracking of CR fields in DWARF CFI. In the current ABI, DWARF CFI contains only a single record describing the save location of the whole CR field. It is implicit that all (or at least all call-clobbered) fields are present at that location. Now, if you use the instructions that save and restore the whole CR at once, this approach might seem reasonable. Unfortunately, with current POWER processors, those instructions tend to be significantly slower that those that access only single CR fields. In particular in routines where only one or two CR fields are actually clobbered and need to be saved, we could improve performance of prolog and epilog code by saving/restoring only selected CR fields. However, this is not possible in the current ABI since there is no way to describe this fact in the CFI. With the ELFv2 ABI, every CR field gets its own CFI record (using the register numbers 68 .. 75 to stand for CR0 .. CR7). Now, those fields will still usually be saved in the same 4-byte field on the stack. The semantics of a CFI record for field CRx is that the memory location holds 4 bytes, and the 4-bit nibble corresponding to CRx within those 4 bytes hold the CRx value to be restored. The one problem with this scheme is the way uw_install_context tries to modify saved valued when unwinding the stack: it will simply copy over the whole field into the save slot of the unwinder routine (that calls __builtin_eh_return). This clearly does not work if multiple CR fields need to be restored independently. To fix this, the prolog/epilog code for unwinder routines will use *multiple* stack slots, one for each call-saved CR fields, and save and restore those fields to and from their own slot. This will allow uw_install_context to install values for multiple fields. (Note that there is already precedent for unwinder routines being treated specially in the rs6000.c prologue/epilogue code ...) Bye, Ulrich gcc/ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (struct rs6000_stack): New member ehcr_offset. (rs6000_stack_info): For ABI_ELFv2, allocate space for separate CR field save areas if the function calls __builtin_eh_return. (rs6000_emit_move_from_cr): New function. (rs6000_emit_prologue): Use it. For ABI_ELFv2, generate separate CFI records for each saved CR field. For functions that call __builtin_eh_return, save all CR fields into separate slots. (restore_saved_cr): For ABI_ELFv2, generate separate CFA_RESTORE entries for each saved CR field. (add_crlr_cfa_restore): Likewise. (rs6000_emit_epilogue): For ABI_ELFv2, if the function calls __builtin_eh_return, restore each CR field from its own slot. libgcc/ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/linux-unwind.h (R_CR3, R_CR4): New macros. (ppc_fallback_frame_state) [_CALL_ELF == 2]: Create CFI entry for CR3 and CR4. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -97,6 +97,7 @@ typedef struct rs6000_stack { int spe_gp_save_offset; /* offset to save spe 64-bit gprs */ int varargs_save_offset; /* offset to save the varargs registers */ int ehrd_offset; /* offset to EH return data */ + int ehcr_offset; /* offset to EH CR field data */ int reg_size;/* register size (4 or 8) */ HOST_WIDE_INT vars_size; /* variable save area size */ int parm_size; /* outgoing parameter size */ @@ -19847,6 +19848,7 @@ rs6000_stack_info (void) rs6000_stack_t *info_ptr = &stack_info; int reg_size = TARGET_32BIT ? 4 : 8; int ehrd_size; + int ehcr_size; int save_align; int first_gp; HOST_WIDE_INT non_fixed_size; @@ -19940,6 +19942,18 @@ rs6000_stack_info (void) else ehrd_size = 0; + /* In the ELFv2 ABI, we also need to allocate space for separate + CR field save areas if the function calls __builtin_eh_return. */ + if (DEFAULT_ABI == ABI_ELFv2 && crtl->calls_eh_return) +{ + /* This hard-codes that we have three call-saved CR fields. */ + ehcr_size = 3 * reg_size; + /* We do *not* use the regular CR save mechanism. */ + info_ptr->cr_save_p = 0; +} + else +ehcr_size = 0; + /* Determine various sizes. */ info_ptr->reg_size = reg_size; info_ptr->fixed_size = RS6000_SAVE_AREA; @@ -20009,6 +20023,8 @@ rs6000_stack_info (void) } else info_ptr->ehrd_offset = info_ptr->gp_save_offset - ehrd_size; + + info_ptr->ehcr_offset = info_ptr->ehrd_offset - ehcr_size; info_ptr->cr_save_offset = reg_size; /* first word when 64-bit. */ info_ptr->lr_save_offset = 2*reg_size; break; @@ -20071,6 +20087,7 @@ rs6000_st
[PATCH, rs6000] ELFv2 ABI 2/8: Remove function descriptors
Hello, this patch adds the first major feature of the ELFv2 ABI: removal of function descriptors. In the current ABI, it is the responsibility of a caller to set up the TOC pointer needed by the callee by loading the correct value into r2. This typically happens in one of these ways: - For a direct function call within the same module, caller and callee share the same TOC, so r2 already contains the correct value. - For a direct function call to a function in another module, the linker will interpose a PLT stub which reloads r2 to be appropriate for the callee, with help from ld.so. - For an indirect function call, the "function pointer" points to function descriptor that holds the target function address as well as the appropriate TOC value (and a static chain value for nested functions). The caller must load those values up before transfering control to the callee. With the new ABI, it is the responsibility of the callee to set up its own TOC pointer. This has a number of advantages, in particular for a callee that doesn't actually need a TOC, and makes the function pointer call sequence more effective by avoiding pipeline hazards. However, it remains true that with the PowerPC ISA, it is difficult to actually establish addressability to the TOC without any outside help. To avoid introducing new performance regressions, the new ABI instead provides a dual entry point scheme. Any function that needs a TOC may provide two entry points: - The first ("global") entry point is intended to called via indirect calls. It expects r12 to be set up to point to the entry point address itself. (Since in order to perform an indirect call, the address must be loaded into some GPR anyway, this does not impose a substantial impact on the caller ...) Code at the global entry point is expected to load up r2 with the appropriate TOC value for this function (and it may use the r12 value to compute that), and then fall through to the local entry point. - The second ("local") entry point expects r2 to be set up to the current TOC, much like an entry point in the current ABI does. However, it must not expect r12 to hold any particular value. This means that a local direct function call within the same module can be done via a simple "bl" instruction just as today. Note that the linker will automatically direct a "bl func" to the *local* entry point associated with the symbol "func"; not the global one. If local and global entry points coincide, the function may expect no particular value in either r12 or r2; this can be used by functions that do not need a TOC. There exists just one single ELF symbol for both local and global entry points; its symbol value refers to the global entry point, and flag bits in ELF st_other encode the offset from there to the local entry point. The compiler emits a .localentry directive to announce this location to the assembler, which will encode it as appropriate. This patch implements all aspects needed for this scheme: - Emit ".abiversion 2" assembler directive to help the assembler and linker understand the ABI implemented in this file. - Remove all handling of function descriptors / .opd section / dot symbols for the ELFv2 ABI. - Output a global entry point prologue and local entry point marker for functions that use the TOC. - Implement the ELFv2 function pointer call sequence. - Use trampolines to handle nested functions, just like on 32-bit (including marking the stack as executable). - No longer scan for the old ABI indirect call sequence in linux-unwind.h. - Update assembler code to the new ABI: - gcc/config/rs6000/ppc-asm.h (used by libgcc and elsewhere) - lititm/config/powerpc/sjlj.S - gcc/testsuite/gcc.target/powerpc/ppc64-abi-dfp-1.c In addition, some related changes are necessary: - Move the code generating the call to _mcount for -mprofile-kernel from output_function_profiler to rs6000_output_function_prologue, since this call must happen at the local entry point, not the global entry point. - Disable (now useless) tests for -mpointers-to-nested-functions. - Fix the libstc++ extract_symvers.in script to ignore markers emitted by readelf --symbols that indicate local entry points. - Fix the "gotest" script to no longer expect function symbols to reside in the data segment. (I understand this last bit needs to go into the Go repository first ...) Bye, Ulrich gcc/ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/rs6000.c (machine_function): New member r2_setup_needed. (rs6000_emit_prologue): Set r2_setup_needed if necessary. (rs6000_output_mi_thunk): Set r2_setup_needed. (rs6000_output_function_prologue): Output global entry point prologue and local entry point marker if needed for ABI_ELFv2. Output -mprofile-kernel code here. (output_function_profiler): Do not output -mprofile-kernel code here; mov
Re: Some wide-int review comments
On Mon, Nov 11, 2013 at 3:26 PM, Kenneth Zadeck wrote: > On 11/11/2013 06:49 AM, Richard Biener wrote: >> >> On Sat, Nov 9, 2013 at 3:23 PM, Kenneth Zadeck >> wrote: >>> >>> On 11/08/2013 05:30 AM, Richard Sandiford wrote: From tree-vrp.c: @@ -1893,6 +1884,10 @@ vrp_int_const_binop (enum tree_code code /* If the singed operation wraps then int_const_binop has done everything we want. */ ; + /* Signed division of -1/0 overflows and by the time it gets here + returns NULL_TREE. */ + else if (!res) +return NULL_TREE; else if ((TREE_OVERFLOW (res) && !TREE_OVERFLOW (val1) && !TREE_OVERFLOW (val2)) Why is this case different from trunk? Or is it a bug-fix independent of wide-int? >>> >>> the api for division is different for wide-int than it was for >>> double-int. >> >> But this is using a tree API (int_const_binop) that didn't change >> (it returned NULL for / 0 previously). So what makes us arrive here >> now? (I agree there is a bug in VRP, but it shouldn't manifest itself >> only on wide-int) >> >> Richard. > > My reading of the code is that is that i changed int_const_binop to return > null_tree for this case. Trunk has: case TRUNC_DIV_EXPR: case FLOOR_DIV_EXPR: case CEIL_DIV_EXPR: case EXACT_DIV_EXPR: /* This is a shortcut for a common special case. */ if (op2.high == 0 && (HOST_WIDE_INT) op2.low > 0 && !TREE_OVERFLOW (arg1) && !TREE_OVERFLOW (arg2) && op1.high == 0 && (HOST_WIDE_INT) op1.low >= 0) { if (code == CEIL_DIV_EXPR) op1.low += op2.low - 1; res.low = op1.low / op2.low, res.high = 0; break; } /* ... fall through ... */ case ROUND_DIV_EXPR: if (op2.is_zero ()) return NULL_TREE; so it already returns NULL_TREE on divide by zero. > On the trunk, only rem returns null_tree for divide by 0, on the wide int > branch, both div and rem return null tree. > > I know that this is going to bring on a string of questions that i do not > remember the answers to as to why i made that change. but > fold-const.c:int_const_binop_1 now returns null_tree and this is just > fallout from that change. > > Kenny Thanks, Richard >>> >>> >
Re: PING^3 Re: [PATCH] Add -fno-instrument-function v2
Andi Kleen writes: PING^3 Since it doesn't look like a generic solution for the LTO options problem will appear this development cycle, I would still like to pursue this option for 4.9. This would help fixing parts of the linux kernel LTO build. Can someone please review the patch? Thanks > Andi Kleen writes: > > PING^2 > >> Andi Kleen writes: >> >>> From: Andi Kleen >>> >>> [I posted this originally quite some time ago. >>> This version fixes all review problems, particularly >>> it works for C++ too and the test case really works.] >> >> Ping! >> >> Could someone please review it. >> >> Note this might be obsolete with Honza's LTO option work, but if it's >> not covered in his first iteration I would still have it earlier for the >> Linux kernel LTO build (fixed ftrace) >> >> -Andi >> >>> This adds a new C/C++ option to force >>> __attribute__((no_instrument_function)) on every function compiled. >>> >>> This is useful together with LTO. You may want to have the whole >>> program compiled with -pg and have to specify that in the LTO >>> link, but want to disable it for some specific files. As the >>> option works on the frontend level it is already passed through >>> properly by LTO. >>> >>> Without LTO it is equivalent to not specifing -pg or -mfentry. >>> >>> This fixes some missing functionality in the Linux kernel LTO port. >>> >>> Passed bootstrap and test suite on x86_64-linux. Ok? >>> >>> gcc/: >>> >>> 2013-08-10 Andi Kleen >>> >>> * c.opt (fno-instrument-function): Document. >>> >>> gcc/c: >>> >>> 2013-08-10 Andi Kleen >>> >>> * c-decl.c (start_function): Handle force_no_instrument_function >>> >>> gcc/cp: >>> >>> 2013-08-10 Andi Kleen >>> >>> * decl.c (start_preparsed_function): Handle >>> force_no_instrument_function >>> >>> gcc/testsuite: >>> >>> 2013-08-10 Andi Kleen >>> >>> * g++.dg/fno-instrument-function.C: Add. >>> * gcc.dg/fno-instrument-function.c: Add. >>> --- >>> gcc/c-family/c.opt | 4 >>> gcc/c/c-decl.c | 3 +++ >>> gcc/cp/decl.c | 3 +++ >>> gcc/doc/invoke.texi| 8 +++- >>> gcc/testsuite/g++.dg/fno-instrument-function.C | 18 ++ >>> gcc/testsuite/gcc.dg/fno-instrument-function.c | 24 >>> >>> 6 files changed, 59 insertions(+), 1 deletion(-) >>> create mode 100644 gcc/testsuite/g++.dg/fno-instrument-function.C >>> create mode 100644 gcc/testsuite/gcc.dg/fno-instrument-function.c >>> >>> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt >>> index 10ae84d..2159f89 100644 >>> --- a/gcc/c-family/c.opt >>> +++ b/gcc/c-family/c.opt >>> @@ -1014,6 +1014,10 @@ fnil-receivers >>> ObjC ObjC++ Var(flag_nil_receivers) Init(1) >>> Assume that receivers of Objective-C messages may be nil >>> >>> +fno-instrument-function >>> +C C++ ObjC ObjC++ RejectNegative Report Var(force_no_instrument_function) >>> +Force __attribute__((no_instrument_function)) for all functions in >>> translation unit. >>> + >>> fnonansi-builtins >>> C++ ObjC++ Var(flag_no_nonansi_builtin, 0) >>> >>> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c >>> index d9bbf5c..15717a9 100644 >>> --- a/gcc/c/c-decl.c >>> +++ b/gcc/c/c-decl.c >>> @@ -7876,6 +7876,9 @@ start_function (struct c_declspecs *declspecs, struct >>> c_declarator *declarator, >>>if (current_scope == file_scope) >>> maybe_apply_pragma_weak (decl1); >>> >>> + if (force_no_instrument_function) >>> +DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (decl1) = 1; >>> + >>>/* Warn for unlikely, improbable, or stupid declarations of `main'. */ >>>if (warn_main && MAIN_NAME_P (DECL_NAME (decl1))) >>> { >>> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c >>> index 01804d2..103188b 100644 >>> --- a/gcc/cp/decl.c >>> +++ b/gcc/cp/decl.c >>> @@ -13023,6 +13023,9 @@ start_preparsed_function (tree decl1, tree attrs, >>> int flags) >>>&& lookup_attribute ("noinline", attrs)) >>> warning (0, "inline function %q+D given attribute noinline", decl1); >>> >>> + if (force_no_instrument_function) >>> +DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (decl1) = 1; >>> + >>>/* Handle gnu_inline attribute. */ >>>if (GNU_INLINE_P (decl1)) >>> { >>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >>> index 782b569..bc20a77 100644 >>> --- a/gcc/doc/invoke.texi >>> +++ b/gcc/doc/invoke.texi >>> @@ -169,7 +169,7 @@ in the following sections. >>> -aux-info @var{filename} -fallow-parameterless-variadic-functions @gol >>> -fno-asm -fno-builtin -fno-builtin-@var{function} @gol >>> -fhosted -ffreestanding -fopenmp -fms-extensions -fplan9-extensions @gol >>> --trigraphs -traditional -traditional-cpp @gol >>> +-trigraphs -traditional -traditional-cpp -fno-instrument-function @gol >>> -fallow-single-precision -fcond-mismatch -flax-vector-conversions @gol >>> -fsigned-bitfields -fsigned-char @gol >>> -funsigned-bitf
[PATCH, rs6000] ELFv2 ABI 1/8: Add options and infrastructure
Hello, this is the first patch in the series to add support for the ELFv2 ABI. The ELFv2 ABI is the intended ABI for the new powerpc64le-linux port. However, it is not inherently tied to the byte order; it it possible in principle to use the ELFv2 ABI in big-endian mode too. Therefore, it is introduces via a new pair of options -mabi=elfv1 / -mabi=elfv2 where -mabi=elfv1 select the current Linux ABI, and -mabi=elfv2 selects the new one. If neither option is given, the compiler defaults to whatever ABI was specified at configure time using the --with-abi=elfv1 / --with-abi=elfv2 configure option. If this was not specified either, every subtarget can provide a master default by defining (or not) the LINUX64_DEFAULT_ABI_ELFv2 macro. The patch series is structured as follows: - This first patch adds the new enum rs6000_abi value ABI_ELFv2 and all the configure logic needed to set it. It is treated just like ABI_AIX (on a modern system) at this point; no change in generated code is expected yet. - A series of follow on patches will add the various features defining the ELFv2 ABI. Note that they will still not be active by default anywhere. - The final patch will simply turn the switch to make ELFv2 the default ABI on powerpc64le-linux. The whole series was tested on: - powerpc64-linux with no regression, including no regression in an ALT_CC_UNDER_TEST/ALT_CXX_UNDER_TEST run of compat.exp and struct-layout-1.exp against a current compiler. - powerpc64-linux --with-abi=elfv2 in mock-up big-endian ELFv2 environment running with some hacks on an existing system. - powerpc64le-linux, using the new default ELFv2 in a rebuilt small OS image using the new ABI everywhere. (Of course, patches to binutils, glibc, and libffi to support the new ELFv2 ABI are also required.) The tests were using all languages (including Java, Go, and Objective-C++; on powerpc64le-linux also Ada), and all target libraries except libsanitizer (which currently seems broken on mainline). There is a small number of regressions on powerpc64le-linux which seem to be due to unrelated issues: - FAIL: g++.dg/cpp1y/vla-initlist1.C execution test This is an invalid assumption in the test case - FAIL: go.test/test/fixedbugs/bug296.go execution, -O2 -g The ELFv2 ABI seems to be exposing a Go frontend bug here - FAIL: runtime/pprof (in libgo) This is due to a glibc problem with unwinding through a context created via makecontext I'll address those separately. (Of course, there are also other pre-existing regressions on powerpc64le-linux simply due to little-endian issues. Those are unaffected by ELFv2.) I'll add more detailed descriptions of the actual ABI features with the various follow-on patches that implement those. This patch specifically only adds the options as described above. ABI_ELFv2 is implemented to be equivalent to ABI_AIX on a modern system; since the ABI is new, there is no need to continue to support legacy features. This means specifically: - DOT_SYMBOLS is assumed to be false - rs6000_compat_align_parm is assumed to be false - code in linux-unwind.h that handles old kernels/linkers is removed The patch also adds two new pre-defined macros: - _CALL_ELF is set to 1 or 2 to denote the ELFv1 / ELFv2 ABI - __STRUCT_PARM_ALIGN__ is set to 16 if aggregates passed by value are aligned to 16 if their native alignment requires that (this is necessary to implement libffi, for example) In addition, the patch adds a testsuite effective target check for the ELFv2 ABI, and disables the pr57949 tests (these verify the operation of the -mcompat-align-parm option, which is no longer useful in the ELFv2 ABI). To avoid having a partial ABI implementation in tree, it seems best to commit this whole patch series as a single commit. Is the series OK for mainline? Bye, Ulrich gcc/ChangeLog: 2013-11-11 Ulrich Weigand * config.gcc [powerpc*-*-* | rs6000-*-*]: Support --with-abi=elfv1 and --with-abi=elfv2. * config/rs6000/option-defaults.h (OPTION_DEFAULT_SPECS): Add "abi". * config/rs6000/rs6000.opt (mabi=elfv1): New option. (mabi=elfv2): Likewise. * config/rs6000/rs6000-opts.h (enum rs6000_abi): Add ABI_ELFv2. * config/rs6000/linux64.h (DEFAULT_ABI): Do not hard-code to AIX_ABI if !RS6000_BI_ARCH. (ELFv2_ABI_CHECK): New macro. (SUBSUBTARGET_OVERRIDE_OPTIONS): Use it to decide whether to set rs6000_current_abi to ABI_AIX or ABI_ELFv2. (GLIBC_DYNAMIC_LINKER64): Support ELFv2 ld.so version. * config/rs6000/rs6000-c.c (rs6000_cpu_cpp_builtins): Predefine _CALL_ELF and __STRUCT_PARM_ALIGN__ if appropriate. * config/rs6000/rs6000.c (rs6000_debug_reg_global): Handle ABI_ELFv2. (debug_stack_info): Likewise. (rs6000_file_start): Treat ABI_ELFv2 the same as ABI_AIX. (rs6000_legitimize_tls_address): Likewise. (rs6000_condit
[PATCH, rs6000] ELFv2 ABI 4/8: Struct passing calling convention changes
Hello, this patch implements the new struct calling convention for ELFv2. With the new ABI, so-called "homogeneous" aggregates, i.e. struct, arrays, or unions that (recursively) contain only elements of the same floating- point or vector type are passed as if those elements were passed as separate arguments. (This is done for up to 8 such elements.) After the refactoring of the rs6000_function_arg / rs6000_arg_partial_bytes logic that was done in a prior patch, implementing support for homogenous aggregates is now mostly straightforward. Note that rs6000_psave_function_arg can now get called for BLKmode arguments, and has to handle them using a PARALLEL. Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand Michael Gschwind * config/rs6000/rs6000.h (AGGR_ARG_NUM_REG): Define. * config/rs6000/rs6000.c (rs6000_aggregate_candidate): New function. (rs6000_discover_homogeneous_aggregate): Likewise. (rs6000_function_arg_boundary): Handle homogeneous aggregates. (rs6000_function_arg_advance_1): Likewise. (rs6000_function_arg): Likewise. (rs6000_arg_partial_bytes): Likewise. (rs6000_psave_function_arg): Handle BLKmode arguments. Index: gcc/gcc/config/rs6000/rs6000.h === --- gcc.orig/gcc/config/rs6000/rs6000.h +++ gcc/gcc/config/rs6000/rs6000.h @@ -1641,6 +1641,9 @@ extern enum reg_class rs6000_constraints #define ALTIVEC_ARG_MAX_REG (ALTIVEC_ARG_MIN_REG + 11) #define ALTIVEC_ARG_NUM_REG (ALTIVEC_ARG_MAX_REG - ALTIVEC_ARG_MIN_REG + 1) +/* Maximum number of registers per ELFv2 homogeneous aggregate argument. */ +#define AGGR_ARG_NUM_REG 8 + /* Return registers */ #define GP_ARG_RETURN GP_ARG_MIN_REG #define FP_ARG_RETURN FP_ARG_MIN_REG Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -8460,6 +8460,219 @@ rs6000_member_type_forces_blk (const_tre && TARGET_ALTIVEC_ABI \ && (NAMED)) +/* Walk down the type tree of TYPE counting consecutive base elements. + If *MODEP is VOIDmode, then set it to the first valid floating point + or vector type. If a non-floating point or vector type is found, or + if a floating point or vector type that doesn't match a non-VOIDmode + *MODEP is found, then return -1, otherwise return the count in the + sub-tree. */ + +static int +rs6000_aggregate_candidate (const_tree type, enum machine_mode *modep) +{ + enum machine_mode mode; + HOST_WIDE_INT size; + + switch (TREE_CODE (type)) +{ +case REAL_TYPE: + mode = TYPE_MODE (type); + if (!SCALAR_FLOAT_MODE_P (mode)) + return -1; + + if (*modep == VOIDmode) + *modep = mode; + + if (*modep == mode) + return 1; + + break; + +case COMPLEX_TYPE: + mode = TYPE_MODE (TREE_TYPE (type)); + if (!SCALAR_FLOAT_MODE_P (mode)) + return -1; + + if (*modep == VOIDmode) + *modep = mode; + + if (*modep == mode) + return 2; + + break; + +case VECTOR_TYPE: + if (!TARGET_ALTIVEC_ABI || !TARGET_ALTIVEC) + return -1; + + /* Use V4SImode as representative of all 128-bit vector types. */ + size = int_size_in_bytes (type); + switch (size) + { + case 16: + mode = V4SImode; + break; + default: + return -1; + } + + if (*modep == VOIDmode) + *modep = mode; + + /* Vector modes are considered to be opaque: two vectors are +equivalent for the purposes of being homogeneous aggregates +if they are the same size. */ + if (*modep == mode) + return 1; + + break; + +case ARRAY_TYPE: + { + int count; + tree index = TYPE_DOMAIN (type); + + /* Can't handle incomplete types. */ + if (!COMPLETE_TYPE_P (type)) + return -1; + + count = rs6000_aggregate_candidate (TREE_TYPE (type), modep); + if (count == -1 + || !index + || !TYPE_MAX_VALUE (index) + || !host_integerp (TYPE_MAX_VALUE (index), 1) + || !TYPE_MIN_VALUE (index) + || !host_integerp (TYPE_MIN_VALUE (index), 1) + || count < 0) + return -1; + + count *= (1 + tree_low_cst (TYPE_MAX_VALUE (index), 1) + - tree_low_cst (TYPE_MIN_VALUE (index), 1)); + + /* There must be no padding. */ + if (!host_integerp (TYPE_SIZE (type), 1) + || (tree_low_cst (TYPE_SIZE (type), 1) + != count * GET_MODE_BITSIZE (*modep))) + return -1; + + return count; + } + +case RECORD_TYPE: + { + int count = 0; + int sub_count; + tree field; + + /* Can't handle incomplete types. */ + if (!COMPLETE_TYPE_P (type)) + return -1; + + for (field = TY
[PATCH, rs6000] ELFv2 ABI 7/8: Eliminate some stack frame fields
Hello, this is the second part of reducing stack space consumption for the ELFv2 ABI: the old ABI reserved six words in every stack frame for various purposes, and two of those were basically unused: one word for "compiler use" and one word for "linker use". Since neither the compiler nor the linker actually ever made any nontrivial use of these fields, they're now eliminated in the new ABI. This patch implements this change, which is mostly straightforward except for the fact that due to the change, the stack offset of the TOC save area changes, which was hard-coded in a number of places ... The patch also updates a number of test cases that hardcoded the stack layout. Bye, Ulrich gcc/ChangeLog: 2013-11-11 Ulrich Weigand Alan Modra * config/rs6000/rs6000.h (RS6000_SAVE_AREA): Handle ABI_ELFv2. (RS6000_SAVE_TOC): Remove. (RS6000_TOC_SAVE_SLOT): New macro. * config/rs6000/rs6000.c (rs6000_parm_offset): New function. (rs6000_parm_start): Use it. (rs6000_function_arg_advance_1): Likewise. (rs6000_emit_prologue): Use RS6000_TOC_SAVE_SLOT. (rs6000_emit_epilogue): Likewise. (rs6000_call_aix): Likewise. (rs6000_output_function_prologue): Do not save/restore r11 around calling _mcount for ABI_ELFv2. libgcc/ChangeLog: 2013-11-11 Ulrich Weigand Alan Modra * config/rs6000/linux-unwind.h (TOC_SAVE_SLOT): Define. (frob_update_context): Use it. gcc/testsuite/ChangeLog: 2013-11-11 Ulrich Weigand * gcc.target/powerpc/ppc64-abi-1.c (stack_frame_t): Remove compiler and linker field if _CALL_ELF == 2. * gcc.target/powerpc/ppc64-abi-2.c (stack_frame_t): Likewise. * gcc.target/powerpc/ppc64-abi-dfp-1.c (stack_frame_t): Likewise. * gcc.dg/stack-usage-1.c (SIZE): Update value for _CALL_ELF == 2. Index: gcc/gcc/config/rs6000/rs6000.h === --- gcc.orig/gcc/config/rs6000/rs6000.h +++ gcc/gcc/config/rs6000/rs6000.h @@ -1529,12 +1529,12 @@ extern enum reg_class rs6000_constraints /* Size of the fixed area on the stack */ #define RS6000_SAVE_AREA \ - ((DEFAULT_ABI == ABI_V4 ? 8 : 24) << (TARGET_64BIT ? 1 : 0)) + ((DEFAULT_ABI == ABI_V4 ? 8 : DEFAULT_ABI == ABI_ELFv2 ? 16 : 24)\ + << (TARGET_64BIT ? 1 : 0)) -/* MEM representing address to save the TOC register */ -#define RS6000_SAVE_TOC gen_rtx_MEM (Pmode, \ -plus_constant (Pmode, stack_pointer_rtx, \ - (TARGET_32BIT ? 20 : 40))) +/* Stack offset for toc save slot. */ +#define RS6000_TOC_SAVE_SLOT \ + ((DEFAULT_ABI == ABI_ELFv2 ? 12 : 20) << (TARGET_64BIT ? 1 : 0)) /* Align an address */ #define RS6000_ALIGN(n,a) (((n) + (a) - 1) & ~((a) - 1)) Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -9029,6 +9029,16 @@ rs6000_function_arg_boundary (enum machi return PARM_BOUNDARY; } +/* The offset in words to the start of the parameter save area. */ + +static unsigned int +rs6000_parm_offset (void) +{ + return (DEFAULT_ABI == ABI_V4 ? 2 + : DEFAULT_ABI == ABI_ELFv2 ? 4 + : 6); +} + /* For a function parm of MODE and TYPE, return the starting word in the parameter area. NWORDS of the parameter area are already used. */ @@ -9037,11 +9047,9 @@ rs6000_parm_start (enum machine_mode mod unsigned int nwords) { unsigned int align; - unsigned int parm_offset; align = rs6000_function_arg_boundary (mode, type) / PARM_BOUNDARY - 1; - parm_offset = DEFAULT_ABI == ABI_V4 ? 2 : 6; - return nwords + (-(parm_offset + nwords) & align); + return nwords + (-(rs6000_parm_offset () + nwords) & align); } /* Compute the size (in words) of a function argument. */ @@ -9281,15 +9289,13 @@ rs6000_function_arg_advance_1 (CUMULATIV { int align; - /* Vector parameters must be 16-byte aligned. This places -them at 2 mod 4 in terms of words in 32-bit mode, since -the parameter save area starts at offset 24 from the -stack. In 64-bit mode, they just have to start on an -even word, since the parameter save area is 16-byte -aligned. Space for GPRs is reserved even if the argument -will be passed in memory. */ + /* Vector parameters must be 16-byte aligned. In 32-bit +mode this means we need to take into account the offset +to the parameter save area. In 64-bit mode, they just +have to start on an even word, since the parameter save +area is 16-byte aligned. */ if (TARGET_32BIT) - align = (2 - cum->words) & 3; + align = -(rs6000_parm_offset () + cum->words) & 3; else
[PATCH, rs6000] ELFv2 ABI 5/8: Struct return calling convention changes
Hello, analogously to the previous patch handling homogeneous aggregates as arguments, this patch handles such aggregates as return values. In addition, the ELFv2 ABI returns any aggregate of up to 16 bytes in size (that is not otherwise handled) in up to two GPRs. In either case, the return value is returned formatted exactly as if it were being passed as first argument. In order to get this correct in the big-endian ELFv2 case, we need to provide a non-default implementation of TARGET_RETURN_IN_MSB. Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand Michael Gschwind * config/rs6000/rs6000.h (FP_ARG_MAX_RETURN): New macro. (ALTIVEC_ARG_MAX_RETURN): Likewise. (FUNCTION_VALUE_REGNO_P): Use them. * config/rs6000/rs6000.c (TARGET_RETURN_IN_MSB): Define. (rs6000_return_in_msb): New function. (rs6000_return_in_memory): Handle ELFv2 homogeneous aggregates. Handle aggregates of up to 16 bytes for ELFv2. (rs6000_function_value): Handle ELFv2 homogeneous aggregates. Index: gcc/gcc/config/rs6000/rs6000.h === --- gcc.orig/gcc/config/rs6000/rs6000.h +++ gcc/gcc/config/rs6000/rs6000.h @@ -1648,6 +1648,10 @@ extern enum reg_class rs6000_constraints #define GP_ARG_RETURN GP_ARG_MIN_REG #define FP_ARG_RETURN FP_ARG_MIN_REG #define ALTIVEC_ARG_RETURN (FIRST_ALTIVEC_REGNO + 2) +#define FP_ARG_MAX_RETURN (DEFAULT_ABI != ABI_ELFv2 ? FP_ARG_RETURN\ + : (FP_ARG_RETURN + AGGR_ARG_NUM_REG - 1)) +#define ALTIVEC_ARG_MAX_RETURN (DEFAULT_ABI != ABI_ELFv2 ? ALTIVEC_ARG_RETURN \ + : (ALTIVEC_ARG_RETURN + AGGR_ARG_NUM_REG - 1)) /* Flags for the call/call_value rtl operations set up by function_arg */ #define CALL_NORMAL0x /* no special processing */ @@ -1667,8 +1671,10 @@ extern enum reg_class rs6000_constraints On RS/6000, this is r3, fp1, and v2 (for AltiVec). */ #define FUNCTION_VALUE_REGNO_P(N) \ ((N) == GP_ARG_RETURN \ - || ((N) == FP_ARG_RETURN && TARGET_HARD_FLOAT && TARGET_FPRS) \ - || ((N) == ALTIVEC_ARG_RETURN && TARGET_ALTIVEC && TARGET_ALTIVEC_ABI)) + || ((N) >= FP_ARG_RETURN && (N) <= FP_ARG_MAX_RETURN \ + && TARGET_HARD_FLOAT && TARGET_FPRS)\ + || ((N) >= ALTIVEC_ARG_RETURN && (N) <= ALTIVEC_ARG_MAX_RETURN \ + && TARGET_ALTIVEC && TARGET_ALTIVEC_ABI)) /* 1 if N is a possible register number for function argument passing. On RS/6000, these are r3-r10 and fp1-fp13. Index: gcc/gcc/config/rs6000/rs6000.c === --- gcc.orig/gcc/config/rs6000/rs6000.c +++ gcc/gcc/config/rs6000/rs6000.c @@ -1449,6 +1449,9 @@ static const struct attribute_spec rs600 #undef TARGET_RETURN_IN_MEMORY #define TARGET_RETURN_IN_MEMORY rs6000_return_in_memory +#undef TARGET_RETURN_IN_MSB +#define TARGET_RETURN_IN_MSB rs6000_return_in_msb + #undef TARGET_SETUP_INCOMING_VARARGS #define TARGET_SETUP_INCOMING_VARARGS setup_incoming_varargs @@ -8725,6 +8728,16 @@ rs6000_return_in_memory (const_tree type /* Otherwise fall through to more conventional ABI rules. */ } + /* The ELFv2 ABI returns homogeneous VFP aggregates in registers */ + if (rs6000_discover_homogeneous_aggregate (TYPE_MODE (type), type, +NULL, NULL)) +return false; + + /* The ELFv2 ABI returns aggregates up to 16B in registers */ + if (DEFAULT_ABI == ABI_ELFv2 && AGGREGATE_TYPE_P (type) + && (unsigned HOST_WIDE_INT) int_size_in_bytes (type) <= 16) +return false; + if (AGGREGATE_TYPE_P (type) && (aix_struct_return || (unsigned HOST_WIDE_INT) int_size_in_bytes (type) > 8)) @@ -8756,6 +8769,19 @@ rs6000_return_in_memory (const_tree type return false; } +/* Specify whether values returned in registers should be at the most + significant end of a register. We want aggregates returned by + value to match the way aggregates are passed to functions. */ + +static bool +rs6000_return_in_msb (const_tree valtype) +{ + return (DEFAULT_ABI == ABI_ELFv2 + && BYTES_BIG_ENDIAN + && AGGREGATE_TYPE_P (valtype) + && FUNCTION_ARG_PADDING (TYPE_MODE (valtype), valtype) == upward); +} + #ifdef HAVE_AS_GNU_ATTRIBUTE /* Return TRUE if a call to function FNDECL may be one that potentially affects the function calling ABI of the object file. */ @@ -29897,6 +29923,8 @@ rs6000_function_value (const_tree valtyp { enum machine_mode mode; unsigned int regno; + enum machine_mode elt_mode; + int n_elts; /* Special handling for structs in darwin64. */ if (TARGET_MACHO @@ -29916,6 +29944,36 @@ rs6000_function_value (const_tree valtyp /* Otherwise fall through to standar
Re: [PATCH, MPX, 2/X] Pointers Checker [10/25] Calls copy and verification
2013/11/11 Richard Biener : > On Mon, Nov 11, 2013 at 3:00 PM, Ilya Enkovich wrote: >> 2013/11/11 Richard Biener : >>> On Fri, Nov 8, 2013 at 11:03 AM, Ilya Enkovich >>> wrote: 2013/11/8 Richard Biener : > On Thu, Nov 7, 2013 at 7:55 PM, Jeff Law wrote: >> On 11/07/13 04:50, Ilya Enkovich wrote: >>> >>> Hi, >>> >>> Here is an updated patch version. >> >> I think this needs to hold until we have a consensus on what the >> parameter >> passing looks like for bounded pointers. > > I still think the best thing to do on GIMPLE is > > arg_2 = __builtin_ia32_bnd_arg (arg_1(D)); > foo (arg_2); > > that is, make the parameter an implicit pair of {value, bound} where > the bound is determined by the value going through a bound association > builtin. No extra explicit argument to the calls so arguments match > the fndecl and fntype. All the complexity is defered to the expander > which can trivially lookup bound arguments via the SSA def (I suppose > it does that anyway now for getting at the explicit bound argument now). > > As far as I can see (well, think), all currently passed bound arguments > are the return value of such builtin already. All bounds are result of different builtin calls. Small example: int *global_p; void foo (int *p) { int buf[10]; bar (p, buf, global_p); } It is translated into: __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40); __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab)); global_p.0_2 = global_p; __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2); bar (p_3(D)(ab), __bound_tmp.1_6, &buf, __bound_tmp.1_7, global_p.0_2, __bound_tmp.1_8); Bounds binding via calls as you suggest may be done as following: __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40); __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab)); global_p.0_2 = global_p; __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2); _9 = __builtin_bind_bounds (p_3(D)(ab), __bound_tmp.1_6); _10 = __builtin_bind_bounds (&buf, __bound_tmp.1_7); _11 = __builtin_bind_bounds (global_p.0_2, __bound_tmp.1_8); bar (_9, _10, _11); Is it close to what you propose? >>> >>> Yes. >> >> Is there a way to bind bounds with structures in a similar way? > > Not to have them easy to lookup in the SSA web. A long time ago > I proposed to make SSA aggregates possible, so you could do > > tem_2 = __internal_bind_bounds (aggr(D), __bound_tmp.1_3, > __bound_tmp.1_4, ...); > bar (tem_2); > > (originally the SSA aggregates were supposed to make copy-propgagation > possible using the SSA copy propagator, and of course I needed it for > the middle-end array work) > > Not sure if that will give other people the creeps (expand would > never expand the "load" from tem_2 but instead handle aggr as > parameter). A slight complication is due to alias analysis > which would be need to taught that bar performs a load of aggr. It would require bounds loading for aggr before __internal_bind_bounds anyway. So, why not to do it in expand? I just need to mark calls with a flag (which you've proposed few times already) to let expand know when it should load bounds. Having SSA aggregates would be nice but I suspect it has much higher impact then loading bounds in expand. I want to try simpler variant first. Thanks, Ilya > > Richard. > >> For >> SSA names I may easily find definition and check if it is a binding >> builtin call. But for structures it is not possible. The way I see it >> to mark all such args as addressable and load required bounds on >> expand pass. >> >> Ilya >>> >>> Richard. >>> Ilya > > Richard. > > > >> Jeff
[PATCH, rs6000] ELFv2 ABI 6/8: Eliminate register save area in some cases
Hello, the final area of changes implemented by the ELFv2 ABI is stack space consumption. Reducing the miminum stack size requirements is important for environments where stack space is restricted, e.g. Linux kernel code, or where we have a large number of stacks (e.g. heavily multi-threaded applications). One reason for the large stack space requirement of the current ABI is that every caller must provide an argument save area of 64 bytes to its caller. This is actually useful in one specific case: if the called function uses variable arguments and constructs a va_list. Because of the way the ABI is designed, the callee can in this case simply store the 8 GPRs (potentially) carrying arguments into that save area, and is then guaranteed to have all function arguments in a linear arrangement on the stack, which makes a trivial va_arg implementation possible. If the callee is not vararg, this wouldn't really be necessary. However, if we were to skip that area then, vararg and non-vararg routines would have incompatible ABIs, which would make it impossible to generate code for a function call if the caller does not have a prototype of the called function. Therefore the old ABI requires the save area to be present always. With the new ABI, we wanted to retain the possibility of using a trivial va_arg implementation, and also the possibility of calling non-prototyped routines safely. However, even so, there is a set of cases where it is still not necessary to provide a save area: when calling a function we have a prototype for and know that it is neither vararg nor uses on-stack arguments. This patch implement this change by having REG_PARM_STACK_SPACE return a different value depending on whether the called routine requires on-stack arguments (and we have a prototype). There was one problem exposed by this change: rs6000_function_arg contained this piece of code: if (mode == BLKmode) mode = Pmode; which marked the mode of a GPR holding an incoming struct argument as Pmode. This is now causing a problem. When REG_PARM_STACK_SPACE returns 0, function.c must allocate a stack slot within the callee's frame to serve as DECL_RTL for the parameter. This is done in assign_parm_setup_block. This routine attempts to use the mode of the incoming register as the mode of that stack slot, if the sizes match. This means that the DECL_RTL for an argument of struct type may now have mode Pmode, even if that type cannot reliably be operated on in Pmode, which causes ICEs in several test cases for me. Note that when REG_PARM_STACK_SPACE is non-zero, this problem does not occur, because in this case, function.c allocates the stack slot for the parameter's DECL_RTL in the register save area, using a different function assign_parm_find_stack_rtl. *That* function will keep the mode as BLKmode even if the incoming register is in another mode. Removing the above two lines from rs6000_function_arg fixes this problem, and does not appear to introduce any problem for the ELFv1 ABI case either (full bootstrap / regtest still passes). Looking back at the ChangeLog, those two lines where added by Alan Modra back in 2004: http://gcc.gnu.org/ml/gcc/2004-11/msg00617.html to fix ICEs after a change by Richard Henderson: http://gcc.gnu.org/ml/gcc/2004-11/msg00569.html but Richard shortly afterwards reverted that change again since it caused problems elsewhere too: http://gcc.gnu.org/ml/gcc/2004-11/msg00751.html So all in all, it seems this change by Alan was simply not necessary and can be reverted. Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand Alan Modra * config/rs6000/rs6000-protos.h (rs6000_reg_parm_stack_space): Add prototype. * config/rs6000/rs6000.h (RS6000_REG_SAVE): Remove. (REG_PARM_STACK_SPACE): Call rs6000_reg_parm_stack_space. * config/rs6000/rs6000.c (rs6000_parm_needs_stack): New function. (rs6000_function_parms_need_stack): Likewise. (rs6000_reg_parm_stack_space): Likewise. (rs6000_function_arg): Do not replace BLKmode by Pmode when returning a register argument. Index: gcc/gcc/config/rs6000/rs6000-protos.h === --- gcc.orig/gcc/config/rs6000/rs6000-protos.h +++ gcc/gcc/config/rs6000/rs6000-protos.h @@ -158,6 +158,7 @@ extern tree altivec_resolve_overloaded_b extern rtx rs6000_libcall_value (enum machine_mode); extern rtx rs6000_va_arg (tree, tree); extern int function_ok_for_sibcall (tree); +extern int rs6000_reg_parm_stack_space (tree); extern void rs6000_elf_declare_function_name (FILE *, const char *, tree); extern bool rs6000_elf_in_small_data_p (const_tree); #ifdef ARGS_SIZE_RTX Index: gcc/gcc/config/rs6000/rs6000.h === --- gcc.orig/gcc/config/rs6000/rs6000.h +++ gcc/gcc/config/rs6000/rs6000.h @@ -1527,10 +1527,6 @@ extern enum reg_class rs6000_cons
[PATCH, rs6000] ELFv2 ABI 8/8: Enable by default on little-endian
Hello, this patch finally throws the switch and enables the ELFv2 ABI by default on powerpc64le-linux. Bye, Ulrich ChangeLog: 2013-11-11 Ulrich Weigand * config/rs6000/sysv4le.h (LINUX64_DEFAULT_ABI_ELFv2): Define. Index: gcc/gcc/config/rs6000/sysv4le.h === --- gcc.orig/gcc/config/rs6000/sysv4le.h +++ gcc/gcc/config/rs6000/sysv4le.h @@ -34,3 +34,7 @@ #undef MULTILIB_DEFAULTS #defineMULTILIB_DEFAULTS { "mlittle", "mcall-sysv" } + +/* Little-endian PowerPC64 Linux uses the ELF v2 ABI by default. */ +#define LINUX64_DEFAULT_ABI_ELFv2 + -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] Time profiler - phase 1
> +2013-10-29 Martin Liska > + Jan Hubicka > + > + * cgraph.c (dump_cgraph_node): Profile dump added. > + * cgraph.h (struct cgraph_node): New time profile variable added. > + * cgraphclones.c (cgraph_clone_node): Time profile is cloned. > + * gcov-io.h (gcov_type): New profiler type introduced. > + * ipa-profile.c (lto_output_node): Streaming for time profile added. > + (input_node): Time profiler is read from LTO stream. > + * predict.c (maybe_hot_count_p): Hot prediction changed. > + * profile.c (instrument_values): New case for time profiler added. > + (compute_value_histograms): Read of time profile. > + * tree-pretty-print.c (dump_function_header): Time profiler is dumped. > + * tree-profile.c (init_ic_make_global_vars): Time profiler function > added. > + (gimple_init_edge_profiler): TP function instrumentation. > + (gimple_gen_time_profiler): New. > + * value-prof.c (gimple_add_histogram_value): Support for time profiler > + added. > + (dump_histogram_value): TP type added to dumps. > + (visit_hist): More sensitive check that takes TP into account. > + (gimple_find_values_to_profile): TP instrumentation. > + * value-prof.h (hist_type): New histogram type added. > + (struct histogram_value_t): Pointer to struct function added. > + * libgcc/Makefile.in: New GCOV merge function for TP added. > + * libgcov.c: function_counter variable introduced. > + (_gcov_merge_time_profile): New. > + (_gcov_time_profiler): New. > + > 2013-11-05 Steven Bosscher > > > diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c > index 1260069..eff4b51 100644 > --- a/gcc/ipa-profile.c > +++ b/gcc/ipa-profile.c > @@ -465,6 +465,7 @@ ipa_propagate_frequency (struct cgraph_node *node) >if (d.maybe_unlikely_executed) > { >node->frequency = NODE_FREQUENCY_UNLIKELY_EXECUTED; > + node->tp_first_run = 0; >if (dump_file) > fprintf (dump_file, "Node %s promoted to unlikely executed.\n", >cgraph_node_name (node)); Why do you put tp_first_run to 0? The function may be unlikely but it still may have average possition in the program execution. I.e. when it is executed once per many invocations during the train run) > @@ -895,9 +907,22 @@ compute_value_histograms (histogram_values values, > unsigned cfg_checksum, >hist->hvalue.counters = XNEWVEC (gcov_type, hist->n_counters); >for (j = 0; j < hist->n_counters; j++) > if (aact_count) > - hist->hvalue.counters[j] = aact_count[j]; > - else > - hist->hvalue.counters[j] = 0; > + hist->hvalue.counters[j] = aact_count[j]; > +else > + hist->hvalue.counters[j] = 0; > + > + /* Time profiler counter is not related to any statement, > + * so that we have to read the counter and set the value to > + * the corresponding call graph node. */ Reformat comment to GNU style. > +2013-10-28 Martin Liska > + > + * gcc.dg/time-profiler-1.c: New test. > + * gcc.dg/time-profiler-2.c: Ditto. > + It is custom to put all changelogs to the beggining of your patch (just nitpicking ;)) > @@ -222,6 +225,18 @@ gimple_init_edge_profiler (void) > = tree_cons (get_identifier ("leaf"), NULL, >DECL_ATTRIBUTES (tree_indirect_call_profiler_fn)); > > + /* void (*) (gcov_type *, gcov_type, void *) */ > + time_profiler_fn_type > += build_function_type_list (void_type_node, > + gcov_type_ptr, NULL_TREE); > + tree_time_profiler_fn > + = build_fn_decl ("__gcov_time_profiler", > + time_profiler_fn_type); > + TREE_NOTHROW (tree_time_profiler_fn) = 1; > + DECL_ATTRIBUTES (tree_time_profiler_fn) > + = tree_cons (get_identifier ("leaf"), NULL, > + DECL_ATTRIBUTES (tree_time_profiler_fn)); We should udpate this code to use set_call_expr_flags but by incremental patch. The patch is OK with changes above. Do you have write permissin to commit it? If not, just send updated version and I will commit it + follow the write access instructions on gcc.gnu.org homepage. Honza
Re: [RFC] libgcov.c re-factoring and offline profile-tool
> 2013-11-04 Rong Xu > > * libgcc/libgcov.c: Delete as part of re-factoring. > * libgcc/libgcov-profiler.c (__gcov_interval_profiler): Moved from > libgcov.c > (__gcov_pow2_profiler): Ditto. > (__gcov_one_value_profiler_body): Ditto. > (__gcov_one_value_profiler): Ditto. > (__gcov_indirect_call_profiler): Ditto. > (__gcov_indirect_call_profiler_v2): Ditto. > (__gcov_average_profiler): Ditto. > (__gcov_ior_profiler): Ditto. > * libgcc/libgcov-driver.c (this_prg): Make it file scope > static variable. > (all_prg): Ditto. > (crc32): Ditto. > (gi_filename): Ditto. > (fn_buffer): Ditto. > (sum_buffer): Ditto. > (struct gcov_filename_aux): New types to store auxiliary information > for gi_filename. > (gcov_version): Moved from libgcov.c. > (crc32_unsigned): Ditto. > (gcov_histogram_insert): Ditto. > (gcov_compute_histogram): Ditto. > (gcov_exit): Ditto. > (gcov_clear): Ditto. > (__gcov_init): Ditto. > (gcov_exit_compute_summary): New function split from gcov_exit(). > (gcov_exit_merge_gcda): Ditto. > (gcov_exit_write_gcda): Ditto. > (gcov_exit_dump_gcov): Ditto. > * libgcc/libgcov-interface.c (init_mx): Moved from libgcov.c. > (init_mx_once): Ditto. > (__gcov_flush): Ditto. > (__gcov_reset): Ditto. > (__gcov_dump): Ditto. > (__gcov_fork): Ditto. > (__gcov_execl): Ditto. > (__gcov_execlp): Ditto. > (__gcov_execle): Ditto. > (__gcov_execv): Ditto. > (__gcov_execvp): Ditto. > (__gcov_execve): Ditto. > * libgcc/libgcov-driver-system.c (gcov_error): New utility function. > (allocate_filename_struct): New function split from gcov_exit(). > (gcov_exit_open_gcda_file): Ditto. > (create_file_directory): Moved from libgcov.c. > * libgcc/libgcov-merge.c: > (__gcov_merge_add): Moved from libgcov.c. > (__gcov_merge_ior): Ditto. > (__gcov_merge_single): Ditto. > (__gcov_merge_delta): Ditto. > * libgcc/Makefile.in: Change to build newly added files. > * gcc/gcov-io.h (__gcov_merge_ior): Add the decl to avoid warning. > Hi, the patch looks resonable. I take your word on the fact that there are no changes in the code, I did not read it all ;))) > Index: libgcc/Makefile.in > === > --- libgcc/Makefile.in(revision 204285) > +++ libgcc/Makefile.in(working copy) > @@ -853,17 +853,37 @@ include $(iterator) > # Build libgcov components. > > # Defined in libgcov.c, included only in gcov library > -LIBGCOV = _gcov _gcov_merge_add _gcov_merge_single _gcov_merge_delta \ > -_gcov_fork _gcov_execl _gcov_execlp _gcov_execle \ > -_gcov_execv _gcov_execvp _gcov_execve _gcov_reset _gcov_dump \ > -_gcov_interval_profiler _gcov_pow2_profiler _gcov_one_value_profiler \ > +##LIBGCOV = _gcov _gcov_merge_add _gcov_merge_single _gcov_merge_delta \ > +##_gcov_fork _gcov_execl _gcov_execlp _gcov_execle \ > +##_gcov_execv _gcov_execvp _gcov_execve _gcov_reset _gcov_dump \ > +##_gcov_interval_profiler _gcov_pow2_profiler _gcov_one_value_profiler \ > +##_gcov_indirect_call_profiler _gcov_average_profiler _gcov_ior_profiler > \ > +##_gcov_merge_ior _gcov_indirect_call_profiler_v2 Probably no need for this commnet block. > + > +LIBGCOV_MERGE = _gcov_merge_add _gcov_merge_single _gcov_merge_delta > _gcov_merge_ior > +LIBGCOV_PROFILER = _gcov_interval_profiler _gcov_pow2_profiler > _gcov_one_value_profiler \ > _gcov_indirect_call_profiler _gcov_average_profiler _gcov_ior_profiler \ > -_gcov_merge_ior _gcov_indirect_call_profiler_v2 > +_gcov_indirect_call_profiler_v2 > +LIBGCOV_INTERFACE = _gcov_flush _gcov_fork _gcov_execl _gcov_execlp > _gcov_execle \ > +_gcov_execv _gcov_execvp _gcov_execve _gcov_reset _gcov_dump > +LIBGCOV_DRIVER = _gcov > > -libgcov-objects = $(patsubst %,%$(objext),$(LIBGCOV)) > +libgcov-merge-objects = $(patsubst %,%$(objext),$(LIBGCOV_MERGE)) > +libgcov-profiler-objects = $(patsubst %,%$(objext),$(LIBGCOV_PROFILER)) > +libgcov-interface-objects = $(patsubst %,%$(objext),$(LIBGCOV_INTERFACE)) > +libgcov-driver-objects = $(patsubst %,%$(objext),$(LIBGCOV_DRIVER)) > +libgcov-objects = $(libgcov-merge-objects) $(libgcov-profiler-objects) \ > + $(libgcov-interface-objects) $(libgcov-driver-objects) > > -$(libgcov-objects): %$(objext): $(srcdir)/libgcov.c > - $(gcc_compile) -DL$* -c $(srcdir)/libgcov.c > +$(libgcov-merge-objects): %$(objext): $(srcdir)/libgcov-merge.c > + $(gcc_compile) -DL$* -c $(srcdir)/libgcov-merge.c > +$(libgcov-profiler-objects): %$(objext): $(srcdir)/libgcov-profiler.c > + $(gcc_compile) -DL$* -c $(srcdir)/libgcov-profiler.c > +$(libgcov-interface-objects): %$(objext): $(srcdir)/libgcov-interface.c > + $(gcc_compile) -DL$*
Re: Some wide-int review comments
On 11/11/2013 09:42 AM, Richard Biener wrote: On Mon, Nov 11, 2013 at 3:26 PM, Kenneth Zadeck wrote: On 11/11/2013 06:49 AM, Richard Biener wrote: On Sat, Nov 9, 2013 at 3:23 PM, Kenneth Zadeck wrote: On 11/08/2013 05:30 AM, Richard Sandiford wrote: From tree-vrp.c: @@ -1893,6 +1884,10 @@ vrp_int_const_binop (enum tree_code code /* If the singed operation wraps then int_const_binop has done everything we want. */ ; + /* Signed division of -1/0 overflows and by the time it gets here + returns NULL_TREE. */ + else if (!res) +return NULL_TREE; else if ((TREE_OVERFLOW (res) && !TREE_OVERFLOW (val1) && !TREE_OVERFLOW (val2)) Why is this case different from trunk? Or is it a bug-fix independent of wide-int? the api for division is different for wide-int than it was for double-int. But this is using a tree API (int_const_binop) that didn't change (it returned NULL for / 0 previously). So what makes us arrive here now? (I agree there is a bug in VRP, but it shouldn't manifest itself only on wide-int) Richard. My reading of the code is that is that i changed int_const_binop to return null_tree for this case. Trunk has: case TRUNC_DIV_EXPR: case FLOOR_DIV_EXPR: case CEIL_DIV_EXPR: case EXACT_DIV_EXPR: /* This is a shortcut for a common special case. */ if (op2.high == 0 && (HOST_WIDE_INT) op2.low > 0 && !TREE_OVERFLOW (arg1) && !TREE_OVERFLOW (arg2) && op1.high == 0 && (HOST_WIDE_INT) op1.low >= 0) { if (code == CEIL_DIV_EXPR) op1.low += op2.low - 1; res.low = op1.low / op2.low, res.high = 0; break; } /* ... fall through ... */ case ROUND_DIV_EXPR: if (op2.is_zero ()) return NULL_TREE; so it already returns NULL_TREE on divide by zero. I found the reason This is one of the many "tree-vrp was not properly tested for TImode bugs." on the trunk, the case 0/(smallest negative number) case will only trigger overflow in TImode. On the wide-int branch, tree-vrp works at the precision of the operands so overflow is triggered properly for this case.So for HImode, the trunk produces the a result for 0/0x80 and then force_fit code at the bottom of int_const_binop_1 turns this into an overflow tree value rather than a null tree. on the wide-int branch, this case causes the overflow bit to be returned from the wide-int divide because the overflow case is properly handled for all types and that overflow is turned into null_tree by the wide-int version of int_const_binop_1. apparently there are no test cases that exercise the true divide by 0 case but there are test cases that hit the 0/ largest negative number case for modes smaller than TImode. Kenny On the trunk, only rem returns null_tree for divide by 0, on the wide int branch, both div and rem return null tree. I know that this is going to bring on a string of questions that i do not remember the answers to as to why i made that change. but fold-const.c:int_const_binop_1 now returns null_tree and this is just fallout from that change. Kenny Thanks, Richard
Re: [PATCH, MPX, 2/X] Pointers Checker [10/25] Calls copy and verification
On Mon, Nov 11, 2013 at 3:45 PM, Ilya Enkovich wrote: > 2013/11/11 Richard Biener : >> On Mon, Nov 11, 2013 at 3:00 PM, Ilya Enkovich >> wrote: >>> 2013/11/11 Richard Biener : On Fri, Nov 8, 2013 at 11:03 AM, Ilya Enkovich wrote: > 2013/11/8 Richard Biener : >> On Thu, Nov 7, 2013 at 7:55 PM, Jeff Law wrote: >>> On 11/07/13 04:50, Ilya Enkovich wrote: Hi, Here is an updated patch version. >>> >>> I think this needs to hold until we have a consensus on what the >>> parameter >>> passing looks like for bounded pointers. >> >> I still think the best thing to do on GIMPLE is >> >> arg_2 = __builtin_ia32_bnd_arg (arg_1(D)); >> foo (arg_2); >> >> that is, make the parameter an implicit pair of {value, bound} where >> the bound is determined by the value going through a bound association >> builtin. No extra explicit argument to the calls so arguments match >> the fndecl and fntype. All the complexity is defered to the expander >> which can trivially lookup bound arguments via the SSA def (I suppose >> it does that anyway now for getting at the explicit bound argument now). >> >> As far as I can see (well, think), all currently passed bound arguments >> are the return value of such builtin already. > > All bounds are result of different builtin calls. Small example: > > int *global_p; > void foo (int *p) > { > int buf[10]; > bar (p, buf, global_p); > } > > > It is translated into: > > __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40); > __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab)); > global_p.0_2 = global_p; > __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2); > bar (p_3(D)(ab), __bound_tmp.1_6, &buf, __bound_tmp.1_7, > global_p.0_2, __bound_tmp.1_8); > > Bounds binding via calls as you suggest may be done as following: > > __bound_tmp.1_7 = __builtin_ia32_bndmk (&buf, 40); > __bound_tmp.1_6 = __builtin_ia32_arg_bnd (p_3(D)(ab)); > global_p.0_2 = global_p; > __bound_tmp.1_8 = __builtin_ia32_bndldx (&global_p, global_p.0_2); > _9 = __builtin_bind_bounds (p_3(D)(ab), __bound_tmp.1_6); > _10 = __builtin_bind_bounds (&buf, __bound_tmp.1_7); > _11 = __builtin_bind_bounds (global_p.0_2, __bound_tmp.1_8); > bar (_9, _10, _11); > > Is it close to what you propose? Yes. >>> >>> Is there a way to bind bounds with structures in a similar way? >> >> Not to have them easy to lookup in the SSA web. A long time ago >> I proposed to make SSA aggregates possible, so you could do >> >> tem_2 = __internal_bind_bounds (aggr(D), __bound_tmp.1_3, >> __bound_tmp.1_4, ...); >> bar (tem_2); >> >> (originally the SSA aggregates were supposed to make copy-propgagation >> possible using the SSA copy propagator, and of course I needed it for >> the middle-end array work) >> >> Not sure if that will give other people the creeps (expand would >> never expand the "load" from tem_2 but instead handle aggr as >> parameter). A slight complication is due to alias analysis >> which would be need to taught that bar performs a load of aggr. > > It would require bounds loading for aggr before __internal_bind_bounds > anyway. So, why not to do it in expand? Ah, ok I wasn't aware of that. > I just need to mark calls > with a flag (which you've proposed few times already) to let expand > know when it should load bounds. Having SSA aggregates would be nice > but I suspect it has much higher impact then loading bounds in expand. > I want to try simpler variant first. Good. Richard. > Thanks, > Ilya > >> >> Richard. >> >>> For >>> SSA names I may easily find definition and check if it is a binding >>> builtin call. But for structures it is not possible. The way I see it >>> to mark all such args as addressable and load required bounds on >>> expand pass. >>> >>> Ilya Richard. > Ilya >> >> Richard. >> >> >> >>> Jeff
Recent Go patch broke CentOS 5.10 bootstrap
Hello! Building latest gcc+go on CentOS 5.10 breaks with following build failure: ../../../gcc-svn/trunk/libgo/go/net/fd_unix.go:414:72: error: reference to undefined identifier ‘syscall.F_DUPFD_CLOEXEC’ r0, _, e1 := syscall.Syscall(syscall.SYS_FCNTL, uintptr(fd), syscall.F_DUPFD_CLOEXEC, 0) ^ gmake[4]: *** [net.lo] Error 1 gmake[4]: Leaving directory `/home/uros/gcc-build/x86_64-unknown-linux-gnu/libgo' gmake[3]: *** [all-recursive] Error 1 gmake[3]: Leaving directory `/home/uros/gcc-build/x86_64-unknown-linux-gnu/libgo' gmake[2]: *** [all] Error 2 gmake[2]: Leaving directory `/home/uros/gcc-build/x86_64-unknown-linux-gnu/libgo' gmake[1]: *** [all-target-libgo] Error 2 gmake[1]: Leaving directory `/home/uros/gcc-build' gmake: *** [all] Error 2 Uros.
Re: [RFC] libgcov.c re-factoring and offline profile-tool
> Here is the patch that includes profile-tool. > Profile-tool now has two functions: merge and rewrite. I'll add diff later. > > Compiler is tested with spec2006 and profiledbootstrap. > profile-tool is tested with spec2006 profiles. Hi, it would be nice if you could elaborate bit more on the tool and what it does. I plan to implement relink only instrumentatino for LTO and for that I will need profile reading/handling that is independent on functio nbodies, so perhaps we can unify the infrastructure. My plans always was to merge the gcov and profile.c profile handling logic and make it bit more generic. Also I think profile-tool is a bit generic name. I.e. it does not realy say it is related to GCC's coverage profiling. Maybe gcc-profile-tool or gcov-profile-tool may be better? Perhaps an resonable option would also be to bundle it into gcov binary... > Index: libgcc/libgcov-tool.c > === > --- libgcc/libgcov-tool.c (revision 0) > +++ libgcc/libgcov-tool.c (revision 0) > @@ -0,0 +1,800 @@ > +/* GCC instrumentation plugin for ThreadSanitizer. > + Copyright (C) 2011-2013 Free Software Foundation, Inc. > + Contributed by Dmitry Vyukov You need to update this... Why there needs to be libgcov changes? Honza
Re: [PATCH] decide edge's hotness when there is profile info
> 2013-11-08 Teresa Johnson > Jan Hubicka > > * predict.c (drop_profile): New function. > (handle_missing_profiles): Ditto. > (counts_to_freqs): Don't overwrite estimated frequencies > when function has no profile counts. > * predict.h (handle_missing_profiles): Declare. > * tree-inline.c (freqs_to_counts): New function. > (copy_cfg_body): Invoke freqs_to_counts as needed. > * tree-profile.c (tree_profiling): Invoke handle_missing_profiles. > > Index: predict.c > === > --- predict.c (revision 204516) > +++ predict.c (working copy) > @@ -2765,6 +2765,107 @@ estimate_loops (void) >BITMAP_FREE (tovisit); > } > > +/* Drop the profile for NODE to guessed, and update its frequency based on > + whether it is expected to be HOT. */ > + > +static void > +drop_profile (struct cgraph_node *node, bool hot) > +{ > + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); > + > + if (dump_file) > +fprintf (dump_file, > + "Dropping 0 profile for %s/%i. %s based on calls.\n", > + cgraph_node_name (node), node->order, > + hot ? "Function is hot" : "Function is normal"); > + /* We only expect to miss profiles for functions that are reached > + via non-zero call edges in cases where the function may have > + been linked from another module or library (COMDATs and extern > + templates). See the comments below for handle_missing_profiles. */ > + if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl)) > +warning (0, > + "Missing counts for called function %s/%i", > + cgraph_node_name (node), node->order); > + > + profile_status_for_function (fn) > + = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT); > + node->frequency > + = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL; > +} > + > +/* In the case of COMDAT routines, multiple object files will contain the > same > + function and the linker will select one for the binary. In that case > + all the other copies from the profile instrument binary will be missing > + profile counts. Look for cases where this happened, due to non-zero > + call counts going to 0-count functions, and drop the profile to guessed > + so that we can use the estimated probabilities and avoid optimizing only > + for size. > + > + The other case where the profile may be missing is when the routine > + is not going to be emitted to the object file, e.g. for "extern template" > + class methods. Those will be marked DECL_EXTERNAL. Emit a warning in > + all other cases of non-zero calls to 0-count functions. */ > + > +void > +handle_missing_profiles (void) > +{ > + struct cgraph_node *node; > + int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION); > + vec worklist; > + worklist.create (64); > + > + /* See if 0 count function has non-0 count callers. In this case we > + lost some profile. Drop its function profile to PROFILE_GUESSED. */ > + FOR_EACH_DEFINED_FUNCTION (node) > +{ > + struct cgraph_edge *e; > + gcov_type call_count = 0; > + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); > + > + if (node->count) > +continue; > + for (e = node->callers; e; e = e->next_caller) > +call_count += e->count; > + if (call_count > + && fn && fn->cfg > + && (call_count * unlikely_count_fraction >= profile_info->runs)) > +{ > + bool maybe_hot = maybe_hot_count_p (NULL, call_count); > + > + drop_profile (node, maybe_hot); > + worklist.safe_push (node); > +} Perhaps we should add an note/error on mishandled profile when function is not COMDAT? That way we may notice further bugs in this area. > +} > + > + /* Propagate the profile dropping to other 0-count COMDATs that are > + potentially called by COMDATs we already dropped the profile on. */ > + while (worklist.length () > 0) > +{ > + struct cgraph_edge *e; > + > + node = worklist.pop (); > + for (e = node->callees; e; e = e->next_caller) > +{ > + struct cgraph_node *callee = e->callee; > + struct function *fn = DECL_STRUCT_FUNCTION (callee->decl); > + > + if (callee->count > 0) > +continue; > + if (DECL_COMDAT (callee->decl) && fn && fn->cfg > + && profile_status_for_function (fn) == PROFILE_READ) Perhaps we can check here maybe_hot_bb_p for the caller and rely on profile estimate to give us false only for really known to be unlikely paths? (i.e. EH handling, noreturns etc.) > +{ > + /* Since there are no non-0 call counts to this function, > + we don't know for sure whether it is hot. Indicate to > + the drop_profile routine that function should be marked > +
Re: Some wide-int review comments
On Mon, Nov 11, 2013 at 4:04 PM, Kenneth Zadeck wrote: > On 11/11/2013 09:42 AM, Richard Biener wrote: >> >> On Mon, Nov 11, 2013 at 3:26 PM, Kenneth Zadeck >> wrote: >>> >>> On 11/11/2013 06:49 AM, Richard Biener wrote: On Sat, Nov 9, 2013 at 3:23 PM, Kenneth Zadeck wrote: > > On 11/08/2013 05:30 AM, Richard Sandiford wrote: >> >>From tree-vrp.c: >> >> @@ -1893,6 +1884,10 @@ vrp_int_const_binop (enum tree_code code >> /* If the singed operation wraps then int_const_binop has done >>everything we want. */ >> ; >> + /* Signed division of -1/0 overflows and by the time it gets here >> + returns NULL_TREE. */ >> + else if (!res) >> +return NULL_TREE; >> else if ((TREE_OVERFLOW (res) >> && !TREE_OVERFLOW (val1) >> && !TREE_OVERFLOW (val2)) >> >> Why is this case different from trunk? Or is it a bug-fix independent >> of wide-int? > > the api for division is different for wide-int than it was for > double-int. But this is using a tree API (int_const_binop) that didn't change (it returned NULL for / 0 previously). So what makes us arrive here now? (I agree there is a bug in VRP, but it shouldn't manifest itself only on wide-int) Richard. >>> >>> My reading of the code is that is that i changed int_const_binop to >>> return >>> null_tree for this case. >> >> Trunk has: >> >> case TRUNC_DIV_EXPR: >> case FLOOR_DIV_EXPR: case CEIL_DIV_EXPR: >> case EXACT_DIV_EXPR: >>/* This is a shortcut for a common special case. */ >>if (op2.high == 0 && (HOST_WIDE_INT) op2.low > 0 >>&& !TREE_OVERFLOW (arg1) >>&& !TREE_OVERFLOW (arg2) >>&& op1.high == 0 && (HOST_WIDE_INT) op1.low >= 0) >> { >>if (code == CEIL_DIV_EXPR) >> op1.low += op2.low - 1; >> >>res.low = op1.low / op2.low, res.high = 0; >>break; >> } >> >>/* ... fall through ... */ >> >> case ROUND_DIV_EXPR: >>if (op2.is_zero ()) >> return NULL_TREE; >> >> so it already returns NULL_TREE on divide by zero. > > I found the reason This is one of the many "tree-vrp was not properly > tested for TImode bugs." > > on the trunk, the case 0/(smallest negative number) case will only trigger > overflow in TImode. On the wide-int branch, tree-vrp works at the > precision of the operands so overflow is triggered properly for this case. > So for HImode, the trunk produces the a result for 0/0x80 and then force_fit > code at the bottom of int_const_binop_1 turns this into an overflow tree > value rather than a null tree. > > on the wide-int branch, this case causes the overflow bit to be returned > from the wide-int divide because the overflow case is properly handled for > all types and that overflow is turned into null_tree by the wide-int version > of int_const_binop_1. > > apparently there are no test cases that exercise the true divide by 0 case > but there are test cases that hit the 0/ largest negative number case for > modes smaller than TImode. You probably mean / -1? I don't see where NULL_TREE is returned for any overflow case in int_const_binop_1. Ah, you made it so. That looks like a bogus change. What's the reason? int_const_binop_1 is supposed to return a value with TREE_OVERFLOW set in these cases, also required for frontend constant folding. Try compiling const int i = (-__INT_MAX__ - 1) / -1; which should say > ./cc1 -quiet t.c t.c:1:34: warning: integer overflow in expression [-Woverflow] const int i = (-__INT_MAX__ - 1) / -1; ^ but not error or ICE. Seems to work on the branch, but only because the expression is still folded by 12357 /* X / -1 is -X. */ 12358 if (!TYPE_UNSIGNED (type) 12359 && TREE_CODE (arg1) == INTEGER_CST 12360 && wi::eq_p (arg1, -1)) 12361 return fold_convert_loc (loc, type, negate_expr (arg0)); Thus case TRUNC_DIV_EXPR: case EXACT_DIV_EXPR: res = wi::div_trunc (arg1, arg2, sign, &overflow); if (overflow) return NULL_TREE; break; should retain the arg2 == 0 checks (and return NULL_TREE) but otherwise keep overflow handling the same. Richard. > Kenny > >> >>> On the trunk, only rem returns null_tree for divide by 0, on the wide int >>> branch, both div and rem return null tree. >>> >>> I know that this is going to bring on a string of questions that i do not >>> remember the answers to as to why i made that change. but >>> fold-const.c:int_const_binop_1 now returns null_tree and this is just >>> fallout from that change. >>> >>> Kenny >> >> Thanks, >> Richard > > >
Re: Some wide-int review comments
On 11/11/2013 10:04 AM, Kenneth Zadeck wrote: On 11/11/2013 09:42 AM, Richard Biener wrote: On Mon, Nov 11, 2013 at 3:26 PM, Kenneth Zadeck wrote: On 11/11/2013 06:49 AM, Richard Biener wrote: On Sat, Nov 9, 2013 at 3:23 PM, Kenneth Zadeck wrote: On 11/08/2013 05:30 AM, Richard Sandiford wrote: From tree-vrp.c: @@ -1893,6 +1884,10 @@ vrp_int_const_binop (enum tree_code code /* If the singed operation wraps then int_const_binop has done everything we want. */ ; + /* Signed division of -1/0 overflows and by the time it gets here + returns NULL_TREE. */ + else if (!res) +return NULL_TREE; else if ((TREE_OVERFLOW (res) && !TREE_OVERFLOW (val1) && !TREE_OVERFLOW (val2)) Why is this case different from trunk? Or is it a bug-fix independent of wide-int? the api for division is different for wide-int than it was for double-int. But this is using a tree API (int_const_binop) that didn't change (it returned NULL for / 0 previously). So what makes us arrive here now? (I agree there is a bug in VRP, but it shouldn't manifest itself only on wide-int) Richard. My reading of the code is that is that i changed int_const_binop to return null_tree for this case. Trunk has: case TRUNC_DIV_EXPR: case FLOOR_DIV_EXPR: case CEIL_DIV_EXPR: case EXACT_DIV_EXPR: /* This is a shortcut for a common special case. */ if (op2.high == 0 && (HOST_WIDE_INT) op2.low > 0 && !TREE_OVERFLOW (arg1) && !TREE_OVERFLOW (arg2) && op1.high == 0 && (HOST_WIDE_INT) op1.low >= 0) { if (code == CEIL_DIV_EXPR) op1.low += op2.low - 1; res.low = op1.low / op2.low, res.high = 0; break; } /* ... fall through ... */ case ROUND_DIV_EXPR: if (op2.is_zero ()) return NULL_TREE; so it already returns NULL_TREE on divide by zero. I found the reason This is one of the many "tree-vrp was not properly tested for TImode bugs." it is an overstatement to say that this was a timode undertesting issue. I made the change to tree-vrp because of wide-int's treating of all divide/rem overflows in a uniform manner AND the fact that there was a untested divide by zero hole in tree-vrp. I most likely never saw a true divide by zero pass into that section of code. on the trunk, the case 0/(smallest negative number) case will only trigger overflow in TImode. On the wide-int branch, tree-vrp works at the precision of the operands so overflow is triggered properly for this case.So for HImode, the trunk produces the a result for 0/0x80 and then force_fit code at the bottom of int_const_binop_1 turns this into an overflow tree value rather than a null tree. on the wide-int branch, this case causes the overflow bit to be returned from the wide-int divide because the overflow case is properly handled for all types and that overflow is turned into null_tree by the wide-int version of int_const_binop_1. apparently there are no test cases that exercise the true divide by 0 case but there are test cases that hit the 0/ largest negative number case for modes smaller than TImode. Kenny On the trunk, only rem returns null_tree for divide by 0, on the wide int branch, both div and rem return null tree. I know that this is going to bring on a string of questions that i do not remember the answers to as to why i made that change. but fold-const.c:int_const_binop_1 now returns null_tree and this is just fallout from that change. Kenny Thanks, Richard
Re: Some wide-int review comments
On 11/11/2013 10:29 AM, Richard Biener wrote: On Mon, Nov 11, 2013 at 4:04 PM, Kenneth Zadeck wrote: On 11/11/2013 09:42 AM, Richard Biener wrote: On Mon, Nov 11, 2013 at 3:26 PM, Kenneth Zadeck wrote: On 11/11/2013 06:49 AM, Richard Biener wrote: On Sat, Nov 9, 2013 at 3:23 PM, Kenneth Zadeck wrote: On 11/08/2013 05:30 AM, Richard Sandiford wrote: From tree-vrp.c: @@ -1893,6 +1884,10 @@ vrp_int_const_binop (enum tree_code code /* If the singed operation wraps then int_const_binop has done everything we want. */ ; + /* Signed division of -1/0 overflows and by the time it gets here + returns NULL_TREE. */ + else if (!res) +return NULL_TREE; else if ((TREE_OVERFLOW (res) && !TREE_OVERFLOW (val1) && !TREE_OVERFLOW (val2)) Why is this case different from trunk? Or is it a bug-fix independent of wide-int? the api for division is different for wide-int than it was for double-int. But this is using a tree API (int_const_binop) that didn't change (it returned NULL for / 0 previously). So what makes us arrive here now? (I agree there is a bug in VRP, but it shouldn't manifest itself only on wide-int) Richard. My reading of the code is that is that i changed int_const_binop to return null_tree for this case. Trunk has: case TRUNC_DIV_EXPR: case FLOOR_DIV_EXPR: case CEIL_DIV_EXPR: case EXACT_DIV_EXPR: /* This is a shortcut for a common special case. */ if (op2.high == 0 && (HOST_WIDE_INT) op2.low > 0 && !TREE_OVERFLOW (arg1) && !TREE_OVERFLOW (arg2) && op1.high == 0 && (HOST_WIDE_INT) op1.low >= 0) { if (code == CEIL_DIV_EXPR) op1.low += op2.low - 1; res.low = op1.low / op2.low, res.high = 0; break; } /* ... fall through ... */ case ROUND_DIV_EXPR: if (op2.is_zero ()) return NULL_TREE; so it already returns NULL_TREE on divide by zero. I found the reason This is one of the many "tree-vrp was not properly tested for TImode bugs." on the trunk, the case 0/(smallest negative number) case will only trigger overflow in TImode. On the wide-int branch, tree-vrp works at the precision of the operands so overflow is triggered properly for this case. So for HImode, the trunk produces the a result for 0/0x80 and then force_fit code at the bottom of int_const_binop_1 turns this into an overflow tree value rather than a null tree. on the wide-int branch, this case causes the overflow bit to be returned from the wide-int divide because the overflow case is properly handled for all types and that overflow is turned into null_tree by the wide-int version of int_const_binop_1. apparently there are no test cases that exercise the true divide by 0 case but there are test cases that hit the 0/ largest negative number case for modes smaller than TImode. You probably mean / -1? I don't see where NULL_TREE is returned for any overflow case in int_const_binop_1. Ah, you made it so. That looks like a bogus change. What's the reason? int_const_binop_1 is supposed to return a value with TREE_OVERFLOW set in these cases, also required for frontend constant folding. Try compiling const int i = (-__INT_MAX__ - 1) / -1; which should say ./cc1 -quiet t.c t.c:1:34: warning: integer overflow in expression [-Woverflow] const int i = (-__INT_MAX__ - 1) / -1; ^ but not error or ICE. Seems to work on the branch, but only because the expression is still folded by 12357 /* X / -1 is -X. */ 12358 if (!TYPE_UNSIGNED (type) 12359 && TREE_CODE (arg1) == INTEGER_CST 12360 && wi::eq_p (arg1, -1)) 12361 return fold_convert_loc (loc, type, negate_expr (arg0)); Thus case TRUNC_DIV_EXPR: case EXACT_DIV_EXPR: res = wi::div_trunc (arg1, arg2, sign, &overflow); if (overflow) return NULL_TREE; break; should retain the arg2 == 0 checks (and return NULL_TREE) but otherwise keep overflow handling the same. I see your point. i did not think that it was necessary to distinguish the two types of behavior by division. i will make this fix today and test and install it. kenny Richard. Kenny On the trunk, only rem returns null_tree for divide by 0, on the wide int branch, both div and rem return null tree. I know that this is going to bring on a string of questions that i do not remember the answers to as to why i made that change. but fold-const.c:int_const_binop_1 now returns null_tree and this is just fallout from that change. Kenny Thanks, Richard
[gomp4] Fix two typos
Hi! Dunno how I've managed to break the branch on Friday, anyway, here is an obvious fix, committed to the branch. 2013-11-11 Jakub Jelinek * tree-vect-data-refs.c (vect_analyze_data_refs): Check loop->safelen rather than loop->simdlen. * tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise. --- gcc/tree-vect-data-refs.c.jj2013-11-08 17:12:49.0 +0100 +++ gcc/tree-vect-data-refs.c 2013-11-11 16:25:15.363311414 +0100 @@ -2982,7 +2982,7 @@ vect_analyze_data_refs (loop_vec_info lo gimple stmt = gsi_stmt (gsi); if (!find_data_references_in_stmt (loop, stmt, &datarefs)) { - if (is_gimple_call (stmt) && loop->simdlen) + if (is_gimple_call (stmt) && loop->safelen) { tree fndecl = gimple_call_fndecl (stmt), op; if (fndecl != NULL_TREE) --- gcc/tree-vect-stmts.c.jj2013-11-08 17:12:49.0 +0100 +++ gcc/tree-vect-stmts.c 2013-11-11 16:26:43.254861184 +0100 @@ -2334,7 +2340,7 @@ vectorizable_simd_clone_call (gimple stm /* If the function isn't const, only allow it in simd loops where user has asserted that at least nunits consecutive iterations can be performed using SIMD instructions. */ - if ((loop == NULL || loop->simdlen < nunits) && gimple_vuse (stmt)) + if ((loop == NULL || loop->safelen < nunits) && gimple_vuse (stmt)) { arginfo.release (); return false; Jakub
Re: [PATCH] decide edge's hotness when there is profile info
On Mon, Nov 11, 2013 at 7:23 AM, Jan Hubicka wrote: >> 2013-11-08 Teresa Johnson >> Jan Hubicka >> >> * predict.c (drop_profile): New function. >> (handle_missing_profiles): Ditto. >> (counts_to_freqs): Don't overwrite estimated frequencies >> when function has no profile counts. >> * predict.h (handle_missing_profiles): Declare. >> * tree-inline.c (freqs_to_counts): New function. >> (copy_cfg_body): Invoke freqs_to_counts as needed. >> * tree-profile.c (tree_profiling): Invoke handle_missing_profiles. >> >> Index: predict.c >> === >> --- predict.c (revision 204516) >> +++ predict.c (working copy) >> @@ -2765,6 +2765,107 @@ estimate_loops (void) >>BITMAP_FREE (tovisit); >> } >> >> +/* Drop the profile for NODE to guessed, and update its frequency based on >> + whether it is expected to be HOT. */ >> + >> +static void >> +drop_profile (struct cgraph_node *node, bool hot) >> +{ >> + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >> + >> + if (dump_file) >> +fprintf (dump_file, >> + "Dropping 0 profile for %s/%i. %s based on calls.\n", >> + cgraph_node_name (node), node->order, >> + hot ? "Function is hot" : "Function is normal"); >> + /* We only expect to miss profiles for functions that are reached >> + via non-zero call edges in cases where the function may have >> + been linked from another module or library (COMDATs and extern >> + templates). See the comments below for handle_missing_profiles. */ >> + if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl)) >> +warning (0, >> + "Missing counts for called function %s/%i", >> + cgraph_node_name (node), node->order); >> + >> + profile_status_for_function (fn) >> + = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT); >> + node->frequency >> + = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL; >> +} >> + >> +/* In the case of COMDAT routines, multiple object files will contain the >> same >> + function and the linker will select one for the binary. In that case >> + all the other copies from the profile instrument binary will be missing >> + profile counts. Look for cases where this happened, due to non-zero >> + call counts going to 0-count functions, and drop the profile to guessed >> + so that we can use the estimated probabilities and avoid optimizing only >> + for size. >> + >> + The other case where the profile may be missing is when the routine >> + is not going to be emitted to the object file, e.g. for "extern template" >> + class methods. Those will be marked DECL_EXTERNAL. Emit a warning in >> + all other cases of non-zero calls to 0-count functions. */ >> + >> +void >> +handle_missing_profiles (void) >> +{ >> + struct cgraph_node *node; >> + int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION); >> + vec worklist; >> + worklist.create (64); >> + >> + /* See if 0 count function has non-0 count callers. In this case we >> + lost some profile. Drop its function profile to PROFILE_GUESSED. */ >> + FOR_EACH_DEFINED_FUNCTION (node) >> +{ >> + struct cgraph_edge *e; >> + gcov_type call_count = 0; >> + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >> + >> + if (node->count) >> +continue; >> + for (e = node->callers; e; e = e->next_caller) >> +call_count += e->count; >> + if (call_count >> + && fn && fn->cfg >> + && (call_count * unlikely_count_fraction >= profile_info->runs)) >> +{ >> + bool maybe_hot = maybe_hot_count_p (NULL, call_count); >> + >> + drop_profile (node, maybe_hot); >> + worklist.safe_push (node); >> +} > > Perhaps we should add an note/error on mishandled profile when function is > not COMDAT? > That way we may notice further bugs in this area. I have a warning like that already in drop_profile(). Is that sufficient? Also, Steven Bosscher suggested putting that warning under OPT_Wdisabled_optimization instead of '0', what do you prefer for that? >> +} >> + >> + /* Propagate the profile dropping to other 0-count COMDATs that are >> + potentially called by COMDATs we already dropped the profile on. */ >> + while (worklist.length () > 0) >> +{ >> + struct cgraph_edge *e; >> + >> + node = worklist.pop (); >> + for (e = node->callees; e; e = e->next_caller) >> +{ >> + struct cgraph_node *callee = e->callee; >> + struct function *fn = DECL_STRUCT_FUNCTION (callee->decl); >> + >> + if (callee->count > 0) >> +continue; >> + if (DECL_COMDAT (callee->decl) && fn && fn->cfg >> + && profile_status_for_function (fn) == PROFILE_READ) > > Perhaps we can check here maybe_hot_bb_p for the caller and rely on
Re: [RFA][PATCH] Isolate erroneous paths optimization
On 11/11/13 05:16, Richard Biener wrote: On Mon, Nov 11, 2013 at 4:11 AM, Jeff Law wrote: On 11/10/13 05:34, Eric Botcazou wrote: But I think that you cannot transform foo () { *0 = 1; } to __builtin_trap as you can catch the trap via an exception handler in a caller of foo, no? That is true. OK, I can see an argument that when using -fnon-call-exceptions that kind of code should not be changed to call __builtin_trap. That's exactly the reason why gnat.dg/memtrap.adb started to fail after Jeff's patch went it. So, if -fisolate-erroneous-paths isn't amended, we'll need to disable it in Ada like in Go. In that case I think it would be fine to run the isolate paths optimization, but to not omit the actual dereference of the NULL pointer (possibly the dereference could be followed by a trap). This would probably work for Ada as well. OK. It sounds like there's a pretty general consensus that we ought ot go ahead and leave in a load/store of a NULL pointer. I'll go ahead and run with that. I'll probably just emit SSA_NAME = *0, unless someone thinks we ought ot distinguish between loads and stores, emitting SSA_NAME = *0 and *0 = 0 for the two cases respectively. Can't you simply keep the trapping stmt as-is? You eliminate less dead code if the trapping statement is a store. You want all the RHS nonsense in that case to just "go away". jeff
Re: [PATCH] make has_gate and has_execute useless
On Mon, Nov 11, 2013 at 12:58:36PM +0100, Richard Biener wrote: > On Mon, Nov 11, 2013 at 1:39 AM, Trevor Saunders > wrote: > > On Fri, Nov 08, 2013 at 10:37:00AM +0100, Richard Biener wrote: > >> On Thu, Nov 7, 2013 at 5:00 PM, wrote: > >> > From: Trevor Saunders > >> > > >> > Hi, > >> > > >> > This is the result of seeing what it would take to get rid of the > >> > has_gate and > >> > has_execute flags on pass_data. It turns out not much, but I wanted > >> > confirmation this part is ok before I go deal with all the places that > >> > initialize the fields. > >> > > >> > I bootstrapped this part on x86_64-unknown-linux-gnu and saw no new test > >> > suite > >> > regressions (ignoring the silk stuff because the full paths in its test > >> > names > >> > break my test script for now) Any reason this patch with the actual > >> > removal of the flags wouldn't be ok? > >> > >> The has_gate flag is easy to remove (without a TODO_ hack), right? > >> No gate simply means that the gate returns always true. The only > >> weird thing is > >> > >> /* If we have a gate, combine the properties that we could have with > >> and without the pass being examined. */ > >> if (pass->has_gate) > >> properties &= new_properties; > >> else > >> properties = new_properties; > >> > >> which I don't understand (and you just removed all properties handling > >> there). > >> > >> So can you split out removing has_gate? This part is obviously ok. > >> > >> Then, for ->execute () I'd have refactored the code to make > >> ->sub passes explicitely executed by the default ->execute () > >> implementation only. That is, passes without overriding execute > >> are containers only. Can you quickly check whether that would > >> work out? > > > > Ok, I've now given this a shot and wasn't terribly successful, if I just > > change execute_pass_list and execute_ipa_pass_list to handle container > > passes executing their sub passes I get the following ICE > > > > ./gt-passes.h:77:2: internal compiler error: Segmentation fault > > }; > >^ > >0xd43d96 crash_signal > >/src/gcc/gcc/toplev.c:334 > >0xd901a9 ssa_default_def(function*, tree_node*) > >/src/gcc/gcc/tree-dfa.c:318 > >0xb56d77 ipa_analyze_params_uses > >/src/gcc/gcc/ipa-prop.c:2094 > >0xb57275 ipa_analyze_node(cgraph_node*) > >/src/gcc/gcc/ipa-prop.c:2179 > >0x13e2b6d ipcp_generate_summary > >/src/gcc/gcc/ipa-cp.c:3615 > >0xc55a2a > > > > execute_ipa_summary_passes(ipa_opt_pass_d*) > > > > /src/gcc/gcc/passes.c:1991 > >0x943341 ipa_passes > > > > /src/gcc/gcc/cgraphunit.c:2011 > >0x943675 > >compile() > > > > /src/gcc/gcc/cgraphunit.c:2118 > > > >now > > Which is because fn->gimple_df is null. I expect that's because pass > > ordering changed somehow, but I'm not sure off hand how or ifthat's > > something worth figuring out right now. > > Yeah, some of the walking doesn't seem to work ... probably a > pass with sub-passes already has an execute member function ;) Sounds like a reasonable guess. > > Another issue I realized is that doing this will change the order of > > plugin events from > > start container pass a > > end container pass a > > start contained pass b > > end contained pass b > > ... > > > > to > > > > start container pass a > > start contained pass b > > end contained pass b > > end container pass a > > > > Arguably that's an improvement, but I'm not sure if changing that APi > > like that is acceptable. > > Indeed an improvement. Can you attach your patch? ok, done thanks Trev > > Thanks, > Richard. > > > Trev > > > >> > >> Thanks, > >> Richard. > >> > >> > Trev > >> > > >> > 2013-11-06 Trevor Saunders > >> > > >> > * pass_manager.h (pass_manager): Adjust. > >> > * passes.c (opt_pass::execute): Tell the pass manager it doesn't > >> > need > >> > to do anything for this pass. > >> > (pass_manager::register_dump_files_1): Don't uselessly deal with > >> > properties of passes. > >> > (pass_manager::register_dump_files): Adjust. > >> > (dump_one_pass): Just call pass->gate (). > >> > (execute_ipa_summary_passes): Likewise. > >> > (execute_one_pass): Don't check pass->has_exec
Re: [PATCH] preprocessor/58580 - preprocessor goes OOM with warning for zero literals
Jakub Jelinek writes: >> -OBJS-libcommon = diagnostic.o diagnostic-color.o pretty-print.o intl.o >> input.o version.o >> +OBJS-libcommon = diagnostic.o diagnostic-color.o pretty-print.o intl.o >> vec.o input.o version.o > > Too long line? Fixed in my local copy of the patch, thanks. > >> + if (c == '\0') >> +c = ' '; >>pp_character (context->printer, c); > > Does that match how libcpp counts the embedded '\0' character in column > computation? Yes, I think so. _cpp_lex_direct in libcpp/lex.c considers '\0' just like a white space and so the column number is incremented when it's encountered. >> +/* The position (byte count) the the last byte of the line. This >> + normally points to the '\n' character, or to one byte after the >> + last byte of the file, if the file doesn't contain a '\n' >> + character. */ >> +size_t end_pos; > > Does it really help to note this? You can always just walk the line from > start_pos looking for '\n' or end of file. Yes you are right, it's not strictly necessary. But with that end_pos, copying a line is even faster; no need of walking. I thought the goal was to avoid re-doing the work we've already done, as much as possible. > >> +static fcache* >> +add_file_to_cache_tab (const char *file_path) >> +{ >> + static size_t idx; >> + fcache *r; >> + >> + FILE *fp = fopen (file_path, "r"); >> + if (ferror (fp)) >> +{ >> + fclose (fp); >> + return NULL; >> +} >> + >> + r = &fcache_tab[idx]; > > Wouldn't it be better to use some LRU algorithm to determine which > file to kick out of the cache? Have some tick counter of cache lookups (or > creation) and assign the tick counter to the just created resp. used > cache entry, and remove the one with the smallest counter? Hehe, the LRU idea occurred to me too, but I dismissed the idea as something probably over-engineered. But now that you are mentioning it I guess I should give it a try ;-) I'll post a patch about that later then. >> + fcache * r = lookup_file_in_cache_tab (file_path); >> + if (r == NULL) > > Formatting (no space after *, extra space after ==). Fixed in my local copy. Thanks. -- Dodji
Re: [PATCH] Fix asan regtest failures c-c++-common/asan/{memcmp-1.c,strncpy-overflow-1.c}
Jakub Jelinek writes: > On Mon, Nov 11, 2013 at 03:01:53PM +0100, Dodji Seketeli wrote: >> Since a couple of days I am seeing failure on the tests above on my >> Fedora system. The errors look like: >> >> FAIL: c-c++-common/asan/memcmp-1.c -O0 output pattern test, is >> = >> ==21832==ERROR: AddressSanitizer: stack-buffer-overflow on address >> 0x7fff75df96f4 at pc 0x7f98ecbab68b bp 0x7fff75df96b0 sp >> 0x7fff75df95b8 >> READ of size 6 at 0x7fff75df96f4 thread T0 >> #0 0x7f98ecbab68a in __interceptor_memcmp >> /home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:295 >> (discriminator 7) >> #1 0x7f98ecbb6393 in __asan_report_error >> /home/dodji/git/gcc/master/libsanitizer/asan/asan_report.cc:774 >> (discriminator 9) >> #2 0x7f98ecbab6d0 in __interceptor_memcmp >> /home/dodji/git/gcc/master/libsanitizer/asan/asan_interceptors.cc:295 >> (discriminator 7) >> #3 0x400b0a in main >> /home/dodji/git/gcc/master/gcc/testsuite/c-c++-common/asan/memcmp-1.c:14 > > That looks like a bug in libasan, __asan_report_error doesn't call memcmp > again. So something is wrong with getting proper backtrace it seems. Correct. I'll need to get familiar with the backtracing part then, unless someone beats me to it. -- Dodji
Re: [PATCH] decide edge's hotness when there is profile info
> I have a warning like that already in drop_profile(). Is that I think it should be warning (or silent) for COMDAT and error/note for other functions (depending on flag_profile_correction). I guess drop_profile is better place for it indeed. > sufficient? Also, Steven Bosscher suggested putting that warning under > OPT_Wdisabled_optimization instead of '0', what do you prefer for > that? It is a bit different case than other disabled optimizations we have (where the optimization does not happen because of --param tunable limits), but I am fine with both alternatives. The warning may end up quite noisy so having way to silence it definitely makes sense. > > >> +} > >> + > >> + /* Propagate the profile dropping to other 0-count COMDATs that are > >> + potentially called by COMDATs we already dropped the profile on. */ > >> + while (worklist.length () > 0) > >> +{ > >> + struct cgraph_edge *e; > >> + > >> + node = worklist.pop (); > >> + for (e = node->callees; e; e = e->next_caller) > >> +{ > >> + struct cgraph_node *callee = e->callee; > >> + struct function *fn = DECL_STRUCT_FUNCTION (callee->decl); > >> + > >> + if (callee->count > 0) > >> +continue; > >> + if (DECL_COMDAT (callee->decl) && fn && fn->cfg > >> + && profile_status_for_function (fn) == PROFILE_READ) > > > > Perhaps we can check here maybe_hot_bb_p for the caller and rely on profile > > estimate > > to give us false only for really known to be unlikely paths? (i.e. EH > > handling, noreturns > > etc.) > > Ok, let me try this. > > >> +{ > >> + /* Since there are no non-0 call counts to this function, > >> + we don't know for sure whether it is hot. Indicate to > >> + the drop_profile routine that function should be marked > >> + normal, rather than hot. */ > >> + drop_profile (node, false); > >> + worklist.safe_push (callee); > >> +} > >> +} > >> +} > >> + worklist.release (); > > > > I would add a pointer set so we avoid duplicates in worklist. This is > > potentially quadratic > > for large programs. > > I'll add a visited_nodes set to keep track of processed nodes so they > don't get added to the worklist multiple times. Perhaps you can also track this by testing profile_status_for_function. Both solutions are fine for me. Honza > > Thanks, > Teresa > > > > > OK, with these changes. > > > > Honza > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [RFA][PATCH] Isolate erroneous paths optimization
On 11/11/13 02:33, Eric Botcazou wrote: However, that brings up an couple interesting questions. Let's say we find a NULL pointer which reaches a return statement in a function which is marked as returns_nonnull. In that case there is no dereference. Presumably for that kind of scenario we'll just keep the builtin trap. Similarly, assume we extend this pass to detect out-of-bounds array indexing. It's fairly simple to do and has always been in my plan. In that case leaving in the array indexing won't necessarily generate a fault. For those presumably we'll just want the builtin_trap as well? Again, I don't mind inserting a *0, I just want to have a plan for the other undefined behaviours we currently detect and those which I plan on catching soon. The more general problem is that, with -fnon-call-exceptions, we really expect a fully-fledged exception to be raised when something goes wrong. Inserting __builtin_trap doesn't work because it's simply not equivalent to a throw. In other words, if __builtin_throw would be inserted instead of __builtin_trap with -fnon-call-exceptions, things would probably be acceptable as-is. Hmm, maybe that's a better soultion then. When non-call-exceptions is active, throw rather than trap. Seems fairly reasonable. Jeff
Re: [RFA][PATCH] Isolate erroneous paths optimization
On Mon, Nov 11, 2013 at 09:24:27AM -0700, Jeff Law wrote: > On 11/11/13 02:33, Eric Botcazou wrote: > >>However, that brings up an couple interesting questions. > >> > >>Let's say we find a NULL pointer which reaches a return statement in a > >>function which is marked as returns_nonnull. In that case there is no > >>dereference. Presumably for that kind of scenario we'll just keep the > >>builtin trap. > >> > >>Similarly, assume we extend this pass to detect out-of-bounds array > >>indexing. It's fairly simple to do and has always been in my plan. In > >>that case leaving in the array indexing won't necessarily generate a > >>fault. For those presumably we'll just want the builtin_trap as well? > >> > >>Again, I don't mind inserting a *0, I just want to have a plan for the > >>other undefined behaviours we currently detect and those which I plan on > >>catching soon. > > > >The more general problem is that, with -fnon-call-exceptions, we really > >expect > >a fully-fledged exception to be raised when something goes wrong. Inserting > >__builtin_trap doesn't work because it's simply not equivalent to a throw. > >In > >other words, if __builtin_throw would be inserted instead of __builtin_trap > >with -fnon-call-exceptions, things would probably be acceptable as-is. > Hmm, maybe that's a better soultion then. When non-call-exceptions > is active, throw rather than trap. But throw what? It is up to the runtimes of -fnon-call-exceptions languages to decide if they actually want to throw or do something else in the signal handlers, and what exactly to throw. Jakub
Re: [RFA][PATCH] Isolate erroneous paths optimization
On Mon, Nov 11, 2013 at 8:27 AM, Jakub Jelinek wrote: > On Mon, Nov 11, 2013 at 09:24:27AM -0700, Jeff Law wrote: >> On 11/11/13 02:33, Eric Botcazou wrote: >> >>However, that brings up an couple interesting questions. >> >> >> >>Let's say we find a NULL pointer which reaches a return statement in a >> >>function which is marked as returns_nonnull. In that case there is no >> >>dereference. Presumably for that kind of scenario we'll just keep the >> >>builtin trap. >> >> >> >>Similarly, assume we extend this pass to detect out-of-bounds array >> >>indexing. It's fairly simple to do and has always been in my plan. In >> >>that case leaving in the array indexing won't necessarily generate a >> >>fault. For those presumably we'll just want the builtin_trap as well? >> >> >> >>Again, I don't mind inserting a *0, I just want to have a plan for the >> >>other undefined behaviours we currently detect and those which I plan on >> >>catching soon. >> > >> >The more general problem is that, with -fnon-call-exceptions, we really >> >expect >> >a fully-fledged exception to be raised when something goes wrong. Inserting >> >__builtin_trap doesn't work because it's simply not equivalent to a throw. >> >In >> >other words, if __builtin_throw would be inserted instead of __builtin_trap >> >with -fnon-call-exceptions, things would probably be acceptable as-is. >> Hmm, maybe that's a better soultion then. When non-call-exceptions >> is active, throw rather than trap. > > But throw what? It is up to the runtimes of -fnon-call-exceptions languages > to decide if they actually want to throw or do something else in the signal > handlers, and what exactly to throw. Yes. At least for Go it would work fine to call a language defined function for a nil pointer dereference (in fact I already have one you could call), but then LTO might be an issue. Ian
Re: PING Cilk Plus failures on non-LTO targets
On Mon, Nov 4, 2013 at 11:06 AM, David Edelsohn wrote: > Balaji, > > I am seeing a large number of libcilkrts failures on AIX. These all > are of the form: > > Executing on host: /tmp/20131103/gcc/xgcc -B/tmp/20131103/gcc/ > /nasfarm/edelsohn > /src/src/gcc/testsuite/c-c++-common/cilk-plus/CK/sync_wo_spawn.c > -fno-diagnosti > cs-show-caret -fdiagnostics-color=never -O0 -flto -g -fcilkplus > -L/tmp/2013110 > 3/powerpc-ibm-aix7.1.0.0/./libcilkrts/.libs -fcilkplus -S -o sync_wo_spawn.s > (timeout = 300) > cc1: error: LTO support has not been enabled in this configuration > compiler exited with status 1 > > If the feature / functionality requires LTO, the tests should check > that LTO is supported with check_effective_target_lto, either in each > relevant test or in cilk-plus.exp. Balaji, I continue to see extra Cilk Plus failures on AIX. Tests that require LTO should check that the target supports LTO. Thanks, David
Re: [patch] Add CONSTRUCTOR_NO_CLEARING flag
> Ok. Can you update doc/generic.texi? Thanks, done (it was still talking about TREE_LIST). -- Eric Botcazou
Re: [PATCH] decide edge's hotness when there is profile info
On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka wrote: >> I have a warning like that already in drop_profile(). Is that > > I think it should be warning (or silent) for COMDAT and error/note for > other functions (depending on flag_profile_correction). > I guess drop_profile is better place for it indeed. I don't want to warn for COMDAT since this is currently an expected issue in that case, so I'm afraid it will be too noisy (but there is a note emitted to the dump file to track how often this occurs). So currently the warning is only emitted in cases where we don't expect it to occur (e.g. non-comdat). > >> sufficient? Also, Steven Bosscher suggested putting that warning under >> OPT_Wdisabled_optimization instead of '0', what do you prefer for >> that? > > It is a bit different case than other disabled optimizations we have (where > the optimization does not happen because of --param tunable limits), but I > am fine with both alternatives. > The warning may end up quite noisy so having way to silence it definitely > makes sense. I didn't find any warnings being emitted when running this for spec or internal benchmarks, so how about instead of a warning, I handle it like you suggest above: depending on the setting of flag_profile_correction either error or emit a note to the dump file under MSG_MISSED_OPTIMIZATION? >> >> >> +} >> >> + >> >> + /* Propagate the profile dropping to other 0-count COMDATs that are >> >> + potentially called by COMDATs we already dropped the profile on. */ >> >> + while (worklist.length () > 0) >> >> +{ >> >> + struct cgraph_edge *e; >> >> + >> >> + node = worklist.pop (); >> >> + for (e = node->callees; e; e = e->next_caller) >> >> +{ >> >> + struct cgraph_node *callee = e->callee; >> >> + struct function *fn = DECL_STRUCT_FUNCTION (callee->decl); >> >> + >> >> + if (callee->count > 0) >> >> +continue; >> >> + if (DECL_COMDAT (callee->decl) && fn && fn->cfg >> >> + && profile_status_for_function (fn) == PROFILE_READ) >> > >> > Perhaps we can check here maybe_hot_bb_p for the caller and rely on >> > profile estimate >> > to give us false only for really known to be unlikely paths? (i.e. EH >> > handling, noreturns >> > etc.) >> >> Ok, let me try this. >> >> >> +{ >> >> + /* Since there are no non-0 call counts to this function, >> >> + we don't know for sure whether it is hot. Indicate to >> >> + the drop_profile routine that function should be marked >> >> + normal, rather than hot. */ >> >> + drop_profile (node, false); >> >> + worklist.safe_push (callee); >> >> +} >> >> +} >> >> +} >> >> + worklist.release (); >> > >> > I would add a pointer set so we avoid duplicates in worklist. This is >> > potentially quadratic >> > for large programs. >> >> I'll add a visited_nodes set to keep track of processed nodes so they >> don't get added to the worklist multiple times. > > Perhaps you can also track this by testing profile_status_for_function. Both > solutions are fine for me. Ah - I just realized I am already checking profile_status_for_function above and adding to the worklist if it is still PROFILE_READ. Since I call drop_profile before adding to the worklist, we can't end up adding multiple times. So I don't think there is currently an issue with this. Thanks, Teresa > > Honza >> >> Thanks, >> Teresa >> >> > >> > OK, with these changes. >> > >> > Honza >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [patch] Fix PR ada/35998
> Due to the different interfaces of int_size_in_bytes and > simple_type_size_in_bits (and 'size' in add_byte_size_attribute being > unsigned and not [unsigned] HWI) it would be cleaner to > add an early return after the call to int_size_in_bytes if its > return value is -1 (and make sure the return value doesn't > overflow an unsigned int - likewise for simple_type_size_in_bits, > not sure why that case doesn't use int_size_in_bytes as well ...)? Both calls are present in the first version of the function, but I agree that the discrepancy looks strange. Revised version attached, tested {GCC,GDB} on x86_64-suse-linux. 2013-11-11 Eric Botcazou PR ada/35998 * dwarf2out.c (add_byte_size_attribute): Use int_size_in_bytes also for fields. Do not add the attribute if the size is negative. -- Eric BotcazouIndex: dwarf2out.c === --- dwarf2out.c (revision 20) +++ dwarf2out.c (working copy) @@ -16319,11 +16319,13 @@ add_subscript_info (dw_die_ref type_die, } } +/* Add a DW_AT_byte_size attribute to DIE with TREE_NODE's size. */ + static void add_byte_size_attribute (dw_die_ref die, tree tree_node) { dw_die_ref decl_die; - unsigned size; + HOST_WIDE_INT size; switch (TREE_CODE (tree_node)) { @@ -16347,7 +16349,7 @@ add_byte_size_attribute (dw_die_ref die, generally given as the number of bytes normally allocated for an object of the *declared* type of the member itself. This is true even for bit-fields. */ - size = simple_type_size_in_bits (field_type (tree_node)) / BITS_PER_UNIT; + size = int_size_in_bytes (field_type (tree_node)); break; default: gcc_unreachable (); @@ -16356,8 +16358,9 @@ add_byte_size_attribute (dw_die_ref die, /* Note that `size' might be -1 when we get to this point. If it is, that indicates that the byte size of the entity in question is variable. We have no good way of expressing this fact in Dwarf at the present time, - so just let the -1 pass on through. */ - add_AT_unsigned (die, DW_AT_byte_size, size); + when location description was not used by the caller code instead. */ + if (size >= 0) +add_AT_unsigned (die, DW_AT_byte_size, size); } /* For a FIELD_DECL node which represents a bit-field, output an attribute
Re: [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
Hi Yufeng, The idea is a good one but I don't like your implementation of adding an extra expression parameter to look at on the find_basis_for_candidate lookup. This goes against the design of the pass and may not be sufficiently general (might there be situations where a third possible basis could exist?). The overall design is set up to have alternate interpretations of candidates in the candidate table to handle this sort of ambiguity. The goal for your example is create a second candidate (chained to the first one by way of the next_interp field) so that the candidate table looks like this: 8 [2] *_10[j_7(D)] = 2; REF : _10 + ((sizetype) j_7(D) * 4) + 0 : int[20] * basis: 0 dependent: 0 sibling: 0 next-interp: 9 dead-savings: 0 9 [2] *_10[j_7(D)] = 2; REF : _5 + ((sizetype) j_7(D) * 4) + 800 : int[20] * basis: 5 dependent: 0 sibling: 0 next-interp: 0 dead-savings: 0 This will in turn allow subsequent candidates to be seen in terms of either _5 or _10, which may be necessary to avoid missed opportunities. There may be a subsequent REF _15 +... that can be an affine expression of either of these, for example. If you fail to find a basis for a candidate with its first interpretation, you can then follow the next-interp chain to look for a basis for the next one, without the messy passing of extra possibilities to the find-basis routine. I haven't read the patch in detail, but I think this should give you enough to work with to re-design the idea to fit better with the existing framework. Please let me know if you need more information, or if you feel I've misunderstood something. Thanks, Bill On Mon, 2013-11-04 at 18:41 +, Yufeng Zhang wrote: > Hi, > > This patch extends the slsr pass to optionally use an alternative base > expression in finding basis for CAND_REFs. Currently the pass uses > hash-based algorithm to match the base_expr in a candidate. Given a > test case like the following, slsr will not be able to recognize the two > CAND_REFs have the same basis, as their base_expr are of different > SSA_NAMEs: > > typedef int arr_2[20][20]; > > void foo (arr_2 a2, int i, int j) > { >a2[i][j] = 1; >a2[i + 10][j] = 2; > } > > The gimple dump before slsr is like the following (using an > arm-none-eabi gcc): > >i.0_2 = (unsigned int) i_1(D); >_3 = i.0_2 * 80; >_5 = a2_4(D) + _3; >*_5[j_7(D)] = 1; < >_9 = _3 + 800; >_10 = a2_4(D) + _9; >*_10[j_7(D)] = 2; < > > Here are the dumps for the two CAND_REFs generated for the two > statements pointed by the arrows: > > >4 [2] _5 = a2_4(D) + _3; > ADD : a2_4(D) + (80 * i_1(D)) : int[20] * > basis: 0 dependent: 0 sibling: 0 > next-interp: 0 dead-savings: 0 > >8 [2] *_10[j_7(D)] = 2; > REF : _10 + ((sizetype) j_7(D) * 4) + 0 : int[20] * > basis: 5 dependent: 0 sibling: 0 > next-interp: 0 dead-savings: 0 > > As mentioned previously, slsr cannot establish that candidate 4 is the > basis for the candidate 8, as they have different base_exprs: a2_4(D) > and _10, respectively. However, the two references actually only differ > by an immediate offset (800). > > This patch uses the tree affine combination facilities to create an > optional alternative base expression to be used in finding (as well as > recording) the basis. It calls tree_to_aff_combination_expand on > base_expr, reset the offset field of the generated aff_tree to 0 and > generate a tree from it by calling aff_combination_to_tree. > > The new tree is recorded as a potential basis, and when > find_basis_for_candidate fails to find a basis for a CAND_REF in its > normal approach, it searches again using a tree expanded in such way. > Such an expanded tree usually discloses the expression behind an > SSA_NAME. In the example above, instead of seeing the strength > reduction candidate chains like this: > >_5 -> 5 >_10 -> 8 > > we are now having: > >_5 -> 5 >_10 -> 8 >a2_4(D) + (sizetype) i_1(D) * 80 -> 5 -> 8 > > With the candidates 5 and 8 linked to the same tree expression (a2_4(D) > + (sizetype) i_1(D) * 80), slsr is now able to establish that 5 is the > basis of 8. > > The patch doesn't attempt to change the content of any CAND_REF though. > It only enables CAND_REFs which (1) have the same stride and (2) have > the underlying expressions of their base_expr only differ in immediate > offsets, to be recognized to have the same basis. The statements with > such CAND_REFs will be lowered to MEM_REFs, and later on the RTL > expander shall be able to fold and re-associate the immediate offsets to > the rightmost side of the addressing expression, and therefore exposes > the common sub-expression successfully. > > The code-gen difference of the example code on arm with -O2 > -mcpu=cortex-15 is: > > mov r3, r1, asl #6 > - add ip, r0, r2, asl #2 >