DWZ 0.14 released

2021-03-08 Thread Tom de Vries
Hi,

DWZ 0.14 has been released.

You can download dwz from the sourceware FTP server here:

https://sourceware.org/ftp/dwz/releases/
ftp://sourceware.org/pub/dwz/releases/

The vital stats:

  Sizemd5sumName
  184KiB  cf60e4a65d9cc38c7cdb366e9a29ca8e  dwz-0.14.tar.gz
  144KiB  1f1225898bd40d63041d54454fcda5b6  dwz-0.14.tar.xz

There is a web page for DWZ at:

https://sourceware.org/dwz/

DWZ 0.14 includes the following changes and enhancements:

* DWARF 5 support. The tool now handles most of DWARF version 5
  (at least everything emitted by GCC when using -gdwarf-5).

  Not yet supported are DW_UT_type units (DWARF 4 .debug_types
  are supported), .debug_names (.gdb_index is supported) and some
  forms and sections that are only emitted by GCC when
  generating Split DWARF (DW_FORM_strx and .debug_str_offsets,
  DW_FORM_addrx and .debug_addr, DW_FORM_rnglistx and
  DW_FORM_loclistsx). https://sourceware.org/PR24726

* .debug_sup support. DWARF Supplementary Object Files
  (DWARF 5, section 7.3.6) can now be generated when using
  the --dwarf-5 option. To keep compatibility with existing DWARF
  consumers this isn't the default yet.

  Without the --dwarf-5 option instead of a .debug_sup section dwz
  will generate a .gnu_debugaltlink section and will use
  DW_FORM_GNU_strp_alt and DW_FORM_GNU_reg_alt, instead of
  DW_FORM_strp_sup and DW_FORM_ref_sup

* An experimental optimization has been added that exploits the
  One-Definition-Rule of C++.  It's enabled using the --odr option, and
  off by default.  This optimization causes struct/union/class DIEs with
  the same name to be considered equal.  The optimization can be set to
  a lower aggressiveness level using --odr-mode=basic, to possibly be
  able to workaround problems without having to switch off the
  optimization altogether.

* The clean-up of temporary files in hardlink mode has been fixed.

* The DIE limits --low-mem-die-limit  / -l  and
  --max-die-limit  / -L  can now be disabled using respectively
  -l none and -L none.  Note that -l none disables the limit, whereas
  -l 0 sets the limit to zero.

* The usage message has been:
  - updated to show that -r and -M are exclusive.
  - updated to show at -v and -? cannot be combined with other options.
  - extended to list all options in detail.
  - restyled to wrap at 80 chars.

* An option --no-import-optimize was added that switches off an
  optimization that attempts to reduce the number of
  DW_TAG_imported_unit DIEs.  This can be used f.i. in case the
  optimization takes too long.

* A heuristic has been added that claims more memory earlier (without
  increasing the peak memory usage) to improve compression time.

* A heuristic has been added that estimates whether one of the two DIE
  limits will be hit.  If so, it will do an exact DIE count to verify
  this.  If the exact DIE count finds that the low-mem DIE limit is
  indeed hit, processing is done in low-mem mode from the start, rather
  than processing in regular mode first.  If the exact DIE count finds
  that the max DIE limit is indeed hit, processing is skipped
  altogether.

* Various other performance improvements.

* A case where previously we would either hit the assertion
  "dwz: dwz.c:9461: write_die: Assertion `refd != NULL' failed" (in
  regular mode) or a segmentation fault (in low-mem mode), now is
  handled by "dwz: Couldn't find DIE at DW_FORM_ref_addr offset 0x".

* A case where a reference from a partial unit to a compile unit was
  generated has been fixed.  This could happen if a DIE was referenced
  using a CU-relative DWARF operator.

* A case has been fixed for low-mem mode where instead of issuing
  "dwz: Couldn't find DIE referenced by  DW_OP_GNU_implicit_pointer" dwz
  would run into a segfault instead.

* A multi-file case where we run into ".debug_line reference above end
  of section" has been fixed.

* The following assertion failures were fixed:
  - dwz: dwz.c:9310: write_die: Assertion `
  value && refdcu->cu_kind != CU_ALT
' failed.
  - dwz: dwz.c:9920: recompute_abbrevs: Assertion `
  off == cu_size
' failed.

* The assert condition of this assertion has been fixed:
  - write_types: Assertion `ref && ref->die_dup == NULL'.


Duplicate constraints in ipa-pta

2015-10-28 Thread Tom de Vries

Richard,

when compiling this testcase:
...
static int __attribute__((noinline, noclone))
foo (int *a, int *b)
{
  *b = 1;
  *a = 2;
  return *b;
}

int __attribute__((noinline, noclone))
bar (int *a, int *b)
{
  return foo (a, b);
}
...

with -O2 -fipa-pta we find in the pta dumpfile:
...
Generating constraints for bar (bar)

bar.arg0 = &NONLOCAL
bar.arg1 = &NONLOCAL
bar.arg1 = &NONLOCAL
...

The reason for the duplicate last two constraints is that with fipa-pta, 
in create_function_info_for we link the function arguments in a next chain.


And in intra_create_variable_infos there are two iteration mechanism used:
- the loop over the function arguments
- the loop over the vi_next (p) for each function argument p

So when processing argument a, we generate constraints for both a and 
it's next b.

And when processing argument b, we generate constraints for b again.

Is this a known issue? Should we fix this?

Thanks,
- Tom


Re: Duplicate constraints in ipa-pta

2015-10-28 Thread Tom de Vries

On 28/10/15 12:10, Richard Biener wrote:

On Wed, 28 Oct 2015, Tom de Vries wrote:


Richard,

when compiling this testcase:
...
static int __attribute__((noinline, noclone))
foo (int *a, int *b)
{
   *b = 1;
   *a = 2;
   return *b;
}

int __attribute__((noinline, noclone))
bar (int *a, int *b)
{
   return foo (a, b);
}
...

with -O2 -fipa-pta we find in the pta dumpfile:
...
Generating constraints for bar (bar)

bar.arg0 = &NONLOCAL
bar.arg1 = &NONLOCAL
bar.arg1 = &NONLOCAL
...

The reason for the duplicate last two constraints is that with fipa-pta, in
create_function_info_for we link the function arguments in a next chain.

And in intra_create_variable_infos there are two iteration mechanism used:
- the loop over the function arguments
- the loop over the vi_next (p) for each function argument p

So when processing argument a, we generate constraints for both a and it's
next b.
And when processing argument b, we generate constraints for b again.

Is this a known issue? Should we fix this?


Didn't see that yet.  Yes, we should fix it.

Index: gcc/tree-ssa-structalias.c
===
--- gcc/tree-ssa-structalias.c  (revision 229481)
+++ gcc/tree-ssa-structalias.c  (working copy)
@@ -5913,6 +6009,8 @@ intra_create_variable_infos (struct func
 make_constraint_from_global_restrict (p, "PARM_RESTRICT");
   else if (p->may_have_pointers)
 make_constraint_from (p, nonlocal_id);
+ if (p->is_full_var)
+   break;
 }
 }
  }

does for me.  Pre-approved if it passes testing.


Tested and committed. Posting with ChangeLog entry.

Thanks,
- Tom
Generate constraints only once in intra_create_variable_infos

2015-10-28  Tom de Vries  

	* tree-ssa-structalias.c (intra_create_variable_infos): Don't iterate
	into vi_next of a full_var.
---
 gcc/tree-ssa-structalias.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 07ec4a5..06415a2 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -5924,6 +5924,8 @@ intra_create_variable_infos (struct function *fn)
 		make_constraint_from_global_restrict (p, "PARM_RESTRICT", true);
 	  else if (p->may_have_pointers)
 		make_constraint_from (p, nonlocal_id);
+	  if (p->is_full_var)
+		break;
 	}
 	}
 }
-- 
1.9.1



Re: REG_CALL_DECL notes

2016-01-12 Thread Tom de Vries

On 12/01/16 17:10, Jakub Jelinek wrote:

Hi!

What is the reason for these notes?


From https://gcc.gnu.org/ml/gcc-patches/2013-03/msg01316.html:
...
Using the reg-note we are able to easily link call_insns to their 
corresponding declaration, even after the calls may have been split into 
an insn (set register to function address) and a call_insn (call 
register), which can happen for f.i. sh, and mips with -mabi-calls.

...


I mean, for indirect calls usually the argument is NULL, so at least for
that case I'd say
(expr_list:REG_CALL_DECL (nil)
is just a waste of RTL memory, because nothing will really make use of it:

static tree
get_call_fndecl (rtx_insn *insn)
{
   rtx note, datum;

   note = find_reg_note (insn, REG_CALL_DECL, NULL_RTX);
   if (note == NULL_RTX)
 return NULL_TREE;

   datum = XEXP (note, 0);
   if (datum != NULL_RTX)
 return SYMBOL_REF_DECL (datum);

   return NULL_TREE;
}

will return the same thing whether ther is REG_CALL_DECL (nil) or no note at
all.


Agreed.


But even for normal calls, on most targets the decl is often embedded
already somewhere else in the call instruction:
(call_insn 6 5 7 2 (call (mem:QI (symbol_ref:DI ("exit") [flags 0x41] 
) [0 __builtin_exit S1 A8])
 (const_int 0 [0])) mnau.c:1 -1
  (expr_list:REG_CALL_DECL (symbol_ref:DI ("exit") [flags 0x41] )
So, why doesn't say get_call_fndecl use REG_CALL_DECL if present, otherwise
if flag_ipa_ra look up the CALL rtx in the instruction and look up the 
symbol_ref in its
MEM?


AFAIU, the targets for which this regnote matters are sh, and mips with 
-mabi-calls. So we could drop the regnote for all other targets.


If we want to reduce the amount of regnotes also for those two targets, 
we can initially do without, but then once we split the call insn, we'll 
have to add the reg-note.


Thanks,
- Tom


Then, what call.c could do is instead of:
   last = last_call_insn ();
   add_reg_note (last, REG_CALL_DECL, datum);
first call get_call_fndecl (last) and if that returns datum, don't add any
note.




Re: Building gcc with graphite

2016-04-12 Thread Tom de Vries

[ cc-ing gcc ml ]

On 12/04/16 11:22, Kumar, Venkataramanan wrote:

Hi,

I am trying to build gcc with graphite enabled both on trunk and the
graphite branch.



I don't know anything about the graphite branch.


Should I need to build and install cloog , ISL PPL etc?



Trunk needs ISL.


Is there any general recommended steps?


https://gcc.gnu.org/install/prerequisites.html :
...
isl Library version 0.16, 0.15, or 0.14.

Necessary to build GCC with the Graphite loop optimizations. It can be 
downloaded from ftp://gcc.gnu.org/pub/gcc/infrastructure/. If an isl 
source distribution is found in a subdirectory of your GCC sources named 
isl, it will be built together with GCC. Alternatively, the --with-isl 
configure option should be used if isl is not installed in your default 
library search path.

...

Thanks,
- Tom


INSN_CODE used on jump_table_data

2014-04-29 Thread Tom de Vries

Denis,

when building gcc for avr with --enable-checking=yes,rtl , I run into the 
following error:

...
/home/vries/gcc_versions/devel/src/libgcc/unwind-c.c: In function 
‘__gcc_personality_sj0’:
/home/vries/gcc_versions/devel/src/libgcc/unwind-c.c:234:1: internal compiler 
error: RTL check: expected elt 6 type 'i' or 'n', have '0' (rtx jump_table_data) 
in recog_memoized, at recog.h:154

 }
 ^
0xbcb709 rtl_check_failed_type2(rtx_def const*, int, int, int, char const*, int, 
char const*)

/home/vries/gcc_versions/devel/src/gcc/rtl.c:764
0xf85f36 recog_memoized
/home/vries/gcc_versions/devel/src/gcc/recog.h:154
0xf9ccaa avr_adjust_insn_length(rtx_def*, int)
/home/vries/gcc_versions/devel/src/gcc/config/avr/avr.c:7780
0x84c2a9 shorten_branches(rtx_def*)
/home/vries/gcc_versions/devel/src/gcc/final.c:1198
0x85cbc2 rest_of_handle_shorten_branches
/home/vries/gcc_versions/devel/src/gcc/final.c:4519
0x85cc10 execute
/home/vries/gcc_versions/devel/src/gcc/final.c:4549
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
...

AFAIU, the problem is that avr_adjust_insn_length uses recog_memoized, which 
uses INSN_CODE on a jump_table_data.


Thanks,
- Tom


Re: GIMPLE tree dumping of, for example, GIMPLE_OMP_PARALLEL's CHILD_FN

2014-05-14 Thread Tom de Vries
On 21/03/14 17:30, Thomas Schwinge wrote:
> Hi!
> 
> Certain GIMPLE codes, such as OpenMP ones, have a structured block
> attached to them, for exmaple, gcc/gimple.def:GIMPLE_OMP_PARALLEL:
> 
> /* GIMPLE_OMP_PARALLEL  represents
> 
>#pragma omp parallel [CLAUSES]
>BODY
> 
>BODY is a the sequence of statements to be executed by all threads.
> [...]
>CHILD_FN is set when outlining the body of the parallel region.
>All the statements in BODY are moved into this newly created
>function when converting OMP constructs into low-GIMPLE.
> [...]
> DEFGSCODE(GIMPLE_OMP_PARALLEL, "gimple_omp_parallel", 
> GSS_OMP_PARALLEL_LAYOUT)
> 
> Using -ftree-dump-all, I can see this structured block (BODY) getting
> dumped, but it then "disappears" in the ompexp pass', and "reappears" (as
> function main._omp_fn.0) in the next ssa pass' dump.
> 
> If I'm correctly understanding the GCC sources as well as operating GDB,
> in the gimple pass we get main._omp_fn.0 dumped because
> gcc/cgraphunit.c:analyze_functions iterates over all functions
> (analyze_function -> dump_function).  In the following passes,
> presumably, this is not done anymore: omplower, lower, eh, cfg.  In
> ompexp, the GIMPLE_OMP_PARALLEL is expanded into a
> »__builtin_GOMP_parallel (main._omp_fn.0)« call, but the main._omp_fn.0
> is not dumped (and there is no BODY anymore to dump).  In the next ssa
> pass, main._omp_fn.0 again is being dumped, by means of
> gcc/passes.c:do_per_function_toporder (execute_pass_list ->
> execute_one_pass -> execute_function_dump -> dump_function_to_file), as I
> understand it.  What do I need to modify to get main._omp_fn.0 included
> in the dumps before the ssa pass, too?

Hi Thomas,

I think the answer to your question lies in two pieces of code.

1. gcc/omp-low.c:expand_omp_taskreg:
...
  /* Inform the callgraph about the new function.  */
  DECL_STRUCT_FUNCTION (child_fn)->curr_properties = cfun->curr_properties;
  cgraph_add_new_function (child_fn, true);
...
Note, the second parameter of cgraph_add_new_function is 'lowered' and set to 
true:

2.  gcc/cgraphunit.c:analyze_function:
...
  /* Make sure to gimplify bodies only once.  During analyzing a
 function we lower it, which will require gimplified nested
 functions, so we can end up here with an already gimplified
 body.  */
  if (!gimple_has_body_p (decl))
gimplify_function_tree (decl);
  dump_function (TDI_generic, decl);

  /* Lower the function.  */
  if (!node->lowered)
{
  if (node->nested)
lower_nested_functions (node->decl);
  gcc_assert (!node->nested);

  gimple_register_cfg_hooks ();
  bitmap_obstack_initialize (NULL);
  execute_pass_list (cfun, g->get_passes ()->all_lowering_passes);
  free_dominance_info (CDI_POST_DOMINATORS);
  free_dominance_info (CDI_DOMINATORS);
  compact_blocks ();
  bitmap_obstack_release (NULL);
  node->lowered = true;
}
...

The code marked by the parallel directive travels through the passes omplower,
lower, eh, and cfg as a part of main.

In ompexp, it's split off into a new function in expand_omp_taskreg. That new
function is marked as already being lowered.

When encountering the new function in analyze_function (after running the
lowering passes on main), we don't lower the code again. The confusing thing is
that we dump the lowered code in the gimplify dump, which suggest that the
function goes 'missing' in the dumps for a while.

Perhaps it would make more sense in this scenario to dump the new function to
the expand_omp dump.

Thanks,
- Tom


combination of read/write and earlyclobber constraint modifier

2014-07-01 Thread Tom de Vries

Vladimir,

There are a few patterns which use both the read/write constraint modifier (+) 
and the earlyclobber constraint modifier (&):

...
$ grep -c 'match_operand.*+.*&' gcc/config/*/* | grep -v :0
gcc/config/aarch64/aarch64-simd.md:1
gcc/config/arc/arc.md:1
gcc/config/arm/ldmstm.md:30
gcc/config/rs6000/spe.md:8
...

F.i., this one in gcc/config/aarch64/aarch64-simd.md:
...
(define_insn "vec_pack_trunc_"
 [(set (match_operand: 0 "register_operand" "+&w")
   (vec_concat:
 (truncate: (match_operand:VQN 1 "register_operand" "w"))
 (truncate: (match_operand:VQN 2 "register_operand" "w"]
...

The documentation ( 
https://gcc.gnu.org/onlinedocs/gccint/Modifiers.html#Modifiers ) states:

...
'‘&’ does not obviate the need to write ‘=’.
...
which seems to state that '&' implies '='.

An earlyclobber operand is defined as 'modified before the instruction is 
finished using the input operands'. AFAIU that would indeed exclude the 
possibility that the earlyclobber operand is an input/output operand it self, 
but perhaps I misunderstand.


So my question is: is the combination of '&' and '+' supported ? If so, what is 
the exact semantics ? If not, should we warn or give an error ?


Thanks,
- Tom


Re: combination of read/write and earlyclobber constraint modifier

2014-07-01 Thread Tom de Vries

On 01-07-14 21:58, Marc Glisse wrote:

So my question is: is the combination of '&' and '+' supported ? If so,
what is the exact semantics ? If not, should we warn or give an error ?

I don't think we can define any reasonable semantics for &+.  My
recommendation would be for this to be considered a hard error.


Uh? The doc explicitly says "An input operand can be tied to an earlyclobber
operand" and goes on to explain why that is useful. It avoids using the same
register for other input when they are identical.


Hi Marc,

That part of the doc refers to the mulsi3 insn for ARM as example:
...
;; Use `&' and then `0' to prevent the operands 0 and 1 being the same
(define_insn "*arm_mulsi3"
  [(set (match_operand:SI  0 "s_register_operand" "=&r,&r")
(mult:SI (match_operand:SI 2 "s_register_operand" "r,r")
 (match_operand:SI 1 "s_register_operand" "%0,r")))]
  "TARGET_32BIT && !arm_arch6"
  "mul%?\\t%0, %2, %1"
  [(set_attr "type" "mul")
   (set_attr "predicable" "yes")]
)
...

Note that there's no combination of & and + here.

AFAIU, the 'tie' established here is from input operand 1 to an earlyclobber 
output operand 0 using the '0' matching constraint.


Having said that, I don't understand the comment, AFAIU it should be: 'Use '0' 
to make sure operands 0 and 1 are the same, and use '&' to make sure operands 0 
and 2 are not the same.


Thanks,
- Tom


Re: combination of read/write and earlyclobber constraint modifier

2014-07-01 Thread Tom de Vries

On 02-07-14 08:23, Marc Glisse wrote:

On Tue, 1 Jul 2014, Tom de Vries wrote:


On 01-07-14 21:58, Marc Glisse wrote:

So my question is: is the combination of '&' and '+' supported ? If so,
what is the exact semantics ? If not, should we warn or give an error ?

I don't think we can define any reasonable semantics for &+.  My
recommendation would be for this to be considered a hard error.


Uh? The doc explicitly says "An input operand can be tied to an earlyclobber
operand" and goes on to explain why that is useful. It avoids using the same
register for other input when they are identical.


Hi Marc,

That part of the doc refers to the mulsi3 insn for ARM as example:
...
;; Use `&' and then `0' to prevent the operands 0 and 1 being the same
(define_insn "*arm_mulsi3"
 [(set (match_operand:SI  0 "s_register_operand" "=&r,&r")
   (mult:SI (match_operand:SI 2 "s_register_operand" "r,r")
(match_operand:SI 1 "s_register_operand" "%0,r")))]
 "TARGET_32BIT && !arm_arch6"
 "mul%?\\t%0, %2, %1"
 [(set_attr "type" "mul")
  (set_attr "predicable" "yes")]
)
...

Note that there's no combination of & and + here.


I think it could have used (match_dup 0) instead of operand 1, if there had been
only the first alternative. And then the constraint would have been +&.



Marc,

isn't that explicitly listed as unsupported here ( 
https://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html#index-match_005fdup-3244 ):

...
Note that match_dup should not be used to tell the compiler that a particular 
register is being used for two operands (example: add that adds one register to 
another; the second register is both an input operand and the output operand). 
Use a matching constraint (see Simple Constraints) for those. match_dup is for 
the cases where one operand is used in two places in the template, such as an 
instruction that computes both a quotient and a remainder, where the opcode 
takes two input operands but the RTL template has to refer to each of those 
twice; once for the quotient pattern and once for the remainder pattern.

...
?

Thanks,
- Tom


Re: combination of read/write and earlyclobber constraint modifier

2014-07-02 Thread Tom de Vries

On 02-07-14 09:02, Marc Glisse wrote:

Still, the meaning of +&, in inline asm for instance, seems relatively clear, 
no?


I can't find any testsuite examples using this construct.

Furthermore, I'd expect the same semantics and restrictions for constraints in 
rtl templates and inline asm.


So I'm not sure what you mean.

Thanks,
- Tom


Re: combination of read/write and earlyclobber constraint modifier

2014-07-02 Thread Tom de Vries

On 02-07-14 11:36, Marc Glisse wrote:

(did you drop the lists on purpose?)



That was a glitch, sorry.
[ Adds list back ]

Thanks,
- Tom


On Wed, 2 Jul 2014, Tom de Vries wrote:


An earlyclobber operand X prevents *other* input operands from using the same
register, but that does not include X itself (if it is using +) or operands
explicitly using a matching constraint for X. At least that's how I
understand it.


Right, that's another interpretation, which would require a clarification in
the documentation


Sure, improving the doc is always good.


I'm fine with either forbidding &= (as proposed here:
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00094.html ) or clarifying the
semantics in the documentation.


+ is essentially a shortcut for = with a matching constraint in the input
operand, so I don't think it is the right level to forbid anything.





constraints in define_expand

2014-07-04 Thread Tom de Vries

[ was: Re: combination of read/write and earlyclobber constraint modifier ]

On 02-07-14 17:52, Jeff Law wrote:

(by the way, in the same aarch64-simd.md file, I noticed some
define_expand with constraints, that looks strange)

It sometimes happens when a define_insn is converted into a define_expand --
folks just forget to remove the pointless constraints.


I happened to come across some code that is trying to check for this case in 
validate_pattern in genrecog.c:

...
/* In DEFINE_EXPAND, DEFINE_SPLIT, and DEFINE_PEEPHOLE2, we
   don't use the MATCH_OPERAND constraint, only the predicate.
   This is confusing to folks doing new ports, so help them
   not make the mistake.  */
if (GET_CODE (insn) == DEFINE_EXPAND
|| GET_CODE (insn) == DEFINE_SPLIT
|| GET_CODE (insn) == DEFINE_PEEPHOLE2)
  {
if (constraints0)
  error_with_line (pattern_lineno,
   "constraints not supported in %s",
   rtx_name[GET_CODE (insn)]);
  }
...

But it doesn't triggered for define_expand, because validate_pattern is only 
used in genrecog.c, and define_expand is ignored in the main function.


Thanks,
- Tom


[gomp4] openacc kernels directive support

2014-08-06 Thread Tom de Vries

Jakub,

I've looked into how to implement the openacc kernels directive in gcc.

In order to map the loopnests marked by the kernels directive efficiently on 
accelerator hardware, we need parallelization and vectorization.


Focussing on paralellization for the moment, a possibility for paralellization 
is to use the parloops pass. The parloops pass identifies loops that can be 
parallelized with a factor n, splits off the n-reduced loop into a function and 
issues the function in n parallel threads.


A problem with using parloops for the kernels directive is that the parloops 
pass is placed after lto's gimple-stream read/write point, so the parloops pass 
is executed during the accelerator-specific compilation. So while the resulting 
function with the reduced loop is compiled for the accelerator as required, also 
the code issuing the function in parallel threads is generated for the 
accelerator. While f.i. newer cuda with dynamic parallelism supports launching 
accelerator kernels from within accelerator kernels, I guess that that might not 
hold in general.


I've investigated moving the parloops pass up in the pass list, using attached 
example kernels.c.  It contains 4 loops; 2 loops that set arrays, one loop that 
does a vector addition, and one loop that does a reduction


First, I compile the example using upstream trunk:
...
$ gcc -ftree-parallelize-loops=32 -fdump-tree-all-all -O2 kernels.c -std=c99 
-Wl,-rpath,$(pwd -P)/lean-c/install/lib64

$ ./a.out ; echo $?
sum: 4293394432
0
...

All 4 loops are recognized as parallel by parloops:
...
$ egrep 'SUCCES|FAIL' kernels.c.*parloops
   SUCCESS: may be parallelized
   SUCCESS: may be parallelized
   SUCCESS: may be parallelized
   SUCCESS: may be parallelized
...

Using attached patch, I manage the same with parloops placed after 
pass_build_ealias, with some additional passes inbetween:

...
  NEXT_PASS (pass_build_ealias);
  NEXT_PASS (pass_ch);
  NEXT_PASS (pass_ccp);
  NEXT_PASS (pass_lim_aux);
  NEXT_PASS (pass_parallelize_loops);
 ...

The pass_lim_aux in front is needed because otherwise the loads of pointers a, b 
and c stay in the loop and prevent parallelization.


The pass_ccp is to get rid of:
...
phi is i_5 = PHI <0(3)>
arg of phi to exit:   value 0 used outside loop
  checking if it a part of reduction pattern:
  FAILED: it is not a part of reduction.
...

The pass_tree_ch is to get rid of:
...
phi is sum_3 = PHI 
arg of phi to exit:   value sum_1 used outside loop
  checking if it a part of reduction pattern:
  FAILED: it is not a part of reduction.
...

The place after build_ealias is early enough to be before the lto-stream 
write/read. I don't see how we can do this earlier. Before ealias, there's no 
alias info, and one of the loops fails to be recognized as parallel. 
Furthermore, pass_ch, pass_ccp, pass_lim_aux and pass_parloops are written to 
work on cfg/ssa code, which we don't have at omp_low/omp_exp time.


We could insert a pass-group here that only deals with functions that have the 
kernels directive, and do the auto-par thing in a pass_oacc_kernels (which 
should share the majority of the infrastructure with the parloops pass):

...
  NEXT_PASS (pass_build_ealias);
  INSERT_PASSES_AFTER/WITHIN (passes_oacc_kernels)
 NEXT_PASS (pass_ch);
 NEXT_PASS (pass_ccp);
 NEXT_PASS (pass_lim_aux);
 NEXT_PASS (pass_oacc_par);
  POP_INSERT_PASSES ()
...

Any comments, ideas or suggestions ?

Thanks,
- Tom

#include 
#include 

#define N (1024 * 512)
#define N_REF 4293394432

unsigned int *__restrict a;
unsigned int *__restrict b;
unsigned int *__restrict c;

void
init_input (void)
{
  for (unsigned int i = 0; i < N; i++)
a[i] = i * 2;

  for (unsigned int i = 0; i < N; i++)
b[i] = i * 4;
}

void
check_output (void)
{
  unsigned int sum = 0;

  for (unsigned int i = 0; i < N; i++)
sum += c[i];

  printf ("sum: %u\n", sum);

  if (sum != N_REF)
abort ();
}

int
main (void)
{
  unsigned int i;

  a = malloc (N * sizeof (unsigned int));
  b = malloc (N * sizeof (unsigned int));
  c = malloc (N * sizeof (unsigned int));

  init_input ();

  for (int ii = 0; ii < N; ii++)
c[ii] = a[ii] + b[ii];

  check_output ();

  free (a);
  free (b);
  free (c);

  return 0;
}

diff --git a/gcc/passes.def b/gcc/passes.def
index f13df6c..b501d2f 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -72,6 +72,10 @@ along with GCC; see the file COPYING3.  If not see
 	  /* pass_build_ealias is a dummy pass that ensures that we
 	 execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
+	  NEXT_PASS (pass_ch);
+	  NEXT_PASS (pass_ccp);
+	  NEXT_PASS (pass_lim_aux);
+	  NEXT_PASS (pass_parallelize_loops);
 	  NEXT_PASS (pass_fre);
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
@@ -159,7 +163,6 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_tree_if

Re: [gomp4] openacc kernels directive support

2014-08-18 Thread Tom de Vries

On 06-08-14 17:10, Tom de Vries wrote:

The place after build_ealias is early enough to be before the lto-stream
write/read. I don't see how we can do this earlier. Before ealias, there's no
alias info, and one of the loops fails to be recognized as parallel.
Furthermore, pass_ch, pass_ccp, pass_lim_aux and pass_parloops are written to
work on cfg/ssa code, which we don't have at omp_low/omp_exp time.



Slight correction: we do have cfg at omp_exp time.


We could insert a pass-group here that only deals with functions that have the
kernels directive, and do the auto-par thing in a pass_oacc_kernels (which
should share the majority of the infrastructure with the parloops pass):
...
   NEXT_PASS (pass_build_ealias);
   INSERT_PASSES_AFTER/WITHIN (passes_oacc_kernels)
  NEXT_PASS (pass_ch);
  NEXT_PASS (pass_ccp);
  NEXT_PASS (pass_lim_aux);
  NEXT_PASS (pass_oacc_par);
   POP_INSERT_PASSES ()
...

Any comments, ideas or suggestions ?


I've experimented with implementing this on top of gomp-4_0-branch, and I ran 
into PR46032.


PR46032 is about vectorization failure on a function split off by omp 
parallelization. The vectorization fails due to aliasing constraints in the 
split off function, which are not present in the original code.


In the gomp-4_0-branch, the code marked by the openacc kernels directive is 
split off during omp_expand. The generated code has the same additional aliasing 
constraints, and in pass_oacc_par the parallelization fails.


The PR46032 contains a tentative patch by Richard Biener, which applies cleanly 
on top of 4.6 (I haven't yet reached a level of understanding of 
tree-ssa-structalias.c to be able to resolve the conflict in 
intra_create_variable_infos when applying on 4.7). The tentative patch involves 
running ipa-pta, which is also a pass run after the point where we write out the 
lto stream. I'm not sure whether it makes sense to run the pta-ipa pass as part 
of the pass_oacc_kernels pass list.


I see three ways of continuing from here:
- take the tentative patch and make it work, including running pta-ipa during
  passes_oacc_kernels
- same, but try somehow to manage without running pta-ipa.
- try to postpone splitting of the function until the end of pass_oacc_par.

Some advice on how to continue from here would be *highly* appreciated. My hunch 
atm is to investigate the last option.


Thanks,
- Tom



non-reproducible g++.dg/ubsan/align-2.C -Os execution failure

2014-09-04 Thread Tom de Vries

Hi,

I ran into this non-reproducible failure while testing a non-bootstrap build on 
x86_64:

...
PASS: g++.dg/ubsan/align-2.C   -Os  (test for excess errors)
Setting LD_LIBRARY_PATH to 
.:/data/vries/test-fix-fuse-caller-save-s390/with/nobootstrap/build/x86_64-unknown-linux-gnu/./libstdc++-v3/src/.libs:/dat\

a/vries/test-fix-fuse-caller-save-s390/with/nobootstrap/build/x86_64-unknown-linux-gnu/./libstdc++-v3/src/.libs:/home/vries/gcc_versions/data/test-fi\
x-fuse-caller-save-s390/with/nobootstrap/build/gcc:/home/vries/gcc_versions/data/test-fix-fuse-caller-save-s390/with/nobootstrap/build/gcc/32:/data/v\
ries/test-fix-fuse-caller-save-s390/with/nobootstrap/build/x86_64-unknown-linux-gnu/./libsanitizer/ubsan/.libs:.:/data/vries/test-fix-fuse-caller-sav\
e-s390/with/nobootstrap/build/x86_64-unknown-linux-gnu/./libstdc++-v3/src/.libs:/data/vries/test-fix-fuse-caller-save-s390/with/nobootstrap/build/x86\_64-unknown-linux-gnu/./libstdc++-v3/src/.libs:/home/vries/gcc_versions/data/test-fix-fuse-caller-save-s390/with/nobootstrap/build/gcc:/home/vries/gc\
c_versions/data/test-fix-fuse-caller-save-s390/with/nobootstrap/build/gcc/32:/data/vries/test-fix-fuse-caller-save-s390/with/nobootstrap/build/x86_64\
-unknown-linux-gnu/./libsanitizer/ubsan/.libs:/home/vries/gcc_versions/infra/lib
spawn [open ...]^M
/home/vries/gcc_versions/data/test-fix-fuse-caller-save-s390/with/src/gcc/testsuite/g++.dg/ubsan/align-2.C:16:13: 
runtime error: reference binding to\ misaligned address 0x00600fe9 for type 
'int', which requires 4 byte alignment

0x00600fe9: note: pointer points here
 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 
00 00FAIL: g++.dg/ubsan/align-2.C   -Os  execution test

...

The sources used where r214879 + a trivial patch ( 
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00253.html ) which should not make 
a difference on x86_64.


Configure line:
...
Configured with: 
/home/vries/gcc_versions/data/test-fix-fuse-caller-save-s390/with/src/configure 
--prefix=/home/vries/gcc_versions/data/test-fix-fuse-caller-save-s390/with/nobootstrap/install 
--with-cloog=/home/vries/gcc_versions/infra 
--with-ppl=/home/vries/gcc_versions/infra 
--with-gmp=/home/vries/gcc_versions/infra 
--with-mpfr=/home/vries/gcc_versions/infra 
--with-mpc=/home/vries/gcc_versions/infra --disable-bootstrap 
--enable-checking=yes,rtl --enable-languages=c,fortran,ada,java,objc,c++

...

I'm posting it here for reference.

Thanks,
- Tom


Re: [gomp4] openacc kernels directive support

2014-09-09 Thread Tom de Vries

On 18-08-14 14:16, Tom de Vries wrote:

On 06-08-14 17:10, Tom de Vries wrote:

We could insert a pass-group here that only deals with functions that have the
kernels directive, and do the auto-par thing in a pass_oacc_kernels (which
should share the majority of the infrastructure with the parloops pass):
...
   NEXT_PASS (pass_build_ealias);
   INSERT_PASSES_AFTER/WITHIN (passes_oacc_kernels)
  NEXT_PASS (pass_ch);
  NEXT_PASS (pass_ccp);
  NEXT_PASS (pass_lim_aux);
  NEXT_PASS (pass_oacc_par);
   POP_INSERT_PASSES ()
...

Any comments, ideas or suggestions ?


I've experimented with implementing this on top of gomp-4_0-branch, and I ran
into PR46032.

PR46032 is about vectorization failure on a function split off by omp
parallelization. The vectorization fails due to aliasing constraints in the
split off function, which are not present in the original code.

In the gomp-4_0-branch, the code marked by the openacc kernels directive is
split off during omp_expand. The generated code has the same additional aliasing
constraints, and in pass_oacc_par the parallelization fails.

The PR46032 contains a tentative patch by Richard Biener, which applies cleanly
on top of 4.6 (I haven't yet reached a level of understanding of
tree-ssa-structalias.c to be able to resolve the conflict in
intra_create_variable_infos when applying on 4.7). The tentative patch involves
running ipa-pta, which is also a pass run after the point where we write out the
lto stream. I'm not sure whether it makes sense to run the pta-ipa pass as part
of the pass_oacc_kernels pass list.

I see three ways of continuing from here:
- take the tentative patch and make it work, including running pta-ipa during
   passes_oacc_kernels
- same, but try somehow to manage without running pta-ipa.
- try to postpone splitting of the function until the end of pass_oacc_par.

Some advice on how to continue from here would be *highly* appreciated. My hunch
atm is to investigate the last option.



Jakub,
Richard,

I've investigated the last option, and published the current state in git-only 
branch vries/oacc-kernels ( 
https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/vries/oacc-kernels ).


The current state at commit 9255cadc5b6f8f7f4e4506e65a6be7fb3c00cd35 is that:
- a simple loop marked with the oacc kernels directive is analyzed for
   parallelization,
- the loop is then rewritten using oacc parallel and oacc loop directives
- these oacc directives are expanded using omp_expand_local
- this results in the loop being split off into a separate function, while
   the loop is replaced with a GOACC_parallel call
- all this is done before writing out the lto stream
- no support yet for reductions, nested loops, more than one loop nest in
  kernels region

At toplevel, the added pass list looks like this:
...
  NEXT_PASS (pass_build_ealias);
  /* Pass group that runs when there are oacc kernels in the
 function.  */
  NEXT_PASS (pass_oacc_kernels);
  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
  NEXT_PASS (pass_ch_oacc_kernels);
  NEXT_PASS (pass_tree_loop_init);
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_ccp);
  NEXT_PASS (pass_parallelize_loops_oacc_kernels);
  NEXT_PASS (pass_tree_loop_done);
  POP_INSERT_PASSES ()
 ...

The main question I'm currently facing is the following: when to do lowering (in 
other words, rewriting of variable access in terms of .omp_data) of the kernels 
region. There are basically 2 passes that contain code to do this:

- pass_lower_omp (on pre-ssa code)
- pass_parallelize_loops (on ssa code)

Atm I'm using pass_lower_omp, and I've added a patch that handles omp-lowered 
code conservatively in ccp and forwprop in order for the lowering to remain 
until arriving at pass_parallelize_loops_oacc_kernels.


But it might turn out to be easier/necessary to handle this in 
pass_parallelize_loops_oacc_kernels instead.


Any advice on this issue, and on the current implementation is welcome.

Thanks,
- Tom



Re: [gomp4] openacc kernels directive support

2014-09-16 Thread Tom de Vries

On 09-09-14 12:56, Richard Biener wrote:

On Tue, 9 Sep 2014, Tom de Vries wrote:


On 18-08-14 14:16, Tom de Vries wrote:

On 06-08-14 17:10, Tom de Vries wrote:

We could insert a pass-group here that only deals with functions that have
the
kernels directive, and do the auto-par thing in a pass_oacc_kernels (which
should share the majority of the infrastructure with the parloops pass):
...
NEXT_PASS (pass_build_ealias);
INSERT_PASSES_AFTER/WITHIN (passes_oacc_kernels)
   NEXT_PASS (pass_ch);
   NEXT_PASS (pass_ccp);
   NEXT_PASS (pass_lim_aux);
   NEXT_PASS (pass_oacc_par);
POP_INSERT_PASSES ()
...

Any comments, ideas or suggestions ?


I've experimented with implementing this on top of gomp-4_0-branch, and I
ran
into PR46032.

PR46032 is about vectorization failure on a function split off by omp
parallelization. The vectorization fails due to aliasing constraints in the
split off function, which are not present in the original code.


Heh.  At least the omp-low.c parts from comment #1 should be pushed
to trunk...



Hi Richard,

Right, but the intra_create_variable_infos part does not apply cleanly, and I 
don't know yet how to resolve that.



In the gomp-4_0-branch, the code marked by the openacc kernels directive is
split off during omp_expand. The generated code has the same additional
aliasing
constraints, and in pass_oacc_par the parallelization fails.

The PR46032 contains a tentative patch by Richard Biener, which applies
cleanly
on top of 4.6 (I haven't yet reached a level of understanding of
tree-ssa-structalias.c to be able to resolve the conflict in
intra_create_variable_infos when applying on 4.7). The tentative patch
involves
running ipa-pta, which is also a pass run after the point where we write out
the
lto stream. I'm not sure whether it makes sense to run the pta-ipa pass as
part
of the pass_oacc_kernels pass list.


No, that's not even possible I think.



OK, thanks for confirming that.


I see three ways of continuing from here:
- take the tentative patch and make it work, including running pta-ipa
during
passes_oacc_kernels
- same, but try somehow to manage without running pta-ipa.
- try to postpone splitting of the function until the end of pass_oacc_par.


I don't understand the last option?  What is the actual issue you run
into?  You split oacc kernels off and _then_ run "autopar" on the
split-off function (and get additional kernels)?



Let me try to reiterate the problem in more detail.

We're trying to implement the auto-parallelization part of the oacc kernels 
directive using the existing parloops pass. The source starting point is the 
gomp-4_0-branch.  The gomp-4_0-branch has a dummy implementation of the oacc 
kernels directive, analogous to the oacc parallel directive.


So the current gomp-4_0-branch does the following steps for oacc 
parallel/kernels directives:

1. pass_lower_omp/scan_omp:
   - create record type with rewrite vars (.omp_data_t).
   - declare function with arg with type pointer to .omp_data_t.
2. pass_lower_omp/lower_omp:
   - rewrite region in terms of rewrite vars
   - add omp_return at end
3. pass_expand_omp:
   - split off the region into a separate function
   - replace region with call to GOACC_parallel/GOACC_kernels, with function
 pointer as argument

I wrote an example with a single oacc kernels region containing a simple vector 
addition loop, and tried to make auto-parallelization work.


The first problem I ran into was that the parloops pass failed to analyze the 
dependencies in an vector addition example, due to the fact that the region was 
already split off into a separate function, similar to PR46032.


I looked briefly into the patches set in PR46032, but I realized that even if I 
fix it, the next problem I run into will be that the parloops pass is run after 
the lto stream read/write point. So any changes the parloops pass makes at that 
point are in the accelerator compile flow, in other words we're talking about 
launching an accelerator kernel from the accelerator. While that is possible 
with recent cuda accelerators, I guess in general we should not expect that to 
be possible.
[ I also thought of a fancy scheme where we don't split off a new function, but 
manipulate the body of the already split off function, and emit a c file from 
the accelerator compiler containing the parameters that the host compiler should 
use to launch the accelerator kernel... but I guess that would be a last resort. ]


So in order to solve the lto stream read/write point problem, I moved the 
parloops pass (well, a copy called pass_oacc_par or similar) up in the pass 
list, to before lto stream read/write point. That precludes solving the alias 
problem with the PR46032 patch set, since we need ipa for that.


I solved (well, rather prevented) the alias problem by disabling pass_omp_expand

Re: [gomp4] openacc kernels directive support

2014-09-30 Thread Tom de Vries

On 22-09-14 10:28, Richard Biener wrote:

On Tue, 16 Sep 2014, Tom de Vries wrote:


On 09-09-14 12:56, Richard Biener wrote:

On Tue, 9 Sep 2014, Tom de Vries wrote:


On 18-08-14 14:16, Tom de Vries wrote:

On 06-08-14 17:10, Tom de Vries wrote:

We could insert a pass-group here that only deals with functions that
have
the
kernels directive, and do the auto-par thing in a pass_oacc_kernels
(which
should share the majority of the infrastructure with the parloops
pass):
...
 NEXT_PASS (pass_build_ealias);
 INSERT_PASSES_AFTER/WITHIN (passes_oacc_kernels)
NEXT_PASS (pass_ch);
NEXT_PASS (pass_ccp);
NEXT_PASS (pass_lim_aux);
NEXT_PASS (pass_oacc_par);
 POP_INSERT_PASSES ()
...

Any comments, ideas or suggestions ?


I've experimented with implementing this on top of gomp-4_0-branch, and
I
ran
into PR46032.

PR46032 is about vectorization failure on a function split off by omp
parallelization. The vectorization fails due to aliasing constraints in
the
split off function, which are not present in the original code.


Heh.  At least the omp-low.c parts from comment #1 should be pushed
to trunk...



Hi Richard,

Right, but the intra_create_variable_infos part does not apply cleanly, and I
don't know yet how to resolve that.


In the gomp-4_0-branch, the code marked by the openacc kernels directive
is
split off during omp_expand. The generated code has the same additional
aliasing
constraints, and in pass_oacc_par the parallelization fails.

The PR46032 contains a tentative patch by Richard Biener, which applies
cleanly
on top of 4.6 (I haven't yet reached a level of understanding of
tree-ssa-structalias.c to be able to resolve the conflict in
intra_create_variable_infos when applying on 4.7). The tentative patch
involves
running ipa-pta, which is also a pass run after the point where we write
out
the
lto stream. I'm not sure whether it makes sense to run the pta-ipa pass
as
part
of the pass_oacc_kernels pass list.


No, that's not even possible I think.



OK, thanks for confirming that.


I see three ways of continuing from here:
- take the tentative patch and make it work, including running pta-ipa
during
 passes_oacc_kernels
- same, but try somehow to manage without running pta-ipa.
- try to postpone splitting of the function until the end of
pass_oacc_par.


I don't understand the last option?  What is the actual issue you run
into?  You split oacc kernels off and _then_ run "autopar" on the
split-off function (and get additional kernels)?



Let me try to reiterate the problem in more detail.

We're trying to implement the auto-parallelization part of the oacc kernels
directive using the existing parloops pass. The source starting point is the
gomp-4_0-branch.  The gomp-4_0-branch has a dummy implementation of the oacc
kernels directive, analogous to the oacc parallel directive.

So the current gomp-4_0-branch does the following steps for oacc
parallel/kernels directives:
1. pass_lower_omp/scan_omp:
- create record type with rewrite vars (.omp_data_t).
- declare function with arg with type pointer to .omp_data_t.
2. pass_lower_omp/lower_omp:
- rewrite region in terms of rewrite vars
- add omp_return at end
3. pass_expand_omp:
- split off the region into a separate function
- replace region with call to GOACC_parallel/GOACC_kernels, with function
  pointer as argument

I wrote an example with a single oacc kernels region containing a simple
vector addition loop, and tried to make auto-parallelization work.


Ah, so the "target" OACC directive tells it to vectorize only, not to
parallelize?


Hi Richard,

I'm trying to make auto-parallelization work, not vectorization.


And we split off the kernel only because we have to
ship it to the accelerator.


The first problem I ran into was that the parloops pass failed to analyze the
dependencies in an vector addition example, due to the fact that the region
was already split off into a separate function, similar to PR46032.

I looked briefly into the patches set in PR46032, but I realized that even if
I fix it, the next problem I run into will be that the parloops pass is run
after the lto stream read/write point. So any changes the parloops pass makes
at that point are in the accelerator compile flow, in other words we're
talking about launching an accelerator kernel from the accelerator. While that
is possible with recent cuda accelerators, I guess in general we should not
expect that to be possible.


HSA also supports that btw.



OK, good to know.


[ I also thought of a fancy scheme where we don't split off a new function,
but manipulate the body of the already split off function, and emit a c file
from the accelerator compiler containing the parameters that the host compiler
should use to launch the accelerator kernel... but I guess that would be a
last r

oacc kernels directive -- reductions

2014-10-14 Thread Tom de Vries

Hi,

in this email I'm trying to explain in detail what problem I'm running into with 
reductions in oacc kernels region, and how I think it could be solved.


Any advice is welcome.


OVERALL PROBLEM

The overall problem I'm trying to solve is to implement the oacc kernels 
directive in gcc, reusing pass_parallelize_loops.



OACC KERNELS

The oacc kernels region is a region with a series of loop nests, which are 
intended to run on the accelerator. The compiler needs to offload each loop nest 
to the accelerator, in the way most optimal for the accelerator.



PASS_PARALLELIZE_LOOPS

The pass analyzes loops. If the loop iterations are independent, and it looks 
beneficial to parallelize the loop, the loop is transformed.


A copy of the loop is made, that deals with:
- small loop iterations for which the overhead of starting several threads will
  be too big, or
- fixup loop iterations that are left in case the number of iterations is not
  divisible by the parallelization factor.

The original loop is transformed:
- References of local variables are replaced with dereferences of a new
  variable, which are initialized at loop entry with the addresses of the
  original variables (eliminate_local_variables)
- copy loop-non-local variables to a structure, and replace references with
  loads from a pointer to another (similar) structure
  (seperate_decls_in_region)
- The loop is replaced with an GIMPLE_OMP_FOR (with and empty body) and
  GIMPLE_OMP_CONTINUE
- The loop region is enveloped with GIMPLE_OMP_PARALLEL and GIMPLE_OMP_RETURN
- the loop region is omp-expanded using omp_expand_local


STATUS

I've created an initial implementation in vries/oacc-kernels, on top of the 
gomp-4_0-branch.



GOMP-4_0-BRANCH

In the gomp-4_0-branch, the kernels directive is translated as a copy of the 
oacc parallels directive.  So, the following stages are done:

- pass_lower_omp/scan_omp:
  - scan directive body for variables.
  - build up omp_context datastructures.
  - declare struct with fields corresponding to scanned variables.
  - declare function with pointer to struct
- pass_lower_omp/lower_omp:
  - declare struct
  - assign values to struct fields
  - declare pointer to struct
  - rewrite body in terms of struct fields using pointer to struct.
- omp_expand:
  - build up omp_region data-structures
  - split off region in separate function
  - replace region with call to oacc runtime function while passing function
pointer to split off function


VRIES/OACC-KERNELS

The current mechanism of offloading (compiling a function for a different 
architecture) is using the lto-streaming. The parloops pass is located after the 
lto-streaming point which is too late. OTOH, the parloops pass needs alias info, 
which is only available after pass_build_ealias. So a copy of the parloops pass 
specialized for oacc kernels has been added after pass_build_ealias (plus a 
couple of passes to compensate for moving the pass up in the pass list).


The new pass does not use the lowering (first 2 steps of loop transform) of 
parloops. The lowering is already done by pass_omp_lower.


The omp-expansion of the oacc-kernels region (done in gomp-4_0-branch) is 
skipped, to allow first the alias analysis to work on the scope of the intact 
function, and the new pass to do the omp-expansion.


So, the new pass:
- analyses the loop for dependences
- if independent, transforms the loop:
  - The loop is replaced with an GIMPLE_OMP_FOR (kind_oacc_loop, with an empty
body) and GIMPLE_OMP_CONTINUE
  - The GIMPLE_OACC_KERNELS is replaced with GIMPLE_OACC_PARALLEL
  - the loop region is omp-expanded using omp_expand_local

The gotchas of the implementation are:
- no support for reductions, nested loops, more than one loop nest in
  kernels region
- the fixup/low-it-count loop copy is still generated _inside_ the split off
  function


PROBLEM WITH REDUCTIONS

In the vries/oacc-kernels implementation, the lowering of oacc kernels (in 
pass_lower_omp) is done before any loop analysis. For reductions, that's not 
possible anymore, since that would mean that detection of reductions comes after 
handling of reductions.


The problem we're running into here, is that:
- on one hand, the oacc lowering is done on high gimple (scopes still intact
  because GIMPLE_BINDs are still present, no bbs and cfgs, eh not expanded, no
  ssa),
- otoh, loop analysis is done on low ssa gimple (bbs, cfgs, ssa, no scopes, eh
  expanded)

The parloops pass is confronted with a similar problem.

AFAIU, ideal pass reuse for parloops would go something like this: on ssa, you 
do loop analysis. You then insert omp pragmas that indicate what transformations 
you want. Then you go back from ssa gimple to high gimple representation, and 
you run omp-lower and omp-expand to do the actual transformations.


Things have been solved like this in parloops: the lowering of omp-lower is not 
reused in parloops, but instead  a different (but similar) lowering has been 
added. Wha

Re: [PATCH] gcc parallel make check

2014-11-25 Thread Tom de Vries

On 15-09-14 18:05, Jakub Jelinek wrote:

libstdc++-v3/
* testsuite/Makefile.am (check_p_numbers0, check_p_numbers1,
check_p_numbers2, check_p_numbers3, check_p_numbers4,
check_p_numbers5, check_p_numbers6, check_p_numbers,
check_p_subdirs): New variables.
(check_DEJAGNU_normal_targets): Use check_p_subdirs.
(check-DEJAGNU): Rewritten so that for parallelized
testing each job runs all the *.exp files, with
GCC_RUNTEST_PARALLELIZE_DIR set in environment.
* testsuite/Makefile.in: Regenerated.
* testsuite/lib/libstdc++.exp (gcc_parallel_test_run_p,
gcc_parallel_test_enable): New procedures.  If
GCC_RUNTEST_PARALLELIZE_DIR is set in environment, override
runtest_file_p to invoke also gcc_parallel_test_run_p.
* testsuite/libstdc++-abi/abi.exp: Run all the tests serially
by the first parallel runtest encountering it.  Fix up path
of the extract_symvers script.
* testsuite/libstdc++-xmethods/xmethods.exp: Run all the tests
serially by the first parallel runtest encountering it.  Run
dg-finish even in case of error.


When comparing test results of patch builds with test results of reference 
builds, the only differences I'm seeing are random differences in amount of 
'UNSUPPORTED: prettyprinter.exp'.


This patch fixes that by ensuring that we print that unsupported message only 
once.

The resulting test result comparison diff is:
...
--- without/FAIL  2014-11-24 17:46:32.202673282 +0100
+++ with/FAIL 2014-11-25 13:45:15.636131571 +0100
 libstdc++-v3/testsuite/libstdc++.sum:UNSUPPORTED: prettyprinters.exp
-libstdc++-v3/testsuite/libstdc++.sum:UNSUPPORTED: prettyprinters.exp
-libstdc++-v3/testsuite/libstdc++.sum:UNSUPPORTED: prettyprinters.exp
-libstdc++-v3/testsuite/libstdc++.sum:UNSUPPORTED: prettyprinters.exp
-libstdc++-v3/testsuite/libstdc++.sum:UNSUPPORTED: prettyprinters.exp
 libstdc++-v3/testsuite/libstdc++.sum:UNSUPPORTED: xmethods.exp
...

Furthermore, the patch adds a dg-finish in case the prettyprinters.exp file is 
unsupported, which AFAIU is also required in that case.


Bootstrapped and reg-tested on x86_64.

OK for trunk/stage3?

Thanks,
- Tom


2014-11-25  Tom de Vries  

	* testsuite/libstdc++-prettyprinters/prettyprinters.exp: Add missing
	dg-finish.  Only print unsupported message once.
---
 libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp b/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp
index a57660f..e5be5b5 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp
@@ -30,7 +30,14 @@ if ![info exists ::env(GUALITY_GDB_NAME)] {
 }
 
 if {! [gdb_version_check]} {
+dg-finish
+# Only print unsupported message in one instance.
+if ![gcc_parallel_test_run_p prettyprinters] {
+	return
+}
+gcc_parallel_test_enable 0
 unsupported "prettyprinters.exp"
+gcc_parallel_test_enable 1
 return
 }
 
-- 
1.9.1



fn spec attribute on builtin function in fortran

2014-12-01 Thread Tom de Vries

Hi,

I've been adding an fn spec function attribute to some openacc builtin 
functions:
...
diff --git a/gcc/builtin-attrs.def b/gcc/builtin-attrs.def
index 9c05a94..4e34192 100644
--- a/gcc/builtin-attrs.def
+++ b/gcc/builtin-attrs.def
@@ -64,6 +64,7 @@ DEF_ATTR_FOR_INT (6)
   DEF_ATTR_TREE_LIST (ATTR_LIST_##ENUM, ATTR_NULL, \
  ATTR_##ENUM, ATTR_NULL)
 DEF_ATTR_FOR_STRING (STR1, "1")
+DEF_ATTR_FOR_STRING (DOT_DOT_DOT_r_r_r, "...rrr")
 #undef DEF_ATTR_FOR_STRING

 /* Construct a tree for a list of two integers.  */
@@ -127,6 +128,8 @@ DEF_ATTR_TREE_LIST (ATTR_PURE_NOTHROW_LIST, ATTR_PURE,\
ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_PURE_NOTHROW_LEAF_LIST, ATTR_PURE,\
ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_FNSPEC_DOT_DOT_DOT_NOCLOB_NOCLOB_NOCLOB_NOTHROW_LIST,\
+   ATTR_FNSPEC, ATTR_LIST_DOT_DOT_DOT_r_r_r, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LIST, ATTR_NORETURN, \
ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LEAF_LIST, ATTR_NORETURN,\
...

That worked well for c. When compiling the fortran compiler, I ran into this 
error:
...
In file included from gcc/fortran/f95-lang.c:1194:0:
gcc/fortran/../oacc-builtins.def: In function 'void 
gfc_init_builtin_functions()':
gcc/fortran/../oacc-builtins.def:32:1: error: 
'ATTR_FNSPEC_DOT_DOT_DOT_NOCLOB_NOCLOB_NOCLOB_NOTHROW_LIST' was not declared in 
this scope

make[2]: *** [fortran/f95-lang.o] Error 1
...

In fortran, attributes are modelled as integers:
...
/* So far we need just these 7 attribute types.  */
#define ATTR_NULL   0
#define ATTR_LEAF_LIST  (ECF_LEAF)
#define ATTR_NOTHROW_LEAF_LIST  (ECF_NOTHROW | ECF_LEAF)
#define ATTR_NOTHROW_LEAF_MALLOC_LIST   (ECF_NOTHROW | ECF_LEAF | ECF_MALLOC)
#define ATTR_CONST_NOTHROW_LEAF_LIST(ECF_NOTHROW | ECF_LEAF | ECF_CONST)
#define ATTR_PURE_NOTHROW_LEAF_LIST (ECF_NOTHROW | ECF_LEAF | ECF_PURE)
#define ATTR_NOTHROW_LIST   (ECF_NOTHROW)
#define ATTR_CONST_NOTHROW_LIST (ECF_NOTHROW | ECF_CONST)
...

And the attribute ints are passed to gfc_define_builtin:
...
static void
gfc_define_builtin (const char *name, tree type, enum built_in_function code,
const char *library_name, int attr)
{
  tree decl;

  decl = add_builtin_function (name, type, code, BUILT_IN_NORMAL,
   library_name, NULL_TREE);
  set_call_expr_flags (decl, attr);

  set_builtin_decl (code, decl, true);
}
...

which passes it to set_call_expr_flags:
...
void
set_call_expr_flags (tree decl, int flags)
{
  if (flags & ECF_NOTHROW)
TREE_NOTHROW (decl) = 1;
  if (flags & ECF_CONST)
TREE_READONLY (decl) = 1;
  if (flags & ECF_PURE)
DECL_PURE_P (decl) = 1;
  if (flags & ECF_LOOPING_CONST_OR_PURE)
DECL_LOOPING_CONST_OR_PURE_P (decl) = 1;
  if (flags & ECF_NOVOPS)
DECL_IS_NOVOPS (decl) = 1;
  if (flags & ECF_NORETURN)
TREE_THIS_VOLATILE (decl) = 1;
  if (flags & ECF_MALLOC)
DECL_IS_MALLOC (decl) = 1;
  if (flags & ECF_RETURNS_TWICE)
DECL_IS_RETURNS_TWICE (decl) = 1;
  if (flags & ECF_LEAF)
DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("leaf"),
NULL, DECL_ATTRIBUTES (decl));
  if ((flags & ECF_TM_PURE) && flag_tm)
apply_tm_attr (decl, get_identifier ("transaction_pure"));
  /* Looping const or pure is implied by noreturn. 


 There is currently no way to declare looping const or looping pure alone.  
*/
  gcc_assert (!(flags & ECF_LOOPING_CONST_OR_PURE)
  || ((flags & ECF_NORETURN) && (flags & (ECF_CONST | ECF_PURE;
}
...

Note that in case of ECF_LEAF, we set an actual attribute.

So, Is this the moment to define ECF_FNSPEC_DOT_DOT_DOT_r_r_r in tree-core.h, 
and handle it in set_call_expr_flags, or should we add more generic attribute 
handling in f95-lang.c?


Thanks,
- Tom


Re: fn spec attribute on builtin function in fortran

2014-12-01 Thread Tom de Vries

On 01-12-14 09:43, Jakub Jelinek wrote:

On Mon, Dec 01, 2014 at 09:35:25AM +0100, Tom de Vries wrote:

I've been adding an fn spec function attribute to some openacc builtin 
functions:
...
diff --git a/gcc/builtin-attrs.def b/gcc/builtin-attrs.def
index 9c05a94..4e34192 100644
--- a/gcc/builtin-attrs.def
+++ b/gcc/builtin-attrs.def
@@ -64,6 +64,7 @@ DEF_ATTR_FOR_INT (6)
DEF_ATTR_TREE_LIST (ATTR_LIST_##ENUM, ATTR_NULL, \
  ATTR_##ENUM, ATTR_NULL)
  DEF_ATTR_FOR_STRING (STR1, "1")
+DEF_ATTR_FOR_STRING (DOT_DOT_DOT_r_r_r, "...rrr")
  #undef DEF_ATTR_FOR_STRING

  /* Construct a tree for a list of two integers.  */
@@ -127,6 +128,8 @@ DEF_ATTR_TREE_LIST (ATTR_PURE_NOTHROW_LIST, ATTR_PURE,\
ATTR_NULL, ATTR_NOTHROW_LIST)
  DEF_ATTR_TREE_LIST (ATTR_PURE_NOTHROW_LEAF_LIST, ATTR_PURE,\
ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_FNSPEC_DOT_DOT_DOT_NOCLOB_NOCLOB_NOCLOB_NOTHROW_LIST,\
+   ATTR_FNSPEC, ATTR_LIST_DOT_DOT_DOT_r_r_r, ATTR_NOTHROW_LIST)
  DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LIST, ATTR_NORETURN, \
ATTR_NULL, ATTR_NOTHROW_LIST)
  DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LEAF_LIST, ATTR_NORETURN,\
...

That worked well for c. When compiling the fortran compiler, I ran into this 
error:
...
In file included from gcc/fortran/f95-lang.c:1194:0:
gcc/fortran/../oacc-builtins.def: In function 'void 
gfc_init_builtin_functions()':
gcc/fortran/../oacc-builtins.def:32:1: error:
'ATTR_FNSPEC_DOT_DOT_DOT_NOCLOB_NOCLOB_NOCLOB_NOTHROW_LIST' was not declared
in this scope
make[2]: *** [fortran/f95-lang.o] Error 1


Fortran FE uses gfc_build_library_function_decl_with_spec to build these.



Hi Jakub,

Thanks for the pointer, that's useful. That's for library functions though, I 
need a builtin.


I'm now trying the approach where I specify the attributes in two formats:
...
DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC_DATA_START, "GOACC_data_start",
 BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR,
 ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST,
 ATTR_NOTHROW_LIST,
 "...rrr")
...

In gcc/builtins.def, we use the first format (ATTRS):
...
#undef DEF_GOACC_BUILTIN_FNSPEC
#define DEF_GOACC_BUILTIN_FNSPEC(ENUM, NAME, TYPE, ATTRS, ATTRS2, FNSPEC) \
  DEF_GOACC_BUILTIN(ENUM, NAME, TYPE, ATTRS)
...

And in gcc/fortran/f95-lang.c, we use the second format (ATTRS2, FNSPEC) and a 
new function gfc_define_builtin_with_spec:

...
#undef DEF_GOACC_BUILTIN_FNSPEC
#define DEF_GOACC_BUILTIN_FNSPEC(code, name, type, attr, attr2, fnspec) \
  gfc_define_builtin_with_spec ("__builtin_" name, builtin_types[type], \
   code, name, attr2, fnspec);
...

Where gfc_define_builtin_with_spec borrows from 
gfc_build_library_function_decl_with_spec:

...
+static void
+gfc_define_builtin_with_spec (const char *name, tree fntype,
+ enum built_in_function code,
+ const char *library_name, int attr,
+ const char *fnspec)
+{
+  if (fnspec)
+{
+  tree attr_args = build_tree_list (NULL_TREE,
+   build_string (strlen (fnspec), fnspec));
+  tree attrs = tree_cons (get_identifier ("fn spec"),
+ attr_args, TYPE_ATTRIBUTES (fntype));
+  fntype = build_type_attribute_variant (fntype, attrs);
+}
+
+  gfc_define_builtin (name, fntype, code, library_name, attr);
+}
...

Thanks,
- Tom



Unconfirmed boehm-gc test failure

2015-01-13 Thread Tom de Vries

Hi Kai,

I encountered a test failure in boehm-gc ( 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64042 'FAIL: boehm-gc.c/gctest.c 
-O2 execution test' ).


I would like to ask somebody to confirm the PR,  which hopefully should be as 
simple as patching a .exp for iterated running of a single test (see comment 5), 
and running the boehm-gc test suite.


But I'm not sure who to ask, since there's no maintainer listed in MAINTAINERS. 
Any idea?


Thanks,
- Tom


pass_stdarg problem when run after pass_lim

2015-01-29 Thread Tom de Vries

Jakub,

consider attached patch, which adds pass_lim after fre1 (a simplification of my 
oacc kernels patch series).


The included testcase lim-before-stdarg.c fails.

The first sign of trouble is in lim-before-stdarg.c.088t.stdarg (attached):
...
gen_rtvec: va_list escapes 0, needs to save 0 GPR units and 0 FPR units.
...

Because of the 'need to save 0 GPRs units', at expand no prologue is generated 
to dump the varargs in registers onto stack.


However, the varargs are still read from stack and are therefore undefined, 
which valgrind observes:

...
==6254== Conditional jump or move depends on uninitialised value(s)
==6254==at 0x4005AB: gen_rtvec (in a.out)
==6254==by 0x400411: main (in a.out)
...
and as a result the test executable aborts.

AFAIU, stdarg recognizes a va_arg item by looking for 'ap[0].field' references 
(in our example, p.gp_offset) of the form 'ap[0].field = temp' and 'temp = 
ap[0].field'.


With -fno-tree-loop-im, we find both read and write references in the loop:
...
  :
  # i_28 = PHI 
  _12 = p.gp_offset;<<<
  if (_12 > 47)
goto ;
  else
goto ;

  :
  _13 = p.reg_save_area;
  _14 = (sizetype) _12;
  addr.0_15 = _13 + _14;
  _16 = _12 + 8;
  p.gp_offset = _16;<<<
  goto ;

  :
  _18 = p.overflow_arg_area;
  _19 = _18 + 8;
  p.overflow_arg_area = _19;

  :
  # addr.0_3 = PHI 
  _21 = MEM[(void * * {ref-all})addr.0_3];
  rt_val_11->elem[i_28] = _21;
  i_23 = i_28 + 1;
  if (n_9(D) > i_23)
goto ;
  else
goto ;
...

But with -ftree-loop-im, that's no longer the case. We just find one reference, 
before the loop, a read:

...
  :
  __builtin_va_start (&p, 0);
  if (n_8(D) == 0)
goto ;
  else
goto ;

  :
  __builtin_va_end (&p);
  goto ;

  :
  rt_val_12 = rtvec_alloc (n_8(D));
  p_gp_offset_lsm.4_31 = p.gp_offset;   <<<
  _15 = p.reg_save_area;
  p_overflow_arg_area_lsm.6_33 = p.overflow_arg_area;
  if (n_8(D) > 0)
goto ;
  else
goto ;
...

pass_stdarg recognizes the reference as a read in va_list_counter_struct_op, and 
calls va_list_counter_op. But since it's a read that is only executed once, 
there's no effect on cfun->va_list_gpr_size:

...
va_list_counter_op (si=0x7fffd7f0, ap=0x76963540, var=0x7696b948, 
gpr_p=true, write_p=false)

at src/gcc/tree-stdarg.c:323
323   if (si->compute_sizes < 0)
(gdb) n
325   si->compute_sizes = 0;
(gdb)
326   if (si->va_start_count == 1
(gdb)
327   && reachable_at_most_once (si->bb, si->va_start_bb))
(gdb)
326   if (si->va_start_count == 1
(gdb)
328 si->compute_sizes = 1;
(gdb)
330   if (dump_file && (dump_flags & TDF_DETAILS))
(gdb)
339   && (increment = va_list_counter_bump (si, ap, var, gpr_p)) + 1 > 
1)
(gdb)
337   if (write_p
(gdb)
354   if (write_p || !si->compute_sizes)
(gdb)
361 }
...

Do I understand correctly that the assumptions of pass_stdarg are that:
- the reads and writes occur in pairs (I'm guessing that because the read above
  seems to be ignored. Also PR41089 seems to hint at this)
- the related memref occurs in the same loop nesting level as the pair
?

Any advice on how to fix this, or work around it?

Thanks,
- Tom
Run pass_lim after fre1

---
 gcc/passes.def   |  3 ++
 gcc/testsuite/gcc.dg/lim-before-stdarg.c | 67 
 gcc/tree-ssa-loop.c  |  2 +
 3 files changed, 72 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/lim-before-stdarg.c

diff --git a/gcc/passes.def b/gcc/passes.def
index 2bc5dcd..03d749e 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -86,6 +86,9 @@ along with GCC; see the file COPYING3.  If not see
 	 execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
 	  NEXT_PASS (pass_fre);
+	  NEXT_PASS (pass_tree_loop_init);
+	  NEXT_PASS (pass_lim);
+	  NEXT_PASS (pass_tree_loop_done);
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
 	  NEXT_PASS (pass_early_ipa_sra);
diff --git a/gcc/testsuite/gcc.dg/lim-before-stdarg.c b/gcc/testsuite/gcc.dg/lim-before-stdarg.c
new file mode 100644
index 000..c7a6f03
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lim-before-stdarg.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+
+#include 
+
+typedef void *rtx;
+
+struct rtvec
+{
+  rtx elem[100];
+};
+typedef struct rtvec *rtvec;
+
+#define NULL_RTVEC ((void *)0)
+
+rtvec __attribute__((noinline,noclone))
+rtvec_alloc (int n)
+{
+  static struct rtvec v;
+
+  if (n != 2)
+__builtin_abort ();
+
+  return &v;
+}
+
+rtvec __attribute__((noinline,noclone))
+gen_rtvec (int n, ...)
+{
+  int i;
+  rtvec rt_val;
+  va_list p;
+
+  va_start (p, n);
+
+  if (n == 0)
+{
+  va_end (p);
+  return NULL_RTVEC;
+}
+
+  rt_val = rtvec_alloc (n);
+
+  for (i = 0; i < n; i++)
+rt_val->elem[i] = va_arg (p, rtx);
+
+  va_end (p);
+  return rt_val;
+}
+
+int
+main ()
+{
+  int a;
+  in

Re: pass_stdarg problem when run after pass_lim

2015-01-29 Thread Tom de Vries

On 29-01-15 18:25, Jakub Jelinek wrote:

The stdarg pass can't grok too heavy optimizations, so if at all possible,
don't schedule such passes early, and if you for some reason do, avoid
optimizing in there the va_list related accesses.


This patch work for the example.

In pass_lim1, I get:
...
;; Function gen_rtvec (gen_rtvec, funcdef_no=1, decl_uid=1841, cgraph_uid=1, 
symbol_order=1)


va_list_related_stmt_p: no simple_mem_ref
_15 = p.gp_offset;
va_list_related_stmt_p: no simple_mem_ref
_16 = p.reg_save_area;
va_list_related_stmt_p: no simple_mem_ref
p.gp_offset = _21;
va_list_related_stmt_p: no simple_mem_ref
_23 = p.overflow_arg_area;
va_list_related_stmt_p: no simple_mem_ref
p.overflow_arg_area = _25;
va_list_related_stmt_p: MOVE_IMPOSSIBLE
_15 = p.gp_offset;
va_list_related_stmt_p: MOVE_IMPOSSIBLE
_16 = p.reg_save_area;
va_list_related_stmt_p: MOVE_IMPOSSIBLE
_23 = p.overflow_arg_area;
gen_rtvec (int n)
...

Thanks,
- Tom
Handle va_list conservatively in pass_lim

---
 gcc/tree-ssa-loop-im.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 9aba79b..2520fa2 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -70,6 +70,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-propagate.h"
 #include "trans-mem.h"
 #include "gimple-fold.h"
+#include "target.h"
+#include "gimple-walk.h"
 
 /* TODO:  Support for predicated code motion.  I.e.
 
@@ -289,6 +291,32 @@ enum move_pos
   };
 
 
+static tree
+va_list_related_tree_p (tree *t, int *walk_subtrees ATTRIBUTE_UNUSED,
+			void *data ATTRIBUTE_UNUSED)
+{
+  tree cfun_va_list = targetm.fn_abi_va_list (cfun->decl);
+  tree c1, c2, type;
+  if (!DECL_P (*t))
+return NULL_TREE;
+  type = TREE_TYPE (*t);
+  c1 = TYPE_CANONICAL (type);
+  c2 = TYPE_CANONICAL(cfun_va_list);
+
+  if (c1 == c2)
+return *t;
+
+  return NULL_TREE;
+}
+
+bool
+va_list_related_stmt_p (gimple stmt)
+{
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  tree res = walk_gimple_stmt (&gsi, NULL, va_list_related_tree_p, NULL);
+  return res != NULL_TREE;
+}
+
 /* If it is possible to hoist the statement STMT unconditionally,
returns MOVE_POSSIBLE.
If it is possible to hoist the statement STMT, but we must avoid making
@@ -384,6 +412,15 @@ movement_possibility (gimple stmt)
 	}
 }
 
+  if (va_list_related_stmt_p (stmt))
+{
+  if (dump_file)
+	{
+	  fprintf (dump_file, "va_list_related_stmt_p: MOVE_IMPOSSIBLE\n");
+	  print_gimple_stmt (dump_file, stmt, 2, 0);
+	}
+  return MOVE_IMPOSSIBLE;
+}
   return ret;
 }
 
@@ -593,6 +630,16 @@ simple_mem_ref_in_stmt (gimple stmt, bool *is_store)
   if (!gimple_assign_single_p (stmt))
 return NULL;
 
+  if (va_list_related_stmt_p (stmt))
+{
+  if (dump_file)
+	{
+	  fprintf (dump_file, "va_list_related_stmt_p: no simple_mem_ref\n");
+	  print_gimple_stmt (dump_file, stmt, 2, 0);
+	}
+  return NULL;
+}
+
   lhs = gimple_assign_lhs_ptr (stmt);
   rhs = gimple_assign_rhs1_ptr (stmt);
 
-- 
1.9.1



Re: pass_stdarg problem when run after pass_lim

2015-01-30 Thread Tom de Vries

On 30-01-15 09:41, Richard Biener wrote:

I don't like adding more hacks to aid the stdarg pass.
It's not required
for GCC 5 anyway and for GCC 6 we should push the lowering change.


Richard,

I agree that that's the best solution (the posted patch is just a solution that 
helps me along for now).



Maybe you want to pick up the work?


In principle yes, depending on the amount of work (at this point I have no idea 
what remains to be done and how long that would take me).


Michael, are your patches posted somewhere?

Thanks,
- Tom



Re: pass_stdarg problem when run after pass_lim

2015-02-02 Thread Tom de Vries

On 30-01-15 14:11, Michael Matz wrote:

Hi,

On Fri, 30 Jan 2015, Tom de Vries wrote:


Maybe you want to pick up the work?


In principle yes, depending on the amount of work (at this point I have no
idea what remains to be done and how long that would take me).

Michael, are your patches posted somewhere?


I don't think I ever sent them.  Pasted below, from somewhen October last
year.  This essentially moves expanding va_arg to pass_stdarg.  But it
does not yet make use of the possibilities this would bring, namely
throwing away a whole lot of fragile code in pass_stdarg that tries to
recover from expanding va_arg too early.

To avoid having to touch each backend it retains expanding va_arg as a
tree expression that needs to go through the gimplifier, which can create
new basic blocks that need to be discovered after the fact, so there's
some shuffling of code in tree-cfg as well.

I also seem to remember that there was a problem with my using temporaries
of the LHS for the new va_arg internal call, some types can't be copied
and hence no temporaries can be created.  I can't seem to trigger this
right now, but this needs to be dealt with somehow I think (but that
requires the final lvalue be available when lowering the VA_ARG_EXPR).

I think that's about it, hence, updating to current compiler, fixing the
above problem (if it's still one), and then cleaning up pass_stdarg to
make use of the availability of IFN_VA_ARG.



Hi Michael,

thanks for the patch.

FYI I've:
- added -ftree-stdarg-opt to be able to skip the va_list_gpr/fpr_size
  optimization (at least useful for developing this patch series)
- split off the internal-fn.def part (since it triggers rebuild for a lot of
  files)
- split off the part that is just refactoring (to get the patch containing
  the actual changes as small as possible)
- pushed the series to vries/expand-va-arg-at-pass-stdarg

Atm, at least these tests are failing:
...
FAIL: gcc.target/x86_64/abi/callabi/vaarg-4a.c execution test
FAIL: gcc.target/x86_64/abi/callabi/vaarg-5a.c execution test
...

I've minimized the vaarg-4a.c failure, and added it as testcase to the patch 
series as gcc.target/x86_64/abi/callabi/vaarg-4.c.


The problem is in this code:
...
  e = va_arg (argp, char *);
  e = va_arg (argp, char *);
...

which is translated into:
...
  :
  argp.1 = argp_3(D);

  :
  argp.12_11 = &argp.1;
  _12 = *argp.12_11;
  _13 = _12 + 8;
  *argp.12_11 = _13;

  :
  argp.3 = argp_3(D);

  :
  argp.13_15 = &argp.3;
  _16 = *argp.13_15;
  _17 = _16 + 8;
  *argp.13_15 = _17;
  _19 = MEM[(char * *)_16];
  e_8 = _19;
...

We copy the value of argp to a temp (bb2), get the addres of the temp, and use 
it to read the value of the temp, and increment the value of the temp (bb5).


However, subsequently we copy the _unmodified_ value of argp to a second temp 
(bb6), get the addres of that temp, and use it to read and increment (bb7).


Obviously, the first and second read return the same value, while they 
shouldn't.

Thanks,
- Tom



Re: pass_stdarg problem when run after pass_lim

2015-02-03 Thread Tom de Vries

On 02-02-15 16:47, Michael Matz wrote:

Hi,

On Mon, 2 Feb 2015, Tom de Vries wrote:


I've minimized the vaarg-4a.c failure, and added it as testcase to the patch
series as gcc.target/x86_64/abi/callabi/vaarg-4.c.

The problem is in this code:
...
   e = va_arg (argp, char *);
   e = va_arg (argp, char *);
...

which is translated into:
...
   :
   argp.1 = argp_3(D);

   :
   argp.12_11 = &argp.1;
   _12 = *argp.12_11;
   _13 = _12 + 8;
   *argp.12_11 = _13;

   :
   argp.3 = argp_3(D);

   :
   argp.13_15 = &argp.3;
   _16 = *argp.13_15;
   _17 = _16 + 8;
   *argp.13_15 = _17;
   _19 = MEM[(char * *)_16];
   e_8 = _19;
...


That looks like non-x86-64 ABI code.  It builds with -mabi=ms, and it
seems the particular path taken therein doesn't write back to the aplist
if it's not locally created with va_start, but rather given as argument.
Or rather, if it is not addressible (like with x86-64 ABI, where it's
either addressible because of va_start, or is a pointer to struct due to
array decay).  The std_gimplify_va_arg_expr might need more changes.



I've managed to fix that, using these lines in std_gimplify_va_arg_expr:
...
  if (TREE_CODE (tmp) == ADDR_EXPR
  && TREE_OPERAND (tmp, 0) != valist)
{
  /* If we're passing the address of a temp, instead of the addres of
 valist, we need to copy back the value of the temp to valist.  */
  assign = gimple_build_assign (valist, TREE_OPERAND (tmp, 0));
  gimple_seq_add_stmt (pre_p, assign);
}
...
[ I've pushed the current state (now based on a current commit) to 
vries/expand-va-arg-at-pass-stdarg again. ]


Ironically, that fix breaks the va_list_gpr/fpr_size optimization, so I've 
disabled that by default for now.


I've done a non-bootstrap and bootstrap build using all languages.

The non-bootstrap test shows (at least) two classes of real failures:
- gcc.c-torture/execute/20020412-1.c, gcc.target/i386/memcpy-strategy-4.c and
  gcc.dg/lto/20090706-1_0.c.
  These are test-cases with vla as va_arg argument. It ICEs in
  force_constant_size with call stack
  gimplify_va_arg_expr -> create_tmp_var -> gimple_add_tmp_var ->
  force_constant_size
- most/all va_arg tests with flto, f.i. gcc.c-torture/execute/stdarg-1.c.
  It segfaults in lto1 during pass_stdarg, in gimplify_va_arg_internal when
  accessing have_va_type which is NULL_TREE after
  'have_va_type = targetm.canonical_va_list_type (have_va_type)'.

I don't think the flto issue is difficult to fix.  But the vla issue probably 
needs more time than I have available right now.


Thanks,
- Tom



Postpone expanding va_arg until pass_stdarg

2015-02-10 Thread Tom de Vries

[ was: Re: pass_stdarg problem when run after pass_lim ]

On 03-02-15 14:36, Michael Matz wrote:

Hi,

On Tue, 3 Feb 2015, Tom de Vries wrote:


Ironically, that fix breaks the va_list_gpr/fpr_size optimization, so
I've disabled that by default for now.

I've done a non-bootstrap and bootstrap build using all languages.

The non-bootstrap test shows (at least) two classes of real failures:
- gcc.c-torture/execute/20020412-1.c, gcc.target/i386/memcpy-strategy-4.c and
   gcc.dg/lto/20090706-1_0.c.
   These are test-cases with vla as va_arg argument. It ICEs in
   force_constant_size with call stack
   gimplify_va_arg_expr -> create_tmp_var -> gimple_add_tmp_var ->
   force_constant_size


Hah, yeah, that's the issue I remembered with create_tmp_var.  This needs
a change in how to represent the va_arg "call", because the LHS can't be a
temporary that's copied to the real LHS afterwards.


- most/all va_arg tests with flto, f.i. gcc.c-torture/execute/stdarg-1.c.
   It segfaults in lto1 during pass_stdarg, in gimplify_va_arg_internal when
   accessing have_va_type which is NULL_TREE after
   'have_va_type = targetm.canonical_va_list_type (have_va_type)'.

I don't think the flto issue is difficult to fix.  But the vla issue
probably needs more time than I have available right now.


I'll think about this.



A status update. I've worked a bit more on this patch series, latest version 
available at vries/expand-va-arg-at-pass-stdarg (and last patch in series attached).


I've done a non-bootstrap x86_64 build for all languages and ran the regression 
testsuite for unix/ unix/-m32. I'm left with one failing test-case.
[ Of course there are a bunch of scan-dump-tree failures because the 
va_list_gpr/fpr_size optimization is switched off. ]


The patch series handles things now as follows. At gimplify_va_arg_expr, the 
VA_ARG expr is not gimplified, but replaced by the internal function call.


That is passed upwards to gimplify_modify_expr, which does the actual 
gimplification. In this function we have sufficient scope of the problem to deal 
with it.


I've added two modifications to gimplify_modify_expr:
- the WITH_SIZE_EXPR in which the CALL_TREE is wrapped, is dropped after
  gimplification, but we need the size expression at expansion in pass_stdarg.
  So I added the size expression as argument to the internal function.
  [ And at pass_stdarg::execute, we wrap the result of gimplify_va_arg_internal
  in a WITH_SIZE_EXPR before generating the assign to the lhs ]
- we detect after gimplify_arg (&ap) whether it created a copy ap.1 of ap,
  rather than use ap itself, and if so, we copy the value back from ap.1 to ap
  after va_arg.

I've worked around the issue of targetm.canonical_va_list_type (have_va_type) 
returning NULL_TREE in gimplify_va_arg_internal during lto1, by simply working 
with the original type in that case:

...
  tree have_va_type = TREE_TYPE (valist);
  tree cano_type = targetm.canonical_va_list_type (have_va_type);

  if (cano_type != NULL_TREE)
have_va_type = cano_type;
...

I'm not really sure yet why std_gimplify_va_arg_expr has a part commented out. 
Michael, can you comment?



The single failing testcase (both with and without -m32) is 
g++.dg/torture/pr45843.C:

...
./gcc/testsuite/g++/g++.sum:FAIL: g++.dg/torture/pr45843.C   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (internal compiler error)

...

The failure looks like this (it happens during the gimplify_assign after calling 
gimplify_va_arg_internal):

...
src/gcc/testsuite/g++.dg/torture/pr45843.C: In function ‘foo(int, ...)’:
src/gcc/testsuite/g++.dg/torture/pr45843.C:11:1: internal compiler error: 
Segmentation fault

0x10a5b04 crash_signal
src/gcc/toplev.c:383
0x6a8985 tree_check(tree_node*, char const*, int, char const*, tree_code)
src/gcc/tree.h:2845
0x7c2f6a is_really_empty_class(tree_node*)
src/gcc/cp/class.c:7923
0x923855 cp_gimplify_expr(tree_node**, gimple_statement_base**, 
gimple_statement_base**)

src/gcc/cp/cp-gimplify.c:625
0xd34641 gimplify_expr(tree_node**, gimple_statement_base**, 
gimple_statement_base**, bool (*)(tree_node*), int)

src/gcc/gimplify.c:7843
0xd2a04d gimplify_stmt(tree_node**, gimple_statement_base**)
src/gcc/gimplify.c:5551
0xd173e3 gimplify_and_add(tree_node*, gimple_statement_base**)
src/gcc/gimplify.c:419
0xd39c94 gimplify_assign(tree_node*, tree_node*, gimple_statement_base**)
src/gcc/gimplify.c:9452
0x130ad18 execute
src/gcc/tree-stdarg.c:779
...

The testcase contains this struct:
...
struct S { struct T { } a[14]; char b; };
...

and uses that struct S as type in va_arg:
...
  arg = va_arg (ap, struct S);
...

The segfault happens because we're calling is_really_empty_class for struct S, 
and TYPE_BINFO is NULL_TREE, which causes BINFO_BASE_ITERATE to segfault. I'm 
not sure yet what thi

Re: Postpone expanding va_arg until pass_stdarg

2015-02-10 Thread Tom de Vries

On 10-02-15 11:10, Richard Biener wrote:

The single failing testcase (both with and without -m32) is
>g++.dg/torture/pr45843.C:
>...
>./gcc/testsuite/g++/g++.sum:FAIL: g++.dg/torture/pr45843.C   -O2 -flto
>-fno-use-linker-plugin -flto-partition=none  (internal compiler error)
>...
>
>The failure looks like this (it happens during the gimplify_assign after
>calling gimplify_va_arg_internal):
>...
>src/gcc/testsuite/g++.dg/torture/pr45843.C: In function ‘foo(int, ...)’:
>src/gcc/testsuite/g++.dg/torture/pr45843.C:11:1: internal compiler error:
>Segmentation fault
>0x10a5b04 crash_signal
> src/gcc/toplev.c:383
>0x6a8985 tree_check(tree_node*, char const*, int, char const*, tree_code)
> src/gcc/tree.h:2845
>0x7c2f6a is_really_empty_class(tree_node*)
> src/gcc/cp/class.c:7923
>0x923855 cp_gimplify_expr(tree_node**, gimple_statement_base**,
>gimple_statement_base**)
> src/gcc/cp/cp-gimplify.c:625
>0xd34641 gimplify_expr(tree_node**, gimple_statement_base**,
>gimple_statement_base**, bool (*)(tree_node*), int)
> src/gcc/gimplify.c:7843
>0xd2a04d gimplify_stmt(tree_node**, gimple_statement_base**)
> src/gcc/gimplify.c:5551
>0xd173e3 gimplify_and_add(tree_node*, gimple_statement_base**)
> src/gcc/gimplify.c:419
>0xd39c94 gimplify_assign(tree_node*, tree_node*, gimple_statement_base**)
> src/gcc/gimplify.c:9452
>0x130ad18 execute
> src/gcc/tree-stdarg.c:779
>...
>
>The testcase contains this struct:
>...
>struct S { struct T { } a[14]; char b; };
>...
>
>and uses that struct S as type in va_arg:
>...
>   arg = va_arg (ap, struct S);
>...
>
>The segfault happens because we're calling is_really_empty_class for struct
>S, and TYPE_BINFO is NULL_TREE, which causes BINFO_BASE_ITERATE to segfault.
>I'm not sure yet what this issue is or how this is supposed to be fixed.

That's probably free_lang_data being more aggressive after Honza
fiddled with BINFOs?  That is - the gimplifications called from tree-stdarg.c
(and others from the middle-end) should never call back into the frontend
via langhooks...


Hmm, that 'should never' sounds like a missing gcc_assert.

This patch is a way to achieve that gimplification doesn't call the actual 
gimplify_expr langhook, and it fixes the failure. But I'm guessing that's not 
the proper way to fix this.


Thanks,
- Tom

diff --git a/gcc/tree-stdarg.c b/gcc/tree-stdarg.c
index 443f6d3..d6cd52d 100644
--- a/gcc/tree-stdarg.c
+++ b/gcc/tree-stdarg.c
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "input.h"
 #include "function.h"
 #include "langhooks.h"
+#include "langhooks-def.h"
 #include "gimple-pretty-print.h"
 #include "target.h"
 #include "bitmap.h"
@@ -734,6 +735,11 @@ pass_stdarg::execute (function *fun)
   const char *funcname = NULL;
   tree cfun_va_list;
   unsigned int retflags = 0;
+  int (*save_gimplify_expr) (tree *, gimple_seq *, gimple_seq *);
+
+  /* Ensure we don't call language hooks from gimplification.  */
+  save_gimplify_expr = lang_hooks.gimplify_expr;
+  lang_hooks.gimplify_expr = lhd_gimplify_expr;

   /* Expand va_arg.  */
   /* TODO: teach pass_stdarg how process the va_arg builtin, and reverse the
@@ -796,6 +802,9 @@ pass_stdarg::execute (function *fun)
}
 }

+  /* Restore language hook.  */
+  lang_hooks.gimplify_expr = save_gimplify_expr;
+
   if (retflags)
 {
   free_dominance_info (CDI_DOMINATORS);



Re: Postpone expanding va_arg until pass_stdarg

2015-02-11 Thread Tom de Vries

On 10-02-15 17:57, Michael Matz wrote:

Hi,

On Tue, 10 Feb 2015, Tom de Vries wrote:


I've added two modifications to gimplify_modify_expr:
- the WITH_SIZE_EXPR in which the CALL_TREE is wrapped, is dropped after
   gimplification, but we need the size expression at expansion in pass_stdarg.
   So I added the size expression as argument to the internal function.
   [ And at pass_stdarg::execute, we wrap the result of
gimplify_va_arg_internal
   in a WITH_SIZE_EXPR before generating the assign to the lhs ]


Hmm, why do you need the WITH_SIZE_EXPR actually?  For variable-sized
types returned by va_arg?



Yep.


- we detect after gimplify_arg (&ap) whether it created a copy ap.1 of ap,
   rather than use ap itself, and if so, we copy the value back from ap.1 to ap
   after va_arg.


My idea was to not generate temporaries and hence copies for
non-scalar types, but rather construct the "result" of va_arg directly
into the original LHS (that would then also trivially solve the problem of
nno-copyable types).



The copy mentioned here is of ap, not of the result of va_arg.

In gimplify_modify_expr, we're already doing effort not to generate temporaries 
for call lhs, see comment:

...
 To
 prevent gimplify_expr from trying to create a new temporary for
 foo's LHS, we tell it that it should only gimplify until it
 reaches the CALL_EXPR.  On return from gimplify_expr, the newly
 created GIMPLE_CALL  will be the last statement in *PRE_P
 and all we need to do here is set 'a' to be its LHS
...


I'm not really sure yet why std_gimplify_va_arg_expr has a part
commented out. Michael, can you comment?


I think I did that because of SSA form.  The old sequence calculated

   vatmp = valist;
   vatmp = vatmp + boundary-1
   vatmp = vatmp & -boundary

(where the local variable in that function 'valist_tmp' is the tree
VAR_DECL 'vatmp') and then continue to use valist_tmp.  When in SSA form
the gimplifier will rewrite this into:

   vatmp_1 = valist;
   vatmp_2 = vatmp_1 + boundary-1
   vatmp_3 = vatmp_2 & -boundary

but the local valist_tmp variable will continue to be the VAR_DECL, not
the vatmp_3 ssa name.  Basically whenever one gimplifies a MODIFY_EXPR
while in SSA form it's suspicious.  So the new code simply build the
expression:

   ((valist + bound-1) & -bound)

gimplifies that into an rvalue (most probably an SSA name) and uses that
to go on generating code by making valist_tmp be that returned rvalue.

I think you'll find that removing that code will make the SSA verifier
scream or generate invalid code with -m32 when that hook is used.



Thanks for the detailed explanation. I'm not sure I understand the problem well 
enough, so I'll try to trigger it and investigate.


Thanks,
- Tom




Re: Postpone expanding va_arg until pass_stdarg

2015-02-11 Thread Tom de Vries

On 10-02-15 14:46, Richard Biener wrote:

This patch is a way to achieve that gimplification doesn't call the actual
>gimplify_expr langhook, and it fixes the failure. But I'm guessing that's
>not the proper way to fix this.

More like

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 220578)
+++ gcc/tree.c  (working copy)
@@ -5815,6 +5815,7 @@ free_lang_data (void)
   still be used indirectly via the get_alias_set langhook.  */
lang_hooks.dwarf_name = lhd_dwarf_name;
lang_hooks.decl_printable_name = gimple_decl_printable_name;
+  lang_hooks.gimplify_expr = lhd_gimplify_expr;
/* We do not want the default decl_assembler_name implementation,
   rather if we have fixed everything we want a wrapper around it
   asserting that all non-local symbols already got their assembler



That worked, and allowed me to do a bootstrap on x86_64 for all languages and 
regtest for unix/ and unix/-m32 without any issues (other than scan-dump-tree 
failures for stdarg, since the va_list_gpr/fpr_size optimization is switched off).


That leaves just teaching the va_list_gpr/fpr_size optimization to recognize 
ifn_va_arg.


Thanks,
- Tom


Re: Postpone expanding va_arg until pass_stdarg

2015-02-12 Thread Tom de Vries

On 12-02-15 14:57, Michael Matz wrote:

Hi,

On Wed, 11 Feb 2015, Tom de Vries wrote:


My idea was to not generate temporaries and hence copies for
non-scalar types, but rather construct the "result" of va_arg directly
into the original LHS (that would then also trivially solve the
problem of nno-copyable types).


The copy mentioned here is of ap, not of the result of va_arg.


Whoops, I misread, yes.  Thanks.



Hi,

Btw, I'm not happy about the ap copies, but I haven't been able to get rid of 
them.


I'm not really sure yet why std_gimplify_va_arg_expr has a part
commented out. Michael, can you comment?


I think I did that because of SSA form.  The old sequence calculated

vatmp = valist;
vatmp = vatmp + boundary-1
vatmp = vatmp & -boundary

(where the local variable in that function 'valist_tmp' is the tree
VAR_DECL 'vatmp') and then continue to use valist_tmp.  When in SSA form
the gimplifier will rewrite this into:

vatmp_1 = valist;
vatmp_2 = vatmp_1 + boundary-1
vatmp_3 = vatmp_2 & -boundary

but the local valist_tmp variable will continue to be the VAR_DECL, not
the vatmp_3 ssa name.  Basically whenever one gimplifies a MODIFY_EXPR
while in SSA form it's suspicious.  So the new code simply build the
expression:

((valist + bound-1) & -bound)

gimplifies that into an rvalue (most probably an SSA name) and uses that
to go on generating code by making valist_tmp be that returned rvalue.

I think you'll find that removing that code will make the SSA verifier
scream or generate invalid code with -m32 when that hook is used.



Thanks for the detailed explanation. I'm not sure I understand the
problem well enough, so I'll try to trigger it and investigate.


Actually the above fails to mention what the real problem is :-)  The
problem is that the local variable valist_tmp will be used to generate
further code after the above expression is generated.  Without my patch it
will continue to point to the VAR_DECL, not to the SSA name that actually
holds the computed value in the generated code.



I have not been able to reproduce this problem (with a bootstrap build on x86_64 
for all languages, and {unix/,unix/-m32} testing), so I've dropped this bit for now.


I've pushed the latest status to vries/expand-va-arg-at-pass-stdarg.

-ftree-stdarg-opt (the va_list_gpr/fpr_size optimization) has been renabled 
again. I needed patch "Always check phi-ops in optimize_va_list_gpr_fpr_size" 
for that.


With a similar bootstrap and reg-test as described above, there's only one 
failure left:

...
FAIL: gcc.dg/tree-ssa/stdarg-2.c scan-tree-dump stdarg "f15: va_list escapes 0, 
needs to save [148] GPR units and [1-9][0-9]* FPR units"

...
And this is due to the ap copy, which is classified as escape.

[ We're still expanding ifn_va_arg before the va_list_gpr/fpr_size 
optimization. ]

Thanks,
- Tom


Re: Postpone expanding va_arg until pass_stdarg

2015-02-13 Thread Tom de Vries

On 13-02-15 09:57, Richard Biener wrote:

[ We're still expanding ifn_va_arg before the va_list_gpr/fpr_size
>optimization. ]

Yeah, and the point of the exercise was of course to change that;)


Well, there are two parts.

The first is postpone expansion of va_arg to before the va_list_gpr/fpr_size 
optimization. This eliminates disturbance between the expansion and the 
optimization, and  makes pass_stdarg more robust. This allows us to insert 
optimizations before pass_stdarg without breaking pass_starg.


The second is to simplify pass_starg by handling va_arg rather than the 
expansion of va_arg.


I think the first part by itself (which is done now) is already worthwhile on 
its own.


Thanks,
- Tom


Re: Postpone expanding va_arg until pass_stdarg

2015-02-16 Thread Tom de Vries

On 12-02-15 23:51, Tom de Vries wrote:

On 12-02-15 14:57, Michael Matz wrote:



I'm not really sure yet why std_gimplify_va_arg_expr has a part
commented out. Michael, can you comment?


I think I did that because of SSA form.  The old sequence calculated

vatmp = valist;
vatmp = vatmp + boundary-1
vatmp = vatmp & -boundary

(where the local variable in that function 'valist_tmp' is the tree
VAR_DECL 'vatmp') and then continue to use valist_tmp.  When in SSA form
the gimplifier will rewrite this into:

vatmp_1 = valist;
vatmp_2 = vatmp_1 + boundary-1
vatmp_3 = vatmp_2 & -boundary

but the local valist_tmp variable will continue to be the VAR_DECL, not
the vatmp_3 ssa name.  Basically whenever one gimplifies a MODIFY_EXPR
while in SSA form it's suspicious.  So the new code simply build the
expression:

((valist + bound-1) & -bound)

gimplifies that into an rvalue (most probably an SSA name) and uses that
to go on generating code by making valist_tmp be that returned rvalue.

I think you'll find that removing that code will make the SSA verifier
scream or generate invalid code with -m32 when that hook is used.



Thanks for the detailed explanation. I'm not sure I understand the
problem well enough, so I'll try to trigger it and investigate.


Actually the above fails to mention what the real problem is :-)  The
problem is that the local variable valist_tmp will be used to generate
further code after the above expression is generated.  Without my patch it
will continue to point to the VAR_DECL, not to the SSA name that actually
holds the computed value in the generated code.



I have not been able to reproduce this problem (with a bootstrap build on x86_64
for all languages, and {unix/,unix/-m32} testing), so I've dropped this bit for
now.


Hi Michael,

Just to double-check, I added an assert to detect triggering of the 'dynamic 
alignment' bit in std_gimplify_va_arg_expr.


The assert hit during a bootstrap in libquadmath/printf/quadmath-printf.c, in 
flt128_va with -m32.


The fn_va_arg statement we start with is:
...
# .MEM_3 = VDEF <.MEM_1(D)>
d.19_4 = VA_ARG (ap_2(D), 0B); [return slot optimization]
...

The code generated by the gimplification of fn_va_arg is:
...
(gdb) call debug_bb_n (3)
:
D.5165 = *ap_2(D);
D.5165 = D.5165 + 15;
D.5165 = D.5165 & 4294967280B;
D.5166 = D.5165 + 16;
*ap_2(D) = D.5166;
d.19_4 = MEM[(__float128 *)D.5165];
...

and after the 'update_ssa (TODO_update_ssa)', it looks like:
...
(gdb) call debug_bb_n (3)
:
_8 = *ap_2(D);
_9 = _8 + 15;
_10 = _9 & 4294967280B;
_11 = _10 + 16;
*ap_2(D) = _11;
d.19_4 = MEM[(__float128 *)_10];
...

All this looks ok to me, so I see no need to re-instate this dropped bit.

Thanks,
- Tom


broken link for Programming Languages Software Award on gcc homepage

2015-04-15 Thread Tom de Vries

Hi,

the link for 'ACM SIGPLAN Programming Languages Software Award' in the news list 
on gcc.gnu.org is http://www.sigplan.org/node/231, as discussed here ( 
https://gcc.gnu.org/ml/gcc/2014-06/msg00136.html ).


Following the link gives me:
...
Page Not Found

The page you were looking for was not found. Sorry.

You were looking for

http://www.sigplan.org/node#231

Please email the SIGPLAN Information Director, if you think the file should be 
here, and we will fix it.

...

A generic link mentioning the award is : 
http://www.sigplan.org/Awards/Software/ .

Thanks,
- Tom


[gomp4] bootstrap broken, function enclosing_target_ctx defined but not used

2015-05-18 Thread Tom de Vries

Thomas,

In ran into this bootstrap failure with branch gomp-4_0-branch:
...
src/gcc-gomp-4_0-branch/gcc/omp-low.c:2897:1: error: 'omp_context* 
enclosing_target_ctx(omp_context*)' defined but not used [-Werror=unused-function]

 enclosing_target_ctx (omp_context *ctx)
 ^
cc1plus: all warnings being treated as errors
make[3]: *** [omp-low.o] Error 1
...

Thanks,
- Tom


[RFC] Update Stage 4 description

2019-01-09 Thread Tom de Vries
[ To revisit https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00385.html ]

The current formulation for the description of Stage 4 here (
https://gcc.gnu.org/develop.html ) is:
...
During this period, the only (non-documentation) changes that may be
made are changes that fix regressions.

Other changes may not be done during this period.

Note that the same constraints apply to release branches.

This period lasts until stage 1 opens for the next release.
...

This updated formulation was proposed by Richi (with a request for
review of wording):
...
 During this period, the only (non-documentation) changes that may
 be made are changes that fix regressions.

-Other changes may not be done during this period.
+Other important bugs like wrong-code, rejects-valid or build issues may
+be fixed as well.  All changes during this period should be done with
+extra care on not introducing new regressions - fixing bugs at all cost
+is not wanted.

 Note that the same constraints apply to release branches.

 This period lasts until stage 1 opens for the next release.
...

If a text can be agreed upon, then I can prepare a patch for wwwdocs.

Thanks,
- Tom


[wwwdocs, committed] Update Stage 4 description

2019-01-09 Thread Tom de Vries
[ was: Re: [RFC] Update Stage 4 description ]

On 09-01-19 09:47, Richard Biener wrote:
> On Wed, 9 Jan 2019, Tom de Vries wrote:
> 
>> [ To revisit https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00385.html ]
>>
>> The current formulation for the description of Stage 4 here (
>> https://gcc.gnu.org/develop.html ) is:
>> ...
>> During this period, the only (non-documentation) changes that may be
>> made are changes that fix regressions.
>>
>> Other changes may not be done during this period.
>>
>> Note that the same constraints apply to release branches.
>>
>> This period lasts until stage 1 opens for the next release.
>> ...
>>
>> This updated formulation was proposed by Richi (with a request for
>> review of wording):
>> ...
>>  During this period, the only (non-documentation) changes that may
>>  be made are changes that fix regressions.
>>
>> -Other changes may not be done during this period.
>> +Other important bugs like wrong-code, rejects-valid or build issues may
>> +be fixed as well.  All changes during this period should be done with
>> +extra care on not introducing new regressions - fixing bugs at all cost
>> +is not wanted.
>>
>>  Note that the same constraints apply to release branches.
>>
>>  This period lasts until stage 1 opens for the next release.
>> ...
>>
>> If a text can be agreed upon, then I can prepare a patch for wwwdocs.
> 
> The proposed text sounds good, please post a patch and apply!

Attached patch committed.

Thanks,
- Tom
Index: htdocs/develop.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/develop.html,v
retrieving revision 1.190
diff -r1.190 develop.html
135,138c135,140
< be made are changes that fix regressions.  Other changes may not be
< done during this period.  Note that the same constraints apply
< to release branches.  This period lasts until stage 1 opens for
< the next release.
---
> be made are changes that fix regressions.  Other important bugs
> like wrong-code, rejects-valid or build issues may be fixed as well.
> All changes during this period should be done with extra care on
> not introducing new regressions - fixing bugs at all cost is not
> wanted.  Note that the same constraints apply to release branches.
> This period lasts until stage 1 opens for the next release.


Re: Transformation of contrib/check_GNU_style.sh to a python script

2017-05-15 Thread Tom de Vries

On 05/15/2017 03:55 PM, Martin Liška wrote:

... check_GNU_style.sh script. The script works
quite fine, but it's very unpleasant that it reports problematic lines in the 
patch,
not in original patches.


Agreed.


I decided to substitute part of functionality by Python
script that uses a library that parses patches. So that reported errors can be
easily converted to quickfix list for VIM. That makes navigation very easy.

I'm attaching simple version that I made in couple of minutes and I would like 
to
ask whether the bash script is broadly used


I use it regularly.


and whether community would be interested
in transformation of the script?


I think it's a good idea.

Thanks,
- Tom



Re: Transformation of contrib/check_GNU_style.sh to a python script

2017-05-19 Thread Tom de Vries

On 05/19/2017 11:51 AM, Martin Liška wrote:

Hello.

I'm sending final (slightly updated) version of the script. I'm also adding 
Jakub,
because I remember he's got another regex patterns he's using for review 
process?
Would it be fine to just remove the old *.sh script, or is it preferred to have
them both living next to each other for some time?



I'd like to keep the old script around for a while, to make comparison 
between the two scripts easier.



Thanks,
Martin


check_GNU_style.py


#!/usr/bin/env python3
#
# Checks some of the GNU style formatting rules in a set of patches.
#
# This file is part of GCC.
#
# GCC is free software; you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free
# Software Foundation; either version 3, or (at your option) any later
# version.
#
# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
# WARRANTY; without even the implied warranty of MERCHANTABILITY or
# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
# for more details.
#
# You should have received a copy of the GNU General Public License
# along with GCC; see the file COPYING3.  If not see
# .  */
#
# The script requires following python packages
# (can be installed via "pip 3 install"):


pip3 (there's a space inbetween pip and 3)


#   unidiff
#   termcolor



I'd prefer a formulation that can be copy-pasted to the command line, f.i.:
...
# The script requires python packages, which can be installed via pip3
# like this:
# $ pip3 install unidiff termcolor
...

I'm not sure if it's possible in python, but it would be even better to 
detect the missing packages at runtime and print a message explaining 
how to install the missing packages.



Anyway, using your explanation I now managed to install the dependencies 
and run the script. [ Earlier I ran into the missing package error, 
googled the package, found https://pypi.python.org/pypi/unidiff, ran the 
suggested install line '$ pip install unidiff', and found that the 
script still was not working. ]


Thanks,
- Tom


Re: Transformation of contrib/check_GNU_style.sh to a python script

2017-05-22 Thread Tom de Vries

On 05/19/2017 03:47 PM, Martin Liška wrote:

+if __name__ == '__main__':
+if len(sys.argv) > 1:
+main()
+else:
+unittest.main()


Hi,

when specifying no arguments to the script, I see:
...
$ ./contrib/check_GNU_style.py
.
--
Ran 1 test in 0.000s

OK
...
In other words, the unit tests are run.

I was expecting some form of help message.


Attached patch splits off a lib file from the script. When running the 
lib file as a script, we run the unit tests. When running the script 
without args, we see:

...
$ ./contrib/check_GNU_style.py
usage: check_GNU_style.py [-h] [-f {stdio,quickfix}] file
check_GNU_style.py: error: the following arguments are required: file
...

OK?

Thanks,
- Tom
check_GNU_style.py: print usage if no file specified

2017-05-22  Tom de Vries  

	* check_GNU_style_lib.py: New file, factored out of ...
	* check_GNU_style.py: ... here.  Call main unconditionally.

---
 contrib/check_GNU_style.py | 249 +-
 contrib/check_GNU_style_lib.py | 267 +
 2 files changed, 270 insertions(+), 246 deletions(-)

diff --git a/contrib/check_GNU_style.py b/contrib/check_GNU_style.py
index b236e93..6970ddf 100755
--- a/contrib/check_GNU_style.py
+++ b/contrib/check_GNU_style.py
@@ -19,198 +19,9 @@
 # You should have received a copy of the GNU General Public License
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.  */
-#
-# The script requires python packages, which can be installed via pip3
-# like this:
-# $ pip3 install unidiff termcolor 
 
-import sys
-import re
 import argparse
-import unittest
-
-try:
-from termcolor import colored
-except ImportError:
-print('termcolor module is missing (run: pip3 install termcolor)')
-exit(3)
-
-try:
-from unidiff import PatchSet
-except ImportError:
-print('unidiff module is missing (run: pip3 install unidiff)')
-exit(3)
-
-from itertools import *
-
-ws_char = '█'
-ts = 8
-
-def error_string(s):
-return colored(s, 'red', attrs = ['bold'])
-
-class CheckError:
-def __init__(self, filename, lineno, console_error, error_message,
-column = -1):
-self.filename = filename
-self.lineno = lineno
-self.console_error = console_error
-self.error_message = error_message
-self.column = column
-
-def error_location(self):
-return '%s:%d:%d:' % (self.filename, self.lineno,
-self.column if self.column != -1 else -1)
-
-class LineLengthCheck:
-def __init__(self):
-self.limit = 80
-self.expanded_tab = ' ' * ts
-
-def check(self, filename, lineno, line):
-line_expanded = line.replace('\t', self.expanded_tab)
-if len(line_expanded) > self.limit:
-return CheckError(filename, lineno,
-line_expanded[:self.limit]
-+ error_string(line_expanded[self.limit:]),
-'lines should not exceed 80 characters', self.limit)
-
-return None
-
-class SpacesCheck:
-def __init__(self):
-self.expanded_tab = ' ' * ts
-
-def check(self, filename, lineno, line):
-i = line.find(self.expanded_tab)
-if i != -1:
-return CheckError(filename, lineno,
-line.replace(self.expanded_tab, error_string(ws_char * ts)),
-'blocks of 8 spaces should be replaced with tabs', i)
-
-class TrailingWhitespaceCheck:
-def __init__(self):
-self.re = re.compile('(\s+)$')
-
-def check(self, filename, lineno, line):
-m = self.re.search(line)
-if m != None:
-return CheckError(filename, lineno,
-line[:m.start(1)] + error_string(ws_char * len(m.group(1)))
-+ line[m.end(1):],
-'trailing whitespace', m.start(1))
-
-class SentenceSeparatorCheck:
-def __init__(self):
-self.re = re.compile('\w\.(\s|\s{3,})\w')
-
-def check(self, filename, lineno, line):
-m = self.re.search(line)
-if m != None:
-return CheckError(filename, lineno,
-line[:m.start(1)] + error_string(ws_char * len(m.group(1)))
-+ line[m.end(1):],
-'dot, space, space, new sentence', m.start(1))
-
-class SentenceEndOfCommentCheck:
-def __init__(self):
-self.re = re.compile('\w\.(\s{0,1}|\s{3,})\*/')
-
-def check(self, filename, lineno, line):
-m = self.re.search(line)
-if m != None:
-return CheckError(filename, lineno,
-line[:m.start(1)] + error_string(ws_char * len(m.group(1)))
-+ line[m.end(1):],
-'dot, space, space, end of comment&#

Re: Tom de Vries appointed nvptx maintainer

2017-09-25 Thread Tom de Vries

On 09/22/2017 08:30 PM, David Edelsohn wrote:

I am pleased to announce that the GCC Steering Committee has
appointed Tom de Vries as nvptx maintainer.


Thank you for your trust.


Tom, please update your listing in the MAINTAINERS file.


Committed as attached below.


Happy hacking!


Will do :)

Thanks,
- Tom
Add myself as nvptx maintainer

2017-09-25  Tom de Vries  

	* MAINTAINERS (CPU Port Maintainers): Add myself as nvptx maintainer.

---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 99babdc..7e6e08c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -85,6 +85,7 @@ nds32 port		Chung-Ju Wu		
 nds32 port		Shiva Chen		
 nios2 port		Chung-Lin Tang		
 nios2 port		Sandra Loosemore	
+nvptx port		Tom de Vries		
 pdp11 port		Paul Koning		
 picochip port		Daniel Towner		
 powerpcspe port		Andrew Jenner		


xlr/xlp __atomic builtins using ldadd and swap

2011-12-29 Thread Tom de Vries
Richard,

I'm interested in implementing (some of) the new __atomic builtins using the
xlr/xlp atomic instructions ldadd and swap.

Do you perhaps have work in progress there?

Thanks,
- Tom


::gets has not been declared

2012-01-05 Thread Tom de Vries
Hi,

I just ran into the following gcc build failure during a gcc+glibc build:
...
libtool: compile:
/home/vries/local/glibc-arm/base/obj/gcc-mainline-0-arm-none-linux-gnueabi-i686-pc-linux-gnu/./gcc/xgcc
-shared-libgcc -B/home/vries/local/glibc-arm/base/obj/\
gcc-mainline-0-arm-none-linux-gnueabi-i686-pc-linux-gnu/./gcc -nostdinc++
-L/home/vries/local/glibc-arm/base/obj/gcc-mainline-0-arm-none-linux-gnueabi-i686-pc-linux-gnu/arm-none-linux-gnueabi/libstdc++-v3/s\
rc
-L/home/vries/local/glibc-arm/base/obj/gcc-mainline-0-arm-none-linux-gnueabi-i686-pc-linux-gnu/arm-none-linux-gnueabi/libstdc++-v3/src/.libs
-B/home/vries/local/glibc-arm/base/install/arm-none-linux-gnue\
abi/bin/ -B/home/vries/local/glibc-arm/base/install/arm-none-linux-gnueabi/lib/
-isystem /home/vries/local/glibc-arm/base/install/arm-none-linux-gnueabi/include
-isystem /home/vries/local/glibc-arm/base/ins\
tall/arm-none-linux-gnueabi/sys-include
--sysroot=/home/vries/local/glibc-arm/base/install/arm-none-linux-gnueabi/libc
-I/home/vries/local/glibc-arm/base/src/gcc-mainline/libstdc++-v3/../libgcc
-I/home/vrie\
s/local/glibc-arm/base/obj/gcc-mainline-0-arm-none-linux-gnueabi-i686-pc-linux-gnu/arm-none-linux-gnueabi/libstdc++-v3/include/arm-none-linux-gnueabi
-I/home/vries/local/glibc-arm/base/obj/gcc-mainline-0-ar\
m-none-linux-gnueabi-i686-pc-linux-gnu/arm-none-linux-gnueabi/libstdc++-v3/include
-I/home/vries/local/glibc-arm/base/src/gcc-mainline/libstdc++-v3/libsupc++
--sysroot=/home/vries/local/glibc-arm/base/insta\
ll/arm-none-linux-gnueabi/libc -fno-implicit-templates -Wall -Wextra
-Wwrite-strings -Wcast-qual -fdiagnostics-show-location=once -ffunction-sections
-fdata-sections -frandom-seed=vterminate.lo -g -O2 -D_GN\
U_SOURCE -c
/home/vries/local/glibc-arm/base/src/gcc-mainline/libstdc++-v3/libsupc++/vterminate.cc
 -fPIC -DPIC -o vterminate.o
In file included from
/home/vries/local/glibc-arm/base/src/gcc-mainline/libstdc++-v3/libsupc++/vterminate.cc:32:0:
/home/vries/local/glibc-arm/base/obj/gcc-mainline-0-arm-none-linux-gnueabi-i686-pc-linux-gnu/arm-none-linux-gnueabi/libstdc++-v3/include/cstdio:118:11:
error: '::gets' has not been declared
...

My hunch is that this recent glibc change causes/triggers the error:
...
[BZ #13528]
* libio/stdio.h: Do not declare gets for ISO C11 and _GNU_SOURCE.
...

I see this both on ARM and MIPS.

Should I file this as a problem in gcc or glibc?

Thanks,
- Tom


Re: ::gets has not been declared

2012-01-05 Thread Tom de Vries
On 05/01/12 18:40, Jonathan Wakely wrote:
> On 5 January 2012 16:33, Marc Glisse wrote:
>> On Thu, 5 Jan 2012, Joseph S. Myers wrote:
>>
>>> If the final C++11 still requires gets in , despite it being
>>> removed in C11, that's probably also a bug in C++11.  (At least the most
>>> recent draft I have to hand still has gets in .)
>>
>>
>> It still has it. And it is based more on C99 than C11 (which didn't exist at
>> the time), even if they did try to synchronize on some features.
> 
> Yep, it still requires it, this is a glibc bug.
> 
> Glibc should define gets if __cplusplus <= 201103L

Filed http://sourceware.org/bugzilla/show_bug.cgi?id=13566 .

Thanks,
- Tom


question about if_marked construct

2010-06-23 Thread Tom de Vries

Hi,

In the context of bug 31230, I have a question about the if_marked  
construct.


[DOC http://gcc.gnu.org/onlinedocs/gccint/GTY-Options.html]
if_marked ("expression")
Suppose you want some kinds of object to be unique, and so you  
put them in a hash table. If garbage collection marks the hash table,  
these objects will never be freed, even if the last other reference  
to them goes away. GGC has special handling to deal with this: if you  
use the if_marked option on a global hash table, GGC will call the  
routine whose name is the parameter to the option on each hash table  
entry. If the routine returns nonzero, the hash table entry will be  
marked as usual. If the routine returns zero, the hash table entry  
will be deleted.


The routine ggc_marked_p can be used to determine if an element has  
been marked already; in fact, the usual case is to use if_marked  
("ggc_marked_p").

[/DOC]

Suppose we have a tree for type A and a tree for type B called A and  
B for short, with both A and B having entries in the type_hash_table  
called EA and EB, and as complication that A references B.


Suppose also that we have the following function type_hash_marked_p()  
as the if_marked function:

...
static int prop(const_tree type)
{
  return type == A;
}

static int type_hash_marked_p (const void *p) {
 const_tree const type = ((const struct type_hash *) p)->type;
 return ggc_marked_p (type) || prop (type);
}
...

During ggc_mark_roots(), when A and B are not live, 2 scenarios can  
happen:

I.
- A and B are not marked
- EA is visited, and since prop(A) holds, EA is marked, then A, then B
- EB is visited, an since B is marked, it is not deleted
II.
- A and B are not marked
- EB is visited, and since prop(B) does not hold, EB is deleted
- EA is visited, an since prop(A) holds, EA is marked, then A, then B

The problem is that depending on the order in which we visit the hash  
table entries EB is either marked or deleted.

I see 2 possible approaches to make the behavior predictable:
1. prop() needs to be transitively closed, in other words, prop(A)  
and A references B needs to imply prop(B)
2. the garbage collector needs to calculate the transitive closure of  
prop(), before deleting any hash table entries.


Approach 1 seems error-prone to me, but that does seem to be the de- 
facto choice right now.


Can somebody please comment?

Regards
  Tom


Re: question about if_marked construct

2010-06-23 Thread Tom de Vries

On Jun 23, 2010, at 16:49, Ian Lance Taylor wrote:


Tom de Vries  writes:


static int prop(const_tree type)
{
  return type == A;
}

static int type_hash_marked_p (const void *p) {
 const_tree const type = ((const struct type_hash *) p)->type;
 return ggc_marked_p (type) || prop (type);
}


I would like to question your premise.  The gcc garbage collector is
not some general purpose library.  It's specifically for the use of
gcc.  Why, in gcc, would you want to write such an if_marked property?
Is there some simpler and clearer way to express what you actually
want to have happen?

Ian


Hi Ian,

Thanks for your reaction.

What I am really trying to do, is a bootstrap debug build of 4.3.5.   
However, I ran into bug 31230. I minimized the testcase, did an  
analysis, created a naive patch to test my hypothesis, tested the  
patch and put it in the bug report. Now I'm looking for feedback.


In the question I asked to this mailing list I tried to abstract away  
from the specific case I analyzed, to get a more conceptual  
discussion about the garbage collector, but maybe that was a mistake,  
sorry for the confusion.


My analysis, what I'm trying to get confirmed, is that the the actual  
type_hash_marked_p() in gcc is 'such an if_marked property':

...
static int type_hash_marked_p (const void *p) {
 tree type = ((struct type_hash *) p)->type;
 return ggc_marked_p (type) || TYPE_SYMTAB_POINTER (type);
}
...

Regards
  Tom



Re: question about if_marked construct

2010-06-23 Thread Tom de Vries

On Jun 23, 2010, at 19:40, Ian Lance Taylor wrote:


Basile Starynkevitch  writes:


On Wed, 2010-06-23 at 08:56 -0700, Ian Lance Taylor wrote:

Tom de Vries  writes:


What I am really trying to do, is a bootstrap debug build of 4.3.5.
However, I ran into bug 31230. I minimized the testcase, did an
analysis, created a naive patch to test my hypothesis, tested the
patch and put it in the bug report. Now I'm looking for feedback.

In the question I asked to this mailing list I tried to abstract  
away

From the specific case I analyzed, to get a more conceptual
discussion about the garbage collector, but maybe that was a  
mistake,

sorry for the confusion.

My analysis, what I'm trying to get confirmed, is that the the  
actual

type_hash_marked_p() in gcc is 'such an if_marked property':
...
static int type_hash_marked_p (const void *p) {
 tree type = ((struct type_hash *) p)->type;
 return ggc_marked_p (type) || TYPE_SYMTAB_POINTER (type);
}


Interesting.  My first reaction is that this is an invalid use of  
the

garbage collector.  I think there is really only one valid function
that can be used as an if_marked function: one which checks
ggc_marked_p on the structure.



A plugin providing its own GC above GGC, like MELT does, also  
could use

that feature. So don't remove it, please.


I'm not proposing removing any feature.  I'm stating my belief that
using if_marked with a function which does anything other than test
ggc_marked_p will not work.  I don't think it works today, and I don't
think we should put effort into making it work.

Ian


Going from your assessment that the current implementation of  
type_hash_marked_p() is incorrect, I think that for 4.4, 4.5 and  
trunk it's enough just to remove the TYPE_SYMTAB_POINTER from the  
clause (-funit-at-a-time is hardcoded and makes sure the types  
between functions stay alive). Fixing this by building a separate  
garbage collection root as you suggest would only be needed for 4.3  
and earlier.
However, my feeling is that the problem does not classify as a  
regression, so I suppose we'd only fix it for the trunk anyway. I'll  
try to test a patch for the trunk.


Tom



Re: question about if_marked construct

2010-07-05 Thread Tom de Vries

Interesting.  My first reaction is that this is an invalid use of the
garbage collector.  I think there is really only one valid function
that can be used as an if_marked function: one which checks
ggc_marked_p on the structure.


Then how about tree_map_base_marked_p, the if_marked function for  
value_expr_for_decl?


tree.h:
...
struct GTY(()) tree_map_base {
  tree from;
};

struct GTY(()) tree_decl_map {
  struct tree_map_base base;
  tree to;
};

#define tree_decl_map_marked_p tree_map_base_marked_p
...

tree.c:
...
static GTY ((if_marked ("tree_decl_map_marked_p"), param_is (struct  
tree_decl_map)))

 htab_t value_expr_for_decl;

int
tree_map_base_marked_p (const void *p)
{
  return ggc_marked_p (((const struct tree_map_base *) p)->from);
}
...

The tree_map_base_marked_p checks ggc_marked_p on the from field.  
During ggc_scan_cache_tab, if the from field is live, also the to  
field is marked live.
I wrote some code to do sanity testing and found a similar scenario  
as before:

- a register attribute is not marked live during root marking
- reg_attrs_htab is traversed, and the hash table entry corresponding  
to the register attribute is removed
- value_expr_for_decl is traversed, a from field is found live, so  
the to field is also marked live, marking the register attribute live.


Is this valid use of the garbage collector?


Re: question about if_marked construct

2010-07-05 Thread Tom de Vries

Hi,

The tree_map_base_marked_p checks ggc_marked_p on the from field.  
During
ggc_scan_cache_tab, if the from field is live, also the to field  
is marked

live.
I wrote some code to do sanity testing and found a similar  
scenario as

before:
- a register attribute is not marked live during root marking
- reg_attrs_htab is traversed, and the hash table entry  
corresponding to the

register attribute is removed
- value_expr_for_decl is traversed, a from field is found live, so  
the to

field is also marked live, marking the register attribute live.

Is this valid use of the garbage collector?


Originally the if_marked hook was supposed to be used only for
caching purposes.  So it doesn't matter whether an element is
collected or not for correctness.  If we now have accumulated other
uses we indeed have to worry about this scenario (and I think it
won't work very well there).

Richard.



For my understanding: is it correct if I translate 'to be used only  
for caching purposes' to 'the compiler is free to ignore the  
if_marked function and remove all if_marked hash table entries'?
I just tried that in a bootstrap build, and that breaks already in  
stage 1.


From looking at all the if_marked functions in gcc, a typical use  
case seems to be the one of uniqueness (also the use case described  
in the docs): making sure there is only a single object with certain  
properties, such that a test for structural equality can be replaced  
with a pointer equality comparison. This is well supported by the  
current implementation, but correctness does depend on whether a hash  
table element is collected or not.


What is not well supported, is marking live something else than hash  
table entries during ggc_scan_cache_tab. Following the scenario I  
mention above, we can end up with 2 structurally equal objects, while  
the code assumes that's not possible, and tests structural equality  
by pointer comparison. This is the scenario I worry about.


I can image a few ways to go from here:
- leave as is, fix this when it really bothers us (risk: exchange a  
known problem for unknown hard-to-debug and/or hard-to-reproduce  
problems)
- instrument if_marked functions like the one for value_expr_for_decl  
to assert if  the from field is live and the to field is not live,  
and fix the asserts
- extend garbage colllector to handle the problematic case (problem:  
more runtime and/or memory usage for garbage collection)

Any suggestions on which way to go?

Regards
  Tom


Re: question about if_marked construct

2010-07-06 Thread Tom de Vries

Hi Richard,


I can image a few ways to go from here:
- leave as is, fix this when it really bothers us (risk: exchange  
a known

problem for unknown hard-to-debug and/or hard-to-reproduce problems)
- instrument if_marked functions like the one for  
value_expr_for_decl to
assert if  the from field is live and the to field is not live,  
and fix the

asserts
- extend garbage colllector to handle the problematic case  
(problem: more

runtime and/or memory usage for garbage collection)
Any suggestions on which way to go?


Or make sure to walk all if_marked roots last (isn't the problem an
ordering one only?)


That is already done. The problem is not what happens after that  
walk, but during that walk. The code at that point assumes that the  
set of marked non-hashtable-entry objects is already stable, while  
the problematic if_marked functions have the effect that that set is  
enlarged during that walk.
If we want to fix that in the garbage collector, we could walk the  
problematic if_marked roots iteratively (without deleting unmarked  
entries), until fixed point is reached. After that we would walk (and  
delete unmarked entries) for both problematic and normal if_marked  
roots. However, I don't have good idea how we can iterate to fixed- 
point efficiently.


Tom


Re: question about if_marked construct

2010-07-08 Thread Tom de Vries



I can image a few ways to go from here:
- leave as is, fix this when it really bothers us (risk:  
exchange a known
problem for unknown hard-to-debug and/or hard-to-reproduce  
problems)
- instrument if_marked functions like the one for  
value_expr_for_decl to
assert if  the from field is live and the to field is not live,  
and fix

the
asserts
- extend garbage colllector to handle the problematic case  
(problem: more

runtime and/or memory usage for garbage collection)
Any suggestions on which way to go?


Or make sure to walk all if_marked roots last (isn't the problem an
ordering one only?)


That is already done. The problem is not what happens after that  
walk, but
during that walk. The code at that point assumes that the set of  
marked

non-hashtable-entry objects is already stable, while the problematic
if_marked functions have the effect that that set is enlarged  
during that

walk.


Hm, indeed - I know that this happens and it is not easy to avoid.


If we want to fix that in the garbage collector, we could walk the
problematic if_marked roots iteratively (without deleting unmarked  
entries),
until fixed point is reached. After that we would walk (and delete  
unmarked
entries) for both problematic and normal if_marked roots. However,  
I don't

have good idea how we can iterate to fixed-point efficiently.


Me neither.  I suppose it would be nice to avoid the situation by
dropping if_marked from problematic hashes.  Can we at least
somehow figure out which one are those?  (for example by
doing inefficient iteration with ENABLE_CHECKING and ICEing if
the 2nd iteration changes anything?)


I also considered that check, and implemented it, but later wondered  
whether that method would only detect problems which surface given  
the actual order of traversal of hash-tables and entries, and leave  
other problems lurking.
So I created the following check: besides in_use_p and save_in_use_p,  
I created two other per page_entry bitmaps: root_marked_p and  
mark_locked_p. in_use_p is copied to root_marked_p after traversal of  
the root tab.
During traversal of the if_marked roots, whenever an if_marked field  
is tested and found unmarked, it is locked to unmarked by setting  
mark_locked_p.

This allows us to detect:
- when an object that is locked to unmarked, is marked (an entry is  
found dead and deleted, but later found live)
- when if_marked field is tested and found marked, but not root  
marked (an entry is live only thanks to the specific order in which  
we traverse over hash tables and hash table entries)


Tom


Handling labels in delay-slot scheduling

2010-11-18 Thread Tom de Vries
I'm working on improving delay-slot scheduling and would appreciate 
advice on a

problem I encountered.

The problem is: how to add support for placing a CODE_LABEL on an 
instruction in

a delay slot?

My impression is that this is not supported currently. One way to 
implement this
would be to allow labels in the sequence insns which represent the delay 
slots.
Another way could be to keep some state external to the rtl 
representation that

indicates the presence of a label.

To illustrate why I think that would be useful, let's look at 2 related 
examples

of MIPS code, for which delay slot filling is currently not done.

Note: The MIPS has a single delay slot, possibly annulling (annulling
jumps are called branch likely insns for MIPS).

The first example looks like this:
...
beq$2,$0,$L5
nop
lw$3,4($4)
addiu$2,$2,1
...
$L5:
addiu$2,$2,1
...
...
where the beq owns the target thread $L5, in other words the beq is the only
way into $L5. Note that the beq also owns the fall-through thread 
(starting at

the lw insn).

The duplicate insn 'addiu $2,$2,1' can be hoisted into the delay slot. This
already happens when branch likely insns are enabled. The mechanism works as
follows: first the code is transformed into:
...
beql$2,$0,$L5
addiu$2,$2,1
lw$3,4($4)
addiu$2,$2,1
...
$L5:
...
...
using an annulling jump (beql).

and only then into:
...
beq$2,$0,$L5
addiu$2,$2,1
lw$3,4($4)
...
$L5:
...
...
by try_merge_delay_insns.

A problem with newer MIPSes is that the branch likely instruction has a
performance penalty, and is deprecated. However, if we disable the 
branch likely

instruction, the transformation above is not happening anymore.

I wrote some code that detects in this case the duplicate, and 
implements the
transformation by deleting the insn in the fallthrough thread and 
importing the
other insn into the delay slot. This transformation happens 
independently from

branch likely insns, and it happens in a single step.

However, that doesn't work for the second example:
...
beq$3,$0,$L14
nop
$L7:
andi$2,$2,0x
...
bne$3,$0,$L7
nop
$L14:
andi$2,$2,0x
...
...
What is different from the first example, is that here the beq owns 
neither the
fall-through thread ($L7) nor the target thread ($L14). Same for the 
bne. In the

first example, the jump owns both threads.

we can think of this transformation:
...
beq$3,$0,$L14new
$L7:
andi$2,$2,0x
...
bne$3,$0,$L7
nop
andi$2,$2,0x
$L14new:
...
...
but here the label $L7 ends up in the delay slot together with the andi.

Subsequently we transform the second nop in normal fashion:
...
beq$3,$0,$L14new
andi$2,$2,0x
$L7new:
...
bne$3,$0,$L7new
andi$2,$2,0x
$L14new:
...
...

So, how easy is it to support this 'label in delay slot' in reorg.c? Or 
is there
an easier way to achieve the filling of the delay slots in the second 
example?

I thought of enabling branch likely insns for the duration of reorg.c, and
transforming leftover branch likely insns back to normal insns after the 
reorg

pass, but that comes (sometimes) at a penalty.

Thanks,
- Tom



Re: Handling labels in delay-slot scheduling

2010-11-18 Thread Tom de Vries

Hi Jeff,


However, that doesn't work for the second example:
...
beq$3,$0,$L14
nop
$L7:
andi$2,$2,0x
...
bne$3,$0,$L7
nop
$L14:
andi$2,$2,0x
...
...
What is different from the first example, is that here the beq owns 
neither the
fall-through thread ($L7) nor the target thread ($L14). Same for the 
bne. In the

first example, the jump owns both threads.

we can think of this transformation:
...
beq$3,$0,$L14new
$L7:
andi$2,$2,0x
...
bne$3,$0,$L7
nop
andi$2,$2,0x
$L14new:

Could you instead make it:

beq$3,$0,$L14a
andi$2,$2,0x
$L7:
andi$2,$2,0x
...
bne$3,$0,$L7
nop
$L14:
andi$2,$2,0x
L$14a:
...

[ Copy the insn from the L14 target into the delay slot of first 
branch. ]


That is indeed possible in this specific example, because executing
'andi $2,$2, 0x' once more does not change the value of $2, but that 
does

not always work (f.i., not for addi $2,$2,1). This might be an ok
intermediate solution though, thanks for the idea.



Step #2

beq$3,$0,$L14a
andi$2,$2,0x
$L7:
andi$2,$2,0x
$L7a:
...
bne$3,$0,$L7a
andi$2,$2,0x
$L14:
andi$2,$2,0x
L$14a:
...

Same transformation copying the insn from the L7 target into the delay 
slot of the second branch.


Then after reorg has completed (so you don't have to teach reorg about 
code labels in sequences), squish the redundant insns together and 
insert the code label into the SEQUENCE resulting in


beq$3,$0,$L14a
$L7:
andi$2,$2,0x
$L7a:
...
bne$3,$0,$L7a
$L14:
andi$2,$2,0x
L$14a:
...

You'd still have to deal with fallout of code labels in sequences 
post-reorg, so maybe it's not that big of a win to delay having the 
code label appear in the sequence until after reorg.c has completed.


Right.



The other question I'd ask is what's the real penalty these days in 
not filling hte slots?  I know that on later out-of-order PA chips 
filling slots was barely worth the effort, I guess it's still 
profitable on the low-end embedded MIPS chips?


About the penalty, I don't really know. But since the optimization is 
both filling delay slots and removing

duplicate code, it looks like a good idea to me.

Thanks,
- Tom


question about target info cache in resource.c

2011-01-11 Thread Tom de Vries
Hi all,

I would like to know if the attached patch resource-check.patch is a
good sanity check or not.

I have been working in reorg.c and running into trouble with the target
info cache in mark_target_live_regs, so I decided to write a patch that
checks consistency between cached values and recomputed values. Soon I
started running into issues that were unrelated to my changes in reorg.c.

As an example of the kind of situations the sanity check detects,
consider fixed-bit.c.212r.alignments. During pass_machine_reorg, the
following scenario happens:

1. a copy of insn 78 is imported into the delay slot of jump_insn 14.

(insn 106 13 49
/scratch/vries/henry7/mips/src/gcc-trunk-4.5/libgcc/../gcc/config/fixed-bit.c:183
(sequence [
(jump_insn 14 13 78
/scratch/vries/henry7/mips/src/gcc-trunk-4.5/libgcc/../gcc/config/fixed-bit.c:183
(set (pc)
(if_then_else (ne (reg:SI 2 $2 [200])
(const_int 0 [0x0]))
(label_ref:SI 105)
(pc))) 442 {*branch_equalitysi_micromips}
(expr_list:REG_DEAD (reg:SI 2 $2 [200])
(expr_list:REG_BR_PROB (const_int 6100 [0x17d4])
(nil)))
 -> 105)
(insn/s 78 14 49 (set (reg:SI 3 $3 [orig:196 z+4 ] [196])
(const_int 0 [0x0])) 284 {*movsi_internal} (nil))
]) -1 (nil))


It's a copy since jump_insn 14 does not own the thread containing insn
78. The copy is marked with INSN_FROM_TARGET_P (insn/s) to indicate that
it was imported from the target of the jump.

2. we call mark_target_live_regs for insn 17. We scan all insn from the
start of the function to insn 17. We ignore however the copy of insn 78,
since it's marked with INSN_FROM_TARGET_P, so we conclude that $3 is not
live at insn 17. This info is stored in the target info cache.

3. redundant_insn finds that the original insn 78 is redundant (since
the copy of insn 78 occurs on all paths towards insn 78) and decides to
remove the original. The INSN_FROM_TARGET_P of the copy is cleared, to
indicate that the $3 is now live on the fall-through path of jump_insn
14 as well.

4. we call mark_target_live_regs for insn 17 again. We get the value
from the cache and conclude that $3 is not live at insn 17. Then my
patch recomputes the live info, which now takes the copy of insn 78 into
account since INSN_FROM_TARGET_P has been cleared, and concludes that $3
is live. And we assert.

The following fix makes sure that the cached live info is invalidated:
...
@@ -1865,6 +1880,7 @@ redundant_insn (rtx insn, rtx target, rt
{
  /* Show that this insn will be used in the sequel.  */
  INSN_FROM_TARGET_P (candidate) = 0;
+ incr_ticks_for_insn (candidate);
  return candidate;
}

...
and the assert is not triggered anymore.

So my questions are:
- is the consistency check correct? Does it make sense to fix all the
  cases where it triggers?
- Is my analysis of the example and the fix correct?

Thanks,
- Tom
Index: gcc/resource.c
===
--- gcc/resource.c	(revision 310935)
+++ gcc/resource.c	(working copy)
@@ -878,6 +878,12 @@ mark_target_live_regs (rtx insns, rtx ta
   rtx jump_insn = 0;
   rtx jump_target;
   HARD_REG_SET scratch;
+#if defined ENABLE_RUNTIME_CHECKING
+  HARD_REG_SET cached;
+  bool cached_valid = false;
+  int fb = -1;
+#endif
+
   struct resources set, needed;
 
   /* Handle end of function.  */
@@ -918,6 +924,13 @@ mark_target_live_regs (rtx insns, rtx ta
 
   if (b == -1)
 b = find_basic_block (target, MAX_DELAY_SLOT_LIVE_SEARCH);
+#if defined ENABLE_RUNTIME_CHECKING
+  else
+{
+  fb = find_basic_block (target, MAX_DELAY_SLOT_LIVE_SEARCH);
+  gcc_assert (fb == -1 || fb == b);
+}
+#endif
 
   if (target_hash_table != NULL)
 {
@@ -927,8 +940,13 @@ mark_target_live_regs (rtx insns, rtx ta
 	 update it below.  */
 	  if (b == tinfo->block && b != -1 && tinfo->bb_tick == bb_ticks[b])
 	{
+#if defined ENABLE_RUNTIME_CHECKING
+	  cached_valid = true;
+	  COPY_HARD_REG_SET (cached, tinfo->live_regs);
+#else
 	  COPY_HARD_REG_SET (res->regs, tinfo->live_regs);
 	  return;
+#endif
 	}
 	}
   else
@@ -1126,6 +1144,10 @@ mark_target_live_regs (rtx insns, rtx ta
 {
   COPY_HARD_REG_SET (tinfo->live_regs, res->regs);
 }
+#if defined ENABLE_RUNTIME_CHECKING
+  if (cached_valid)
+gcc_assert (hard_reg_set_equal_p (cached, tinfo->live_regs));
+#endif
 }
 
 /* Initialize the resources required by mark_target_live_regs ().

;; Function __ussubudq3 (__ussubudq3)

(note 1 0 6 NOTE_INSN_DELETED)

(note 6 1 80 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(note 80 6 4 2 NOTE_INSN_PROLOGUE_END)

(note 4 80 9 2 NOTE_INSN_FUNCTION_BEG)

(debug_insn 9 4 11 2 
/scratch/vries/henry7/mips/src/gcc-trunk-4.5/libgcc/../gcc/config/fixed-bit.c:180
 (var_location:

Question about conds attribute for *thumb2_alusi3_short

2013-06-24 Thread Tom de Vries
Richard,

I've noticed that f.i. *thumb2_alusi3_short has no explicit setting of the conds
attribute, which means the value of the conds attribute for this insn is nocond:
...
(define_insn "*thumb2_alusi3_short"
  [(set (match_operand:SI  0 "s_register_operand" "=l")
(match_operator:SI 3 "thumb_16bit_operator"
 [(match_operand:SI 1 "s_register_operand" "0")
  (match_operand:SI 2 "s_register_operand" "l")]))
   (clobber (reg:CC CC_REGNUM))]
  "TARGET_THUMB2 && reload_completed
   && GET_CODE(operands[3]) != PLUS
   && GET_CODE(operands[3]) != MINUS"
  "%I3%!\\t%0, %1, %2"
  [(set_attr "predicable" "yes")
   (set_attr "length" "2")]
)
...

AFAIU, this insn is either:
- conditional, and does not modify cc, or
- unconditional, and sets cc.
So the clobber of CC in the RTL conservatively describes both cases.

It seems to me the logical conds setting for the conditional case is nocond, set
(or perhaps clob) for the unconditional case. So, is this a more accurate value
of conds for this insn:
...
   (set (attr "conds")
(if_then_else
  (match_test "GET_CODE (PATTERN (insn)) == COND_EXEC")
  (const_string "nocond")
  (const_string "set")))]
...
?

Is there a generic need to have this attribute accurate for all insns?

Thanks,
- Tom


Fix line number data for PIC register setup code

2013-10-03 Thread Tom de Vries
Richard,

( see also related discussion
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01570.html )

Consider break.c (minimized from gdb/testsuite/gdb.base/break.c):
...
void *v;
void a (void *x) { }
void b (void) { }

int
main (int argc)
{
  if (argc == 12345)
{
  a (v);
  return 1;
}
  b ();

  return 0;
}
...

We compile like this with -fPIC:
...
$ arm-none-linux-gnueabi-gcc break.c -g -fPIC
...

and run to a breakpoint in main:
...
(gdb) b main
Breakpoint 1 at 0x8410: file break.c, line 7.
(gdb) c
Continuing.

Breakpoint 1, main (argc=1) at break.c:7
7   {
...

When we compile with -fno-PIC, we break at another line:
...
(gdb) b main
Breakpoint 1 at 0x83f8: file break.c, line 8.
(gdb) continue
Continuing.

Breakpoint 1, main (argc=1) at break.c:8
8 if (argc == 12345)
...

AFAIU, the correct line number is 8, so the case with -fPIC needs to be fixed.


The assembly looks like this:
...
main:
.LFB0:
.file 1 "break.c"
.loc 1 88 0
.cfi_startproc
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 1, uses_anonymous_args = 0
stmfd   sp!, {fp, lr}
.cfi_def_cfa_offset 8
.cfi_offset 11, -8
.cfi_offset 14, -4
add fp, sp, #4
.cfi_def_cfa 11, 4
sub sp, sp, #8
str r0, [fp, #-8]
.loc 1 88 0
ldr r2, .L4
.LPIC0:
add r2, pc, r2
.loc 1 89 0
ldr r1, [fp, #-8]
ldr r3, .L4+4
cmp r1, r3
bne .L2
.loc 1 91 0
...

>From the point of view of the debugger, in the presence of .loc info, the
prologue is the code in between the 2 first .loc markers. See this comment in
gdb/arm-tdep.c:arm_skip_prologue:
...
  /* GCC always emits a line note before the prologue and another
 one after, even if the two are at the same address or on the
 same line.  Take advantage of this so that we do not need to
 know every instruction that might appear in the prologue.
...

Manually removing the second '.loc 1 88 0' (in the pic register setup generated
by require_pic_register) from the assembly gives us the required behaviour.

An easy way to achieve this in the compiler is this patch:
...
Index: config/arm/arm.c
===
--- config/arm/arm.c(revision 202646)
+++ config/arm/arm.c(working copy)
@@ -5542,9 +5542,12 @@ require_pic_register (void)
  seq = get_insns ();
  end_sequence ();

  for (insn = seq; insn; insn = NEXT_INSN (insn))
if (INSN_P (insn))
- INSN_LOCATION (insn) = prologue_location;
+ INSN_LOCATION (insn) = UNKNOWN_LOCATION;

  /* We can be called during expansion of PHI nodes, where
 we can't yet emit instructions directly in the final
...

However, it looks to me somewhat contradictory that when we mark this code as
UNKNOWN_LOCATION gdb recognized it as part of the prologue, and when we mark it
with prologue_location, gdb doesn't recognize it as part of the prologue.


Looking in more detail why we're emitting the .loc, it's because the pic
register setup follows the FUNCTION_BEG insn-note, which forces a .loc on the
next insn using the final.c:force_source_line variable.

According to this comment in dwarf2out_source_line:
...
 ... NOTE_INSN_FUNCTION_BEG, i.e. the first
 insn that corresponds to something the user wrote.
...
NOTE_INSN_FUNCTION_BEG marks the beginning of user code. To me it seems that we
have to make a choice here: Either the pic register setup code generated by
require_pic_register is user code, or it is prologue code.

If it's prologue code, we need to emit it before the FUNCTION_BEG insn-note
(rough proof-of-concept patch attached), such that no .loc will be generated for
it. Gdb will recognize it as part of the prologue, and the breakpoint will have
the correct line number.

It it's user code, we need to mark it with the first user line, such that a .loc
with the first user line will be generated. Gdb then won't count it as part of
the prologue, and the breakpoint will have the correct line number.


My preference would be to mark it as prologue code, since that's the case for
other uses of arm_load_pic_register.

What is the proper way to fix this?

Thanks,
- Tom

2013-09-15  Tom de Vries  

gcc/
* cfgexpand.c (gimple_expand_cfg): Emit insns_before_parm_birth_insn.
* function.h (struct rtl_data): Add x_insns_before_parm_birth_insn
field.
(insns_before_parm_birth_insn): Define new macro.
* config/arm/arm.c (require_pic_register): Use
insns_before_parm_birth_insn.
Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c (revision 418548)
+++ gcc/cfgexpand.c (working copy)
@@ -4609,6 +4609,10 @@ g

Re: Fix line number data for PIC register setup code

2013-10-13 Thread Tom de Vries
On 03/10/13 17:17, Tom de Vries wrote:
> we need to emit it before the FUNCTION_BEG insn-note
> (rough proof-of-concept patch attached), such that no .loc will be generated 
> for
> it.

I investigated further, and now I think it's a regression caused by the fix for
PR47028.

Attached patch works for the testcase. I'll test the patch, and if successful do
a write-up and submit to gcc-patches.

Thanks,
- Tom

2013-10-13  Tom de Vries  

* cfgexpand.c (gimple_expand_cfg): Don't commit insertions after
NOTE_INSN_FUNCTION_BEG.

* gcc.target/arm/require-pic-register-loc.c: New test.

Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c (revision 421892)
+++ gcc/cfgexpand.c (working copy)
@@ -4618,14 +4618,19 @@ gimple_expand_cfg (void)
 	  if (e->insns.r)
 	{
 	  rebuild_jump_labels_chain (e->insns.r);
-	  /* Avoid putting insns before parm_birth_insn.  */
+	  /* Put insns after parm birth, but before
+		 NOTE_INSNS_FUNCTION_BEG.  */
 	  if (e->src == ENTRY_BLOCK_PTR
 		  && single_succ_p (ENTRY_BLOCK_PTR)
 		  && parm_birth_insn)
 		{
 		  rtx insns = e->insns.r;
 		  e->insns.r = NULL_RTX;
-		  emit_insn_after_noloc (insns, parm_birth_insn, e->dest);
+		  if (NOTE_P (parm_birth_insn)
+		  && NOTE_KIND (parm_birth_insn) == NOTE_INSN_FUNCTION_BEG)
+		emit_insn_before_noloc (insns, parm_birth_insn, e->dest);
+		  else
+		emit_insn_after_noloc (insns, parm_birth_insn, e->dest);
 		}
 	  else
 		commit_one_edge_insertion (e);
Index: gcc/testsuite/gcc.target/arm/require-pic-register-loc.c
===
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/require-pic-register-loc.c (revision 0)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-g -fPIC" } */
+
+void *v;
+void a (void *x) { }
+void b (void) { }
+   /* line 7.  */
+int/* line 8.  */
+main (int argc)/* line 9.  */
+{  /* line 10.  */
+  if (argc == 12345)   /* line 11.  */
+{
+  a (v);
+  return 1;
+}
+  b ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "\.loc 1 7 0" } } */
+/* { dg-final { scan-assembler-not "\.loc 1 8 0" } } */
+/* { dg-final { scan-assembler-not "\.loc 1 9 0" } } */
+
+/* The loc at the start of the prologue.  */
+/* { dg-final { scan-assembler-times "\.loc 1 10 0" 1 } } */
+
+/* The loc at the end of the prologue, with the first user line.  */
+/* { dg-final { scan-assembler-times "\.loc 1 11 0" 1 } } */


Re: Git mirror: asan branch

2013-10-23 Thread Tom de Vries
On 04/07/13 14:59, Thomas Schwinge wrote:
> Hi!
> 
> On Wed, 3 Jul 2013 09:54:58 -0700, Jason Merrill  wrote:
>> On 07/03/2013 02:47 AM, Thomas Schwinge wrote:
>>> OK, that of course works, but from the wiki page I got the idea that it
>>> explicitly was meant to merge these together.  So assuming this used to
>>> work in the past, I wonder what so that it no longer does; such as Git
>>> allowing such duplicates merging in the past, and/or was the intersection
>>> of refs/remotes/* and refs/heads/* meant to be the empty set (then I
>>> assume the merging would work, too), but no longer is?
>>
>> Hmm, it looks like I wrote that up without actually doing it myself, 
>> oops.  I'll correct the wiki.
> 
> Hmm, seems the change you've done:
> 
> fetch = refs/heads/*:refs/remotes/origin/*
> fetch = refs/remotes/*:refs/remotes/origin/remotes/*
> 
> ..., is not ideal either: using »git fetch --verbose --prune« I now see
> all the refs being downloaded -- and then immediatelly pruned again.  :-/
> 
> Would the following be an appropriate variant?  Seems to work fine, but
> "disturbs" the regular Git refs namespace a bit?
> 
> fetch = refs/heads/*:refs/remotes/upstream/*
> fetch = refs/remotes/*:refs/remotes/upstream-remotes/*
> 

Jason,

AFAIU you addressed this issue in the gcc wiki r108 (
http://gcc.gnu.org/wiki/GitMirror?action=diff&rev1=107&rev2=108 ) on 2013-07-11.

However, I ran into the same problem with 'The various release branches':
...
$ for f in 4_8 ; do git config --add remote.origin.fetch
refs/remotes/gcc-$f-branch:refs/remotes/origin/gcc-$f-branch; done
$ git remote update
Fetching origin
fatal: refs/remotes/origin/gcc-4_8-branch tracks both refs/heads/gcc-4_8-branch
and refs/remotes/gcc-4_8-branch
error: Could not fetch origin
...

Following the changes you made in 'Fetching all branches', the command becomes:
...
git config --add remote.origin.fetch
refs/remotes/gcc-$f-branch:refs/remotes/svn/gcc-$f-branch
...

I could try that out, but apparently something more is needed:
...
Unfortunately, this doesn't work well with the git-svn configuration set up
above, because it expects the branches to be in remotes/origin. One way to deal
with this is to switch git-svn to use remotes/svn instead and adjust your local
branches to use remotes/svn for their upstream.
...
Can you translate the last sentence into shell/git command(s)?

Thanks
- Tom




Re: Git mirror: asan branch

2013-10-29 Thread Tom de Vries
On 24/10/13 07:05, Andi Kleen wrote:
> Tom de Vries  writes:
>> ...
>> Can you translate the last sentence into shell/git command(s)?
> 
> It would be far better to just centrally mirror all branches in SVN as 
> standard git branches. Then all these problems wouldn't occur.
> 
> As far as I can tell all the workarounds proposed so far 
> have some nasty drawbacks.
> 

Andi,

AFAIU you're suggesting to do the 'Fetching all branches' approach?

I've tried that now:
...
$ git clone git://gcc.gnu.org/git/gcc.git
$ git svn init -Ttrunk --prefix=svn/ svn+ssh://vr...@gcc.gnu.org/svn/gcc
$ git svn fetch
$ git svn show-ignore >> .git/info/exclude
$ git config remote.origin.fetch 'refs/heads/*:refs/remotes/origin/*'
$ git config --add remote.origin.fetch refs/remotes/*:refs/remotes/svn/*
$ git remote update
...
 * [new branch]  gomp-branch -> svn/gomp-branch
 ! [rejected]trunk  -> svn/trunk  (non-fast-forward)
...
error: Could not fetch origin
...

It this is what I should expect to see?

Thanks,
- Tom



libcilkrts breaks non-bootstrap build

2013-11-04 Thread Tom de Vries
Hi,

When configuring a gcc build with "--disable-bootstrap --enable-languages=c" I
run into this error:
...
libtool: compile:  g++
-B/home/vries/gcc_versions/devel/lean-c/install/x86_64-unknown-linux-gnu/bin/
-B/home/vries/gcc_versions/devel/lean-c/install/x86_64-unknown-linux-gnu/lib/
-isystem
/home/vries/gcc_versions/devel/lean-c/install/x86_64-unknown-linux-gnu/include
-isystem
/home/vries/gcc_versions/devel/lean-c/install/x86_64-unknown-linux-gnu/sys-include
"-DPACKAGE_NAME=\"Cilk Runtime Library\""
-DPACKAGE_TARNAME=\"cilk-runtime-library\" -DPACKAGE_VERSION=\"2.0\"
"-DPACKAGE_STRING=\"Cilk Runtime Library 2.0\""
-DPACKAGE_BUGREPORT=\"c...@intel.com\" -DPACKAGE_URL=\"\"
-DPACKAGE=\"cilk-runtime-library\" -DVERSION=\"2.0\" -DSTDC_HEADERS=1
-DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
-DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1
-DHAVE_UNISTD_H=1 -DHAVE_ALLOCA_H=1 -DHAVE_ALLOCA=1 -DHAVE_DLFCN_H=1
-DLT_OBJDIR=\".libs/\" -I. -I/home/vries/gcc_versions/devel/src/libcilkrts
-I/home/vries/gcc_versions/devel/src/libcilkrts/include
-I/home/vries/gcc_versions/devel/src/libcilkrts/runtime
-I/home/vries/gcc_versions/devel/src/libcilkrts/runtime/config/x86
-DIN_CILK_RUNTIME=1 -D_Cilk_spawn= -D_Cilk_sync= -D_Cilk_for=for -fcilkplus -g3
-O0 -dH -D_GNU_SOURCE -MT bug.lo -MD -MP -MF .deps/bug.Tpo -c
/home/vries/gcc_versions/devel/src/libcilkrts/runtime/bug.cpp  -fPIC -DPIC -o
.libs/bug.o
g++: error: unrecognized command line option ‘-fcilkplus’
make[2]: *** [bug.lo] Error 1
make[2]: Leaving directory
`/home/vries/gcc_versions/devel/lean-c/build/x86_64-unknown-linux-gnu/libcilkrts'
...

The error occurs because the compiler doesn't support -fcilkplus.

Should configure disable libcilkplus when c++ is not enabled?

I see the same error with "--disable-bootstrap --enable-languages=c,c++". Should
configure test the compiler for support of -fcilkplus and disable libcilkplus if
not?

Thanks,
- Tom


Re: libcilkrts breaks non-bootstrap build

2013-11-04 Thread Tom de Vries
On 04/11/13 21:23, Iyer, Balaji V wrote:
> Hi Tom,
>   This is what I tried for --enable-languages=c,c++
> 
> ../trunk-gcc/configure --disable-bootstrap --enable-languages="c,c++" 
> --prefix=/home /install_dir/trunk-install-disable-bootstrap
> 
> And it seem to compile fine. Did you any other tags to configure?
> 

Balaji,

I've just tried it myself again, and didn't manage to reproduce the error. I
think I must have seen a side effect of a patch I was trying out to fix the
other problem.

Sorry for the confusion.

Thanks,
- Tom

> Thanks,
> 
> Balaji V. Iyer.
> 
>> -Original Message-
>> From: Tom de Vries [mailto:tom_devr...@mentor.com]
>> Sent: Monday, November 4, 2013 2:15 PM
>> To: gcc@gcc.gnu.org
>> Cc: Iyer, Balaji V
>> Subject: libcilkrts breaks non-bootstrap build
>>
>> Hi,
>>
>> When configuring a gcc build with "--disable-bootstrap --enable-
>> languages=c" I run into this error:
>> ...
>> libtool: compile:  g++
>> -B/home/vries/gcc_versions/devel/lean-c/install/x86_64-unknown-linux-
>> gnu/bin/
>> -B/home/vries/gcc_versions/devel/lean-c/install/x86_64-unknown-linux-
>> gnu/lib/
>> -isystem
>> /home/vries/gcc_versions/devel/lean-c/install/x86_64-unknown-linux-
>> gnu/include
>> -isystem
>> /home/vries/gcc_versions/devel/lean-c/install/x86_64-unknown-linux-
>> gnu/sys-include
>> "-DPACKAGE_NAME=\"Cilk Runtime Library\""
>> -DPACKAGE_TARNAME=\"cilk-runtime-library\" -
>> DPACKAGE_VERSION=\"2.0\"
>> "-DPACKAGE_STRING=\"Cilk Runtime Library 2.0\""
>> -DPACKAGE_BUGREPORT=\"c...@intel.com\" -DPACKAGE_URL=\"\"
>> -DPACKAGE=\"cilk-runtime-library\" -DVERSION=\"2.0\" -DSTDC_HEADERS=1
>> -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -
>> DHAVE_STRING_H=1
>> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -
>> DHAVE_STDINT_H=1
>> -DHAVE_UNISTD_H=1 -DHAVE_ALLOCA_H=1 -DHAVE_ALLOCA=1 -
>> DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -I. -
>> I/home/vries/gcc_versions/devel/src/libcilkrts
>> -I/home/vries/gcc_versions/devel/src/libcilkrts/include
>> -I/home/vries/gcc_versions/devel/src/libcilkrts/runtime
>> -I/home/vries/gcc_versions/devel/src/libcilkrts/runtime/config/x86
>> -DIN_CILK_RUNTIME=1 -D_Cilk_spawn= -D_Cilk_sync= -D_Cilk_for=for -
>> fcilkplus -g3
>> -O0 -dH -D_GNU_SOURCE -MT bug.lo -MD -MP -MF .deps/bug.Tpo -c
>> /home/vries/gcc_versions/devel/src/libcilkrts/runtime/bug.cpp  -fPIC -DPIC -
>> o .libs/bug.o
>> g++: error: unrecognized command line option '-fcilkplus'
>> make[2]: *** [bug.lo] Error 1
>> make[2]: Leaving directory
>> `/home/vries/gcc_versions/devel/lean-c/build/x86_64-unknown-linux-
>> gnu/libcilkrts'
>> ...
>>
>> The error occurs because the compiler doesn't support -fcilkplus.
>>
>> Should configure disable libcilkplus when c++ is not enabled?
>>
>> I see the same error with "--disable-bootstrap --enable-languages=c,c++".
>> Should configure test the compiler for support of -fcilkplus and disable
>> libcilkplus if not?
>>
>> Thanks,
>> - Tom



Re: [PATCH] RE: libcilkrts breaks non-bootstrap build

2013-11-05 Thread Tom de Vries
On 05/11/13 05:17, Iyer, Balaji V wrote:
> Is the following patch OK to fix this issue? 
> 

Balaji,

the patch fixes the problem for me, thanks.

I can't approve your patch, but it looks good to me.

FWIW, I stumbled upon this text at http://gcc.gnu.org/codingconventions.html
which is related to this patch, and also to the top-level part of your commit
r204173, which is missing at 'src':
...
Top-level configure.ac, configure, Makefile.in, config-ml.in, config.if and most
other top-level shell-scripts: Please try to keep these files in sync with the
corresponding files in the src repository at sourceware.org. Some people hope to
eventually merge these trees into a single repository; keeping them in sync
helps this goal. When you check in a patch to one of these files, please check
it in the src tree too, or ask someone else with write access there to do so.
...

Thanks,
- Tom

> Thanks,
> 
> Balaji V. Iyer.
> 
> Index: configure.ac
> ===
> --- configure.ac(revision 204381)
> +++ configure.ac(working copy)
> @@ -2061,7 +2061,7 @@
>  case ,${enable_languages}, in
>*,c++,*) ;;
>*)
> -noconfigdirs="$noconfigdirs target-libitm target-libsanitizer 
> target-libvtv"
> +noconfigdirs="$noconfigdirs target-libcilkrts target-libitm 
> target-libsanitizer target-libvtv"
>  ;;
>  esac
> 
> Index: ChangeLog
> ===
> --- ChangeLog   (revision 204381)
> +++ ChangeLog   (working copy)
> @@ -1,3 +1,9 @@
> +2013-11-04  Balaji V. Iyer  
> +
> +   * configure.ac: Added libcilkrts to noconfig list when C++ is not
> +   supported.
> +   * configure: Regenerated.
> +
>  2013-11-01  Trevor Saunders  
> 
>  * MAINTAINERS (Write After Approval): Add myself.
> Index: configure
> ===
> --- configure   (revision 204381)
> +++ configure   (working copy)
> @@ -6630,7 +6630,7 @@
>  case ,${enable_languages}, in
>*,c++,*) ;;
>*)
> -noconfigdirs="$noconfigdirs target-libitm target-libsanitizer 
> target-libvtv"
> +noconfigdirs="$noconfigdirs target-libcilkrts target-libitm 
> target-libsanitizer target-libvtv"
>  ;;
>  esac



Re: [PATCH] RE: libcilkrts breaks non-bootstrap build

2013-11-05 Thread Tom de Vries
On 05/11/13 10:37, Gopalasubramanian, Ganesh wrote:
> My non-bootstrap build fails with the following message
> 
> /bin/bash: ./libtool: No such file or directory
> make: *** [cilk-abi-vla.lo] Error 127
> 
> I have my libtool installed in /usr/bin.
> 
> I configured the build with 
> configure --prefix=../dailybuild/usr/Nov_05_2013 
> --enable-languages=c,c++,fortran --disable-bootstrap
> 
> Am I missing something?

Ganesh,

I've just completed a build of r204382 with the same configure flags. I can't
reproduce that failure.

What is not clear from your message, is whether indeed ./libtool is missing, or
it can't be found while it's present.

If ./libtool is not present, it's supposed to be generated by configure.status,
so check your configure.log, perhaps there's a related error there?

Thanks,
- Tom


question about REG_PARM_STACK_SPACE usage in expand_call

2013-12-12 Thread Tom de Vries

Honza,

in calls.c:expand_call, I see the following code:
...
#ifdef REG_PARM_STACK_SPACE
  /* If outgoing reg parm stack space changes, we can not do sibcall.  */
  || (OUTGOING_REG_PARM_STACK_SPACE (funtype)
  != OUTGOING_REG_PARM_STACK_SPACE (TREE_TYPE (current_function_decl)))
  || (reg_parm_stack_space != REG_PARM_STACK_SPACE (fndecl))
#endif
...

I don't understand the last line. reg_parm_stack_space is initialized like this:
...
  reg_parm_stack_space = REG_PARM_STACK_SPACE (!fndecl ? fntype : fndecl);
...

Was this meant perhaps?
...
  || (reg_parm_stack_space != REG_PARM_STACK_SPACE (current_function_decl))
...

Thanks,
- Tom


Re: nvptx multilib setup (was: [Bug target/104364] [12 Regression] OpenMP/nvptx regressions after "[nvptx] Add some support for .local atomics")

2022-02-04 Thread Tom de Vries via Gcc

On 2/4/22 08:21, Thomas Schwinge wrote:

Hi Tom!

Taking this one to the mailing list; not directly related to PR104364:

On 2022-02-03T13:35:55+, "vries at gcc dot gnu.org via Gcc-bugs" 
 wrote:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104364



I've tested this using (recommended) driver 470.94 on boards:


(As not every user will be using the recommended/latest, I too am doing
some testing also on oldish Nvidia/CUDA Driver versions.)  Combinatorial
explosion is a problem, of course...



I am starting to suspect that I misinterpreted the nvidia website.  When 
asking for a driver for a board, I get some driver, which I took to be 
the recommended one.


But I started to notice changes in recommended version from 470.x to 
510.x, which suggest the recommended one is just the latest one they've 
updated in the set of recommended drivers.  So it seems inaccurate to 
talk about "the" recommended driver.


Thanks for the testing, much appreciated. I'm currently testing with 390.x.


while iterating over dimensions { -mptx=3.1 , -mptx=6.3 } x { GOMP_NVPTX_JIT=-O0, 
 }.


Do you use separate (nvptx-none offload target only?) builds for
different '-mptx' variants (likewise: '-misa'), or have you hacked up the
multilib configuration? 


Neither, I'm using --target_board=unix/foffload= for that.

 ('gcc/config/nvptx/t-nvptx:MULTILIB_OPTIONS'

etc., I suppose?)  Should we add a few representative configurations to
be built by default?  And/or, should we have a way to 'configure' per
user needs (I suppose: '--with-multilib-list=[...]', as supported for a
few other targets?)?  (I see there's also a new
'--with-multilib-generator=[...]', haven't looked in detail.)  No matter
which way: again, combinatorial explosion is a problem, of course...



As far as I know, the gcc build doesn't finish when switching default to 
higher than sm_35, so there's little point to go to a multilib setup at 
this point.  But once we fix that, we could reconsider, otherwise, 
things are likely to regress again.


Thanks,
- Tom