Re: Question about strange register calling truncates to SImode on x86_64

2006-10-12 Thread Michael Matz
Hi,

On Thu, 12 Oct 2006, Kai Tietz wrote:

> thanks for description. I wasn't aware, that the upper 32 bits are 
> zeroed. Does this means that even the stack has to be in the first 4 Gb, 
> too.

Why should it?  I.e. no, it doesn't have to.

> Or does this mov instruction does a sign-extention.

Which mov instruction?


Ciao,
Michael.


Re: Unwinding CFI gcc practice of assumed `same value' regs

2006-12-13 Thread Michael Matz
Hi,

On Tue, 12 Dec 2006, Andrew Haley wrote:

>  > > In practice, %ebp either points to a call frame -- not necessarily 
>  > > the most recent one -- or is null.  I don't think that having an 
>  > > optional frame pointer mees you can use %ebp for anything random at 
>  > > all, but we need to make a clarification request of the ABI.
>  > 
>  > I don't see that as feasible.  If %ebp/%rbp may be used as a general 
>  > callee-saved register, then it can hold any value.
> 
> Sure, we already know that, as has been clear.  The question is *if* 
> %rbp may be used as a general callee-saved register that can hold any 
> value.

Yes of course it was meant to be used such.  The ABI actually only gives a 
recommendation that %rbp should be zero in the outermost frame, it's not a 
must.  The ABI _requires_ proper .eh_frame descriptors when unwinding is 
desired; so it's useless (and wrong) for any unwinder to look at %rbp and 
determine if it should stop.

Alternatively (though not sanctioned by the ABI) all functions through 
which unwinding is desired but for which no unwind info is created _have_ 
to use %rbp as frame pointer and not as general register.  In that case 
the zeroing of %rbp would be a usable stop condition for functions without 
unwind info.  But that's already outside the ABI.


Ciao,
Michael.


Re: Unwinding CFI gcc practice of assumed `same value' regs

2006-12-13 Thread Michael Matz
Hi,

On Mon, 11 Dec 2006, Jan Kratochvil wrote:

> currently (on x86_64) the gdb backtrace does not properly stop at the 
> outermost
> frame:
> 
> #3  0x0036ddb0610a in start_thread () from /lib64/tls/libpthread.so.0
> #4  0x0036dd0c68c3 in clone () from /lib64/tls/libc.so.6
> #5  0x in ?? ()
> 
> Currently it relies only on clearing %rbp (0x above is
> unrelated to it, it got read from uninitialized memory).
> 
> http://sourceware.org/ml/gdb/2004-08/msg00060.html suggests frame 
> pointer 0x0 should be enough for a debugger not finding CFI to stop 
> unwinding, still it is a heuristic.  In the -fno-frame-pointer compiled 
> code there is no indication the frame pointer register became a regular 
> one and 0x0 is its valid value.

Right.  Unwinding through functions (without frame pointer) requires CFI.  
If there is CFI for a function the unwinder must not look at %rbp for stop 
condition.  If there's no CFI for a function it can't be unwound (strictly 
per ABI).  If one relaxes that and wants to unwind through CFI-less 
functions it has to have a frame pointer.  In that case zero in that frame 
pointer could indicate the outermost frame (_if_ the suggestion in the ABI 
is adhered to, which noone is required to).


Ciao,
Michael.


Re: Unwinding CFI gcc practice of assumed `same value' regs

2006-12-13 Thread Michael Matz
Hi,

On Tue, 12 Dec 2006, Ulrich Drepper wrote:

> > Really?  Well, that's one interpretation.  I don't believe that, 
> > though.  It's certainly an inconsistency in the specification, which 
> > says that null-termination is supported, and this implies that you 
> > can't put a zero in there.
> 
> Again, this is just because the "authors" of the ABI didn't think.

[Blaeh, Ulrich talk] No, I think it's because the "readers" of the ABI 
can't read.


Ciao,
Michael.


Re: Error help

2005-03-02 Thread Michael Matz
Hi,

On Wed, 2 Mar 2005, Rajkishore Barik wrote:

> Checked out a version of new-regalloc code and patched some of my code. It 
> crashes in the bootstrapping
> part where "xgcc" runs some of my code on __dtor* modules. Is there a way 
> to avoid this 
> bootstrapping? I do not really care about these errors.

Somehow I got this mail already separately, not over the gcc list.  
Perhaps my answer didn't get through, so here it is again:

Avoid the process of bootstraping itself?  Then just do a make after
configuring instead of a "make bootstrap".  Note that this still will
build the runtime libraries, so if the compiler crashes on them you would
have to fix it before having an installable version.  But during
development I normally also don't care, and just do make without install,
and debug the generated cc1 directly.  But I usually at least fix bugs
during the runtime lib building, when I see them.


Ciao,
Michael.


Re: matching constraints in asm operands question

2005-03-08 Thread Michael Matz
Hi,

On Sat, 5 Mar 2005 [EMAIL PROTECTED] wrote:

> > Well, I assumed the same thing when I started poking at that code, but
> > then someone pointed out that it didn't actually work that way, and as
> > I recall the code does in fact assume a register.  I certainly would
> > not object to making '+' work properly for memory operands, but simply
> > asserting that it already does is wrong.
> 
> The code in reload to make non-matching operands match assumes a
> register. However, a match from a plus should always kept in sync
> (except temporarily half-way through a substitution, since we now
> unshare).  If it isn't, that's a regression.

In former times an in-out constraint simply was translated to a matching 
constraints.  So it broke when no register was allowed with it.

Jason added a warning to that effect in 2003
  http://gcc.gnu.org/ml/gcc-patches/2003-12/msg01491.html
RTH fixed the problem of this translation in tree-ssa and hence also 
removed the warning
  http://gcc.gnu.org/ml/gcc-patches/2004-05/msg00438.html

I now see that I had an objection to that one as he also applied the 
removal of the warning to 3.4, without me seeing that we were doing the 
right thing as Richard claimed, but that seemed to have falled through the 
crack.

> Do you have a testcase, and/or can point out the code that introduces
> the inconsistency in the rtl?


Ciao,
Michael.


Re: Installation into 'const' directory

2005-03-16 Thread Michael Matz
Hi,

On Wed, 16 Mar 2005, Art Haas wrote:

> This morning's build of GCC mainline installed various files into a
> directory named 'const' instead of a version number like name. The c++
> headers from yesterday's build were in a '4.1.0' directory, for example.
> A perusal of the ChangeLog suggests yesterday's patchset that revamped
> the build to use BASE-VER, DATESTAMP, and DEV-PHASE files is likely to
> have made this (unanticipated?) change.

See also http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01542.html .

I've hacked around this locally with the below patch for the toplevel 
configure.


Ciao,
Michael.
-- 
Index: configure
===
RCS file: /cvs/gcc/gcc/configure,v
retrieving revision 1.206
diff -u -p -r1.206 configure
--- configure   28 Feb 2005 13:24:57 -  1.206
+++ configure   16 Mar 2005 14:07:10 -
@@ -858,14 +858,16 @@ extra_host_args=
 if test "${with_gcc_version_trigger+set}" = set; then
   gcc_version_trigger=$with_gcc_version_trigger
 else
-  gcc_version_trigger=$topsrcdir/gcc/version.c
+  gcc_version_trigger=$topsrcdir/gcc/BASE-VER
 fi
 if test -f "${gcc_version_trigger}"; then
-  gcc_version_full=`grep version_string "${gcc_version_trigger}" | sed -e 
's/.*"\([^"]*\)".*/\1/'`
+  #gcc_version_full=`grep version_string "${gcc_version_trigger}" | sed -e 
's/.*"\([^"]*\)".*/\1/'`
+  gcc_version_full=`cat "${gcc_version_trigger}"`
 else
   gcc_version_full=`$CC -v 2>&1 | sed -n 's/^gcc version //p'`
 fi
-gcc_version=`echo ${gcc_version_full} | sed -e 's/\([^ ]*\) .*/\1/'`
+#gcc_version=`echo ${gcc_version_full} | sed -e 's/\([^ ]*\) .*/\1/'`
+gcc_version=${gcc_version_full}
 
 
 


Re: Strange build errors compiling SPEC with mainline

2005-03-18 Thread Michael Matz
Hi,

On Fri, 18 Mar 2005, Diego Novillo wrote:

> Starting around 2005-03-17, I haven't been able to compile
> several SPEC tests with mainline.  Has there been any change in
> the pre-processor that might explain these errors?
> 
> I'm pretty sure my installation is correct because this worked
> until 2005-03-15, the system header files are all there and I get
> no such errors from the runs with tree-cleanup-branch (merged
> 2005-02-23).
> 
> Any ideas?
> 
> Thanks.  Diego.
> 
> -
> /home/cygnus/dnovillo/perf/sbox/gcc/local.i686/inst.tobiano/bin/gcc -c -o 
> bits.o-O3 -march=i686bits.c
> Error from make 'specmake  build 2> make.err | tee make.out':
> In file included from gzip.h:37,
>  from bits.c:55:
> /usr/include/stdio.h:34:21: error: stddef.h: No such file or directory

stddef.h is a header installed by GCC into 
lib/gcc//4.1.0/include/stddef.h  If it can't be found it means that 
it's not installed there, which might be due to Zacks changes.  You should 
look if you have a 'const' directory instead of the 4.1.0 one.  If yes, 
then this is the problem, and Zacks latest patches fixes it.


Ciao,
Michael.


Re: about new_regalloc

2005-04-01 Thread Michael Matz
Hi,

On Thu, 31 Mar 2005, zouq wrote:

> in gcc3.4.1,i found rest_of_new_handle_regalloc
> why in gcc4.0, it has been removed?

It was removed from gcc 4 because it bitrotted and broke on all kinds of 
code.  If you want to see a more recent and more working version look at 
the new-regalloc-branch.


Ciao,
Michael.


Re: unreducable cp_tree_equal ICE in gcc-4.0.0-20050410

2005-04-13 Thread Michael Matz
Hi,

On Wed, 13 Apr 2005, Nick Rasmussen wrote:

> I'm running into an ICE in the prerelease, that is proving to be
> very difficult in reducing to a small testcase.  If I preprocess
> the source (via -E or -save-temps) the code successfully compiles.
>  If I minimally change the source file in some ways(like adding a
> static integer in the global scope) the code compiles.  I've been
> able to delete some lines from the source file that's triggering
> the bug, including some from within the function that is
> triggering the ICE, but I'm down to a point where I can't easily
> reduce it further, or even get it into a single source file.

If possible you could tar eveything needed together with a small Makefile, 
for others to look at.  Otherwise it's will be difficult.  You can also 
compile cc1plus with debugging (just do a "make CFLAGS=-g" after 
configuring gcc, no bootstrap), and try to come up with some more info 
yourself.  For instance by printing the two trees which are impossible, by 
going to the cp_tree_equal frame and doing
  (gdb) p debug_tree(t1)
in gdb (and for t2).

> /dept/rnd/vendor/gcc-4.0.0pre1-amd64/bin/g++ -fno-builtin -O2 -g 
> -DHAVE_X86_64 -DHAVE_LITTLE_ENDIAN -DHAVE_BYTESWAP_H -DHAVE_64BIT_POINTER 
> -DHAVE_VA_COPY -DHAVE_XINERAMA -DPLATFORM_LINUX -D_FILE_OFFSET_BITS=64 
> -DHAVE_STL_LIMITS -DHAVE_IOS_BASE -Drestrict=__restrict__ 
> -DPLATFORM_LINUX_AMD64 -DPLATFORM=LINUX_AMD64 
> -DBUILD=LINUX_AMD64_GCC400pre1_OPT_DEBUG -DDISTRO_SUSE -DDISTRO=SUSE 
> -DDISTRO_VERSION=91 -DNVIDIA_VERSION_6111 -DNVIDIA_VERSION=6111 
> -DGCC_VERSION_400pre1 -DNDEBUG -I/usr/share/doc/NVIDIA_GLX-1.0/include -I. 
> -I/dept/rnd/home/nick/work/build-zeno2/SUSE_AMD64_GCC400pre1_OPT_DEBUG/include
>  -I/usr/X11R6/include -c bug.C -o /dev/null
> bug.C: In member function 'void EzFleshMesh::buildNodeArrays()':
> bug.C:411: internal compiler error: in cp_tree_equal, at cp/tree.c:1552

And give the context of that line.  Although it more seems like 
overwriting memory inside the compiler or something like this, as 
otherwise it would be more deterministic.

> #2  0x00473171 in cp_tree_equal (t1=0x2a9a1c94b0, t2=0x2a9a1cbb90) at 
> ../../gcc-4.0.0-20050410/gcc/cp/tree.c:1552
> #3  0x00473198 in cp_tree_equal (t1=0x2a9a013820, t2=0x2a9a031460) at 
> ../../gcc-4.0.0-20050410/gcc/cp/tree.c:1543
> #4  0x004590b3 in comptypes (t1=0x2a9a00ea90, t2=0x2a9a02bdd0, 
> strict=Variable "strict" is not available.
> ) at ../../gcc-4.0.0-20050410/gcc/cp/typeck.c:912
> #5  0x00458ede in comptypes (t1=0x2a9a00ec30, t2=0x2a9a032000, 
> strict=0) at ../../gcc-4.0.0-20050410/gcc/cp/typeck.c:1034
> #6  0x0047b0b9 in cxx_types_compatible_p (x=0x2a9a00ec30, 
> y=0x2a9a032000) at ../../gcc-4.0.0-20050410/gcc/cp/cp-objcp-common.c:173
> #7  0x007776bd in expressions_equal_p (e1=0x2a9a1c9640, 
> e2=0x2a9a1cbd20) at ../../gcc-4.0.0-20050410/gcc/tree-vn.c:127
> #8  0x007776f7 in val_expr_pair_expr_eq (p1=0xc16b90, p2=0xcd25e0) at 
> ../../gcc-4.0.0-20050410/gcc/tree-vn.c:153
> #9  0x007ffcba in htab_find_slot_with_hash (htab=0xca98a0, 
> element=0x81c3b9, hash=8504885, insert=INSERT) at 
> ../../gcc-4.0.0-20050410/libiberty/hashtab.c:660
> #10 0x0077794f in vn_add (expr=0x2a9a1cbd20, val=0x2a9a240330, 
> vuses=0x0) at ../../gcc-4.0.0-20050410/gcc/tree-vn.c:199
> #11 0x004e09c1 in execute_pre (do_fre=0 '\0') at 
> ../../gcc-4.0.0-20050410/gcc/tree-ssa-pre.c:1742

So PRE is trying to compare two types, and they contains something which 
can't be handled.  Either because they were silently overwritten, or 
because of a logical error.


Ciao,
Michael.


Re: Should there be a GCC 4.0.1 release quickly?

2005-05-02 Thread Michael Matz
Hi,

On Thu, 28 Apr 2005, Mark Mitchell wrote:

> I'd rather not rush to a 4.0.1 release.

I'm a fan of release early, release often.  Really.  Even if this means we 
would end up with a 4.0.20 after half a year.  Basically I can think of 
only one reason _not_ to release after a critical bug is fixed.  That 
reason is a very good one, it is resources.  Resources to prepare the 
release, write announcement, testbuild the RC tarballs, and so on.

But given the resource constraints I think one should release as often as
possible.  Every two weeks after serious bugs are fixed seems not
unreasonable to me.  I realize that is extreme, but I still think it makes
sense.  Certainly I feel that the planned two months until 4.0.1 are much
too long for the number of critical bugs 4.0.0 had.


Ciao,
Michael.


Re: Backporting to 4_0 the latest friend bits

2005-05-02 Thread Michael Matz
Hi,

On Sat, 30 Apr 2005, Kriang Lerdsuwanakij wrote:

> Sure, this code compiles with 4.1 and 3.4 but doesn't compile with 4.0.
> Although the code is valid, I'd bet it doesn't work the way the
> programmer of the above code (or other 99% who doesn't track
> the standard closely) would expect.

Note that this was a reduced testcase from the original file.  I optimized 
only for triggering the non-compilation, not for preserving the authors 
initial intent.  For instance it may very well be possible that this 
friend declaration was not necessary at all, and only put there by the 
author because of confusion or it was initially needed, then later this 
need was removed, but the friend decl was forgotten.

So, the basic facts which interest me for the purpose of this discussion 
are:
  1) the program in its original form can be compiled with 3.3 and 3.4
 _and_ worked there (for whatever reasons it worked)
  2) does not compile with 4.0
  3) does compile with 4.1 (and presumably also works)

What I would find ideal under these circumstances is that the patch which 
made it work in 4.1, _if it's not too intrusive_, be included in 4.0, even 
if it doesn't fix a regression in the strict sense.  If you will, I want 
the bar lowered for regressions to also include (case by case) being able 
to compile code which was incorrectly compiled before, is now not compiled 
at all, and for which a fix exists in 4.1.

Basically I don't want defects in 3.x compilers to prevent backporting of 
bugfixes from 4.1 to 4.0, if possible.


Ciao,
Michael.


Re: Backporting to 4_0 the latest friend bits

2005-05-03 Thread Michael Matz
Hi,

On Mon, 2 May 2005, Mark Mitchell wrote:

> At the same time, if the code in question doesn't mean what the person
> who wrote it wants it to mean (e.g., if it implicitly declares classes
> in the scope of the friendly class, rather than nominating other classes
> as friends), then that code should still be fixed.

No disagreement from me here.

> It's certainly in the long-term interest of KDE not to have spurious
> friend declarations around, and I'd expect that as a KDE distributor you
> would want to encourage them to use the syntax that means what they
> want, even in parallel to possibly fixing the compiler.

Yep.  /us fighting in many places ;-)


Ciao,
Michael.


Re: restrict and char pointers

2005-05-06 Thread Michael Matz
Hi,

On Thu, 5 May 2005, Daniel Berlin wrote:

> You can do it, but apparently restrict isn't as simple as "a and b are
> both restrict pointers and therefore can never alias", because that's
> not the actual definition of restrict. It says stuff about pointers
> "based on" restricted objects, etc.
> 
> Joseph Myers has shown some very weird but legal things you can do with
> restrict that make it all but pointless to try to handle, at least for
> the non function argument case.

Disagreement here.  Some of Josephs weird examples were things like:
  void f(int* restrict a, int * restrict b) {
for (i=0; i < N; i+=2) a[i] = 2*b[i];
  }
  int array[N];
  f (&array[0], &array[1]);

I.e. interleaving the pointers in a way that they both point into the 
same object.  This is not forbidden by restrict, but it also poses no 
problem.  The two restrict pointers still may not actually point to the 
same memory objects (individual array elements here), so the compiler can 
still trivially apply the "two restrict pointers don't alias" rule.

Then there is the problem of "based on".  This allows a non-restrict
pointer to sometimes alias a restrict pointer.  One could ignore this
rule at first, for ease of implementation, and say that a unrestricted
pointer can alias everything again.  restricted pointers themself can only
be based on other restricted pointers under very narrow circumstances,
which effectively prevents them again to be aliased at their points of
dereference.

User actually using restrict in their programs will most probably use
restrict for each pointer possible, so just handling that two restrict
pointers can't alias alone would probably catch 90% of all cases. (I've
actually had one user who was very confused that the vectorizer didn't do
anything on his very simple two-line function, although he made all three
pointer arguments be restricted).

As you said the most important would probably be to handle restrict 
pointers in function arguments.  I would add to that also those pointers 
which are not used in any RHS of any statement (except for being 
dereferenced of course).  This would ensure that no other pointer is based 
on them.  And it would allow users to write code like:
  void f(int* a, int *b, int n) {
if (a > b+n || b > a+n) {  // arrays don't overlap
  int * restrict ar = a;
  int * restrict br = b;
  /* Make this nonaliasing known to the compiler */
  for (int i = 0; i < n; i++) ar[i] += br[i];
} else {
  /* They overlap, can't optimize as much */
  for (int i = 0; i < n; i++) a[i] += b[i];
}
  }


Ciao,
Michael.


RFA: Integrate ABI testsuite in GCC

2005-06-13 Thread Michael Matz
Hi,

I have this x86-64 ABI testsuite I worked on lately again (after some 
years lingering around, it was first written when we did the port on 
simulators still).  It currently lies on cvs.x86-64.org in the 'abitest' 
module, for the curious (it has anoncvs too).

I would like to somehow integrate this into GCC, so that it is run 
automatically when doing a make check.  I've pondered about several ways 
to do this:
  1) add something like --with-abitest=/dir/to/abitest to gccs configure
 make check could then use this external path to run the ABI testsuite
  2) mirror the testsuite somehow inside GCC's CVS to be tighly integrated

I'm not sure which way is best.  The second one has the advantage that you 
can't miss running it, when just checking out GCC and developing on it.
The first one has the advantage the GCC's CVS would not be clobbered with 
something external, and very architecture specific.

Currently the testsuite contains some testcase generators written in C, 
and some hand-written testcases.  I'm thinking about rewriting the 
generators at least in something better suited to text manipulation, i.e. 
bash or perl.  What would be the feeling having such kind of stuff in 
GCC's CVS?

Basically I'm looking for some consensus how to make my above goal happen.  
So anyone any suggestions, ideas, flames?


Ciao,
Michael.


Re: Function Inlining for FORTRAN

2005-07-22 Thread Michael Matz
Hi,

On Wed, 20 Jul 2005, Steven Bosscher wrote:

> On Wednesday 20 July 2005 17:22, Paul Brook wrote:
> > To implement (b) this needs to be changed to:
> >
> > - Do everything up until gfc_generate{,_module}_code as normal.
> > - Save the results somewhere and repeat for each PU.
> > - Identify calls for procedures for which we have definitions, and link
> > them together somehow. It 's probably worth maintaining some sort of global
> > symbol table and building these associations incrementally during
> > resolution.
> 
> This is what I was working on, but I never finished it.  I encountered
> some memory corruption issues (procedure names disappearing underneath
> me) that I never found time for to investigate.
> 
> I've appended the last incarnation of my hack that I could find in my
> local mail archive.  This was supposed to help implement the first two
> points of (b).  Actually linking things together is something I never
> got to do.

And I had once written a hack to make whole-program mode work with 
gfortran (which in the end worked well enough for the fortran programs in 
SPEC2k).  Its purpose is the merging of decls, so that a real call graph 
can be generated.  As I know not much of Fortran the actual inlining 
enabled by this might generate wrong code in cases like Paul mentioned.  
If so, then at least spec2k does not contain such ;-)  The patch is below, 
perhaps it's of use for anyone.  It's against an old version of the 
tree-profiling branch.


Ciao,
Michael.
-- 
diff -urp -x CVS -x '*.orig' gcc.jh/gcc/fortran/f95-lang.c 
gcc/gcc/fortran/f95-lang.c
--- gcc.jh/gcc/fortran/f95-lang.c   2005-03-12 21:30:09.0 +0100
+++ gcc/gcc/fortran/f95-lang.c  2005-03-14 11:50:08.0 +0100
@@ -534,6 +534,22 @@ pushdecl_top_level (tree x)
   return t;
 }
 
+tree find_fndecl (tree name);
+tree
+find_fndecl (tree name)
+{
+  struct binding_level *b = current_binding_level;
+  while (b)
+{
+  tree t;
+  for (t = b->names; t; t = TREE_CHAIN (t))
+if (TREE_CODE (t) == FUNCTION_DECL
+   && DECL_NAME (t) == name)
+ return t;
+  b = b->level_chain;
+}
+  return NULL_TREE;
+}
 
 /* Clear the binding stack.  */
 static void
diff -urp -x CVS -x '*.orig' gcc.jh/gcc/fortran/trans.c gcc/gcc/fortran/trans.c
--- gcc.jh/gcc/fortran/trans.c  2005-03-12 21:30:09.0 +0100
+++ gcc/gcc/fortran/trans.c 2005-03-14 11:50:10.0 +0100
@@ -658,6 +658,8 @@ gfc_generate_code (gfc_namespace * ns)
   /* Main program subroutine.  */
   if (!ns->proc_name)
 {
+  /* Let backend know that this is the main entry point to the program.  */
+  main_identifier_node = get_identifier ("MAIN__");
   /* Lots of things get upset if a subroutine doesn't have a symbol, so we
  make one now.  Hopefully we've set all the required fields.  */
   gfc_get_symbol ("MAIN__", ns, &main_program);
diff -urp -x CVS -x '*.orig' gcc.jh/gcc/fortran/trans-decl.c 
gcc/gcc/fortran/trans-decl.c
--- gcc.jh/gcc/fortran/trans-decl.c 2005-03-12 21:30:09.0 +0100
+++ gcc/gcc/fortran/trans-decl.c2005-03-14 11:50:09.0 +0100
@@ -45,6 +45,7 @@ Software Foundation, 59 Temple Place - S
 
 #define MAX_LABEL_VALUE 9
 
+extern tree find_fndecl (tree);
 
 /* Holds the result of the function if no result variable specified.  */
 
@@ -917,54 +918,58 @@ gfc_get_extern_function_decl (gfc_symbol
   mangled_name = gfc_sym_mangled_function_id (sym);
 }
 
-  type = gfc_get_function_type (sym);
-  fndecl = build_decl (FUNCTION_DECL, name, type);
+  fndecl = find_fndecl (name);
+  if (!fndecl || TREE_CODE (fndecl) != FUNCTION_DECL)
+{
+  type = gfc_get_function_type (sym);
+  fndecl = build_decl (FUNCTION_DECL, name, type);
 
-  SET_DECL_ASSEMBLER_NAME (fndecl, mangled_name);
-  /* If the return type is a pointer, avoid alias issues by setting
- DECL_IS_MALLOC to nonzero. This means that the function should be
- treated as if it were a malloc, meaning it returns a pointer that
- is not an alias.  */
-  if (POINTER_TYPE_P (type))
-DECL_IS_MALLOC (fndecl) = 1;
+  SET_DECL_ASSEMBLER_NAME (fndecl, mangled_name);
+  /* If the return type is a pointer, avoid alias issues by setting
+DECL_IS_MALLOC to nonzero. This means that the function should be
+treated as if it were a malloc, meaning it returns a pointer that
+is not an alias.  */
+  if (POINTER_TYPE_P (type))
+   DECL_IS_MALLOC (fndecl) = 1;
 
-  /* Set the context of this decl.  */
-  if (0 && sym->ns && sym->ns->proc_name)
-{
-  /* TODO: Add external decls to the appropriate scope.  */
-  DECL_CONTEXT (fndecl) = sym->ns->proc_name->backend_decl;
-}
-  else
-{
-  /* Global declaration, e.g. intrinsic subroutine.  */
-  DECL_CONTEXT (fndecl) = NULL_TREE;
-}
+  /* Set the context of this decl.  */
+  if (0 && sym->ns && sym->ns->proc_name)
+   {
+ /* TODO: Add external decls to the app

Re: Link-time optimzation

2005-11-18 Thread Michael Matz
Hi,

On Thu, 17 Nov 2005, Kenneth Zadeck wrote:

> A stack machine representation was chosen for the same reason.  Tree
> gimple is a series of statements each statement being a tree.

IMHO we should follow that path of thinking.  The representation of GIMPLE 
where we do most optimizations on (i.e. tree-ssa) is implemented as 
GCC trees, thats true.  But this is just an implementation detail, and one 
which somewhen in the future hopefully will be changed.  Because in 
essence GIMPLE is a rather flat intermediate form, most of the time just 
three address form.  I think it would be a mistake in the long run if we 
would now use a stack based external representation just because right now 
gimple is implemeted via trees.  For instance the gimple statement

  a = b + c

would need to be implemented ala
  push id_b
  push id_c
  add
  pop id_a

The expansion of the trivial operation into four stackops is horrible to 
read (think reading debug dumps).  Additionally the change of 
representation form might introduce hard to overcome issues due to 
mismatches in the expressiveness.  We would possibly need a mini stack 
optimizer for just reading back this form into gimple.

I think writing out gimple directly, i.e. using a register machine and 
three address code, is the better way.  I could even imagine some custom 
extensions to the three address form to easily represent nested constructs 
which still happen in gimple (e.g. type conversions, address taking etc).

> 1) Do some register allocation of the temps so that they are reused.
>This is non trivial to undo (but truely doable), especially where
>you wish to not adversely impact debugging.
> 
> 2) Just generate a lot of temps and hope that some other form of
>compression will save the day.

In the above light I would go for 2) together with perhaps relatively 
trivial form of 1)  (e.g. reusing temps per gimple statements, which 
reduces the overall need for temps to the max Sethi-Ullman number for the 
statements to be converted, most of the time lower than lets say 20).

OTOH it might be a good idea to persue both strategies at first (i.e. a 
gimple writer/reader based on stack machine and one based on register 
machine), and then see which feels better.  Perhaps even a merger of both 
approaches is sensible, three address form for most simple gimple 
statements with falling back to stack encoding for deeply nested operands.


Ciao,
Michael.


Re: Link-time optimzation

2005-11-18 Thread Michael Matz
Hi,

On Fri, 18 Nov 2005, Steven Bosscher wrote:

> On Friday 18 November 2005 17:31, Michael Matz wrote:
> > Perhaps even a merger of both
> > approaches is sensible, three address form for most simple gimple
> > statements with falling back to stack encoding for deeply nested operands.
> 
> That would be a bad violation of the KISS principle.

Of course.  It was just an idea coming to my mind, you don't have to start 
with that.  And sometimes one shouldn't avoid complexity at all cost, if 
the gain is high enough ;)


Ciao,
Michael.


Re: Register Allocation

2005-11-23 Thread Michael Matz
Hi,

On Tue, 22 Nov 2005, Peter Bergner wrote:

> Spill Location Optimizer [page(s) 11]:
> * The IBM iSeries backend has this optimization.  During spilling,
>   it inserts copies to/from spill pseudos (essentially like another
>   register class) which represent the stores/loads from the stack.
>   These spill pseudos can then be dead code eliminated, coalesced
>   and colored (using an interference graph) just like any other
>   normal pseudo.  Normal Chaitin coloring (using k = infinity) does
>   a good job of minimizing the amount of stack space used for
>   spilled pseudos.

This is btw. also done by the new-ra branch.  Instead of spilling to stack 
directly it spills to special new pseudo regs.  The obvious problem with 
that is a phase ordering problem, namely that if you only have pseudo 
stack locations (the pseudo regs in this case) you don't know the final 
insn sequence (e.g. if the final stack offset happens to be 
unrepresentable so that insns are necessary to actually construct the 
stack address for the load/store).  That's why new-ra leaves the stack 
slots as pseudos only for one round, and then assign real stack positions 
to them (and recolors the possibly new insns and affected pseudos).

> Spill Cost Engine [page(s) 26-29]:
> * The register allocator should not be estimating the execution
>   frequency of a basic block as 10^nesting level.  That information
>   should be coming from the cfg which comes from profile data or
>   from a good static profile.  The problem with 10^loop nesting
>   level is that we can overestimate the spill costs for some
>   pseudos.  For example:
>   while (...) {
> 
> if (...)
>   
> else
>  }
>   In the code above, "b"'s spill cost will be twice that of "a",
>   when they really should have the same spill cost.

Nearly.  "b" _is_ more costly to spill, code size wise.  All else being 
equal it's better to spill "a" in this case.  But the cost is of course 
not twice as large, as you say.  I.e. I agree with you that the metric 
should be based exclusively on the BB frequencies attached to the CFG, not 
any nesting level.  Also like in new-ra ;)


Ciao,
Michael.


Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-03 Thread Michael Matz
Hello,

On Mon, 3 Feb 2020, Richard Earnshaw (lists) wrote:

> Where does your '50 chars' limit come from?  It's not in the glibc text, 
> and it's not in the linux kernel text either.  AFAICT this is your 
> invention and you seem to be the only person proposing it.

Nope, it's fairly common, so much so that it's included in the "commonly 
accepted rules" that googling for "git subject lines" gives you (as a 
snippet glimpse from some website), and that vim changes color when 
spilling over 50 chars.  I actually thought it was universal and obvious 
until this thread (which is why I admittedly only did the above google 
right before writing this mail).  For the rationale: 'git log --oneline' 
with hash and author or date should fit the usual 72 char limit.  (An 
11-character hash plus space alone would come out as 60 chars for the 
subject)

That's also the reason why some people (certainly me) are nervous about or 
dislike all the "tags" in the subject line.  E.g. what essential 
information (and subjects are for essential info, right?) is "[committed]" 
(or, even worse "[patch]") supposed to transport?  If the rest of the 
subject doesn't interest me, I don't care if something was committed or 
not; if it _does_ interest me, then I'm going to look at the mail/patch 
either way, if committed or not; at which point the info if the author 
required review or has already committed it could be gives in the body as 
well.  Similar for some other metainfo tags.  (The "subsystem:" is good, 
though).

And if we must have these tags, then why not at least short ones?  Why 
isn't "[cmt]" or something enough?  There will be very few tags, so they 
become mnemonic pretty much immediately.  What becomes clearer when 
writing "[patch v2 1/13]" in comparison to "[v2 1/13]"?


Ciao,
Michael.



Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-03 Thread Michael Matz
Hi,

On Mon, 3 Feb 2020, Richard Earnshaw (lists) wrote:

> The idea is that the [...] part is NOT part of the commit, only part of 
> the email.

I understand that, but the subject line of this thread says "e-mail 
subject lines", so I thought we were talking about, well, exactly that; 
and I see no value of these tags in e-mails either.

(They might have a low but non-zero value for projects that use 
a single mailing list for patches and generic discussion, but we are not 
such project)

Basically: if they are deemed to clutter the git log for whatever reason, 
then there must be a very good argument for why they not also clutter 
e-mail subject lines, but instead are essential to have there, 
but not in the log.

> 'git am' would strip leading [...] automatically unless 
> you've configured, or asked git to do otherwise.  So that leading part 
> is not counted for the length calculation.

There's still e-mail netiquette which also should be obeyed, or at least 
not contradicted by git netiquette.


Ciao,
Michael.


Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-03 Thread Michael Matz
Hello,

On Mon, 3 Feb 2020, Jakub Jelinek wrote:

> > > The idea is that the [...] part is NOT part of the commit, only part of 
> > > the email.
> > 
> > I understand that, but the subject line of this thread says "e-mail 
> > subject lines", so I thought we were talking about, well, exactly that; 
> > and I see no value of these tags in e-mails either.
> 
> In email, they do carry information that is useful there, the distinction
> whether a patch has been committed already and doesn't need review from
> others, or whether it is a patch intended for patch review, or just a RFC
> patch that is not yet ready for review, but submitter is looking for some
> feedback.

For tags like [cmt] or [rfc] I don't have much gripe, though I do think 
that info could be given in the body, and that e.g. in e-mail archives 
(where the tags are not removed automatically) they carry the same value 
as in git log, namely zero.

But suggesting that using the subject line for tagging is recommended can 
lead to subjects like

 [PATCH][GCC][Foo][component] Fix foo component bootstrap failure

in an e-mail directed to gcc-patc...@gcc.gnu.org (from somewhen last year, 
where Foo/foo was an architecture; I'm really not trying to single out the 
author).  That is, _none_ of the four tags carried any informational 
content.


Ciao,
Michael.


Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-03 Thread Michael Matz
Hello,

On Mon, 3 Feb 2020, Richard Earnshaw (lists) wrote:

> Well, I'd review a patch differently depending on whether or not it was 
> already committed, a patch requiring review or an RFC looking for more 
> general comments, so I *do* think such an email prefix is useful.

As I said: a very good argument must be made; it might be that rfc falls 
into the useful-tag category.

> >> 'git am' would strip leading [...] automatically unless
> >> you've configured, or asked git to do otherwise.  So that leading part
> >> is not counted for the length calculation.
> > 
> > There's still e-mail netiquette which also should be obeyed, or at least
> > not contradicted by git netiquette.
> 
> The 50 char limit seems to come from wanting git log --oneline to not wrap in
> an 80 column terminal.  Whilst laudable, I'm not sure that such a limit
> doesn't become too restrictive and then lead to hard-to-understand summaries.

In my experience hard-to-understand summaries are more related to people 
writing them than to length, IOW, I fear a larger limit like 72 characters 
won't help that.  And, as Segher put it, we aren't really talking about 
limits, only about suggestions, if you _really_ have to mention 
that 40-character function name in which you fixed something in your 
subject, then yeah, you'll go over the 50 chars.  But as recommendation 
the 50 chars make more sense than the 72 chars, IMHO.


Ciao,
Michael.


Re: Question about undefined functions' parameters during LTO

2020-03-13 Thread Michael Matz
Hello,

On Fri, 13 Mar 2020, Erick Ochoa wrote:

> +for (tree parm = DECL_ARGUMENTS (undefined_function->decl); parm; parm =
> DECL_CHAIN (parm))
> + {
> +   tree type = TREE_TYPE(parm);
> +   if (dump_file) fprintf(dump_file, "I want the type, do I have it?
> %s\n", type ? "true" : "false");
> + }
> +  }
> +  return 0;
> +}
> 
> I have added the complete patch below, however the function iphw_execute
> encapsulates the logic I am trying at the moment.
> 
> The problem is that while this program runs, DECL_ARGUMENTS returns NULL and
> therefore the loop is never entered. This is true for functions that have
> arguments, such as puts/malloc/... and others in glibc.

As argument (types) conceptually belong to the functions type (not its 
decl), you should look at the function decls type, not at DECL_ARGUMENTS.
See the FOREACH_FUNCTION_ARGS iterator and its helpers.  Note that you 
need to pass it TREE_TYPE(funcdecl).

(DECL_ARGUMENTS is the list of formal parameters viewed from the function 
bodies perspective, so without a body that isn't filled).


Ciao,
Michael.


Re: Not usable email content encoding

2020-03-18 Thread Michael Matz
Hi,

On Wed, 18 Mar 2020, Frank Ch. Eigler via Gcc wrote:

> > > The key here is to realize that the raw message is not what you get
> > > back from the mailing list reflector, and also not the raw message
> > > that was sent by the sender.  In this day of mta intermediaries,
> > > proxies, reflectors, it may be time to revisit that suggestion.
> > 
> > But these largely are new problems.  It used to work flawlessly.
> 
> I understand that's frustrating.  But these workflows were counting on
> literally unspecified behaviours not changing, or outright standards
> violations continuing.

Wut?  How is "not mangle the mail body" in any way violating standards?  
You're talking about rewriting or adding headers (where the former is Real 
Bad, no matter what DMARC wants to impose), but the suggestion is based on 
not rewriting the body.  If the body (including attachtments) is rewritten 
any way then that simply is a bug.

> > Patch reencoding problems go back to the redhat.com changes last
> > November (I understand the responsible vendor is working on a fix,
> > but I'm not up-to-date on the current developments).
> 
> This one is a standards-compliant reencoding.  Even if mimecast (?)
> stops doing it, we can't be sure nothing else will.
> 
> > Since the sourceware.org Mailman migration, the From: header is being
> > rewritten, without any compelling reason.  I certainly do not do any
> > DMARC checking here, so the rewriting does not benefit me.
> 
> It benefits you because more and more email services are rejecting or
> interfering with mail that is not clean enough.  If you want to
> receive mail reliably, or send and have confidence that it is
> received, clean mail benefits you.

Depends on your definition of "clean".  If by that you mean rewriting mail 
bodies then I'm not sure what to say.


Ciao,
Michael.


Re: Not usable email content encoding

2020-03-18 Thread Michael Matz
Hello,

On Wed, 18 Mar 2020, Frank Ch. Eigler via Gcc wrote:

> > > The From: header rewriting for DMARC participants is something sourceware
> > > is doing now.
> > 
> > Out of curiousity, is this rewriting you are talking about the cause for a
> > lot of mails showing up as "From: GCC List" rather than their real senders?
> > This has become very annoying recently.
> 
> Yes, for emails from domains with declared interest in email
> cleanliness, via DMARC records in DNS.  We have observed mail
> -blocked- at third parties, even just days ago, when we failed to
> sufficiently authenticate outgoing reflected emails.

Was this blocking also a problem before mailman (i.e. two weeks ago)?  
Why did nobody scream for not having received mail?  Or why is it blocked 
now, but wasn't before?  Can it be made so again, like it was with ezmlm?

(And DMARCs requirement of having to rewrite From: headers should make it 
clear to everyone that it's stupid).


Ciao,
Michael.


Re: Question about undefined functions' parameters during LTO

2020-04-07 Thread Michael Matz
Hello,

On Tue, 7 Apr 2020, Erick Ochoa wrote:

> Thanks for this lead! It is almost exactly what I need. I do have one more
> question about this. It seems that the types obtained via
> FOR_EACH_FUNCTION_ARGS and TREE_TYPE are different pointers when compiled with
> -flto.
> 
> What do I mean by this? Consider the following code:
> 
> #include 
> int main(){
>   FILE *f = fopen("hello.txt", "w");
>   fclose(f);
>   return 0;
> }
> 
> The trees corresponding to types FILE* and FILE obtained via the variable f
> are different from the trees obtained from the argument to fclose.

Yes, quite possible.

> However, when we are compiling the simple C program via
> /path/to/gcc -flto a.c -fdump-ipa-hello-world -fipa-hello-world
> /path/to/gcc -flto -flto-patition=none -fipa-hello-world a.c -o a.out
> one can see that the pointers are different:
> 
> pointers 0x79ee1c38 =?= 0x79ee0b28
> records 0x79ee1b90 =?= 0x79ee0a80
> 
> Do you, or anyone else for that matter, know if it would be possible to 
> keep the trees pointing to the same address? Or, in case it can be 
> possible with some modifications, where could I start looking to modify 
> the source code to make these addresses match? The other alternative for 
> me would be to make my own type comparison function, which is something 
> I can do. But I was wondering about this first.

So, generally type equality can't be established by pointer equality in 
GCC, even more so with LTO; there are various reasons why the "same" type 
(same as in language equality) is represented by different trees, and 
those reasons are amplified with LTO.  We try to unify some equal types to 
the same trees when reading in LTO bytecode, but that's only an 
optimization mostly.

So, when you want to compare types use useless_type_conversion_p (for 
equivalence you need useless(a,b) && useless(b,a)).  In particular, for 
record types T it's TYPE_CANONICAL(T) that needs to be pointer-equal.  
(I.e. you could hard-code that as well, but it's probably better to use 
the existing predicates we have).  Note that useless_type_conversion_p is 
for the middle-end type system (it's actually one part of the definition 
of that type system), i.e. it's language agnostic.  If you need language 
specific equality you would have to use a different approach, but given 
that you're adding IPA passes you probably don't worry about that.


Ciao,
Michael.


Re: Question about undefined functions' parameters during LTO

2020-04-07 Thread Michael Matz
Hello,

On Tue, 7 Apr 2020, Erick Ochoa wrote:

> > So, when you want to compare types use useless_type_conversion_p (for 
> > equivalence you need useless(a,b) && useless(b,a)).  In particular, 
> > for record types T it's TYPE_CANONICAL(T) that needs to be 
> > pointer-equal. (I.e. you could hard-code that as well, but it's 
> > probably better to use the existing predicates we have).  Note that 
> > useless_type_conversion_p is for the middle-end type system (it's 
> > actually one part of the definition of that type system), i.e. it's 
> > language agnostic.  If you need language specific equality you would 
> > have to use a different approach, but given that you're adding IPA 
> > passes you probably don't worry about that.
> 
> I've been using TYPE_MAIN_VARIANT(T) as opposed to TYPE_CANONICAL(T). 
> This was per the e-mail thread: 
> https://gcc.gnu.org/legacy-ml/gcc/2020-01/msg00077.html .

Well, Honza correctly said that TYPE_MAIN(a) == TYPE_MAIN(b) implies a and 
b to have the same representation.  But that doesn't imply the reserve 
direction, so that hint was somewhat misleading.

> I am not 100% sure what the differences are between these two yet,

Basically TYPE_MAIN_VARIANT "removes" qualifiers, i.e. the main variant is 
always the one without const/volatile.

TYPE_CANONICAL is defined for record types and being the same means they 
have the same representation as well, and are regarded the same from a 
type aliasing p.o.v.  In comparison to MAIN_VARIANT a non-equal CANONICAL 
pointer does imply non-equivalence of the types, so you can infer 
something from comparing CANONICAL.  That is true for the types that do 
have TYPE_CANONICAL set, the others need structural comparison.  See the 
docu of TYPE_CANONICAL and TYPE_STRUCTURAL_EQUALITY_P in tree.h.

> but I think TYPE_CANONICAL(T) was not helpful because of typedefs? I 
> might be wrong here, it has been a while since I did the test to see 
> what worked.
> 
> Using TYPE_MAIN_VARIANT(T) has gotten us far in an optimization we are 
> working on, but I do think that a custom type comparison is needed now.

Well, it really depends on what specific definition of type 
equality/equivalence/compatibility you need, and for what.  Do you want to 
differ between typedefs or not, do you regard structs of same members but 
different tag as equal or not, and so on.

> I do not believe I can use useless_type_conversion_p because I need a 
> partial order in order to place types in a set.

Apart from the fact that useless_type_conversion_p _is_ a partial order, 
do you mean the internal requirement of a set implementation relying on 
some partial order?  Well, yeah, I wouldn't necassarily expect you can use 
predicates defining a semantic order on items to be usable as an internal 
implementation requirement of some random data structure.  I don't know 
what exactly you need the sets for, so I don't know if you could just use 
the usual pointer sets that would then hold possibly multiple "same" 
trees, where the same-ness would only be used later when pulling elements 
out of the set.


Ciao,
Michael.


Re: Not usable email content encoding

2020-04-07 Thread Michael Matz
Hello,

On Tue, 7 Apr 2020, Jonathan Wakely via Gcc wrote:

> On Mon, 6 Apr 2020 at 23:00, Maciej W. Rozycki via Gcc  
> wrote:
> >  And can certainly score a positive though not a definite rating in spam
> > qualification.  I don't think we ought to encourage bad IT management
> > practices by trying to adapt to them too hard and hurting ourselves (our
> > workflow) in the process.
> 
> What you call "bad IT management practices" includes how Gmail works,
> which a HUGE number of people use.
> 
> A number of lists I'm on switched to our current style of minging a
> year or two ago, because gmail users were not receiving mail, because
> gmail was rejecting the mail.

I find that unconvincing, because even googlegroup email lists don't 
mangle From: from sender domains that are now mangled by sourceware.org 
:-/

Can we please switch it off?  It's not like we really had a problem before 
the switch to mailman.


Ciao,
Michael.


Re: Not usable email content encoding

2020-04-07 Thread Michael Matz
Hello,

On Tue, 7 Apr 2020, Frank Ch. Eigler wrote:

> > I find that unconvincing, because even googlegroup email lists don't 
> > mangle From: from sender domains that are now mangled by sourceware.org 
> > :-/
> 
> It turns out receiving mail FROM google-groups mail is itself
> sometimes at risk because it fails to do this From: mangling, and its
> ARC/DKIM re-signature of mail requires even more software to process
> and bless.  (Its current behaviour on some groups-gmail lists I'm on
> are DMARC non-compliant.)

In a way that's amusing and just reinforces my p.o.v. that DMARC is 
bollocks.

> > Can we please switch it off?  It's not like we really had a problem
> > before the switch to mailman.
> 
> We have offered some first-hand evidence that there were problems,
> just they were worked on by people in the background.

Okay, now the question is, are those problems offsetting the current 
problems?  IMHO they don't, but of course I'm heavily biased, not having 
had those old problems  :)


Ciao,
Michael.


Re: Not usable email content encoding

2020-04-08 Thread Michael Matz
Hello,

On Wed, 8 Apr 2020, Mark Wielaard wrote:

> On Tue, 2020-04-07 at 11:53 +0200, Florian Weimer via Overseers wrote:
> > Gmail can drop mail for any reason.  It's totally opaque, so it's a
> > poor benchmark for any mailing list configuration changes because it's
> > very hard to tell if a particular change is effective or not.
> > 
> > Many mailing lists have *not* made such changes and continue to work
> > just fine in the face of restrictive DMARC sender policies and
> > enforcement at the recipient.
> > 
> > In general, mail drop rates due to DMARC seem to increase in these two
> > cases if the original From: header is preserved:
> > 
> > * The sender (i.e., the domain mentioned in the From: header)
> >   publishes a restrictive DMARC policy and the mailing list strips the
> >   DKIM signature.
> > 
> > * The sender signs parts of the message that the mailing list alters,
> >   and the mailing list does not strip the DKIM signature.
> > 
> > If neither scenario applies, it's safe to pass through the message
> > without munging.  The mailing list software can detect this and
> > restricting the From: header munging to those cases.
> > 
> > I doubt Mailman 2.x can do this, so it is simply a poor choice as
> > mailing list software at this point.
> 
> Earlier versions of Mainman2 had some issues which might accidentally 
> change some headers. But the latest fixes make this possible. It is how 
> the FSF handles DMARC for various GNU mailinglists (by NOT modifying the 
> headers and body and passing through the DKIM signatures): 
> https://lists.gnu.org/archive/html/savannah-hackers-public/2019-06/msg00018.html

Oh, that would be nice to have at sourceware.org.  Please?  :-)


Ciao,
Michael.


Re: Not usable email content encoding

2020-04-13 Thread Michael Matz
Hello,

On Mon, 13 Apr 2020, Christopher Faylor wrote:

> On Wed, Apr 08, 2020 at 04:15:27PM -0500, Segher Boessenkool wrote:
> >On Wed, Apr 08, 2020 at 01:50:51PM +, Michael Matz wrote:
> >>On Wed, 8 Apr 2020, Mark Wielaard wrote:
> >>>Earlier versions of Mainman2 had some issues which might accidentally
> >>>change some headers.  But the latest fixes make this possible.  It is
> >>>how the FSF handles DMARC for various GNU mailinglists (by NOT
> >>>modifying the headers and body and passing through the DKIM
> >>>signatures):
> >>>https://lists.gnu.org/archive/html/savannah-hackers-public/2019-06/msg00018.html
> >>
> >>Oh, that would be nice to have at sourceware.org.  Please?  :-)
> >
> >Yes, please please please, can we have this?
> 
> In case it isn't obvious, we are already running the latest available
> version of mailman 2.

I think that means that dmarc_moderation_action: "Munge From" can simply 
be switched off then (at least I don't see which other headers e.g. gcc@ 
is rewriting that would cause DMARC to scream; and if there are any, then 
it would be better to disable those as well.  Same with any potential 
body rewriting that might still happen).

I would offer help testing that this doesn't cause delivery issues, e.g. 
on some test email list, but it seems none of my domains is DMARC-infected 
:-/


Ciao,
Michael.

P.S: I wonder btw. why the From munging is enabled also for p=none domains 
like redhat.com.  The RFC says this is to be used for gathering DMARC 
feedback, not requiring any specific action for the mail text itself on 
the sender or receiver.  But an answer to this would be moot with the 
above non-munging of From.


Re: GCC optimizations with O3

2020-04-22 Thread Michael Matz
Hello,

On Wed, 22 Apr 2020, Erick Ochoa wrote:

> in order for me to debug my issue, I'm going to have to refactor passes 
> which directly reference optimize.

For debugging you can also work backwards: use -O3 and add -fno-xy 
options.  At least you then know (after disabling all O3 passes) that it's 
one of those places that explicitely are tacked off the opt level.

> I am planning on refactoring them by creating a "$pass_opt_level". This 
> variable can be set via command line or somewhere in opts.c. I can then 
> substitute the references to optimize with $pass_opt_level.

I think for local decisions in passes that currenly use 'optimize' the 
better strategy is to determine the underlying cause for having that test, 
and add a flag for that (or reuse an existing one, e.g. if the reason was 
"don't disturb debugging experience" then create or use a flag specific to 
that role).

For the global decisions mentioned by Jakub: that's by nature not specific 
to a pass, hence a per-pass opt level wouldn't help.


Ciao,
Michael.


Re: Should ARMv8-A generic tuning default to -moutline-atomics

2020-04-30 Thread Michael Matz
Hello,

On Wed, 29 Apr 2020, Florian Weimer via Gcc wrote:

> Distributions are receiving requests to build things with
> -moutline-atomics:
> 
>   
> 
> Should this be reflected in the GCC upstream defaults for ARMv8-A
> generic tuning?  It does not make much sense to me if every distribution
> has to overide these flags, either in their build system or by patching
> GCC.

Yep, same here.  It would be nicest if upstream would switch to 
outline-atomics by default on armv8-a :-)  (the problem with build system 
overrides is that some compilers don't understand the option, complicating 
the overrides; and patching GCC package would create a deviation from 
upstream also for users)


Ciao,
Michael.


Re: New mklog script

2020-05-19 Thread Michael Matz
Hello,

On Tue, 19 May 2020, Martin Liška wrote:

> > The common problems I remember is that e.g. when changing a function comment
> > above some function, it is attributed to the previous function rather than
> > following, labels in function confusing it:
> >   void
> >   foo ()
> >   {
> > ...
> >   label:
> > ...
> > -  ...
> > +  ...
> >   }
> 
> I've just tested that and it will take function for patch context
> (sem_variable::equals):
> @@ -1875,6 +1875,7 @@ sem_variable::equals (tree t1, tree t2)
>  default:
>return return_false_with_msg ("Unknown TREE code reached");
>  }
> +
>  }

No, the problem happens when the label is at column 0, like function names 
are.  Basically diff -p uses a regexp morally equivalent to 
'^[[:alpha:]$_]' to detect function headers, and git diff -p and friends 
followed suit.  But it should use something like
'^[[:alpha:]$_].*[^:]$' to rule out things ending with ':'.  See also diff 
-F for GNU diff.


Ciao,
Michael.


Re: New mklog script

2020-05-19 Thread Michael Matz
Hello,

On Tue, 19 May 2020, Jakub Jelinek wrote:

> On Tue, May 19, 2020 at 05:21:16PM +0100, Richard Earnshaw wrote:
> > This is really a wart in the GNU coding style.  And one reason why I
> > tend to indent such labels by a single space.  It particularly affects
> > things like class definitions where public, private, etc statements
> > often appear in column 0.
> > 
> > IMO, it would be nice to get an official change in the coding style for
> > this, it's really irritating.
> 
> It doesn't have to be just label,
> void
> foo ()
> {
>   ...
> #define X ...
>   ...
> #undef X
>   ...
> }
> does the similar thing for mklog.

That particular one would be a mere bug in mklog then.  diff -p regards 
only members of [[:alpha:]$_] as acceptable start characters of function 
names (i.e. indeed things that can start a C identifier (ignoring details 
like non-base characters) with the '$' extension), of which '#' is none.


Ciao,
Michael.


RE: New x86-64 micro-architecture levels

2020-07-23 Thread Michael Matz
Hello,

On Wed, 22 Jul 2020, Mallappa, Premachandra wrote:

> > That's deliberate, so that we can use the same x86-* names for 32-bit 
> > library selection (once we define matching micro-architecture levels there).
> 
> Understood.
> 
> > If numbers are out, what should we use instead?
> > x86-sse4, x86-avx2, x86-avx512?  Would that work?
> 
> Yes please, I think we have to choose somewhere, above would be more 
> descriptive

And IMHO that's exactly the problem.  These names should _not_ be 
descriptive, because any description invokes a wrong feeling of precision.  
E.g. what Florian already mentioned: sse4 - does it imply 4.1 and 4.2, or 
avx512: what of F, CD, ER, PF, VL, DQ, BW, IFMA, VBMI, 4VNNIW, 4FMAPS, 
VPOPCNTDQ, VNNI, VBMI2, BITALG, VP2INTERSECT, GFNI, VPCLMULQDQ, VAES does 
that one imply (rhethorical question, list shown just to make sillyness 
explicit).

Regarding precision: I think we should rule out any mathematically correct 
scheme, e.g. one in which every ISA subset gets an index and the directory 
name contains a hexnumber constructed by bits with the corresponding index 
being one or zero, depending on if the ISA subset is required or not: I 
think we're currently at about 40 ISA subsets, and hence would end up in 
names like x86-32001afff and x86-22001afef (the latter missing two subset 
compared to the former).

No, IMHO the non-vendor names should be non-descript, and either be 
numbers or characters, of which I would vote for characters, i.e. A, B, C.  
Obviously, as already mentioned here, the mapping of level to feature set 
needs to be described in documentation somewhere, and should be maintained 
by either glibc, glibc/gcc/llvm or psABI people.

I don't have many suggestions about vendor names, be them ISA-subset 
market names, or core names or company names.  I will just note that using 
such names has lead to an explosion of number of names without very good 
separation between them.  As long as we're only talking about -march= 
cmdline flags that may have been okay, if silly, but under this proposal 
every such name is potentially a subdirectory containing many shared 
libraries, and one that potentially needs to be searched at every library 
looking in the dynamic linker; so it's prudent to limit the size of this 
name set as well.

As for which subsets should or shouldn't be required in which level: I 
think the current suggestions all sound good, ultimately it's always going 
to be some compromise.


Ciao,
Michael.


Re: Problems with changing the type of an ssa name

2020-07-27 Thread Michael Matz
Hello,

On Sat, 25 Jul 2020, Gary Oblock via Gcc wrote:

>   if ( TYPE_P ( type) )
> {
>TREE_TYPE ( ssa_name) = TYPE_MAIN_VARIANT ( type);
>if ( ssa_defined_default_def_p ( ssa_name) )
>   {
>  // I guessing which I know is a terrible thing to do...
>  SET_SSA_NAME_VAR_OR_IDENTIFIER ( ssa_name, TYPE_MAIN_VARIANT ( 
> type));

As the macro name indicates this takes a VAR_DECL, or an IDENTIFIER_NODE.  
You put in a type, that won't work.

You also simply override the type of the SSA name, without also caring for 
the type of the underlying variable that is (potentially!) associated with 
the SSA name; if those two disagree then issues will arise, you have to 
replace either the variables type (not advisable!), or the associated 
variable, with either nothing or a new variable (of the appropriate type), 
or an mere identifier.  Generally you can't modify SSA names in place like 
this, i.e. as Richi says, create new SSA names, replace all occurences of 
one with the other.


Ciao,
Michael.


Re: [libgcc2.c] Implementation of __bswapsi2()

2020-11-12 Thread Michael Matz
Hello,

On Thu, 12 Nov 2020, Stefan Kanthak wrote:

> Does GCC generate (unoptimised) code there, similar to the following i386
> assembly, using 4 loads, 4 shifts, 2 ands plus 3 ors?

Try for yourself.  '-m32 -O2 -march=i386' is your friend.


Ciao,
Michael.

Spoiler: it's generating:

movl4(%esp), %eax
rolw$8, %ax
roll$16, %eax
rolw$8, %ax
ret



Re: [RFC] Increase libstdc++ line length to 100(?) columns

2020-11-30 Thread Michael Matz
Hello,

On Sun, 29 Nov 2020, Allan Sandfeld Jensen wrote:

> On Sonntag, 29. November 2020 18:38:15 CET Florian Weimer wrote:
> > * Allan Sandfeld Jensen:
> > > If you _do_ change it. I would suggest changing it to 120, which is next
> > > common step for a lot of C++ projects.
> > 
> > 120 can be problematic for a full HD screen in portrait mode.  Nine
> > pixels per character is not a lot (it's what VGA used), and you can't
> > have any window decoration.  With a good font and screen, it's doable.
> > But if the screen isn't quite sharp, then I think you wouldn't be able
> > to use portrait mode anymore.
> 
> Using a standard condensed monospace font of 9px, it has a width of 7px, 120 

A char width of 7px implies a cell width of at least 8px (so 960px for 120 
chars), more often of 9px.  With your cell width of 7px your characters 
will be max 6px, symmetric characters will be 5px, which is really small.

> char would take up 940px fitting two windows in horizontal mode and one in 
> vertical. 9px isn't fuzzy, and 8px variants are even narrower.

Well, and if you're fine with a 5px cell-width font then you can even fit 
216 chars on a line in HD portrait mode.  But Florian posed the width of 
9px, and I agree with him that it's not a lot (if my monitor weren't as 
big as it is I would need to use an even wider font for comfortable 
reading, as it is 9px width are exactly right for me, I'm not using 
portrait, though).  So, it's the question if the line lengths should or 
should not cater for this situation.

> Sure using square monospace fonts might not fit, but that is an unusual 
> configuration and easily worked around by living with a non-square monospace 
> font, or accepting occational line overflow. Remember nobody is suggesting 
> every line should be that long, just allowing it to allow better structural 
> indentation.

The occasional line overflow will automatically become the usual case with 
time, space allowed to be filled will eventually be filled.


Ciao,
Michael.


Re: [RFC] Increase libstdc++ line length to 100(?) columns

2020-11-30 Thread Michael Matz
Hello,

On Mon, 30 Nov 2020, Allan Sandfeld Jensen wrote:

> > > On Sonntag, 29. November 2020 18:38:15 CET Florian Weimer wrote:
> > > > * Allan Sandfeld Jensen:
> > > > > If you _do_ change it. I would suggest changing it to 120, which is
> > > > > next
> > > > > common step for a lot of C++ projects.
> > > > 
> > > > 120 can be problematic for a full HD screen in portrait mode.  Nine
> > > > pixels per character is not a lot (it's what VGA used), and you can't
> > > > have any window decoration.  With a good font and screen, it's doable.
> > > > But if the screen isn't quite sharp, then I think you wouldn't be able
> > > > to use portrait mode anymore.
> > > 
> > > Using a standard condensed monospace font of 9px, it has a width of 7px,
> > > 120
> > A char width of 7px implies a cell width of at least 8px (so 960px for 120
> > chars), more often of 9px.  With your cell width of 7px your characters
> > will be max 6px, symmetric characters will be 5px, which is really small.
> > 
> I was talking about the full cell width. I tested it before commenting, 
> measuring the width in pixels of a line of text.

Yes, and I was saying that a cell width of 7px is very narrow because the 
characters itself will only be using 5px or 6px max (to leave room for 
inter-character spacing in normal words).  You might be fine with such 
narrow characters, but not everyone will be.


Ciao,
Michael.


RE: [EXTERNAL] Re: DWARF Debug Info Relocations (.debug_str STRP references)

2020-12-03 Thread Michael Matz
Hello,

On Tue, 1 Dec 2020, Bill Messmer via Gcc wrote:

> Thank you very much for the help.  I was so fixated on the fact that the 
> .rela.debug* sections were there that I didn't pay attention to the 
> e_type in the ELF header.  Apparently, neither did the library that I 
> was using to parse the DWARF data.
> 
> Interestingly, I have seen other non-RedHat kernel debug images where 
> the kernel is ET_EXEC

vmlinux is always final-linked.

> and there are still .rela.debug* sections present 
> in the image.

Depending on configuration vmlinux is linked with --emit-relocs, which 
causes all relocations, no matter if applied or not, to also be emitted in 
a final link.  That has its uses, but it also confuses most tools, as 
they blindly apply relocations again, even if they aren't from loadable 
segments.

As not much other software uses --emit-relocs, and even in linux it's 
optional and non-default you see these confused tools occuring in the wild 
instead of being fixed.


Ciao,
Michael.

> Though the effect of applying those relocs has always 
> been nil (the data in the original .debug* section is already the same 
> as what the .rela.debug* section indicates to alter).
> 
> Sincerely,
> 
> Bill Messmer
> 
> -Original Message-
> From: Mark Wielaard  
> Sent: Monday, November 30, 2020 6:39 PM
> To: Bill Messmer 
> Cc: gcc@gcc.gnu.org
> Subject: Re: [EXTERNAL] Re: DWARF Debug Info Relocations (.debug_str STRP 
> references)
> 
> Hi Bill,
> 
> On Mon, Nov 30, 2020 at 10:22:34PM +, Bill Messmer wrote:
> 
> > I'm still a bit confused here.  And the reason I ask this is because I 
> > open this particular vmlinux image with an OSS ELF/DWARF library...  
> > which gives me the *WRONG* names for various DWARF DIEs.
> > I stepped through the library...  and the reason the names are wrong 
> > is because the library applies all of the relocations in 
> > .rela.debug_info to the sections it opens.  I thought there might be a 
> > bug in the library somewhere, so I started down looking at the DWARF 
> > data with standard Linux tools and in hex dumps...  and it seemed 
> > incorrect to my -- admittedly limited -- understanding...
> >
> > Yes, I am using llvm-dwarfdump to textualize the DWARF data
> > (llvm-dwarfdump-10 --verbose vmlinux) and I would assume(?) this 
> > applies the relocations as necessary.  And I am using readelf to get 
> > the section data (readelf -S vmlinux) and the relocation data (readelf 
> > -r vmlinuix); however, the hex data I show is just a flat hexdump of 
> > the image (hexdump -C vmlinux -n ... -s ...).
> 
> I traced your steps and did the same on a local vmlinux copy and got the same 
> results as you. That didn't make sense to me. Till I realized my original 
> assumption, that the vmlinux image, like kernel modules were partially linked 
> and so ET_REL files that still needed relocation applied, seemed wrong. The 
> vmlinux file isn't actually ET_REL, but it is ET_EXEC (see readelf -h 
> vmlinux). In which case other tools don't apply the relocations. And so your 
> observation is correct. The offset to the .debug_str table is right in the 
> .debug_info section, the relocations are meaningless. That is surprising.
> 
> > Either both that library and my understanding are incorrect, there is 
> > something wrong with that relocation data, or it flat isn't supposed 
> > to be applied...
> 
> It is the last thing, the aren't supposed to be applied because it is an 
> ET_EXEC file (which isn't supposed to have .rela.debug sections, but 
> apparently has).
> 
> > I also tried what you suggested "eu-strip -- reloc-debug-sections vmlinux 
> > -f stripped" and looked at the resulting output:
> > 
> > "readelf -S stripped" still shows the reloc sections:
> > 
> >   [65] .debug_info   PROGBITS   00059e50
> >0c458644     0 0 1
> >   [66] .rela.debug_info  RELA   0c4b2498
> >1288ae68  0018   I  7865 8
> > 
> > And that relocation is still there via "readelf -r stripped":
> 
> Which now also makes sense, because as the --help text says "only relevant 
> for ET_REL files".
> 
> So you did find a real mystery, for some reason the way the vmlinux image is 
> created does get relocations correctly applied, but they (or at least some) 
> are still left behind in the ELF image even though they are no longer needed 
> (and if you do try to use/apply them, you get wrong results). We should 
> probably find out if this happened during the upstream build or during distro 
> packaging.
> 
> Cheers,
> 
> Mark
> 


Re: 'walk_stmt_load_store_addr_ops' for non-'gimple_assign_single_p (stmt)'

2021-03-16 Thread Michael Matz
Hello,

On Tue, 16 Mar 2021, Thomas Schwinge wrote:

> >>Indeed, given (Fortran) 'zzz = 1', we produce GIMPLE:
> >>
> >>gimple_assign 
> >>
> >>..., and calling 'walk_stmt_load_store_addr_ops' on that, I see, as
> >>expected, the 'visit_store' callback invoked, with 'rhs' and 'arg':
> >>''.
> >>
> >>However, given (Fortran) 'zzz = r + r2', we produce GIMPLE:
> >>
> >>gimple_assign 

But that's pre-ssa form.  After writing into SSA 'zzz' will be replaced by 
an SSA name, and the actual store into 'zzz' will happen in a store 
instruction.

> >>..., and calling 'walk_stmt_load_store_addr_ops' on that, I see,
> >>unexpectedly, no callback at all invoked: neither 'visit_load', nor
> >>'visit_store' (nor 'visit_address', obviously).
> >
> > The variables involved are registers. You only get called on memory 
> > operands.
> 
> How would I have told that from the 'walk_stmt_load_store_addr_ops'
> function description?  (How to improve that one "to reflect relatity"?)
> 
> But 'zzz' surely is the same in 'zzz = 1' vs. 'zzz = r + r2' -- for the
> former I *do* see the 'visit_store' callback invoked, for the latter I
> don't?

The walk_gimple functions are intended to be used on the SSA form of 
gimple (i.e. the one that it is in most of the time).  And in that it's 
not the case that 'zzz = 1' and 'zzz = r + r2' are similar.  The former 
can have memory as the lhs (that includes static variables, or indirection 
through pointers), the latter can not.  The lhs of a binary statement is 
always an SSA name.  A write to an SSA name is not a store, which is why 
it's not walked for walk_stmt_load_store_addr_ops.

Maybe it helps to look at simple C examples:

% cat x.c
int zzz;
void foo(void) { zzz = 1; }
void bar(int i) { zzz = i + 1; }
% gcc -c x.c -fdump-tree-ssa-vops
% cat x.c.*ssa
foo ()
{
   :
  # .MEM_2 = VDEF <.MEM_1(D)>
  zzz = 1;
  # VUSE <.MEM_2>
  return;
}

bar (int i)
{
  int _1;

   :
  _1 = i_2(D) + 1;
  # .MEM_4 = VDEF <.MEM_3(D)>
  zzz = _1;
  # VUSE <.MEM_4>
  return;

}

See how the instruction writing to zzz (a global, and hence memory) is 
going through a temporary for the addition in bar?  This will always be 
the case when the expression is arithmetic.

In SSA form gimple only very few instruction types can be stores, namely 
calls and stores like above (where the RHS is an unary tree).  If you want 
to capture writes into SSA names as well (which are more appropriately 
thought of as 'setting the ssa name' or 'associating the ssa name with the 
rhs value') you need the per-operand callback indeed.  But that depends on 
what you actually want to do.


Ciao,
Michael.


Re: 'walk_stmt_load_store_addr_ops' for non-'gimple_assign_single_p (stmt)'

2021-03-17 Thread Michael Matz
Hello,

On Wed, 17 Mar 2021, Richard Biener wrote:

> > The walk_gimple functions are intended to be used on the SSA form of 
> > gimple (i.e. the one that it is in most of the time).
> 
> Actually they are fine to use pre-SSA.

Structurally, sure.

> They just even pre-SSA distinguish between registers and memory.  

And that's of course the thing.

I probably should have used a different term, but used "SSA rewriting" to 
name the point where this distinction really starts to matter.  Before it 
a binary gimple statement could conceivably contain a non-register in the 
LHS (perhaps not right now, but there's nothing that would inherently 
break with that), and then would include a store that 
walk_stmt_load_store_addr_ops would "miss".

So, yeah, using SSA as name for that was sloppy, it's gimple itself that 
has the invariant of only registers in binary statements.


Ciao,
Michael.

> That's what gimplification honors as well, in 'zzz = r + r2' all 
> operands are registers, otherwise GIMPLE requires loads and stores split 
> into separate stmts not doing any computation.
> 
> It's just less obivous in the dumps (compared to SSA name dumping).
> 
> Richard.
> 
> >  And in that it's
> > not the case that 'zzz = 1' and 'zzz = r + r2' are similar.  The former
> > can have memory as the lhs (that includes static variables, or indirection
> > through pointers), the latter can not.  The lhs of a binary statement is
> > always an SSA name.  A write to an SSA name is not a store, which is why
> > it's not walked for walk_stmt_load_store_addr_ops.
> >
> > Maybe it helps to look at simple C examples:
> >
> > % cat x.c
> > int zzz;
> > void foo(void) { zzz = 1; }
> > void bar(int i) { zzz = i + 1; }
> > % gcc -c x.c -fdump-tree-ssa-vops
> > % cat x.c.*ssa
> > foo ()
> > {
> >:
> >   # .MEM_2 = VDEF <.MEM_1(D)>
> >   zzz = 1;
> >   # VUSE <.MEM_2>
> >   return;
> > }
> >
> > bar (int i)
> > {
> >   int _1;
> >
> >:
> >   _1 = i_2(D) + 1;
> >   # .MEM_4 = VDEF <.MEM_3(D)>
> >   zzz = _1;
> >   # VUSE <.MEM_4>
> >   return;
> >
> > }
> >
> > See how the instruction writing to zzz (a global, and hence memory) is
> > going through a temporary for the addition in bar?  This will always be
> > the case when the expression is arithmetic.
> >
> > In SSA form gimple only very few instruction types can be stores, namely
> > calls and stores like above (where the RHS is an unary tree).  If you want
> > to capture writes into SSA names as well (which are more appropriately
> > thought of as 'setting the ssa name' or 'associating the ssa name with the
> > rhs value') you need the per-operand callback indeed.  But that depends on
> > what you actually want to do.
> >
> >
> > Ciao,
> > Michael.
> 


Re: dwarf DW_AT_decl_name: system headers vs source files?

2015-06-22 Thread Michael Matz
Hi,

On Sat, 20 Jun 2015, DJ Delorie wrote:

> Note that the DW_AT_decl_file refers to "dj.h" and not "dj.c".  If you 
> remove the "3" from the '# 1 "dj.h" 1 3' line, the DW_AT_decl_file 
> instead refers to "dj.c".  It's been this way for many releases.
> 
> Is this intentional?

I think it came in with r137873, aka 
  https://gcc.gnu.org/ml/gcc-patches/2008-07/msg01061.html

> If so, what is the rationalization for it?

I can't really grok the situation Rafael wanted to improve.


Ciao,
Michael.


Re: Repository for the conversion machinery

2015-09-17 Thread Michael Matz
Hi,

On Thu, 17 Sep 2015, Eric S. Raymond wrote:

> All I can say is every time I've tried this it's been a nightmare, and 
> when you say "apart from CVS imported revisions" my hair stands on end.  
> And the GCC history is two and a half times the size of the next largest 
> repo I've tried this on.
> 
> If you want to try writing the program to do this data analysis, go 
> right ahead.

A start would be:
svn diff -c50004 | sed -ne \
'/^+++.*ChangeLog/,/^Index/s/^+.*[0-9] *\([^0-9]*[(<].*@.*[)>]\).*$/\1/p'

Sometimes (e.g. for some CVS imported commits) the commit to ChangeLog 
files was done in a different revision than the changes themself (it 
wasn't a very good CVS to subversion conversion), so for that the above 
doesn't find the address (it will be the revision before or after that 
touches ChangeLog, but no other files).  But it's fairly reasonable for 
newer revisions.  Might need adjustments for even different date or email 
address formats.  Feeding it all revisions when you have extracted them 
already should give a resonable estimate for who the real author was.


Ciao,
Michael.


Re: Repository for the conversion machinery

2015-09-17 Thread Michael Matz
Hi,

On Thu, 17 Sep 2015, Richard Earnshaw wrote:

> None of this has any chance of working for any commits to the pre-egcs 
> sources.  In those days there was no version control on the ChangeLog 
> file.
> 
> My feeling is we could spend months ratholing on this particular problem 
> rather than making real progress on moving forward.  If it will help to 
> move things forward, I'm happy to accept that for the purposes of 
> conversion we should just use 'committer id' and drop any attempt to 
> reconstruct 'author id' for each patch.

I don't see why the problems with some ranges of commits should prevent us 
doing better for those ranges where there are no such problems.  I thought 
reposurgeon (I haven't looked at it) had the ability to postprocess 
commits and dig out additional info, and assumed it would be more or less 
trivial to add such sed-greping.


Ciao,
Michael.


Re: C++ order of evaluation of operands, arguments

2015-11-25 Thread Michael Matz
Hi,

On Tue, 24 Nov 2015, Richard Biener wrote:

> On Tue, Nov 24, 2015 at 12:01 AM, Jason Merrill  wrote:
> > There's a proposal working through the C++ committee to define the order of
> > evaluation of subexpressions that previously had unspecified ordering:
> >
> > http://www.open-std.org/Jtc1/sc22/wg21/docs/papers/2015/p0145r0.pdf
> >
> > I agree with much of this, but was concerned about the proposal to define
> > order of evaluation of function arguments as left-to-right, since GCC does
> > right-to-left on PUSH_ARGS_REVERSED targets, including x86_64.

Actually the most natural order for an (normal, stack-passing) 
implementation that supports varargs is right-to-left, so the proposal has 
it exactly backwards.  (Reason being that named arguments then have 
constant offsets from stack pointer).

right-to-left is also the better order for all ABIs that pass (at least 
some) arguments on stack but give left arguments smaller offsets from 
top-of-stack (as most ABIs do).

> The reason we have PUSH_ARGS_REVERSED is to get more optimal
> code generation for calls reducing lifetime of argument computation
> results.  Like when you have
> 
>  foo (bar(), baz())
> 
> then generate
> 
>   reg = bar();
>   push reg;
>   reg = baz();
>   push reg;
>   foo ();
> 
> instead of
> 
>   reg = baz();
>   spill reg;
>   reg = bar();
>   push reg;
>   push spilled reg;
>   foo ();

Your examples are switched, but the idea is the correct one (result of bar 
needs to be pushed last in the x86 ABI).

> I would like to get rid of PUSH_ARGS_REVERSED (as used in 
> gimplification) because it's also one source of very early IL 
> differences across archs.

Actually most other targets should also act as if PUSH_ARGS_REVERSED, not 
only x86_64.


Ciao,
Michael.


Re: C++ order of evaluation of operands, arguments

2015-11-26 Thread Michael Matz
Hi,

On Thu, 26 Nov 2015, David Brown wrote:

> That is all true - but if you have to pick an order that makes sense to 
> users, especially of functions that are not varargs (i.e., most 
> functions), then left-to-right is the only logical, natural order - at 
> least for those of use who use left-to-right languages.

Exactly, what's feeling humanly natural here is cultural (and I'd argue 
there's a fairly large percentage of the population who are neither ltr 
not rtl, but rather top-to-bottom), and therefore ...

> really going to be a big issue?  One should not limit the language just 
> because of a tiny efficiency issue with rarely-used cases.

... this is the only objectively measurable dimension, and hence _that_ is 
exactly how the language should be limited in such cases, if it must be 
limited at all (which seems indeed a bit dubious after 30 years as the 
proposal mentions itself) [1].


Ciao,
Michael.

[1] I'm trying (but failing) to not even start the argument that if the 
goal is to make C++ "make more sense to users" that this already failed in 
'98, got worse with later revisions, and fixating evaluation order is 
helping just so slightly (if at all), that it all seems a bit ridiculous 
to me.


Re: ivopts vs. garbage collection

2016-01-11 Thread Michael Matz
Hi,

On Fri, 8 Jan 2016, Richard Biener wrote:

> > The only solution here is for ivopts to keep a pointer to the array, 
> > not a pointer to some location near, but outside of the array.
> 
> Yes, the solution is to make IVOPTs not do this (eventually controlled 
> by a parameter because clearly it thinks doing this is beneficial 
> cost-wise).

Well, that's a hack.  A solution is to design something that works 
generally for garbage collected languages with such requirements instead 
of arbitrarily limiting transformations here and there.  It could be 
something like the notion of derived pointers, where the base pointer 
needs to stay alive as long as the derived pointers are.  All potential GC 
points where a derived pointer is alive also needs to have the base 
pointer be alive (they could explicitely add uses of the base pointers, or 
alternatively anthing computing liveness needs to deal with this).  For 
normal transformation that don't deal with explicit liveness sets (i.e. 
most of our SSA transforms) it's enough if the instruction creating the 
derived pointer from the base can't be looked through, i.e. is an 
optimization barrier.


Ciao,
Michael.


Re: ivopts vs. garbage collection

2016-01-11 Thread Michael Matz
Hi,

On Mon, 11 Jan 2016, Ian Lance Taylor wrote:

> > Well, that's a hack.  A solution is to design something that works 
> > generally for garbage collected languages with such requirements 
> > instead of arbitrarily limiting transformations here and there.  It 
> > could be something like the notion of derived pointers, where the base 
> > pointer needs to stay alive as long as the derived pointers are.  All 
> > potential GC points where a derived pointer is alive also needs to 
> > have the base pointer be alive (they could explicitely add uses of the 
> > base pointers, or alternatively anthing computing liveness needs to 
> > deal with this).  For normal transformation that don't deal with 
> > explicit liveness sets (i.e. most of our SSA transforms) it's enough 
> > if the instruction creating the derived pointer from the base can't be 
> > looked through, i.e. is an optimization barrier.
> 
> What do you suggest we do for GCC 6?

Realistically?  The hack of course.

> Your suggestion of every derived pointer keeping the base pointer alive 
> sounds too strong to me; it certainly sounds too strong for Go. For Go 
> it should be sufficient to ensure that every derived pointer points 
> within the bounds of the object.

Okay, that's a certain type of GC.  Others might need exact offset-zero 
base pointers.

> It's also not obvious to me that making a pointer transformation into
> an optimization barrier would be a win overall.

It will almost always be a pessimization (the additional live value needs 
at least a place on the stack).

> For something like
> ivopts it seems better to simply not introduce the pointer
> transformation--to apply ivopts only to non-pointers when GC matters

Of course that deals only with ivopts.  In practice that might be enough, 
even for a long time, but it's not really a full solution, there are other 
transformations on pointers (e.g. the vectorizer fiddling with alignment), 
some of them creating out of object pointers (e.g. chopping off the lowest 
few bits).

Another problem is to define "when GC matters".  With LTO you can't rely 
on anything from the frontend, so it needs to be encoded in the IL, or at 
the very least in a per-function item, with corresponding avoidance of 
inlining of GC into non-GC aware functions.

> (I'm still puzzled by the fact that Java has apparently not encountered 
> this problem in all these years.)

Probably it just so happened that the base pointer for some derived 
pointer was lying around in some memory location.  It's not very likely 
that the only reference is in a register, it can happen only shortly after 
allocating the new block, but before it's actually stored into some other 
structure (or freed, i.e. when it's only temporary, but then the base 
pointer would be live as well), perhaps this pattern doesn't happen very 
often in java code.  Obviously it does in Go.  Perhaps we can limit the 
ivopts hack also to pointers that are problematic.  Only if the base 
pointer comes from a new allocation (is not loaded from memory) and isn't 
stored into memory before use do we need to avoid manipulating it too 
much.


Ciao,
Michael.


Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct

2016-02-11 Thread Michael Matz
Hi,

On Thu, 11 Feb 2016, Jonathan Wakely wrote:

> On 11 February 2016 at 12:40, Matthijs van Duin wrote:
> > You never define "POD for the purposes of layout", and I can only
> > interpret it as being equivalent to "standard-layout".
> 
> As Richard pointed out, it's defined in the C++ ABI.

Which is C++y as well (and hence doesn't in itself solve the C/C++ 
compatibility we should strive for in the ABI).  I'll concur with Matthijs 
and think that trivially copyable is the correct distinction for passing 
without registers (in addition of it being clearer than a strangly defined 
concept of "POD-but-not-quite-POD").  Do you think different?  Are there 
non-trivially copyable examples that we'd wish to pass without registers 
as well?


Ciao,
Michael.


Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct

2016-02-11 Thread Michael Matz
Hi,

On Thu, 11 Feb 2016, H.J. Lu wrote:

> Any suggestions on new wording, something like
> 
> 1. "class type".  A class type is a structure, union or C++ class.
> 2. "empty type".  An empty type is a type where it and all of its
> subobjects are of class or array type.
> 
> Does it cover
> 
> struct A { };
> struct B { };
> struct C : A, B { };

I think this is covered by the above points.  But without further 
restriction I don't see how e.g. the above example with ctors and dtors 
would be ruled out (except if you regard a ctor as a sub-object).  For 
that you seem to need trivially-copyable, or that POD-ly thing.  So, 
perhaps simply amend (2) "... is a trivially copyable type where it ...".


Ciao,
Michael.


Re: gnu-gabi group

2016-02-12 Thread Michael Matz
Hi,

On Thu, 11 Feb 2016, Mark Wielaard wrote:

> If we could ask overseers to setup a new group/list gnu-gabi on 
> sourceware where binutils, gcc, gdb, glibc and other interested parties 
> could join to maintain these extensions and ask for clarifications that 
> would be wonderful. I am not a big fan of google groups mailinglists, 
> they seem to make it hard to subscribe and don't have easy to access 
> archives. Having a local gnu-gabi group on sourceware.org would be 
> better IMHO.

Agreed.


Ciao,
Michael.


Re: gengtype: conditional GTY ? (to add before GCC 6 release)

2016-02-15 Thread Michael Matz
Hi,

On Fri, 12 Feb 2016, Richard Biener wrote:

> >What do you think about refactoring iterators in GCC 7?
> 
> I think refactoring towards STL style iterators would be welcome.  It 
> may be different for the actual instances though.

Oh God, please, for the live of all kittens, no.  If anything, implement 
and use a range idiom like in D.


Ciao,
Michael.


Re: gengtype: conditional GTY ? (to add before GCC 6 release)

2016-02-16 Thread Michael Matz
Hi,
On Tue, 16 Feb 2016, Mikhail Maltsev wrote:

> > If anything, implement and use a range idiom like in D.
> > 
> Could you please elaborate on that?

Motivation:
  http://accu.org/content/conf2009/AndreiAlexandrescu_iterators-must-go.pdf
Detailed intro of the concept:
  http://www.informit.com/articles/article.aspx?p=1407357
A bit discussion and smaller overview:
  
http://www.digitalmars.com/d/archives/digitalmars/D/Ranges_and_versus_iterators_107975.html

I believe ranges are implementable in C++, I'm not sure if efficiently 
already in c++98.  But whatever we implement as iteration scheme, should 
IMHO look more like ranges than explicit iterators (hell, I even much 
prefer the FOR_EACH macros over iterators).


Ciao,
Michael.


Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct

2016-02-18 Thread Michael Matz
Hi,

On Tue, 16 Feb 2016, H.J. Lu wrote:

> Here is the new definition:
> 
> An empty type is a type where it and all of its subobjects (recursively) 
> are of class, structure, union, or array type.  No memory slot nor 
> register should be used to pass or return an object of empty type.

The trivially copyable is gone again.  Why is it not necessary?


Ciao,
Michael.


Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct

2016-02-19 Thread Michael Matz
Hi,

On Thu, 18 Feb 2016, Richard Smith wrote:

> >> An empty type is a type where it and all of its subobjects 
> >> (recursively) are of class, structure, union, or array type.  No 
> >> memory slot nor register should be used to pass or return an object 
> >> of empty type.
> >
> > The trivially copyable is gone again.  Why is it not necessary?
> 
> The C++ ABI doesn't defer to the C psABI for types that aren't 
> trivially-copyable. See 
> http://mentorembedded.github.io/cxx-abi/abi.html#normal-call

Hmm, yes, but we don't want to define something for only C and C++, but 
language independend (so far as possible).  And given only the above 
language I think this type:

struct S {
  S() {something();}
};

would be an empty type, and that's not what we want.  "Trivially copyable" 
is a reasonably common abstraction (if in doubt we could even define it in 
the ABI), and captures the idea that we need well (namely that a bit-copy 
is enough).


Ciao,
Michael.


Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct

2016-02-19 Thread Michael Matz
Hi,

On Thu, 18 Feb 2016, H.J. Lu wrote:

> >> An empty type is a type where it and all of its subobjects 
> >> (recursively) are of class, structure, union, or array type.  No 
> >> memory slot nor register should be used to pass or return an object 
> >> of empty type.
> >
> > The trivially copyable is gone again.  Why is it not necessary?
> 
> I think we want to cover
> 
> struct
> {
>   unsigned int : 8;
> };
> 
> but not
> 
> struct
> {
>   unsigned int  i :8;
> };
> 
> " trivially copyable" applies to both.

Correct, but I'm not suggesting to use only the trivially copyable 
definition, I want to have it added as condition for not requiring a 
register or memory slot.  I.e. "an object of empty type that's trivially 
copyable".


Ciao,
Michael.


Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct

2016-02-22 Thread Michael Matz
Hi,

On Fri, 19 Feb 2016, Richard Smith wrote:

> >> > The trivially copyable is gone again.  Why is it not necessary?
> >>
> >> The C++ ABI doesn't defer to the C psABI for types that aren't
> >> trivially-copyable. See
> >> http://mentorembedded.github.io/cxx-abi/abi.html#normal-call
> >
> > Hmm, yes, but we don't want to define something for only C and C++, but
> > language independend (so far as possible).  And given only the above
> > language I think this type:
> >
> > struct S {
> >   S() {something();}
> > };
> >
> > would be an empty type, and that's not what we want.
> 
> Yes it is. Did you mean to give S a copy constructor, copy assignment
> operator, or destructor instead?

Er, yes, I did mean to :-)


Ciao,
Michael.


Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct

2016-02-22 Thread Michael Matz
Hi,

On Sat, 20 Feb 2016, Richard Smith wrote:

> > An empty type is a type where it and all of its subobjects 
> > (recursively) are of class, structure, union, or array type.
> >
> > doesn't cover "trivially-copyable".
> 
> That's correct. Whether a type is trivially copyable is unrelated to 
> whether it is empty.

I would still feel more comfortable to include the restriction to 
trivially copyable types, not in the part of definition of empty type, of 
course, but as part of the restrictions of when a type can be passed in no 
registers.  Basically to clarify the intent in the psABI if there's any 
doubt.  I.e. like so:

---
An empty type is a type where it and all of its subobjects (recursively)
are of class, structure, union, or array type.  No memory slot nor 
register should be used to pass or return an object of empty type that's 
trivially copyable.
---

(With possibly a self-sufficient definition of trivially copyable, that's 
language agnostic)


Ciao,
Michael.


Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct

2016-02-23 Thread Michael Matz
Hi,

On Tue, 23 Feb 2016, H.J. Lu wrote:

> > ---
> > An empty type is a type where it and all of its subobjects (recursively)
> > are of class, structure, union, or array type.  No memory slot nor
> > register should be used to pass or return an object of empty type that's
> > trivially copyable.
> > ---
> >
> > (With possibly a self-sufficient definition of trivially copyable, that's
> > language agnostic)
> >
> 
> Do you have an example in which an empty type defined above isn't
> "trivially copyable"?

The ones we've always talked about: empty C++ types with non-trivial copy 
ctors or dtors.  Yes, I'm aware of the fact that the Itanium C++ ABI 
doesn't invoke the underlying psABI for these types (or better, it 
specifies them to be passed by reference).  But first, there are other 
languages that have such constructs, but don't necessarily have an 
written-down ABI (OO fortran anyone? Ada?).  Second, there may be other 
C++ ABIs that don't contain such language (which would be an ommission, 
but well, happens).  And third even for our C++ needs (based on the 
Itanium ABI) I feel it's simply more clear and self-sufficient to be 
explicit about this restriction.

It's not that we have any sort of upper bound on the number of words we're 
allowed to use in the psABI, and I also don't think anything is gained by 
being as terse as possible.  Succinct, sure, but not as arcane as we can 
make it while still being correct.

So, question back: can you imaging any cases where the "restriction" to 
trivially copyable would _not_ do the thing we want?


Ciao,
Michael.


Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct

2016-02-23 Thread Michael Matz
Hi,

On Tue, 23 Feb 2016, H.J. Lu wrote:

> I thought
> 
> ---
> An empty type is a type where it and all of its subobjects (recursively)
> are of class, structure, union, or array type.
> ---
> 
> excluded
> 
> struct empty
> {
> empty () = default;
> };


Why would that be excluded?  There are no subobjects, hence all of them 
are of class, structure, union or array type, hence this is an empty type. 
(And that's good, it indeed looks quite empty to me).  Even if you would 
add a non-trivial copy ctor, making this thing not trivially copyable 
anymore, it would still be empty.  Hence, given your proposed language in 
the psABI, without reference to any other ABI (in particular not to the 
Itanium C++ ABI), you would then need to pass it without registers.  That 
can't be done, and that's exactly why I find that wording incomplete.  It 
needs implicit references to other languages ABIs to work.

> Adding "trivially copyable" extends, not limiting, the scope of
> empty type.

Huh?  Adding (as in ANDing, not ORing) anything to a positive condition 
necessarily restricts it.  But also note, that my wording does _not_ add 
the restriction to the definition of "empty type", but rather only to when 
they can be passed/returned by nothing.


Ciao,
Michael.


Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct

2016-02-29 Thread Michael Matz
Hi,

On Fri, 26 Feb 2016, H.J. Lu wrote:

> >> It is clear to me now.  Let's go with
> >>
> >> ---
> >> An empty type is a type where it and all of its subobjects (recursively)
> >> are of class, structure, union, or array type.  No memory slot nor
> >> register should be used to pass or return an object of empty type that's
> >> trivially copyable.
> >> ---
> >>
> >> Any comments?
> >
> > Yes. "trivially copyable" is the wrong restriction. See
> > http://mentorembedded.github.io/cxx-abi/abi.html#normal-call for the
> > actual Itanium C++ ABI rule.
> 
> I looked it up.  " trivially copyable" is covered by C++ ABI.
> 
> > It's also completely nonsensical to mention this as a special case in
> > relation to empty types. The special case applies to all function
> > parameters, irrespective of whether they're empty -- this rule applies
> > *long* before you consider whether the type is empty. For instance, in
> > the x86-64 psABI, this should go right at the start of section 2.2.3
> > ("Parameter Passing and Returning Values"). But please don't add it
> > there -- it's completely redundant, as section 5.1 already says that
> > the Itanium C++ ABI is used, so it's not necessary to duplicate rules
> > from there.
> 
> Here is the final wording:
> 
> An empty type is a type where it and all of its subobjects (recursively)
> are of class, structure, union, or array type.  No memory slot nor register
> should be used to pass or return an object of empty type.
> 
> Footnote: Array of empty type can only passed by reference in C and C++.
> 
> Michael, can you put it in x86-64 psABI?  I will update i386 and IA MCU
> psABIs.

Not without further discussion, sorry.  If we want to invoke the C++ ABI 
to not have to worry about trivially copyable (and yeah, of course it  
would be placed better at the beginning for the whole argument passing 
section), then we need to also look at its other rules.  In particular at:


3.1.3 Empty Parameters 
* Empty classes will be passed no differently from ordinary classes. If 
  passed in registers the NaT bit must not be set on all registers that 
  make up the class. 
* The contents of the single byte parameter slot are unspecified, and the 
  callee may not depend on any particular value. On Itanium, the 
  associated NaT bit must not be set if the parameter slot is associated 
  with a register.


So, in C++, empty classes will be passed as above, not as no registers.  
The new rule would create a conflict between "no registers/slots" and "the 
single byte parameter slot".

Have you thought about this?

(I'll also note that putting in this rule might interact with "2.2 POD 
Data Types / If the base ABI does not specify rules for empty classes, 
then an empty class has size and alignment 1." because we now do specify 
rules for empty classes, though not for size and alignment explicitely).

Also this insistence that all of "trivially copyable" is 
already quite nicely specified in the C++ ABI is still not really relevant 
because C++ _is not the only language out there_.  I'm not sure how often 
I have to repeat this until people get it.


Ciao,
Michael.


Re: [isocpp-parallel] Proposal for new memory_order_consume definition

2016-02-29 Thread Michael Matz
Hi,

On Sun, 28 Feb 2016, Linus Torvalds wrote:

> > So the kernel obviously is already using its own C dialect, that is 
> > pretty far from standard C. All these options also have a negative 
> > impact on the performance of the generated code.
> 
> They really don't.

They do.

> Have you ever seen code that cared about signed integer overflow?
> 
> Yeah, getting it right can make the compiler generate an extra ALU
> instruction once in a blue moon, but trust me - you'll never notice.
> You *will* notice when you suddenly have a crash or a security issue
> due to bad code generation, though.

No, that's not at all the important piece of making signed overflow 
undefined.  The important part is with induction variables controlling 
loops:

  short i;  for (i = start; i < end; i++)
vs.
  unsigned short u; for (u = start; u < end; u++)

For the former you're allowed to assume that the loop will terminate, and 
that its iteration count is easily computable.  For the latter you get 
modulo arithmetic and (if start/end are of larger type than u, say 'int') 
it might not even terminate at all.  That has direct consequences of 
vectorizability of such loops (or profitability of such transformation) 
and hence quite important performance implications in practice.  Not for 
the kernel of course.  Now we can endlessly debate how (non)practical it 
is to write HPC code in C or C++, but there we are.

> The fact is, undefined compiler behavior is never a good idea. Not for 
> serious projects.

Perhaps if these undefinednesses wouldn't have been put into the standard, 
people wouldn't have written HPC code, and if that were so the world would 
be a nicer place sometimes (certainly for the compiler).  Alas, it isn't.


Ciao,
Michael.


Re: [isocpp-parallel] Proposal for new memory_order_consume definition

2016-02-29 Thread Michael Matz
Hi,

On Sat, 27 Feb 2016, Paul E. McKenney wrote:

> But we do already have something very similar with signed integer
> overflow.  If the compiler can see a way to generate faster code that
> does not handle the overflow case, then the semantics suddenly change
> from twos-complement arithmetic to something very strange.  The standard
> does not specify all the ways that the implementation might deduce that
> faster code can be generated by ignoring the overflow case, it instead
> simply says that signed integer overflow invoked undefined behavior.
> 
> And if that is a problem, you use unsigned integers instead of signed
> integers.
> 
> So it seems that we should be able to do something very similar here.

For this case the important pice of information to convey one or the other 
meaning in source code is the _type_ of involved entities, not annotations 
on the operations.  signed type -> undefined overflow, unsigned type -> 
modulo arithmetic; easy, and it nicely carries automatically through 
operation chains (and pointers) without any annotations.

I feel much of the complexity in the memory order specifications, also 
with your recent (much better) wording to explain dependency chains, would 
be much easier if the 'carries-dependency' would be encoded into the types 
of operands.  For purpose of example, let's call the marker "blaeh" (not 
atomic to not confuse with existing use :) ):

int foo;
blaeh int global;
int *somep;
blae int *blaehp;
f () {
  blaehp = &foo;  // might be okay, adds restrictions on accesses through 
  // blaehp, but not through 'foo' directly
  blaehp = &global;
  if (somep == blaehp)
{
  /* Even though the value is equal ... */
  ... *blaehp ... /* ... a compiler can't rewrite this into *somep */
}
}

A "carries-dependency" on some operation (e.g. a call) would be added by 
using a properly typed pointer at those arguments (or return type) where 
it matters.  You can't give a blaeh pointer to something only accepting 
non-blaeh pointers (without cast).

Pointer addition and similar transformations involving a blaeh pointer and 
some integer would still give a blaeh pointer, and hence by default also 
solve the problem of cancellations.

Such marking via types would not solve all problems in an optimal way if 
you had two overlapping but independend dependency chains (all of them 
would collapse to one chain and hence made dependend, which still is 
conservatively correct).

OTOH introducing new type qualifiers is a much larger undertaking, so I 
can understand one wants to avoid this.  I think it'd ultimately be 
clearer, though.


Ciao,
Michael.


Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct

2016-03-01 Thread Michael Matz
Hi,

On Mon, 29 Feb 2016, Jason Merrill wrote:

> > Also this insistence that all of "trivially copyable" is already quite 
> > nicely specified in the C++ ABI is still not really relevant because 
> > C++ _is not the only language out there_.  I'm not sure how often I 
> > have to repeat this until people get it.
> 
> Other language ABIs can handle language specific calling conventions as 
> appropriate for them.  The psABI can only talk about things that are in 
> its domain.

Naturally.  How far to follow that road, though?  Remove the word "class" 
from the description of empty types again?  Why is that in-domain and the 
notion of trivially copyable isn't?


Ciao,
Michael.


Re: Importance of transformations that turn data dependencies into control dependencies?

2016-03-01 Thread Michael Matz
Hi,

On Tue, 1 Mar 2016, Richard Biener wrote:

> > What about the example I gave above?  Is it unrealistic for compilers 
> > do ever do something like this, or is it just unlikely to gain much 
> > performance, or is it just that GCC does not do this today?
> 
> GCC does not do this today with the exception of value-profiling.  GCC 
> in other cases does not establish equivalences but only relations (a < 
> b, etc.) that are not a problem as far as I can see because those do not 
> allow to change expressions using a to use b.

Made up example using relations:

int32 a, b;
a = (b >> 31) & 1;

-->

if (b < 0)
  a = 1;
else
  a = 0;

data-dep to control-dep and only relations :)  (I think this is taken care 
of by Pauls wording, ignoring the fact that these aren't pointers anyway 
and hence don't carry a dependency through them, only onto them at max)


Ciao,
Michael.


Re: [gimplefe] [gsoc16] Gimple Front End Project

2016-03-14 Thread Michael Matz
Hi,

On Thu, 10 Mar 2016, Richard Biener wrote:

> Then I'd like to be able to re-construct SSA without jumping through 
> hoops (usually you can get close but if you require copies propagated in 
> a special way you are basically lost for example).
> 
> Thus my proposal to make the GSoC student attack the unit-testing 
> problem by doing modifications to the pass manager and "extending" an 
> existing frontend (C for simplicity).

I think it's wrong to try to shoehorn the gimple FE into the C FE.  C is 
fundamentally different from gimple and you'd have to sprinkle 
gimple_dialect_p() all over the place, and maintaining that while 
developing future C improvements will turn out to be much work.  Some 
differences of C and gimple:

* C has recursive expressions, gimple is n-op stmts, no expressions at all
* C has type promotions, gimple is explicit
* C has all other kinds of automatic conversion (e.g. pointer decay)
* C has scopes, gimple doesn't (well, global and local only), i.e. symbol 
  lookup is much more complicated
* C doesn't have exceptions
* C doesn't have class types, gimple has
* C doesn't have SSA (yes, I'm aware of your suggestions for that)
* C doesn't have self-referential types
* C FE generates GENERIC, not GIMPLE (so you'd need to go through the
  gimplifier and again would feed gimple directly into the passes)

I really don't think changing the C FE to accept gimple is a useful way 
forward.

I agree with others that the gimple FE is rather more similar to a textual 
form of LTO bytecode.  In difference to the byte code its syntax has to be 
defined and kept stable, and yes, it's sensible to get inspiration from C 
syntax.

> It's true that in the ideal world a "GIMPLE frontend" has to work 
> differently but then to get a 100% GIMPLE frontend you indeed arrive at 
> what LTO does and then I agree we should attack that from the LTO side 
> and not develop another thing besides it.  But this is a _much_ larger 
> task and I don't see anyone finishing that.

So, you think the advantage of starting with the C FE would be faster 
return on investment?  Even if it were true (and I'm not sure it is), what 
would it help if it ultimately wouldn't be acceptable for inclusion in GCC 
proper?

> Also keep in mind that the LTO side will likely simplify a lot by 
> stripping out the ability to stream things that are only required for 
> debug information (I hopefully will finish LTO early debug for GCC 7).

Btw. I'm not suggesting to fit the gimple FE into the LTO infrastructure, 
that's also leading to a mess IMO.  But I do think creating a gimple FE 
from scratch is easier than fitting it into the C FE.  Taking some 
infrastructure from the LTO frontend (namely all the tree/gimple building 
routines) and some from the C frontend (namely parts of the parsers, at 
least for types).

> So - can we please somehow focus on the original question about a GSoC 
> project around the "GIMPLE frontend"?  Of course you now can take Davids 
> promise of spending some development time on this into consideration.  
> Try drafting something that can be reasonably accomplished with giving 
> us something to actually use in the GCC 7 timeframe.

Okay, here's a rough plan:

1) take the LTO frontend, remove all bytecode 
   routines, partitioning stuff, lto symtab merging stuff, tree merging 
   stuff, WPA stuff; i.e. retain only the few pieces needed for creating 
   tree and those that every frontend needs to provide.  Now that 
   frontend should be able to generate an empty compilation unit.
2) implement two stub functions: parsetype and lookuptype, the first 
   always generates a type named 'int32' with the obvious INTEGER_TYPE 
   tree, the second always gives back that type.  Now hack on the frontend
   long enough that this simple input can be parsed and the obvious asm 
   file is generated: "int32 i;".  Then "int32 i = 0;".

   Now the frontend knows global decls and initializers, but the type
   parsing problem is deferred.
3) implement function parsing.  Hack on the frontend long enough
   that this program is accepted and generates the obvious gimple:
 int32 main(int32 argc) {
   int32 a;
   a_1 = argc_0;
   _2 = a_1 - argc_0;
   return _2;
 }
   While doing this, think about how to represent default defs, and 
   generally SSA names.
4) implement some more operators for one and two-op statements
5) implement syntax for conditional code, for instance:
 {
   L0: if (argc_0 == 1) goto L1; else goto L2;
   L1: $preds (L0)
   return 1;
   L2: $preds (L0)
   return 2;
 }
5) think about and implement syntax for PHI nodes, e.g.:
 {
   int a;
   L0: if (argc_0 == 1) goto L1; else goto L2;
   L1: $preds (L0)
   a_1 = 1;
   goto L3;
   L2: $preds (L0)
   a_2 = 42 + argc_0;
   goto L3;
   L3: $preds (L1, L2)
   a_3 = $phi (a_1, a_2);
   return a_3;
 }

   See for instance how it'

Re: [gimplefe] [gsoc16] Gimple Front End Project

2016-03-15 Thread Michael Matz
Hi,

On Tue, 15 Mar 2016, Richard Biener wrote:

> So I am most worried about replicating all the complexity of types and 
> decl parsing for the presumably nice and small function body parser.
> 
> In private discussion we somewhat agreed (Micha - correct me ;)) that 
> iff the GIMPLE FE would replace the C FE function body parsing 
> completely (re-using name lookup infrastructure of course) and iff the 
> GIMPLE FE would emit GIMPLE directly (just NULL DECL_SAVED_TREE and a 
> GIMPLE seq in DECL_STRUCT_FUNCTION->gimple_body) then "re-using" the C 
> FE would be a way to greatly speed up success.

Yeah, that's a fair characterization of our discussion.  What I'm most 
worried about with mixing C and gimple parsing are several things:
* silently accepting C-like code that actually isn't supposed to be 
  gimple, i.e. I fear we muddle the water by attaching something to an 
  existing large blob without a very clear separation
* uglifying the C parser so much that the changes become unacceptable
* Parsing gimple, but going though GENERIC; I want to directly create
  GIMPLE

Separating the type/decl parsing and function body parsing would help with 
all three things, and will give you a working type parser without actually 
copying code around, so that's a plus.

(Of course, putting it into an existing front-end might also be less fun 
than writing one from scratch, but that's not my main point :) ).

> The other half of the project would then be to change the pass manager 
> to do something sensible with the produced GIMPLE as well as making our 
> dumps parseable by the GIMPLE FE.

Definitely the dumping part needs to be developed somewhat in lock-step 
with the parser; the pass manager infrastructure should be started 
somewhat halfway into the project, yes.


Ciao,
Michael.


Re: Aggressive load in gcc when accessing escaped pointer?

2016-03-21 Thread Michael Matz
Hi,

On Sat, 19 Mar 2016, Cy Cheng wrote:

> But I don't understand why &c - 8 is invalid? Which rule in C99 it volatile?

&x points to the start of object x, and &x - something (something != 0) 
points outside object x.  'c' was a complete object, so &c-8 points 
outside any object, hence the formation of that pointer is already 
invalid (as is its dereference).


Ciao,
Michael.


Re: Preventing preemption of 'protected' symbols in GNU ld 2.26 [aka should we revert the fix for 65248]

2016-04-18 Thread Michael Matz
Hi,

On Mon, 18 Apr 2016, H.J. Lu wrote:

> > reason is DSO code (also handcoded assembly) may reasonably expect to 
> > be able to load the address with a PC-relative load-address type 
> > instruction (ADDIUPC, LEA, MOVAB, etc.) and the target may not even 
> > have suitable dynamic relocations available to apply any load-time 
> > fixup if the symbol referred turns up outside of the DSO.  The 
> > instruction used may have a PC-relative range limit too.
> 
> That is why protected visibility is such a mess.

Not mess, but it comes with certain limitations.  And that's okay.  It's 
intended as an optimization, and it should do that optimization if 
requested, and error out if it can't be done for whatever reason.

E.g. one limitation might very well be that function pointer comparison 
for protected functions doesn't work (gives different outcomes if the 
pointer is built from inside the exe or from a shared lib).  (No matter 
how it's built, it will still _work_ when called).  Alternatively we can 
make comparison work (by using the exe PLT slot), in which case Alans 
testcase will need more complications to show that protected visibility 
currently is broken.  Alans testcase will work right now (as in showing 
protected being broken) on data symbols.


Ciao,
Michael.


Re: Preventing preemption of 'protected' symbols in GNU ld 2.26 [aka should we revert the fix for 65248]

2016-04-19 Thread Michael Matz
Hi,

On Tue, 19 Apr 2016, Richard Biener wrote:

> So with all this it sounds that current protected visibility is just 
> broken and we should forgo with it, making it equal to default 
> visibility?

Like how?  You mean in GCC regarding protected as default visibility?  No, 
that's just throwing out the baby with the water.  We should make 
protected do what it was intended to do and accept that not all invariants 
that are true for default visible symbols are also true for protected 
symbols, possibly by ...

> At least I couldn't decipher a solution that solves all of the issues 
> with protected visibility apart from trying to error at link-time (or 
> runtime?) for the cases that are tricky (impossible?) to solve.

... this.


Ciao,
Michael.


Re: GCC 6 symbol poisoning and c++ header usage is fragile

2016-04-21 Thread Michael Matz
Hi,

On Thu, 21 Apr 2016, Szabolcs Nagy wrote:

> there is also , ,  usage and go-system.h is special. 
> (and gmp.h includes  when built with c++)
> 
> so i can prepare a patch with INCLUDE_{MAP,SET,LIST} and remove the 
> explicit libc/libstdc++ includes.

This.

> >> auto-profile.c
> >> diagnostic.c
> >> graphite-isl-ast-to-gimple.c
> >> ipa-icf.c
> >> ipa-icf-gimple.c
> >> pretty-print.c
> >> toplev.c

But check if they really are required at all in the sources.  E.g.  
is useless in ipa-icf-gimple.c.


Ciao,
Michael.


Re: SafeStack proposal in GCC

2016-05-09 Thread Michael Matz
Hi,

On Sat, 7 May 2016, Rich Felker wrote:

> > > * sigaltstack and swapcontext are broken too.
> > 
> > We have prototype that supports swapcontext that we're happy to 
> > release, but it clearly requires more work before being ready to merge 
> > upstream.
> 
> The *context APIs are deprecated and I'm not sure they're worth 
> supporting with this. It would be a good excuse to get people to stop 
> using them.

How?  POSIX decided to remove the facilities without any adequate 
replacement (thread aren't).


Ciao,
Michael.


Re: SafeStack proposal in GCC

2016-05-09 Thread Michael Matz
Hi,

On Mon, 9 May 2016, Rich Felker wrote:

> > > The *context APIs are deprecated and I'm not sure they're worth 
> > > supporting with this. It would be a good excuse to get people to 
> > > stop using them.
> > 
> > How?  POSIX decided to remove the facilities without any adequate 
> > replacement (thread aren't).
> 
> Threads work just as well as the ucontext api for coroutines. Due to the 
> requirement to save/restore signal masks, the latter requires a syscall, 
> making it no faster than a voluntary context switch via futex syscall.

Uhm, no.  If you disregard efficiency, sure, POSIX threads are sometimes a 
replacement on some platforms.  They still have completely different 
activation models (being synchronous with *context, for which you need 
even further slow synchronization in a threading model).

> Most of the other hacks people used the ucontext API for were complete 
> hacks with undefined behavior, anyway.

Sure, that doesn't imply the facility should be removed.  I can misuse all 
kinds of stuff.

> BTW it's not even possible to implement makecontext on most targets due 
> to the wacky variadic calling convention it uses -- in most ABIs, 
> there's simply no way to shift the variadic args into the right slots 
> for calling the start function for the new context without knowing their 
> types, and the implementation has no way to know the types. So it's 
> really an unusably broken API.

Of course.  But _that_ implies that a workable replacement should have 
been put in place, not the unrealistic stance POSIX took with the removal:
  makecontext2(ucontext_t *ucp, void (*func)(void*), void* cookie);
Done.  I never understood why they left in the hugely 
unuseful {sig,}{set,long}jmp() but removed the actually useful *context()
(amended somehow like above).


Ciao,
Michael.


Re: SafeStack proposal in GCC

2016-05-09 Thread Michael Matz
Hi,

On Mon, 9 May 2016, Rich Felker wrote:

> > Done.  I never understood why they left in the hugely unuseful 
> > {sig,}{set,long}jmp() but removed the actually useful *context() 
> > (amended somehow like above).
> 
> Because those are actually part of the C language

Sure.  Same QoI bug in my book.  (And I'm not motivated enough to find out 
if the various C standards weren't just following POSIX whe setjmp was 
included, or really the other way around).

> (the non-sig versions, but the sig versions are needed to work around 
> broken unices that made the non-sig versions save/restore signal mask 
> and thus too slow to ever use). They're also much more useful for 
> actually reasonable code (non-local exit across functions that were 
> badly designed with no error paths)

Trivially obtainable with getcontext/setcontext as well.

> as opposed to just nasty hacks that 
> are mostly/entirely UB anyway (coroutines, etc.).

Well, we differ in the definition of reasonable :)  And I certainly don't 
see any material difference in undefined behaviour between both classes of 
functions.  Both are "special" regarding compilers (e.g. returning 
multiple times) and usage.  But as the *jmp() functions can be implemented 
with *context(), but not the other way around, it automatically follows 
(to me!) that the latter are more useful, if for nothing else than basic 
building blocks.  (there are coroutine libs that try to emulate a real 
makecontext with setjmp/longjmp on incapable architectures.  As this is 
impossible for all corner cases they are broken and generally awful on 
them :) )


Ciao,
Michael.


Re: SafeStack proposal in GCC

2016-05-10 Thread Michael Matz
Hi,

On Tue, 10 May 2016, Szabolcs Nagy wrote:

> setjmp is defined so that the compiler can treat it
> specially and the caller has to make sure certain
> objects are volatile, cannot appear in arbitrary
> places (e.g. in the declaration of a vla), longjmp
> must be in same thread etc.
> 
> all those requirements that make setjmp implementible
> at all were missing from the getcontext specs, so you
> can call it through a function pointer and access
> non-volatile modified local state after the second
> return, etc. (the compiler treating "getcontext"
> specially is a hack, not justified by any standard.)

Because _it was removed from the standards_, which is the point of this 
sub-thread.  If it had stayed it would have gotten the same caveats that 
setjmp/longjmp got over time to "make it implementable" ...

> i think both gccgo and qemu can setcontext into another
> thread, so when getcontext returns all tls object
> addresses are wrong.. the semantics of this case was
> not properly defined anywhere (and there are
> implementation internal objects with thread local
> storage duration like fenv so this matters even if
> the caller does not use tls). this is unlikely to
> work correctly with whatever safestack implementation.

... like saying something about this ...

> if setcontext finishes executing the last linked
> context in the main thread it was not clearly
> specified what cleanups will be performed.

... or this, or any of the other use-cases.  And don't claim that 
setjmp/longjmp were completely specified always.  E.g. the interaction 
with threads (naturally) was only put in when threads were put in.  The 
mentioning of the abstract machine state was only there when that concept 
was added, and so on.

> there is just a never ending list of issues with
> these apis, so unless there is an actual proposal
> how to tighten their specification, any caller of
> the context apis rely on undefined semantics.

TBH, that's probably fighting against wind mills.  Architectures who care 
about co-routines implementable in a working way will have a sensible 
*context implementation, others won't.  Too bad for them.


Ciao,
Michael.


Re: Deprecating basic asm in a function - What now?

2016-06-20 Thread Michael Matz
Hi,

On Sun, 19 Jun 2016, David Wohlferd wrote:

> All basic asm in trunk: 1,105 instances.
> - Exclude 273 instances with empty strings leaving 832.
> - Exclude 271 instances for boehm-gc project leaving 561.
> - Exclude 202 instances for testsuite project leaving 359.
> - Exclude 282 instances that are (apparently) top-level leaving
> 
> ~77 instances of basic-asm-in-a-function to be fixed for gcc builds.  
> Most of these are in gcc/config or libgcc/config with just a handful per 
> platform. Lists available upon request.

Well, I think this quite clearly shows how bad an idea it would be to 
deprecate basic asm.  We are just one project, and ourself and our 
dependencies already have 77+271 uses of them, not counting the testsuite 
which also reflects some real world usage.

I see zero gain by deprecating them and only churn.  What would be the 
advantage again?


Ciao,
Michael.


Re: Deprecating basic asm in a function - What now?

2016-06-20 Thread Michael Matz
Hi,

On Mon, 20 Jun 2016, Andrew Haley wrote:

> On 20/06/16 18:36, Michael Matz wrote:
> > I see zero gain by deprecating them and only churn.  What would be the 
> > advantage again?
> 
> Correctness.

As said in the various threads about basic asms, all correctness 
problems can be solved by making GCC more conservative in handling them 
(or better said: not making it less conservative).

If you talk about cases where basic asms diddle registers expecting GCC to 
have placed e.g. local variables into specific ones (without using local 
reg vars, or extended asm) I won't believe any claims ...

> It is very likely that many of these basic asms are not
> robust

... of them being very likely without proof.  They will have stopped 
working with every change in compilation options or compiler version.  In 
contrast I think those that did survive a couple years in software very 
likely _are_ correct, under the then documented (or implicit) assumptions.
Those usually are: clobbers and uses memory, processor state and fixed 
registers.

> in the face of compiler changes because they don't declare their 
> dependencies and therefore work only by accident.

Then the compiler better won't change into less conservative handling of 
basic asms.

You see, the experiment shows that there's a gazillion uses of basic asms 
out there.  Deprecating them means that each and every one of them (for us 
alone that's 540 something, including testsuite and boehm) has to be 
changed from asm("body") into asm("body" : : : "memory") (give and take 
some syntax for also clobbering flags).  Alternatively rewrite the 
body to actually make use of extended asm.  I guarantee you that a 
non-trivial percentage will be wrong _then_ while they work fine now.  
Even if it weren't so it still would be silly if GCC simply could regard 
the former as the latter internally.  It would just be change for the sake 
of it and affecting quite many users without gain.


Ciao,
Michael.


Re: Deprecating basic asm in a function - What now?

2016-06-21 Thread Michael Matz
Hi,

On Tue, 21 Jun 2016, Andrew Haley wrote:

> > As said in the various threads about basic asms, all correctness 
> > problems can be solved by making GCC more conservative in handling 
> > them (or better said: not making it less conservative).
> 
> Well, yes.  That's exactly why we've agreed to change basic asms to make 
> them clobber memory, i.e. to make GCC more conservative.

Exactly.  But this thread is about something else, see subject.

> Well, maybe.  It's also fairly likely that many work by accident.  IMO 
> this is more of a statement of hope than any kind of reasonable 
> expectation.

Like yours, of course.

> > Then the compiler better won't change into less conservative handling 
> > of basic asms.
> 
> Repeat, repeat: the change being made is to make gcc MORE
> conservative.

This thread is about deprecating basic asms.  That's not more 
conservative, it's simply breaking backward compatibility for many users.

> > they work fine now.  Even if it weren't so it still would be silly if 
> > GCC simply could regard the former as the latter internally.
> 
> That's what we're doing.

Currently.  But not the proposed patch in this thread, and the general 
idea of deprecating the basic syntax.


Ciao,
Michael.


Re: [GSoC] writing test-case

2014-05-19 Thread Michael Matz
Hi,

On Thu, 15 May 2014, Richard Biener wrote:

> To me predicate (and capture without expression or predicate)
> differs from expression in that predicate is clearly a leaf of the
> expression tree while we have to recurse into expression operands.
> 
> Now, if we want to support applying predicates to the midst of an
> expression, like
> 
> (plus predicate(minus @0 @1)
> @2)
> (...)
> 
> then this would no longer be true.  At the moment you'd write
> 
> (plus (minus@3 @0 @1)
> @2)
>   if (predicate (@3))
> (...)
> 
> which makes it clearer IMHO (with the decision tree building
> you'd apply the predicates after matching the expression tree
> anyway I suppose, so code generation would be equivalent).

Syntaxwise I had this idea for adding generic predicates to expressions:

(plus (minus @0 @1):predicate
  @2)
(...)

If prefix or suffix doesn't matter much, but using a different syntax
to separate expression from predicate seems to make things clearer.  
Optionally adding things like and/or for predicates might also make sense:

(plus (minus @0 @1):positive_p(@0) || positive_p(@1)
  @2)
(...)


Ciao,
Michael.


Re: [GSoC] writing test-case

2014-05-20 Thread Michael Matz
Hi,

On Tue, 20 May 2014, Richard Biener wrote:

> > Syntaxwise I had this idea for adding generic predicates to expressions:
> >
> > (plus (minus @0 @1):predicate
> >   @2)
> > (...)
> 
> So you'd write
> 
>  (plus @0 :integer_zerop)
> 
> instead of
> 
>  (plus @0 integer_zerop)
> 
> ?

plus is binary, where is your @1?  If you want to not capture the second 
operand but still have it tested for a predicates, then yes, the first 
form it would be.

> 
> > If prefix or suffix doesn't matter much, but using a different syntax
> > to separate expression from predicate seems to make things clearer.
> > Optionally adding things like and/or for predicates might also make sense:
> >
> > (plus (minus @0 @1):positive_p(@0) || positive_p(@1)
> >   @2)
> > (...)
> 
> negation whould be more useful I guess.  You open up a can of
> worms with ordering though:
> 
> (plus (minus @0 @1) @2:operand_equal_p (@1, @2, 0))
> 
> which might be declared invalid or is equivalent to

It wouldn't necessarily be invalid, the predicate would apply to @2;
but check operands 1 and 0 as well, which might be surprising.  In this 
case it might indeed be equivalent to :

> (plus (minus @0 @1) @2):operand_equal_p (@1, @2, 0)



> Note that your predicate placement doesn't match placement of
> captures for non-innermost expressions.  capturing the outer
> plus would be
> 
> (plus@3 (minus @0 @1) @2)


You're right, I'd allow placing the predicate directly behind the capture, 
i.e.:

(plus@3:predicate (minus @0 @1) @2)

> But I still think that doing all predicates within a if-expr makes the 
> pattern less convoluted.

I think it simply depends on the scope of the predicate.  If it's a 
predicate applying to multiple operands from different nested level an 
if-expr is clearer (IMHO).  If it applies to one operand it seems more 
natural to place it directly next to that operand.  I.e.:

(minus @0 @1:non_negative) // better

vs.

(minus @0 @1)
  (if (non_negative (@1))

But:

(plus@3 (minus @0 @1) @2)  // better
  (if (operand_equal_p (@1, @2, 0))

vs:

(plus@3:operand_equal_p (@1, @2, 0) (minus @0 @1) @2)

That is we could require that predicates that are applied with ':' need to 
be unary and apply to the one expression to which they are bound.

> Enabling/disabling a whole set of patterns with a common condition
> might still be a worthwhile addition.

Right, but that seems orthogonal to the above?


Ciao,
Michael.


Re: [GSoC] decision tree first steps

2014-06-16 Thread Michael Matz
Hi,

On Mon, 16 Jun 2014, Richard Biener wrote:

> For
> 
> (match_and_simplify
>   (MINUS_EXPR @2 (PLUS_EXPR@2 @0 @1))
>   @1)

Btw, this just triggered my eye.  So with lumping the predicate to the 
capture without special separator syntax, it means that there's a 
difference between "minus_expr @2" and "minus_expr@2" with a meaningful 
whitespace (despite 'r' and '@' already being a natural word boundary), 
which seems less than ideal.  Just mentioning :)


Ciao,
Michael.


Re: GCC 4.9.1 Status Report (2014-07-10)

2014-07-14 Thread Michael Matz
Hi,

On Mon, 14 Jul 2014, Franzi Edo. wrote:

> It is like if gcc do not take in account of my BUILD_CFLAGS="-g -O2 
> -fbracket-depth=1024“

GCC meanwhile is compiled with c++.  Instead of CFLAGS use CXXFLAGS.  I.e. 
BUILD_CXXFLAGS, and so on.  No guarantees, but at least foobar_CFLAGS only 
should not be enough.


Ciao,
Michael.

Re: Eliminated function return values in retarget

2014-10-14 Thread Michael Matz
Hi,

On Tue, 14 Oct 2014, Jamie Iles wrote:

>   int foo(void)
>   {
>   if (getreturn() != 0)
>   return -1;
>   
>   return 0;
>   }

So if getreturn() returns zero it can simply reuse that return value ...

> but at -O1 I get
> 
>   10: fb ff ff 40 call0 
>   10: R_OLDLAND_PC24  getreturn-0x4
>   14: 01 00 00 3c mov $r1, 0x0
>   18: 01 01 00 0a sub $r1, $r1, $r0
>   1c: 00 01 00 2a or  $r0, $r1, $r0
>   20: 00 f0 01 38 asr $r0, $r0, 0x1f

... and if I'm interpreting the mnemonics correctly that is what seems to 
happen here.  If $r0 is zero, then:

 mov $r1, 0x0  $r1 = 0
 sub $r1, $r1, $r0 $r1 = 0 - 0 = 0
 or  $r0, $r1, $r0 $r0 = 0 | 0 = 0
 asr $r0, $r0, 0x1f$r0 = 0 >> 31 = 0

Voila, zero is correctly returned.  Any non-zero value will be transformed 
into -1 (if positive the high bit will be set due to subtraction, if 
negative the high bit will be set due to the 'or', and the shift 
replicates the high bit into the lower ones, yielding -1).


Ciao,
Michael.


Re: What is R_X86_64_GOTPLT64 used for?

2014-11-13 Thread Michael Matz
Hi,

On Thu, 13 Nov 2014, H.J. Lu wrote:

> x86-64 psABI has
> 
> name@GOT: specifies the offset to the GOT entry for the symbol name
> from the base of the GOT.
> 
> name@GOTPLT: specifies the offset to the GOT entry for the symbol name
> from the base of the GOT, implying that there is a corresponding PLT entry.
> 
> But GCC never generates name@GOTPLT and assembler fails to assemble
> it:

I've added the implementation for the large model, but only dimly remember 
how it got added to the ABI in the first place.  The additional effect of 
using that reloc was supposed to be that the GOT slot was to be placed 
into .got.plt, and this might hint at the reasoning for this reloc:

If you take the address of a function and call it, you need both a GOT 
slot and a PLT entry (where the existence of GOT slot is implied by the 
PLT of course).  Now, if you use the normal @GOT64 reloc for the 
address-taking operation that would create a slot in .got.  For the call 
instruction you'd use @PLT (or variants thereof, like PLTOFF), which 
creates the PLT slot _and_ a slot in .got.plt.  So, now we've ended up 
with two GOT slots for the same symbol, where one should be enough (the 
address taking operation can just as well use the slot in .got.plt).  So 
if the compiler would emit @GOTPLT64 instead of @GOT64 for all address 
references to symbols where it knows that it's a function it could save 
one GOT slot.

So, I think it was supposed to be a small optimization hint.  But it never 
was used in the compiler ...

> [hjl@gnu-6 pr17598]$ cat x.S
> movabs $foo@GOTPLT,%rax
> [hjl@gnu-6 pr17598]$ gcc -c x.S
> x.S: Assembler messages:
> x.S:1: Error: relocated field and relocation type differ in signedness

... and now seems to have bit-rotted.

> [hjl@gnu-6 pr17598]$
> 
> It certainly isn't needed on data symbols.  I couldn't find any possible
> usage for this relocation on function symbols.

The longer I think about it the more I'm sure it's the above optional 
optimization mean.


Ciao,
Michael.


Re: What is R_X86_64_GOTPLT64 used for?

2014-11-17 Thread Michael Matz
Hi,

On Thu, 13 Nov 2014, H.J. Lu wrote:

> @GOTPLT will create a PLT entry, but it doesn't mean PLT entry will be 
> used.

Correct.  The compiler was supposed to somehow make a good decision (e.g. 
if there were calls and address-takings in the same unit).

> Only @PLTOFF will use PLT entry.  Linker should be smart enough to use 
> only one GOT slot, regardless if @GOTPLT or @GOT is used to take 
> function address and call via PLT.

For @GOT the respective GOT slot needs to resolve to the final address (to 
provide stable function pointers).  For @GOTPLT it could at first resolve 
to the PLT slot (which would be only a relative reloc, not a symbol based 
one), like Richard said.  So there still would be a difference.  Apart 
from that I agree that the linker should ideally only use one GOT slot, 
which would remove that particular advantage of @GOTPLT (note that it 
doesn't do so currently contrary to what you said downthread, I'll respond 
separately).

> I'd like to propose
> 
> 1. Update psABI to remove R_X86_64_GOTPLT64.

I don't have much against this, but would like others to say something.


Ciao,
Michael.


Re: What is R_X86_64_GOTPLT64 used for?

2014-11-17 Thread Michael Matz
Hi,

On Thu, 13 Nov 2014, H.J. Lu wrote:

> Linker does:
> 
> ... code that looks like it might create just one GOT slot ...
> 
> So if  a symbol is accessed by both @GOT and @PLTOFF, its
> needs_plt will be true and its got.plt entry will be used for
> both @GOT and @GOTPLT.  @GOTPLT has no advantage
> over @GOT, but potentially wastes a PLT entry.

The above is not correct.  Had you tried you'd see this:

% cat x.c
extern void foo (void);
void main (void)
{
  void (*f)(void) = foo;
  f();
  foo();
}
% gcc -fPIE -mcmodel=large -S x.c; cat x.s
...
movabsq $foo@GOT, %rax
...
movabsq $foo@PLTOFF, %rax
...

So, foo is access via @GOT offset and @PLTOFF.  Then,

% cat y.c
void foo (void) {}
% gcc -o liby.so -shared -fPIC y.c
% gcc -fPIE -mcmodel=large x.s liby.so
% readelf -r a.out
...
00600ff8  00040006 R_X86_64_GLOB_DAT  foo + 0
...
00601028  00040007 R_X86_64_JUMP_SLO  foo + 0
...

The first one (to 600ff8) is the normal GOT slot, the second one the GOT 
slot for the PLT entry.  Both are actually used:

004005f0 :
  4005f0:   ff 25 32 0a 20 00   jmpq   *0x200a32(%rip)# 601028 
<_GLOBAL_OFFSET_TABLE_+0x28>

That uses the second GOT slot, and:

004006ec :
  4006ec:   55  push   %rbp
  4006ed:   48 89 e5mov%rsp,%rbp
  4006f0:   53  push   %rbx
  4006f1:   48 83 ec 18 sub$0x18,%rsp
  4006f5:   48 8d 1d f9 ff ff fflea-0x7(%rip),%rbx# 4006f5 

  4006fc:   49 bb 0b 09 20 00 00movabs $0x20090b,%r11
  400703:   00 00 00 
  400706:   4c 01 dbadd%r11,%rbx
  400709:   48 b8 f8 ff ff ff ffmovabs $0xfff8,%rax
  400710:   ff ff ff 
  400713:   48 8b 04 03 mov(%rbx,%rax,1),%rax

This uses the first slot at 0x600ff8.

So, no, currently GOT and GOTPLT (at least how it's supposed to be 
implemented) are not equivalent.

> Here is a patch to mark relocation 30 (R_X86_64_GOTPLT64) as reserved.  
> I pushed updated x86-64 psABI changes to
> 
> https://github.com/hjl-tools/x86-64-psABI/tree/hjl/master
> 
> I will update linker to keep accepting relocation 30 and treat it the 
> same as R_X86_64_GOT64.

That seems a bit premature given the above.


Ciao,
Michael.


Re: What is R_X86_64_GOTPLT64 used for?

2014-11-18 Thread Michael Matz
Hi,

On Mon, 17 Nov 2014, H.J. Lu wrote:

> It has nothing to do with large model.

Yes, I didn't say so.  I've used it only to force GCC to emit @GOT relocs 
(otherwise it would have used @GOTPCREL) to disprove your claim.

> The same thing happens to small model.  We may be to able optimize it, 
> independent of GOTPLT.

Yes, if we were to optimize this, the difference between GOT and GOTPLT 
would be very minor.

> In any case, -mcmodel=large shouldn't change program behavior.

No, it shouldn't of course.


Ciao,
Michael.


Re: What is R_X86_64_GOTPLT64 used for?

2014-11-20 Thread Michael Matz
Hi,

On Wed, 19 Nov 2014, H.J. Lu wrote:

> > The first one (to 600ff8) is the normal GOT slot, the second one the GOT
> > slot for the PLT entry.  Both are actually used:
> >
> > 004005f0 :
> >   4005f0:   ff 25 32 0a 20 00   jmpq   *0x200a32(%rip)# 
> > 601028 <_GLOBAL_OFFSET_TABLE_+0x28>
> 
> They are not:

Huh?  I said both GOT slots are used and I proved it in the disasm dumps.

> => 0x004005aa <+29>: movabs $0xfff8,%rax
>0x004005b4 <+39>: mov(%rbx,%rax,1),%rax

Here it's using one of the GOT slots (namely the one not associated with 
the PLT entry) ...

> Breakpoint 2, 0x004005c0 in main () at main.c:6
> 6  f();
> (gdb) p $rax
> $5 = 140737352012384
> (gdb) disass $rax
> Dump of assembler code for function foo:

... which is why %rax contains the final address of foo, being loaded from 
the appropriate GOT slot that was just relocated with a GLOB_DAT reloc.

> Breakpoint 3, 0x004005cf in main () at main.c:7
> 7  foo();
> (gdb) p $rax
> $6 = 4195472
> (gdb) disass $rax
> Dump of assembler code for function foo@plt:
>0x00400490 <+0>: jmpq   *0x200552(%rip)# 0x6009e8
> 

And here it's using the other GOT slot (associated with this PLT entry), 
unequal to the one used above and initially pointing to the first PLT 
stub.  So why do you say that not both are used, you clearly see they are?

> One way to optimize it is to make PLT entry to use the normal GOT
> slot:

Exactly.  As a symbol lookup needs to be done anyway for the GLOB_DAT 
reloc going through the dynamic linker for the lazy lookup later when a 
call occurs doesn't make sense.

> jmp *name@GOTPCREL(%rip)
> 8 byte nop

You mean replacing the PLT slot with the above?  Yep, something like that.  
Even better of course would be to not use the PLT slot at all, it's just a 
useless indirection.  It would be even cooler to rewrite the call insn 
from
  call foo@PLT
into
  call *foo@GOTPCREL(%rip)

(in the small model here)  Unfortunately the latter is one byte larger 
than the former.  But perhaps GCC could already emit the latter form 
when it knows a certain function symbol has its address taken (or more 
precisely if a GLOB_DAT reloc is going to be emitted for it).

> where name@GOTPCREL points to the normal GOT slot
> updated by R_X86_64_GLOB_DAT relocation at run-time.
> Should I give it a try?

Frankly, I have no idea if it's worth it.  Address takings of function 
symbols doesn't occur very often, except in vtables, and that's not using 
GOT slots.  Vtables should be handled in a completely different way 
anyway: as the entries aren't usually used for address comparisons they 
should point to the PLT slots, so that it's only RELATIVE relocs, not 
symbol based ones, so that also virtual calls can be resolved lazily.


Ciao,
Michael.


Re: [ping] Re: proper name of i386/x86-64/etc targets

2015-01-20 Thread Michael Matz
Hi,

On Mon, 19 Jan 2015, Sandra Loosemore wrote:

> > I'd be happy to work on a patch to bring the manual to using a common 
> > naming convention, but what should it be?  Wikipedia seems to use 
> > "x86" (lowercase) to refer to the entire family of architectures 
> > (including the original 16-bit variants), "IA-32" for the 32-bit 
> > architecture (I believe that is Intel's official name), and "x86-64" 
> > (with a dash instead of underscore) for the 64-bit architecture.  But 
> > of course the target maintainers should have the final say on what 
> > names to use.
> 
> Ping?  Any thoughts?

ia32 is confusing because ia64 (a well known term) sounds related but 
can't be farther away from it, and it's also vendor specific.  Our 
traditional i386 seems better to me (although it has its own problems, but 
I'm not aware of any better abbreviation in the wild that's vendor neutral 
and specifically means the 32bit incarnation of the x86 architecture).


Ciao,
Michael.


Re: [ping] Re: proper name of i386/x86-64/etc targets

2015-01-20 Thread Michael Matz
Hi,

On Tue, 20 Jan 2015, H.J. Lu wrote:

> > ia32 is confusing because ia64 (a well known term) sounds related but 
> > can't be farther away from it, and it's also vendor specific.  Our 
> > traditional i386 seems better to me (although it has its own problems, 
> > but I'm not aware of any better abbreviation in the wild that's vendor 
> > neutral and specifically means the 32bit incarnation of the x86 
> > architecture).
> >
> 
> The problem with i386 is it is a real processor.  When someone says 
> i386, it isn't clear if it means the processor or 32-bit x86.

That's what I meant with its own problems :)  But ia32 seems worse to me 
than this IMO.


Ciao,
Michael.


Re: [ping] Re: proper name of i386/x86-64/etc targets

2015-01-20 Thread Michael Matz
Hi,

On Tue, 20 Jan 2015, Uros Bizjak wrote:

> > At least, IA-32 is clear, although IA-64 may be confusing :-).  FWIW, 
> > i386 is also vendor specific.
> 
> Wikipedia agrees [1]:

I wouldn't use a wikipedia article that only cites sources from after 2008 
(and most of them actually after the after-the-fact invention of "ia32") 
for an architecture that exists since 1985 as sensible source for 
supporting either point of view ;-)  It totally lacks references to 
config.guess which IMHO is a much better source of "how to call an 
architecture" :)

Anyway, I've said the things I had to say, crouching back under my stone 
:)


Ciao,
Michael.


Re: Why is floor() only compiled to roundsd when using -funsafe-math-optimizations?

2015-01-28 Thread Michael Matz
Hi,

On Wed, 28 Jan 2015, Tobias Burnus wrote:

> I first want to point to POSIX, which has:
> 
> "floor, floorf, floorl - floor function" [...]
> "An application wishing to check for error situations should set errno to
>  zero and call feclearexcept(FE_ALL_EXCEPT) before calling  these  functions.
>  On  return,  if  errno  is  non-zero  or  fetestexcept(FE_INVALID |
>  FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW) is non-zero, an error has 
> occurred."
> 
> No one seems to care about the "errno" handling (and prefers to use the 
> trapping information directly), thus, I wouldn't be surprised if most 
> libc do not set errno.

That is because the error conditions that POSIX allows simply can't happen 
with the IEEE float formats that are in use.  From the glibc manpage for 
floor:

  NOTES
   SUSv2 and POSIX.1-2001 contain text about  overflow  (which  might  set
   errno  to ERANGE, or raise an FE_OVERFLOW exception).  In practice, the
   result cannot overflow on any current machine, so  this  error-handling
   stuff is just nonsense.  (More precisely, overflow can happen only when
   the maximum value of the exponent is smaller than the  number  of  man-
   tissa bits.  For the IEEE-754 standard 32-bit and 64-bit floating-point
   numbers the maximum value of the exponent is 128 (respectively,  1024),
   and the number of mantissa bits is 24 (respectively, 53).)

Hence, GCC guarding the transformation is simply a bit too cautious.


Ciao,
Michael.


Re: pass_stdarg problem when run after pass_lim

2015-01-30 Thread Michael Matz
Hi,

On Fri, 30 Jan 2015, Tom de Vries wrote:

> > Maybe you want to pick up the work?
> 
> In principle yes, depending on the amount of work (at this point I have no
> idea what remains to be done and how long that would take me).
> 
> Michael, are your patches posted somewhere?

I don't think I ever sent them.  Pasted below, from somewhen October last 
year.  This essentially moves expanding va_arg to pass_stdarg.  But it 
does not yet make use of the possibilities this would bring, namely 
throwing away a whole lot of fragile code in pass_stdarg that tries to 
recover from expanding va_arg too early.

To avoid having to touch each backend it retains expanding va_arg as a 
tree expression that needs to go through the gimplifier, which can create 
new basic blocks that need to be discovered after the fact, so there's 
some shuffling of code in tree-cfg as well.

I also seem to remember that there was a problem with my using temporaries 
of the LHS for the new va_arg internal call, some types can't be copied 
and hence no temporaries can be created.  I can't seem to trigger this 
right now, but this needs to be dealt with somehow I think (but that 
requires the final lvalue be available when lowering the VA_ARG_EXPR).

I think that's about it, hence, updating to current compiler, fixing the 
above problem (if it's still one), and then cleaning up pass_stdarg to 
make use of the availability of IFN_VA_ARG.


Ciao,
Michael.

Index: gimplify.c
===
--- gimplify.c  (revision 216512)
+++ gimplify.c  (working copy)
@@ -9001,6 +9001,39 @@ dummy_object (tree type)
   return build2 (MEM_REF, type, t, t);
 }
 
+/* Call the target expander for evaluating a va_arg call of VALIST
+   and TYPE.  */
+
+tree
+gimplify_va_arg_internal (tree valist, tree type, location_t loc,
+ gimple_seq *pre_p, gimple_seq *post_p)
+{
+  tree have_va_type = TREE_TYPE (valist);
+  have_va_type = targetm.canonical_va_list_type (have_va_type);
+
+  /* Make it easier for the backends by protecting the valist argument
+ from multiple evaluations.  */
+  if (TREE_CODE (have_va_type) == ARRAY_TYPE)
+{
+  /* For this case, the backends will be expecting a pointer to
+TREE_TYPE (abi), but it's possible we've
+actually been given an array (an actual TARGET_FN_ABI_VA_LIST).
+So fix it.  */
+  if (TREE_CODE (TREE_TYPE (valist)) == ARRAY_TYPE)
+   {
+ tree p1 = build_pointer_type (TREE_TYPE (have_va_type));
+ valist = fold_convert_loc (loc, p1,
+build_fold_addr_expr_loc (loc, valist));
+   }
+
+  gimplify_expr (&valist, pre_p, post_p, is_gimple_val, fb_rvalue);
+}
+  else
+gimplify_expr (&valist, pre_p, post_p, is_gimple_min_lval, fb_lvalue);
+
+  return targetm.gimplify_va_arg_expr (valist, type, pre_p, post_p);
+}
+
 /* Gimplify __builtin_va_arg, aka VA_ARG_EXPR, which is not really a
builtin function, but a very special sort of operator.  */
 
@@ -9027,8 +9060,7 @@ gimplify_va_arg_expr (tree *expr_p, gimp
 
   /* Generate a diagnostic for requesting data of a type that cannot
  be passed through `...' due to type promotion at the call site.  */
-  if ((promoted_type = lang_hooks.types.type_promotes_to (type))
-  != type)
+  if ((promoted_type = lang_hooks.types.type_promotes_to (type)) != type)
 {
   static bool gave_help;
   bool warned;
@@ -9062,36 +9094,28 @@ gimplify_va_arg_expr (tree *expr_p, gimp
   *expr_p = dummy_object (type);
   return GS_ALL_DONE;
 }
-  else
+  else if (optimize && !optimize_debug)
 {
-  /* Make it easier for the backends by protecting the valist argument
-from multiple evaluations.  */
-  if (TREE_CODE (have_va_type) == ARRAY_TYPE)
+  tree tmp, tag;
+  gimple call;
+  tmp = build_fold_addr_expr_loc (loc, valist);
+  if (gimplify_arg (&tmp, pre_p, loc) == GS_ERROR)
+   return GS_ERROR;
+  tag = build_int_cst (build_pointer_type (type), 0);
+  call = gimple_build_call_internal (IFN_VA_ARG, 2, tmp, tag);
+  gimple_seq_add_stmt (pre_p, call);
+  if (VOID_TYPE_P (type))
{
- /* For this case, the backends will be expecting a pointer to
-TREE_TYPE (abi), but it's possible we've
-actually been given an array (an actual TARGET_FN_ABI_VA_LIST).
-So fix it.  */
- if (TREE_CODE (TREE_TYPE (valist)) == ARRAY_TYPE)
-   {
- tree p1 = build_pointer_type (TREE_TYPE (have_va_type));
- valist = fold_convert_loc (loc, p1,
-build_fold_addr_expr_loc (loc, 
valist));
-   }
-
- gimplify_expr (&valist, pre_p, post_p, is_gimple_val, fb_rvalue);
+ *expr_p = NULL;
+ return GS_ALL_DONE;
}
-  else
-   gimplify_expr (&valist, pre_p, post_p, is_gimple_min_lval, fb_lvalue

Re: pass_stdarg problem when run after pass_lim

2015-02-02 Thread Michael Matz
Hi,

On Mon, 2 Feb 2015, Tom de Vries wrote:

> I've minimized the vaarg-4a.c failure, and added it as testcase to the patch
> series as gcc.target/x86_64/abi/callabi/vaarg-4.c.
> 
> The problem is in this code:
> ...
>   e = va_arg (argp, char *);
>   e = va_arg (argp, char *);
> ...
> 
> which is translated into:
> ...
>   :
>   argp.1 = argp_3(D);
> 
>   :
>   argp.12_11 = &argp.1;
>   _12 = *argp.12_11;
>   _13 = _12 + 8;
>   *argp.12_11 = _13;
> 
>   :
>   argp.3 = argp_3(D);
> 
>   :
>   argp.13_15 = &argp.3;
>   _16 = *argp.13_15;
>   _17 = _16 + 8;
>   *argp.13_15 = _17;
>   _19 = MEM[(char * *)_16];
>   e_8 = _19;
> ...

That looks like non-x86-64 ABI code.  It builds with -mabi=ms, and it 
seems the particular path taken therein doesn't write back to the aplist 
if it's not locally created with va_start, but rather given as argument.  
Or rather, if it is not addressible (like with x86-64 ABI, where it's 
either addressible because of va_start, or is a pointer to struct due to 
array decay).  The std_gimplify_va_arg_expr might need more changes.


Ciao,
Michael.


Re: pass_stdarg problem when run after pass_lim

2015-02-03 Thread Michael Matz
Hi,

On Tue, 3 Feb 2015, Tom de Vries wrote:

> Ironically, that fix breaks the va_list_gpr/fpr_size optimization, so 
> I've disabled that by default for now.
> 
> I've done a non-bootstrap and bootstrap build using all languages.
> 
> The non-bootstrap test shows (at least) two classes of real failures:
> - gcc.c-torture/execute/20020412-1.c, gcc.target/i386/memcpy-strategy-4.c and
>   gcc.dg/lto/20090706-1_0.c.
>   These are test-cases with vla as va_arg argument. It ICEs in
>   force_constant_size with call stack
>   gimplify_va_arg_expr -> create_tmp_var -> gimple_add_tmp_var ->
>   force_constant_size

Hah, yeah, that's the issue I remembered with create_tmp_var.  This needs 
a change in how to represent the va_arg "call", because the LHS can't be a 
temporary that's copied to the real LHS afterwards.

> - most/all va_arg tests with flto, f.i. gcc.c-torture/execute/stdarg-1.c.
>   It segfaults in lto1 during pass_stdarg, in gimplify_va_arg_internal when
>   accessing have_va_type which is NULL_TREE after
>   'have_va_type = targetm.canonical_va_list_type (have_va_type)'.
> 
> I don't think the flto issue is difficult to fix.  But the vla issue 
> probably needs more time than I have available right now.

I'll think about this.


Ciao,
Michael.


Re: pass_stdarg problem when run after pass_lim

2015-02-03 Thread Michael Matz
Hi,

On Tue, 3 Feb 2015, Jakub Jelinek wrote:

> It can be lowered during gimplification to some internal call.

That's what my patch does :)

> What arguments and return values will it have can be decided based on 
> what will be most suitable for the lowering.

And that as well, just the concrete choice I did doesn't work for all 
cases, so that needs some change.


Ciao,
Michael.


Re: Postpone expanding va_arg until pass_stdarg

2015-02-10 Thread Michael Matz
Hi,

On Tue, 10 Feb 2015, Tom de Vries wrote:

> I've added two modifications to gimplify_modify_expr:
> - the WITH_SIZE_EXPR in which the CALL_TREE is wrapped, is dropped after
>   gimplification, but we need the size expression at expansion in pass_stdarg.
>   So I added the size expression as argument to the internal function.
>   [ And at pass_stdarg::execute, we wrap the result of
> gimplify_va_arg_internal
>   in a WITH_SIZE_EXPR before generating the assign to the lhs ]

Hmm, why do you need the WITH_SIZE_EXPR actually?  For variable-sized 
types returned by va_arg?

> - we detect after gimplify_arg (&ap) whether it created a copy ap.1 of ap,
>   rather than use ap itself, and if so, we copy the value back from ap.1 to ap
>   after va_arg.

My idea was to not generate temporaries and hence copies for 
non-scalar types, but rather construct the "result" of va_arg directly 
into the original LHS (that would then also trivially solve the problem of 
nno-copyable types).

> I'm not really sure yet why std_gimplify_va_arg_expr has a part 
> commented out. Michael, can you comment?

I think I did that because of SSA form.  The old sequence calculated

  vatmp = valist;
  vatmp = vatmp + boundary-1
  vatmp = vatmp & -boundary

(where the local variable in that function 'valist_tmp' is the tree 
VAR_DECL 'vatmp') and then continue to use valist_tmp.  When in SSA form 
the gimplifier will rewrite this into:

  vatmp_1 = valist;
  vatmp_2 = vatmp_1 + boundary-1
  vatmp_3 = vatmp_2 & -boundary

but the local valist_tmp variable will continue to be the VAR_DECL, not 
the vatmp_3 ssa name.  Basically whenever one gimplifies a MODIFY_EXPR 
while in SSA form it's suspicious.  So the new code simply build the 
expression:

  ((valist + bound-1) & -bound)

gimplifies that into an rvalue (most probably an SSA name) and uses that 
to go on generating code by making valist_tmp be that returned rvalue.

I think you'll find that removing that code will make the SSA verifier 
scream or generate invalid code with -m32 when that hook is used.


Ciao,
Michael.


  1   2   3   4   5   >