from:"Bill Schmidt"

Git question: Rebasing a user branch

2020-02-04 Thread Bill Schmidt

I'm having a little difficulty with my workflow, and I'm hoping someone 
can spot the problem.


I have a user branch set up with the contrib/git-add-user-branch.sh 
script.  Here are the relevant portions of my .git/config:


[remote "users/wschmidt"]
    url = git+ssh://wschm...@gcc.gnu.org/git/gcc.git
    fetch = +refs/users/wschmidt/heads/*:refs/remotes/users/wschmidt/*
    fetch = +refs/users/wschmidt/tags/*:refs/tags/users/wschmidt/*
    push = refs/heads/wschmidt/*:refs/users/wschmidt/heads/*
[branch "wschmidt/builtins"]
    remote = users/wschmidt
    merge = refs/users/wschmidt/heads/builtins

I originally created the branch from master.  I then made 15 local 
commits, and pushed these upstream.


I now want to rebase my branch from master, and reflect this state 
upstream.  My recipe is:


git checkout master
git pull
git checkout wschmidt/builtins
git rebase master
git push --dry-run users/wschmidt +wschmidt/builtins

After the rebase step, git status shows:

On branch wschmidt/builtins
Your branch and 'users/wschmidt/builtins' have diverged,
and have 39 and 15 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)

nothing to commit, working tree clean

Looks fine to me, so lets try the force push:

wschmidt@marlin:~/newgcc/gcc/config/rs6000$ git push --dry-run 
users/wschmidt +wschmidt/builtins

To git+ssh://gcc.gnu.org/git/gcc.git
 * [new branch]  wschmidt/builtins -> wschmidt/builtins

Well, that's odd, why is it trying to create a new branch?

If I inadvisedly attempt to push without --dry-run, I am stopped from 
creating the new branch:


remote: *** Shared development branches should be named devel/*, and 
should be documented in https://gcc.gnu.org/git.html .

remote: error: hook declined to update refs/heads/wschmidt/builtins
To git+ssh://gcc.gnu.org/git/gcc.git
 ! [remote rejected] wschmidt/builtins -> wschmidt/builtins 
(hook declined)
error: failed to push some refs to 
'git+ssh://wschm...@gcc.gnu.org/git/gcc.git'


It seems wrong that it is trying to update refs/head/wschmidt/builtins 
(thus creating a new branch).  It seems like there may be a missing 
"users/" needed someplace.  But I am not at all confident that's 
correct.  I'm a little suspicious of the push spec in my config.


Can someone with strong git-fu give me any suggestions?

Best regards,
Bill

Re: Git question: Rebasing a user branch

2020-02-04 Thread Bill Schmidt


On 2/4/20 4:31 PM, Andreas Schwab wrote:

On Feb 04 2020, Bill Schmidt wrote:


wschmidt@marlin:~/newgcc/gcc/config/rs6000$ git push --dry-run
users/wschmidt +wschmidt/builtins
To git+ssh://gcc.gnu.org/git/gcc.git
  * [new branch]  wschmidt/builtins -> wschmidt/builtins

Well, that's odd, why is it trying to create a new branch?

You told it so, with the refspec you used.  Instead you want to push to
users/wschmidt/builtins on the remote side,
ie. +wschmidt/builtins:users/wschmidt/builtins.



Hm.  If I'm understanding you correctly, this still attempts to create a 
new branch:


wschmidt@marlin:~/newgcc/gcc/config/rs6000$ git push --dry-run 
users/wschmidt +wschmidt/builtins:users/wschmidt/builtins

To git+ssh://gcc.gnu.org/git/gcc.git
 * [new branch]  wschmidt/builtins -> users/wschmidt/builtins

I expect I've misunderstood, though.

Thanks!
Bill



Andreas.

Re: Git question: Rebasing a user branch

2020-02-04 Thread Bill Schmidt


On 2/4/20 5:09 PM, Andreas Schwab wrote:

On Feb 04 2020, Bill Schmidt wrote:


Hm.  If I'm understanding you correctly, this still attempts to create a
new branch:

wschmidt@marlin:~/newgcc/gcc/config/rs6000$ git push --dry-run
users/wschmidt +wschmidt/builtins:users/wschmidt/builtins

Sorry, that needs to be fully qualified, as it is not under refs/heads
(and /heads/ was missing on the remote side):

+wschmidt/builtins:refs/users/wschmidt/heads/builtins



Thanks!  That worked:

git push users/wschmidt 
+wschmidt/builtins:refs/users/wschmidt/heads/builtins


Regarding your other suggestion, I don't like to use -f given that I've 
noticed it will sometimes attempt to push other local branches as well.  
But it looks to be unnecessary with this method.


Much obliged!
Bill



Andreas.

Power ELFv2 ABI now openly published

2015-08-24 Thread Bill Schmidt

At Cauldron this year, several people complained to me that our latest
ABI document was behind a registration wall.  I'm happy to say that
we've finally gotten past the issues that were holding it there, and it
is now openly available at:

https://members.openpowerfoundation.org/document/dl/576  

Thanks,
Bill

Some aliasing questions

2016-04-08 Thread Bill Schmidt

Hi,

I ran into a couple of aliasing issues with a project I'm working on,
and have some questions.

The first is an issue with TOC-relative addresses on PowerPC.  These are
symbolic addresses that are to be loaded from a fixed slot in the table
of contents, as addressed by the TOC pointer (r2).  In the RTL phases
prior to register allocation, these are described in an UNSPEC that
looks like this for an example store:

(set (mem/c:DI (unspec:DI [
   (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
   (reg:DI 2 2)
  ] UNSPEC_TOCREL) [1 svul+0 S8 A128])
 (reg:DI 178))

The UNSPEC helps keep track of the r2 reference until this is split into
two or more insns depending on the memory model.

I discovered that alias.c:memrefs_conflict_p is unable to make
must-alias decisions about these, because it doesn't see into the UNSPEC
to find the symbol_ref.  Thus it returns -1 (no information) when
comparing the above with:

(set (reg/f:DI 177)
 (unspec:DI [
(symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
(reg:DI 2 2)
  ] UNSPEC_TOCREL))
(set (reg:V2DI 159)
 (mem:V2DI (and:DI (reg/f:DI 177)
   (const_int -16 [0xfff0])) [4 *_11+0 S16 
A128]))

But clearly the two addresses overlap.

I added the following hack, and the code then returns 1 (must-alias),
without regressing anything in the test suite.

Index: gcc/alias.c  
===
--- gcc/alias.c (revision 234726)   
+++ gcc/alias.c (working copy)  
@@ -2213,6 +2213,12 @@ memrefs_conflict_p (int xsize, rtx x, int ysize, r
}   
 }  

+  /* Some targets may hide a base address in an UNSPEC.  Peel that away.  */   
+  if (GET_CODE (x) == UNSPEC)  
+return memrefs_conflict_p (xsize, XVECEXP (x, 0, 0), ysize, y, c); 
+  if (GET_CODE (y) == UNSPEC)  
+return memrefs_conflict_p (xsize, x, ysize, XVECEXP (y, 0, 0), c); 
+   
   if (CONSTANT_P (x))  
 {  
   if (CONST_INT_P (x) && CONST_INT_P (y))  

Now, this is probably not right for a real fix, since it assumes that
any UNSPEC is ok for this, and that the base address will be found
recursively in the first position.  I don't know whether any other
targets have similar issues.  So:

(1) What is the best way to handle this?  Would it be better to have
some sort of target hook?

(2) Are there other places in the aliasing infrastructure where this
UNSPEC use could be getting us into trouble?

Another issue I see involves disjoint alias sets.  If you look closely
at the rtx's above, they have been marked as disjoint, belonging to
alias sets 1 and 4, respectively:

[1 svul+0 S8 A128]
[4 *_11+0 S16 A128]

The gimple involved is:

  svul[0] = 0;
  svul[1] = 1;
  svul.1_9 = (sizetype) &svul;
  _10 = svul.1_9 & 18446744073709551600;  // i.e., -16 or 0xfff...f0
  _11 = (__vector unsigned long *) _10;
  vul.2_12 = *_11;

where svul is file-scope:

  static unsigned long long svul[2] __attribute__ ((aligned (16)));

Here I am exposing the semantics of the vec_ld built-in, which aligns
the address to a 16-byte boundary by masking the low-order four bits.
But bitwise AND only works in an integer type, so some casting there may
be responsible for losing track of the fact that *_11 aliases svul.
However, it seems odd to imply that *_11 definitely does not alias svul.
So:

(3) Am I doing something wrong to expose the address masking this way?

(4) Are the alias sets bogus, or am I misinterpreting this?  If they are
wrong, please point me to where they are computed and I can debug
further.

Thanks for any help!  I haven't dug deeply into the aliasing analysis
before.

Bill

Re: Some aliasing questions

2016-04-08 Thread Bill Schmidt

On Fri, 2016-04-08 at 13:41 -0700, Richard Henderson wrote:
> On 04/08/2016 11:10 AM, Bill Schmidt wrote:
> > The first is an issue with TOC-relative addresses on PowerPC.  These are
> > symbolic addresses that are to be loaded from a fixed slot in the table
> > of contents, as addressed by the TOC pointer (r2).  In the RTL phases
> > prior to register allocation, these are described in an UNSPEC that
> > looks like this for an example store:
> > 
> > (set (mem/c:DI (unspec:DI [
> >(symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
> >(reg:DI 2 2)
> >   ] UNSPEC_TOCREL) [1 svul+0 S8 A128])
> >  (reg:DI 178))
> > 
> > The UNSPEC helps keep track of the r2 reference until this is split into
> > two or more insns depending on the memory model.
> 
> 
> That's why Alpha uses LO_SUM for pre-reload tracking of such things.
> 
> Even though that's a bit of a liberty, since there's no HIGH to go along with
> the LO_SUM.  But at least it allows the middle-end to continue to find the 
> symbol.

Yes, that seems like a better way to handle this.  I'll put this on the
to-do list for GCC 7.  The fewer UNSPECs the better; ironically, I ran
into this while trying to remove some other UNSPECs...

> 
> > (1) What is the best way to handle this?  Would it be better to have
> > some sort of target hook?
> 
> Perhaps, yes.
> 
> > Another issue I see involves disjoint alias sets.  If you look closely
> > at the rtx's above, they have been marked as disjoint, belonging to
> > alias sets 1 and 4, respectively:
> ...
> >   _11 = (__vector unsigned long *) _10;
> ...
> >   static unsigned long long svul[2] __attribute__ ((aligned (16)));
> 
> Be consistent about unsigned long vs unsigned long long and this will be 
> fixed.

Right -- we have some history here with these built-ins and their
signatures, which should have been designed to use long long all the
time to avoid inconsistencies between 32-bit and 64-bit targets.
Because long and long long are essentially synonyms on ppc64, I'm a bit
blind to spotting these situations.  I think I will have to use
may-alias types for these during the expansion, as Richi suggested.

Thanks to both of you for the help!

Bill

> 
> 
> r~
>

Re: Some aliasing questions

2016-04-12 Thread Bill Schmidt

On Tue, 2016-04-12 at 10:00 +0930, Alan Modra wrote:
> On Fri, Apr 08, 2016 at 01:41:05PM -0700, Richard Henderson wrote:
> > On 04/08/2016 11:10 AM, Bill Schmidt wrote:
> > > The first is an issue with TOC-relative addresses on PowerPC.  These are
> > > symbolic addresses that are to be loaded from a fixed slot in the table
> > > of contents, as addressed by the TOC pointer (r2).  In the RTL phases
> > > prior to register allocation, these are described in an UNSPEC that
> > > looks like this for an example store:
> > > 
> > > (set (mem/c:DI (unspec:DI [
> > >(symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
> > >(reg:DI 2 2)
> > >   ] UNSPEC_TOCREL) [1 svul+0 S8 A128])
> > >  (reg:DI 178))
> > > 
> > > The UNSPEC helps keep track of the r2 reference until this is split into
> > > two or more insns depending on the memory model.
> > 
> > 
> > That's why Alpha uses LO_SUM for pre-reload tracking of such things.
> > 
> > Even though that's a bit of a liberty, since there's no HIGH to go along 
> > with
> > the LO_SUM.  But at least it allows the middle-end to continue to find the 
> > symbol.
> 
> I wish I'd been made aware of the problem with alias analysis when I
> invented this scheme for -mcmodel=medium code..

It's certainly subtle.  I had to be pretty lucky to discover it, as the
only effect is to rather harmlessly say "who knows" rather than giving a
definite answer.

> 
> Back in gcc-4.3 days, when small-model code was the only option, we
> used to generate
>   mem (plus ((reg 2) (const (minus ((symbol_ref)
> (symbol_ref toc_base))
> for a toc mem reference, which accurately reflects the addressing.
> 
> The problem is that when splitting this to a high/lo_sum you lose the
> r2 reference in the lo_sum, and that allows r2 to die prematurely,
> breaking an important linker code editing optimisation.
> 
> Hmm.  Maybe if we rewrote the mem to
>   mem (plus ((symbol_ref toc_base) (const (minus ((symbol_ref)
>   (reg 2))
> It might look odd, but is no lie.  r2 is equal to toc_base.  Or
> perhaps we could lie a litte and simply omit the plus and toc_base
> reference?
> 
> Either way, when we split to
>   set (reg tmp) (high (const (minus ((symbol_ref) (reg 2)
>   .. mem (lo_sum (reg tmp) (const (minus ((symbol_ref) (reg 2)
> both high and lo_sum reference r2 and the linker could happily replace
> rtmp in the lo_sum insn with r2 when the high address is known to be
> zero.

Yes, this sounds promising.  And it really helps to know the history
here -- you saved me a lot of digging through the archives, since I
didn't want to rediscover the issue behind the present design.

> 
> Bill, do you have test cases for the alias problem?  Is this something
> that needs fixing for gcc-6?
> 

Last question first ... no, I don't think it does.  It's generally fine
for the structural aliasing to report "I don't know" and let other
checks decide whether aliasing can exist; it just isn't optimal.  I only
spotted this because getting past this check allowed me to run into a
problem in my code that was exposed in the TBAA checks afterwards.

I ran into this with an experimental patch for GCC 7.  I can send you a
copy of the patch, and point you to the test in the test suite that
exhibits the problem when that patch is applied.  I'll do that offline.

Thanks, Alan!

Bill

Re: possibly dead call from slsr_process_phi () to gimple_bb ()

2016-07-25 Thread Bill Schmidt


> On Jul 25, 2016, at 4:04 AM, Richard Biener  wrote:
> 
> On Mon, 25 Jul 2016, Prathamesh Kulkarni wrote:
> 
>> Hi,
>> I am trying to write a WIP patch to warn for dead function calls,
>> and incidentally it caught the following dead call to gimple_bb() from
>> slsr_process_phi () in gimple-ssa-strength-reduction.c:
>> 
>> if (SSA_NAME_IS_DEFAULT_DEF (arg))
>>   arg_bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
>> else
>>   gimple_bb (SSA_NAME_DEF_STMT (arg));
>> 
>> Presumably it should be:
>> arg_bb = gimple_bb (SSA_NAME_DEF_STMT (arg)) ?
> 
> Looks like so.  Bill should know.

Certainly looks that way.  I'll get that cleaned up.  Thanks for the report!

Bill

> 
> Richard.
> 
>> Thanks,
>> Prathamesh
>> 
>> 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)
>

vec_ld versus vec_vsx_ld on power8

2015-03-13 Thread Bill Schmidt

Hi Tim,

I'll discuss the loads here for simplicity; the situation for stores is
analogous.

There are a couple of differences between lvx and lxvd2x.  The most
important one is that lxvd2x supports unaligned loads, while lvx does
not.  You'll note that lvx will zero out the lower 4 bits of the
effective address in order to force an aligned load.

lxvd2x loads two doublewords into a vector register using big-endian
element order, regardless of whether the processor is running in
big-endian or little-endian mode.  That is, the first doubleword from
memory goes into the high-order bits of the vector register, and the
second doubleword goes into the low-order bits.  This is semantically
incorrect for little-endian, so the xxpermdi swaps the doublewords in
the register to correct for this.

At optimization -O1 and higher, gcc will remove many of the xxpermdi
instructions that are added to correct for LE semantics.  In many vector
computations, the lanes where the computations are performed do not
matter, so we don't have to perform the swaps.

For unaligned loads where we are unable to remove the swaps, this is
still better than the alternative using lvx.  An unaligned load requires
a four-instruction sequence to load the two aligned quadwords that
contain the desired data, set up a permutation control vector, and
combine the desired pieces of the two aligned quadwords into a vector
register.  This can be pipelined in a loop so that only one load occurs
per loop iteration, but that requires additional vector copies.  The
four-instruction sequence takes longer and increases vector register
pressure more than an lxvd2x/xxpermdi.

When the data is known to be aligned, lvx is equivalent to lxvd2x
performance if we are able to remove the permutes, and is preferable to
lxvd2x if not.

There are cases where we do not yet use lvx in lieu of lxvd2x when we
could do so and improve performance.  For example, saving and restoring
of vector parameters in a function prolog and epilog does not yet always
use lvx.  This is a performance opportunity we plan to improve in the
future.

A rule of thumb for your purposes is that if you can guarantee that you
are using aligned data, you should use vec_ld and vec_st, and otherwise
you should use vec_vsx_ld and vec_vsx_st.  Depending on your
application, it may be worthwhile to copy your data into an aligned
buffer before performing vector calculations on it.  GCC provides
attributes that will allow you to specify alignment on a 16-byte
boundary.

Note that the above discussion presumes POWER8, which is the only POWER
hardware that currently supports little-endian distributions and
applications.  Unaligned load/store performance on earlier processors
was less efficient, so the tradeoffs differ.

I hope this is helpful!

Bill Schmidt, Ph.D.
IBM Linux Technology Center

You wrote:

> I have a issue/question using VMX/VSX on Power8 processor on a little endian 
> system.
> Using intrinsics function, if I perform an operation with vec_vsx_ld(â) - 
> vet_vsx_st(), the compiler will add
> a permutation, and then perform an operations (memory correctly aligned)

> lxvd2x â
> xxpermdi â
> operations â.
> xxpermdi
> stxvd2x â

> If I use vec_ld() - vec_st()

> lvx
> operations â
> stvx

> Reading the ISA, I do not see a real difference between this 2 instructions ( 
> or I miss it)

> So my 3 questions are:

> Why do I have permutations ?
> What is the cost of these permutations ?
> What is the difference vet_vsx_ld and vec_ld  for the performance ?

Re: vec_ld versus vec_vsx_ld on power8

2015-03-13 Thread Bill Schmidt

Hi Tim,

Actually, I left out another very good reason why you may want to use
vec_vsx_ld/st.  Sorry for forgetting this.

As you saw, vec_ld translates into the lvx instruction.  This
instruction loads a sequence of 16 bytes into a vector register.  For
big endian, the first byte in memory is loaded into the high order byte
of the register.  For little endian, the first byte in memory is loaded
into the low order byte of the register.

This is fine if the data you are loading is arrays of characters, but is
not so fine if you are loading arrays of larger items.  Suppose you are
loading four integers {1, 2, 3, 4} into a register with lvx.  In big
endian you will see:

  00 00 00 01  00 00 00 02  00 00 00 03  00 00 00 04

In little endian you will see:

  04 00 00 00  03 00 00 00  02 00 00 00  01 00 00 00

But for this to be interpreted as a vector of integers ordered for
little endian, what you really want is:

  00 00 00 04  00 00 00 03  00 00 00 02  00 00 00 01

If you use vec_vsx_ld, the compiler will generate a lxvw2x instruction
followed by an xxpermdi that swaps the doublewords.  After the lxvw2x
you will have:

  00 00 00 02  00 00 00 01  00 00 00 04  00 00 00 03

because the two LE doublewords are loaded in BE (reversed) order.
Swapping the two doublewords restores sanity:

  00 00 00 04  00 00 00 03  00 00 00 02  00 00 00 01

So, even if your data is properly aligned, the use of vec_ld = lvx is
only correct if you are loading arrays of bytes.  Arrays of anything
larger must use vec_vsx_ld to avoid errors.

Again, sorry for my previous omission!

Thanks,

Bill Schmidt, Ph.D.
IBM Linux Technology Center

On Fri, 2015-03-13 at 15:42 +, Ewart Timothée wrote:
> thank you very much for this answer.
> I know my memory is aligned so I will use vec_ld/st only.
> 
> best
> 
> Tim
> 
> 
> 
> 
>

Re: vec_ld versus vec_vsx_ld on power8

2015-03-13 Thread Bill Schmidt

Hi Tim,

Sorry to have confused you.  This stuff is a bit boggling the first 200
times you look at it...

For both 32-bit and 64-bit floating-point, you should use ld_vsx_vec on
both BE and LE machines, and the compiler will take care of doing the
right thing for you in both cases.  You do not have to add any swaps
yourself.

When compiling for big-endian, ld_vsx_vec will translate into either
lxvw4x (for 32-bit floating-point) or lxvd2x (for 64-bit
floating-point).  The values will be loaded into the register from
left-to-right (BE ordering).

When compiling for little-endian, ld_vsx_vec will translate into lxvd2x
followed by xxpermdi for both 32-bit and 64-bit floating-point.  This
does the right thing in both cases.  The values will be loaded into the
register from right-to-left (LE ordering).

The vector programming model is set up to allow you to usually code the
same way for both BE and LE.  This is discussed more in Chapter 6 of the
ELFv2 ABI manual, which can be obtained from the OpenPOWER Connect
website (free registration required):

https://www-03.ibm.com/technologyconnect/tgcm/TGCMServlet.wss?alias=OpenPOWER&linkid=1n

Bill


On Fri, 2015-03-13 at 17:11 +, Ewart Timothée wrote:
> Hello,
> 
> I am super confuse now
> 
> scenario 1, what I have in m code:
> machine boots in LE.
> 
> 1) memory: LE
> 2) I load (ld_vec)
> 3) register : LE
> 4) VSU compute in LE
> 5) I store (st_vec)
> 6) memory: LE
> 
> scenario 2: ( I did not test but it is what I get if I order gcc to compiler 
> in BE)
> machine boot in BE
> 
> 1) memory: BE
> 2) I load (ld_vsx_vec)
> 3) register : BE
> 4) VSU compute in BE 
> 5) I store (st_vsx_vec)
> 6) memory: BE
> 
> At this point the VUS compute in both order
> 
> chimera scenario 3, what I understand:
> 
> machine boot in LE
> 
> 1) memory: LE
> 2) I load (ld_vsx_vec)  (the load swap the element)
> 3) register : BE
> 4) swap : LE
> 5) VSU compute in LE
> 6) swap : BE 
> 5) I store (st_vsx_vec) (the store swap the element)
> 6) memory: BE
> 
> I understand  ld/st_vsx_vec load/store from LE/BE, but as the VXU can compute
> in both mode what should I swap (I precise I am working with 32/64 bits float)
> 
> Best,
> 
> Tim
> 
> Timothée Ewart, Ph. D. 
> http://www.linkedin.com/in/tewart
> timothee.ew...@epfl.ch
> 
> 
> 
> 
> 
> 
> > Le 13 Mar 2015 à 17:50, Bill Schmidt  a écrit :
> > 
> > Hi Tim,
> > 
> > Actually, I left out another very good reason why you may want to use
> > vec_vsx_ld/st.  Sorry for forgetting this.
> > 
> > As you saw, vec_ld translates into the lvx instruction.  This
> > instruction loads a sequence of 16 bytes into a vector register.  For
> > big endian, the first byte in memory is loaded into the high order byte
> > of the register.  For little endian, the first byte in memory is loaded
> > into the low order byte of the register.
> > 
> > This is fine if the data you are loading is arrays of characters, but is
> > not so fine if you are loading arrays of larger items.  Suppose you are
> > loading four integers {1, 2, 3, 4} into a register with lvx.  In big
> > endian you will see:
> > 
> >  00 00 00 01  00 00 00 02  00 00 00 03  00 00 00 04
> > 
> > In little endian you will see:
> > 
> >  04 00 00 00  03 00 00 00  02 00 00 00  01 00 00 00
> > 
> > But for this to be interpreted as a vector of integers ordered for
> > little endian, what you really want is:
> > 
> >  00 00 00 04  00 00 00 03  00 00 00 02  00 00 00 01
> > 
> > If you use vec_vsx_ld, the compiler will generate a lxvw2x instruction
> > followed by an xxpermdi that swaps the doublewords.  After the lxvw2x
> > you will have:
> > 
> >  00 00 00 02  00 00 00 01  00 00 00 04  00 00 00 03
> > 
> > because the two LE doublewords are loaded in BE (reversed) order.
> > Swapping the two doublewords restores sanity:
> > 
> >  00 00 00 04  00 00 00 03  00 00 00 02  00 00 00 01
> > 
> > So, even if your data is properly aligned, the use of vec_ld = lvx is
> > only correct if you are loading arrays of bytes.  Arrays of anything
> > larger must use vec_vsx_ld to avoid errors.
> > 
> > Again, sorry for my previous omission!
> > 
> > Thanks,
> > 
> > Bill Schmidt, Ph.D.
> > IBM Linux Technology Center
> > 
> > On Fri, 2015-03-13 at 15:42 +, Ewart Timothée wrote:
> >> thank you very much for this answer.
> >> I know my memory is aligned so I will use vec_ld/st only.
> >> 
> >> best
> >> 
> >> Tim
> >> 
> >> 
> >> 
> >> 
> >> 
> > 
> > 
>

Re: [RFC][GCC][rs6000] Remaining work for inline expansion of strncmp/strcmp/memcmp for powerpc

2018-12-03 Thread Bill Schmidt

On 12/3/18 8:34 AM, Florian Weimer wrote:
> * Aaron Sawdey:
>
>> If you are aware of any real world code that is faster when built
>> with -fno-builtin-strcmp and/or -fno-builtin-strncmp, please let me know
>> so I can look at avoiding those situations.
> Sorry, I have not tried to benchmark this.
>
> One more question: There's a hardware erratum on POWER9 DD2.1 related to
> VSX load instructions causing memory corruption when accessing
> cache-inhibited memory.  It may not be very likely that strncmp is used
> on such memory, but memcpy and memmove definitely need to take that into
> account, and perhaps memset and memcmp as well.
>
> In the past, I did not receive positive feedback for my suggestion that
> we should have a separate family of string functions for device memory.
> (This is a general problem that is not specific to POWER.)  So we still
> have the problem that at least some of the string functions in glibc
> need to be compatible with device memory.
>
> My concern here is that the GCC inline expansion could essentially
> disable the workaround we have in glibc memcpy and memmove for the
> hardware erratum.

I don't think we have a real concern here.  DD2.1 is used in a particular
situation where GCC 4.8.5 is the supported compiler, and not used elsewhere.
So I'd prefer not to cripple the compiler for this specific use case.  If
the customer with DD2.1 hardware chooses to use GCC 8 or later, and runs
into this problem, they can use -mno-builtin-mem{set,cmp} as a workaround.
Do you feel that's satisfactory?

We can also have a private discussion if you feel that's warranted.

Thanks,
Bill

>
> Thanks,
> Florian
>

Re: -Wformat-diag: floating-point or floating point?

2019-05-21 Thread Bill Schmidt

On 5/21/19 11:47 AM, Martin Sebor wrote:
> The GCC coding style says to use "floating-point" as an adjective
> rather than "floating point."  After enhancing the -Wformat-diag
> checker to detect this I found a bunch of uses of the latter, such
> as in:
>
>   gcc/c/c-decl.c:10944
>   gcc/c/c-parser.c:9423, 9446, 9450, etc.
>   gcc/convert.c:418, 422
>   gcc/cp/call.c:5070
>   gcc/cp/cvt.c:886
>
> Before I fix them all and adjust the tests, I want to make sure
> we really want to follow this rule.  The C standard uses both
> interchangeably.  With just one exception, the C++ standard uses
> the hyphenated form.
The hyphenated form is correct English, so I certainly prefer it. :-)

Bill
>
> Thanks
> Martin
>

Re: -Wformat-diag: floating-point or floating point?

2019-05-22 Thread Bill Schmidt

On 5/22/19 5:19 AM, Richard Earnshaw (lists) wrote:
> On 21/05/2019 21:18, Bill Schmidt wrote:
>> On 5/21/19 11:47 AM, Martin Sebor wrote:
>>> The GCC coding style says to use "floating-point" as an adjective
>>> rather than "floating point."  After enhancing the -Wformat-diag
>>> checker to detect this I found a bunch of uses of the latter, such
>>> as in:
>>>
>>>   gcc/c/c-decl.c:10944
>>>   gcc/c/c-parser.c:9423, 9446, 9450, etc.
>>>   gcc/convert.c:418, 422
>>>   gcc/cp/call.c:5070
>>>   gcc/cp/cvt.c:886
>>>
>>> Before I fix them all and adjust the tests, I want to make sure
>>> we really want to follow this rule.  The C standard uses both
>>> interchangeably.  With just one exception, the C++ standard uses
>>> the hyphenated form.
>> The hyphenated form is correct English, so I certainly prefer it. :-)
>>
> It's not quite as simple as that.  Hyphens should be used to make it
> clear what is the adjective and what is the noun:
>
>A floating-point number (hyphenated) is a number with a
>floating point (no hyphen).
>
> In the first case 'floating-point' is the adjective and qualifies
> number.  In the second case 'floating' is the adjective and qualifies
> 'point'.
>
> But this is English, so there are probably some exceptions even then -
> but not in this case, I think.  :-)

English is always fun, agreed -- Martin cited the requirement to use
"floating-point" when it's used as an adjective, which is certainly correct.

There's a more interesting question around cavalier usage such as,
"We should use floating point."  I would argue that there is an implied
noun "arithmetic" modified here, so this should also be hyphenated,
but I daresay there would be people on both sides of this one...

This is why grammar police usually die from friendly fire. :-)

Bill
>
> R.
>

Re: -Wformat-diag: floating-point or floating point?

2019-05-22 Thread Bill Schmidt

On 5/22/19 9:58 AM, Martin Sebor wrote:
> On 5/22/19 6:27 AM, Richard Earnshaw (lists) wrote:
>> On 22/05/2019 13:17, Bill Schmidt wrote:
>>> On 5/22/19 5:19 AM, Richard Earnshaw (lists) wrote:
>>>> On 21/05/2019 21:18, Bill Schmidt wrote:
>>>>> On 5/21/19 11:47 AM, Martin Sebor wrote:
>>>>>> The GCC coding style says to use "floating-point" as an adjective
>>>>>> rather than "floating point."  After enhancing the -Wformat-diag
>>>>>> checker to detect this I found a bunch of uses of the latter, such
>>>>>> as in:
>>>>>>
>>>>>>    gcc/c/c-decl.c:10944
>>>>>>    gcc/c/c-parser.c:9423, 9446, 9450, etc.
>>>>>>    gcc/convert.c:418, 422
>>>>>>    gcc/cp/call.c:5070
>>>>>>    gcc/cp/cvt.c:886
>>>>>>
>>>>>> Before I fix them all and adjust the tests, I want to make sure
>>>>>> we really want to follow this rule.  The C standard uses both
>>>>>> interchangeably.  With just one exception, the C++ standard uses
>>>>>> the hyphenated form.
>>>>> The hyphenated form is correct English, so I certainly prefer it. :-)
>>>>>
>>>> It's not quite as simple as that.  Hyphens should be used to make it
>>>> clear what is the adjective and what is the noun:
>>>>
>>>>     A floating-point number (hyphenated) is a number with a
>>>>     floating point (no hyphen).
>>>>
>>>> In the first case 'floating-point' is the adjective and qualifies
>>>> number.  In the second case 'floating' is the adjective and qualifies
>>>> 'point'.
>>>>
>>>> But this is English, so there are probably some exceptions even then -
>>>> but not in this case, I think.  :-)
>>>
>>> English is always fun, agreed -- Martin cited the requirement to use
>>> "floating-point" when it's used as an adjective, which is certainly
>>> correct.
>>>
>>> There's a more interesting question around cavalier usage such as,
>>> "We should use floating point."  I would argue that there is an implied
>>> noun "arithmetic" modified here, so this should also be hyphenated,
>>> but I daresay there would be people on both sides of this one...
>>
>> I would argue that leaving out "arithmetic" is the error. :-)
>
> I agree.  Unfortunately, there are a few cases like that among
> the diagnostics that my script has already fixed:
>
>   decimal floating point not supported
>   comparing floating point with %<==%> or % is unsafe
>   ISO C does not support decimal floating point
>
> They probably should read
>
>   decimal floating point types not supported
>   comparing floating-point values with %<==%> or % is unsafe
>   ISO C does not support decimal floating point types

"decimal floating point types" does use "floating-point" to modify
"types", so if you change those they should probably remain hyphenated. 
Technically the whole phrase "decimal floating point" modifies types
together, and should be hyphenated together, but that just looks fussy
and isn't common practice.  None of those details are going to solve
world hunger. :-)  Thanks for fixing the general problem!  For the edge
cases, Richard's optimization looks better and better. :-P

Bill
>
> I think they can be adjusted later if we think it's important,
> after the checker is finalized and committed.  I don't want to
> complicate it too much by having to differentiate between
> adjectives and nouns.  The vast majority of the "floating point"
> instances is has found are adjectives.
>
> Martin
>
>
>>> This is why grammar police usually die from friendly fire. :-)
>>>
>>
>> Sticking your head above the parapet is always fraught with danger :)
>>
>>
>> R.
>>
>

Re: SPEC 2017 profiling question (502.gcc_r and 505.mcf_r fail)

2019-10-04 Thread Bill Schmidt


On 10/4/19 10:13 AM, Steve Ellcey wrote:

I am curious if anyone has tried running 'peak' SPEC 2017 numbers using
profiling.  Now that the cactus lto bug has been fixed I can run all
the SPEC intrate and fprate benchmarks with '-Ofast -flto -march=native'
on my aarch64 box and get accurate results but when I try to use these
options along with -fprofile-generate/-fprofile-use I get two
verification errors: 502.gcc_r and 505.mcf_r. The gcc benchmark is
generating different assembly language for some of its tests and mcf is
generating different numbers that look too large to just be due to
unsafe math optimizations.

Has anyone else seen these failures?



Have you tried -fno-strict-aliasing?  There is a known issue with 
spec_qsort() that affects both of these benchmarks.  See 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201.


Hope this helps,

Bill



Steve Ellcey
sell...@marvell.com

Re: GCC 7.5 Release Candidate available from gcc.gnu.org

2019-11-07 Thread Bill Schmidt


That second set of failures occurs already on 7.4.1...

On 11/7/19 5:48 AM, Matthias Klose wrote:

On 05.11.19 13:45, Richard Biener wrote:


The first release candidate for GCC 7.5 is available from

  https://gcc.gnu.org/pub/gcc/snapshots/7.5.0-RC-20191105/

and shortly its mirrors.  It has been generated from SVN revision 
277823.


I have so far bootstrapped and tested the release candidate on
{x86_64,i586,ppc64le,s390x,aarch64}-linux.  Please test it
and report any issues to bugzilla.

If all goes well, I'd like to release 7.5 on Thursday, November 14th.


With a distribution build (Ubuntu) on amd64, i386, armhf, arm64, 
ppc64el and s390x, I don't see any regressions in the GCC testsuite 
(compared to 7.4.0), except for two issues on ppc64el:


FAIL: gcc.target/powerpc/pr87532.c (test for excess errors)
Excess errors:
/build/gcc-7-8odB_r/gcc-7-7.4.0/src/gcc/testsuite/gcc.target/powerpc/pr87532.c:45:27: 
warning: format '%d' expects argument of type 'int', but argument 2 
has type 'size_t {aka long unsigned int}' [-Wformat=]


is a new test, and only caused by default hardening settings.

PASS: gcc.dg/vect/slp-perm-4.c execution test
FAIL: gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "vectorized 1 
loops" 1
PASS: gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "gaps 
requires scalar epilogue loop" 0
FAIL: gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "vectorizing 
stmts using SLP" 1


Matthias

Re: GCC 7.5 Release Candidate available from gcc.gnu.org

2019-11-07 Thread Bill Schmidt

Er, sorry, I guess that is saying the same thing as it is broken in 
7.5.  Oops.


On 11/7/19 9:24 AM, Bill Schmidt wrote:

That second set of failures occurs already on 7.4.1...

On 11/7/19 5:48 AM, Matthias Klose wrote:

On 05.11.19 13:45, Richard Biener wrote:


The first release candidate for GCC 7.5 is available from

  https://gcc.gnu.org/pub/gcc/snapshots/7.5.0-RC-20191105/

and shortly its mirrors.  It has been generated from SVN revision 
277823.


I have so far bootstrapped and tested the release candidate on
{x86_64,i586,ppc64le,s390x,aarch64}-linux.  Please test it
and report any issues to bugzilla.

If all goes well, I'd like to release 7.5 on Thursday, November 14th.


With a distribution build (Ubuntu) on amd64, i386, armhf, arm64, 
ppc64el and s390x, I don't see any regressions in the GCC testsuite 
(compared to 7.4.0), except for two issues on ppc64el:


FAIL: gcc.target/powerpc/pr87532.c (test for excess errors)
Excess errors:
/build/gcc-7-8odB_r/gcc-7-7.4.0/src/gcc/testsuite/gcc.target/powerpc/pr87532.c:45:27: 
warning: format '%d' expects argument of type 'int', but argument 2 
has type 'size_t {aka long unsigned int}' [-Wformat=]


is a new test, and only caused by default hardening settings.

PASS: gcc.dg/vect/slp-perm-4.c execution test
FAIL: gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "vectorized 
1 loops" 1
PASS: gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "gaps 
requires scalar epilogue loop" 0
FAIL: gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "vectorizing 
stmts using SLP" 1


Matthias

Re: GCC 7.5 Release Candidate available from gcc.gnu.org

2019-11-08 Thread Bill Schmidt


On 11/7/19 5:48 AM, Matthias Klose wrote:

On 05.11.19 13:45, Richard Biener wrote:


The first release candidate for GCC 7.5 is available from

  https://gcc.gnu.org/pub/gcc/snapshots/7.5.0-RC-20191105/

and shortly its mirrors.  It has been generated from SVN revision 
277823.


I have so far bootstrapped and tested the release candidate on
{x86_64,i586,ppc64le,s390x,aarch64}-linux.  Please test it
and report any issues to bugzilla.

If all goes well, I'd like to release 7.5 on Thursday, November 14th.


With a distribution build (Ubuntu) on amd64, i386, armhf, arm64, 
ppc64el and s390x, I don't see any regressions in the GCC testsuite 
(compared to 7.4.0), except for two issues on ppc64el:


FAIL: gcc.target/powerpc/pr87532.c (test for excess errors)
Excess errors:
/build/gcc-7-8odB_r/gcc-7-7.4.0/src/gcc/testsuite/gcc.target/powerpc/pr87532.c:45:27: 
warning: format '%d' expects argument of type 'int', but argument 2 
has type 'size_t {aka long unsigned int}' [-Wformat=]


is a new test, and only caused by default hardening settings.

PASS: gcc.dg/vect/slp-perm-4.c execution test
FAIL: gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "vectorized 1 
loops" 1
PASS: gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "gaps 
requires scalar epilogue loop" 0
FAIL: gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "vectorizing 
stmts using SLP" 1



I finally bisected this to r275208:


2019-08-30  Richard Biener  mailto:rguenther%40suse.de>>

Backport from mainline
2019-05-27  Richard Biener  mailto:rguenther%40suse.de>>

PR tree-optimization/90637
* tree-ssa-sink.c (statement_sink_location): Honor the
computed sink location for single-uses.

* gcc.dg/gomp/pr90637.c: New testcase.

2019-06-21  Richard Biener  mailto:rguenther%40suse.de>>

PR tree-optimization/90930
* tree-ssa-reassoc.c (rewrite_expr_tree_parallel): Set visited
flag on new stmts to avoid re-processing them.

2019-05-15  Richard Biener  mailto:rguenther%40suse.de>>

PR c/90474
* c-common.c (c_common_mark_addressable_vec): Also mark
a COMPOUND_LITERAL_EXPR_DECL addressable similar to
c_mark_addressable.

2019-04-25  Richard Biener  mailto:rguenther%40suse.de>>

PR middle-end/90194
* match.pd: Add pattern to simplify view-conversion of an
empty constructor.

* g++.dg/torture/pr90194.C: New testcase.

2019-04-24  Richard Biener  mailto:rguenther%40suse.de>>

PR middle-end/90213
* gimple-fold.c (fold_const_aggregate_ref_1): Do multiplication
by size and BITS_PER_UNIT on poly-wide-ints.

2019-04-15  Richard Biener  mailto:rguenther%40suse.de>>

PR tree-optimization/90071
* tree-ssa-reassoc.c (init_range_entry): Do not pick up
abnormal operands from def stmts.

* gcc.dg/torture/pr90071.c: New testcase.

2019-03-13  Richard Biener  mailto:rguenther%40suse.de>>

PR middle-end/89677
* tree-scalar-evolution.c (simplify_peeled_chrec): Do not
throw FP expressions at tree-affine.

* gcc.dg/torture/pr89677.c: New testcase.


This looks rather familiar, actually.  I seem to recall an SLP 
degradation from a change to tree-ssa-sink.c on trunk this release.  
Richi, could there be a missing backport here?



Bill




Matthias

Re: GCC 7.5 Release Candidate available from gcc.gnu.org

2019-11-11 Thread Bill Schmidt


On 11/11/19 7:26 AM, Richard Biener wrote:

On Fri, 8 Nov 2019, Bill Schmidt wrote:


On 11/7/19 5:48 AM, Matthias Klose wrote:

On 05.11.19 13:45, Richard Biener wrote:

The first release candidate for GCC 7.5 is available from

   https://gcc.gnu.org/pub/gcc/snapshots/7.5.0-RC-20191105/

and shortly its mirrors.  It has been generated from SVN revision 277823.

I have so far bootstrapped and tested the release candidate on
{x86_64,i586,ppc64le,s390x,aarch64}-linux.  Please test it
and report any issues to bugzilla.

If all goes well, I'd like to release 7.5 on Thursday, November 14th.

With a distribution build (Ubuntu) on amd64, i386, armhf, arm64, ppc64el and
s390x, I don't see any regressions in the GCC testsuite (compared to 7.4.0),
except for two issues on ppc64el:

FAIL: gcc.target/powerpc/pr87532.c (test for excess errors)
Excess errors:
/build/gcc-7-8odB_r/gcc-7-7.4.0/src/gcc/testsuite/gcc.target/powerpc/pr87532.c:45:27:
warning: format '%d' expects argument of type 'int', but argument 2 has type
'size_t {aka long unsigned int}' [-Wformat=]

is a new test, and only caused by default hardening settings.

PASS: gcc.dg/vect/slp-perm-4.c execution test
FAIL: gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "vectorized 1
loops" 1
PASS: gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "gaps requires
scalar epilogue loop" 0
FAIL: gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "vectorizing
stmts using SLP" 1


I finally bisected this to r275208:


2019-08-30  Richard Biener  mailto:rguenther%40suse.de>>

Backport from mainline
2019-05-27  Richard Biener  mailto:rguenther%40suse.de>>

PR tree-optimization/90637
* tree-ssa-sink.c (statement_sink_location): Honor the
computed sink location for single-uses.

* gcc.dg/gomp/pr90637.c: New testcase.

2019-06-21  Richard Biener  mailto:rguenther%40suse.de>>

PR tree-optimization/90930
* tree-ssa-reassoc.c (rewrite_expr_tree_parallel): Set visited
flag on new stmts to avoid re-processing them.

2019-05-15  Richard Biener  mailto:rguenther%40suse.de>>

PR c/90474
* c-common.c (c_common_mark_addressable_vec): Also mark
a COMPOUND_LITERAL_EXPR_DECL addressable similar to
c_mark_addressable.

2019-04-25  Richard Biener  mailto:rguenther%40suse.de>>

PR middle-end/90194
* match.pd: Add pattern to simplify view-conversion of an
empty constructor.

* g++.dg/torture/pr90194.C: New testcase.

2019-04-24  Richard Biener  mailto:rguenther%40suse.de>>

PR middle-end/90213
* gimple-fold.c (fold_const_aggregate_ref_1): Do multiplication
by size and BITS_PER_UNIT on poly-wide-ints.

2019-04-15  Richard Biener  mailto:rguenther%40suse.de>>

PR tree-optimization/90071
* tree-ssa-reassoc.c (init_range_entry): Do not pick up
abnormal operands from def stmts.

* gcc.dg/torture/pr90071.c: New testcase.

2019-03-13  Richard Biener  mailto:rguenther%40suse.de>>

PR middle-end/89677
* tree-scalar-evolution.c (simplify_peeled_chrec): Do not
throw FP expressions at tree-affine.

* gcc.dg/torture/pr89677.c: New testcase.


This looks rather familiar, actually.  I seem to recall an SLP degradation
from a change to tree-ssa-sink.c on trunk this release.  Richi, could there be
a missing backport here?

Not sure - it's reassoc that messes up things here and a
--param tree-reassoc-width=1 "fixes" the failure.  For PR90930 I
restricted this to the last pass instance (but only on trunk).
Does it also fail on the GCC 8 and 9 branches?  Ah, on GCC 8 at least
the default target setting for this seems to be 1 (it's non-FP,
maybe you changed that), with explicit --param tree-reassoc-width={2,3,4}
it also fails the same way.



OK; yes, I think one of our team did some refining of the reassoc 
parameters in that timeframe, so this makes sense.




It's a bit late to try thinking about backporting this change
but I'll now consider it for GCC 9 at least.

So IMHO a latent issue, somehow the rev. triggered "inconsistent"
reassoc for the testcase.  I'm going to leave it as-is for GCC 7.5
(with the testsuite regression).

Are you fine with that?  An explicit --param tree-reassoc-width=1
on the testcase also would work for me if you prefer that.



I am fine with leaving the testcase regressed; we have a good 
explanation and this isn't a serious issue for users.  Thanks for 
investigating!


Bill



Thanks,
Richard.

Re: -fsanitize=thread support on ppc64

2017-01-23 Thread Bill Schmidt

TSan support was contributed to LLVM by a student working at one of the US 
National Labs a while back.  I helped him with some of the PPC assembly
programming.  To my knowledge this is working, but I haven't tested this with
GCC.  Do you think we want to change the configuration for GCC this late in the
release?  I can run a quick test with TSan turned on to see where we're at.

-- Bill

Bill Schmidt, Ph.D.
GCC for Linux on Power
Linux on Power Toolchain
IBM Linux Technology Center
wschm...@linux.vnet.ibm.com

> On Jan 23, 2017, at 6:53 AM, Maxim Ostapenko  wrote:
> 
> Hi,
> 
> On 23/01/17 14:33, Jakub Jelinek wrote:
>> Hi!
>> 
>> I've noticed today there is tsan_rtl_ppc64.S file since the latest
>> merge from upstream.  Does that mean tsan is supposed to work
>> on ppc64?  Just powerpc64le-*-linux*, or powerpc64-*-linux* too?
> 
> FWIW LLVM has build bots for both ppc64le-linux and ppc64be-linux, see:
> http://lab.llvm.org:8011/builders/sanitizer-ppc64le-linux
> http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux
> 
> Thus TSan is supposed to work on ppc64, I guess.
> 
> -Maxim
> 
>> If yes, then libsanitizer/configure.tgt should be changed to reflect that
>> change.
>> 
>>  Jakub
>> 
>> 
>

Re: -fsanitize=thread support on ppc64

2017-01-23 Thread Bill Schmidt


> On Jan 23, 2017, at 8:32 AM, Jakub Jelinek  wrote:
> 
> On Mon, Jan 23, 2017 at 08:22:30AM -0600, Bill Schmidt wrote:
>> TSan support was contributed to LLVM by a student working at one of the US 
>> National Labs a while back.  I helped him with some of the PPC assembly
>> programming.  To my knowledge this is working, but I haven't tested this with
>> GCC.  Do you think we want to change the configuration for GCC this late in 
>> the
>> release?  I can run a quick test with TSan turned on to see where we're at.
> 
> I think it should be enabled if it works, even this late.
> I bet we need something like the following patch on top of
> the PR79168 patch.
> 
> I'll test both patches on both ppc64le and ppc64.

Sounds good, thanks!  Let me know if I can help in any way.

> 
> Another question is, it seems upstream has s390{,x}-*-linux* support for
> asan/ubsan, does that work?  In that case we should add it to configure.tgt
> too (similarly to the sparc*-*-linux* entry).

CCing Uli for the s390 question.

Bill

> 
> 2017-01-23  Jakub Jelinek  
> 
>   * configure.tgt: Enable tsan and lsan on powerpc64{,le}-*-linux*.
> 
> --- libsanitizer/configure.tgt.jj 2016-11-09 15:22:50.0 +0100
> +++ libsanitizer/configure.tgt2017-01-23 15:25:21.059399613 +0100
> @@ -1,5 +1,5 @@
> # -*- shell-script -*-
> -#   Copyright (C) 2012 Free Software Foundation, Inc.
> +#   Copyright (C) 2012-2017 Free Software Foundation, Inc.
> 
> # This program is free software; you can redistribute it and/or modify
> # it under the terms of the GNU General Public License as published by
> @@ -31,6 +31,11 @@ case "${target}" in
>   fi
>   ;;
>   powerpc*-*-linux*)
> + if test x$ac_cv_sizeof_void_p = x8; then
> + TSAN_SUPPORTED=yes
> + LSAN_SUPPORTED=yes
> + TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_ppc64.lo
> + fi
>   ;;
>   sparc*-*-linux*)
>   ;;
> 
> 
>   Jakub
>

Re: -fsanitize=thread support on ppc64

2017-01-23 Thread Bill Schmidt


> On Jan 23, 2017, at 8:32 AM, Jakub Jelinek  wrote:
> 
> On Mon, Jan 23, 2017 at 08:22:30AM -0600, Bill Schmidt wrote:
>> TSan support was contributed to LLVM by a student working at one of the US 
>> National Labs a while back.  I helped him with some of the PPC assembly
>> programming.  To my knowledge this is working, but I haven't tested this with
>> GCC.  Do you think we want to change the configuration for GCC this late in 
>> the
>> release?  I can run a quick test with TSan turned on to see where we're at.
> 
> I think it should be enabled if it works, even this late.
> I bet we need something like the following patch on top of
> the PR79168 patch.
> 
> I'll test both patches on both ppc64le and ppc64.
> 
> Another question is, it seems upstream has s390{,x}-*-linux* support for
> asan/ubsan, does that work?  In that case we should add it to configure.tgt
> too (similarly to the sparc*-*-linux* entry).
> 
> 2017-01-23  Jakub Jelinek  
> 
>   * configure.tgt: Enable tsan and lsan on powerpc64{,le}-*-linux*.
> 
> --- libsanitizer/configure.tgt.jj 2016-11-09 15:22:50.0 +0100
> +++ libsanitizer/configure.tgt2017-01-23 15:25:21.059399613 +0100
> @@ -1,5 +1,5 @@
> # -*- shell-script -*-
> -#   Copyright (C) 2012 Free Software Foundation, Inc.
> +#   Copyright (C) 2012-2017 Free Software Foundation, Inc.
> 
> # This program is free software; you can redistribute it and/or modify
> # it under the terms of the GNU General Public License as published by
> @@ -31,6 +31,11 @@ case "${target}" in
>   fi
>   ;;
>   powerpc*-*-linux*)

I think you want a separate entry for powerpc64*-*-linux* -- IIRC, the 
existing code will definitely not work for 32-bit due to TLS differences.
Thus be sure we don't enable TSAN for powerpc-*-linux.

Bill

> + if test x$ac_cv_sizeof_void_p = x8; then
> + TSAN_SUPPORTED=yes
> + LSAN_SUPPORTED=yes
> + TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_ppc64.lo
> + fi
>   ;;
>   sparc*-*-linux*)
>   ;;
> 
> 
>   Jakub
>

Re: -fsanitize=thread support on ppc64

2017-01-23 Thread Bill Schmidt


> On Jan 23, 2017, at 8:47 AM, Jakub Jelinek  wrote:
> 
> On Mon, Jan 23, 2017 at 08:45:16AM -0600, Bill Schmidt wrote:
>>> 2017-01-23  Jakub Jelinek  
>>> 
>>> * configure.tgt: Enable tsan and lsan on powerpc64{,le}-*-linux*.
>>> 
>>> --- libsanitizer/configure.tgt.jj   2016-11-09 15:22:50.0 +0100
>>> +++ libsanitizer/configure.tgt  2017-01-23 15:25:21.059399613 +0100
>>> @@ -1,5 +1,5 @@
>>> # -*- shell-script -*-
>>> -#   Copyright (C) 2012 Free Software Foundation, Inc.
>>> +#   Copyright (C) 2012-2017 Free Software Foundation, Inc.
>>> 
>>> # This program is free software; you can redistribute it and/or modify
>>> # it under the terms of the GNU General Public License as published by
>>> @@ -31,6 +31,11 @@ case "${target}" in
>>> fi
>>> ;;
>>>  powerpc*-*-linux*)
>> 
>> I think you want a separate entry for powerpc64*-*-linux* -- IIRC, the 
>> existing code will definitely not work for 32-bit due to TLS differences.
>> Thus be sure we don't enable TSAN for powerpc-*-linux.
> 
> That is handled by the
> 
>>> +   if test x$ac_cv_sizeof_void_p = x8; then
> 
> test (similarly how for both i?86-*-linux* and x86_64-*-linux* it is enabled
> only for LP64 multilib and not others).  We want to enable it only for the
> 64-bit multilib, not 32-bit.

Ah, quite right.  Sorry for the sloppy reading.

Bill

> 
>>> +   TSAN_SUPPORTED=yes
>>> +   LSAN_SUPPORTED=yes
>>> +   TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_ppc64.lo
>>> +   fi
>>> ;;
>>>  sparc*-*-linux*)
>>> ;;
> 
>   Jakub
>

Re: lvx versus lxvd2x on power8

2017-04-11 Thread Bill Schmidt

Hi Igor,

(Apologies for not threading this, I haven't received my digest for this
list yet)

You wrote:

>I recently checked this old discussion about when/why to use lxvd2x instead of 
>lvsl/lvx/vperm/lvx to load elements from memory to vector: 
>https://gcc.gnu.org/ml/gcc/2015-03/msg00135.html

>I had the same doubt and I was also concerned how performance influences on 
>these 
>approaches. So that, I created the following project to check which one is 
>faster 
>and how memory alignment can influence on results:

>https://github.com/PPC64/load_vec_cmp

>This is a simple code, that many loads (using both approaches) are executed in 
>a 
>simple loop in order to measure which implementation is slower. The project 
>also 
>considers alignment.

>As it can be seen on this plot 
>(https://raw.githubusercontent.com/igorsnunes/load_vec_cmp/master/doc/LoadVecCompare.png)
>an unaligned load using lxvd2x takes more time.

>The previous discussion (as far as I could see) addresses that lxvd2x performs 
>better than lvsl/lvx/vperm/lvx in all cases. Is that correct? Is my analysis 
>wrong?

>This issue concerned me, once lxvd2x is heavily used on compiled code.

One problem with your analysis is that you are forcing the use of the xxswapd
following the lxvd2x.  Although this is technically required for a load in
isolation to place elements in the correct lanes, in practice the compiler is
able to remove almost all of the xxswapd instructions during optimization.  Most
SIMD code does not care about which lanes are used for calculation, so long as
results in memory are placed properly.  For computations that do care, we can
often adjust the computations to still allow the swaps to be removed.  So your
analysis does not show anything about how code is produced in practice.

Another issue is that you're throwing away the results of the loads, which isn't
a particularly useful way to measure the costs of the latencies of the
instructions.  Typically with the pipelined lvx implementation, you will have
an lvx feeding the vperm feeding at least one use of the loaded value in each 
iteration of the loop, while with lxvd2x and optimization you will only have an 
lxvd2x feeding the use(s).  The latter is easier for the scheduler to cover 
latencies in most cases.

Finally, as a rule of thumb, these kind of "loop kernels" are really bad for
predicting performance, particularly on POWER.

In the upcoming POWER9 processors, the swap issue goes away entirely, as we will
have true little-endian unaligned loads (the indexed-form lxvx to replace 
lxvd2x/
xxswapd, and the offset-form lxv to reduce register pressure).

Now, you will of course see slightly worse unaligned performance for lxvd2x
versus aligned performance for lxvd2x.  This happens at specific crossing
points where the hardware has to work a bit harder.

I hate to just say "trust me" but I want you to understand that we have been
looking at these kinds of performance issues for several years.  This does
not mean that there are no cases where the pipelined lvx solution works better
for a particular loop, but if you let the compiler optimize it (or do similar
optimization in your own assembly code), lxvd2x is almost always better.

Thanks,
Bill

Re: Question on Gimple canonicalization

2013-04-12 Thread Bill Schmidt

On Fri, 2013-04-12 at 15:51 +0100, Sofiane Naci wrote:
> Hi,
> 
> Consider the following sequence, which computes 2 addresses to access an
> array:
> 
>   _2 = (long unsigned int) i_1(D);
>   _3 = _2 * 200;
>   _4 = _3 + 1000;
>   _6 = A2_5(D) + _4;
>   *_6[0] = 1;
>   _9 = _3 + 2000;
>   _10 = A2_5(D) + _9;
>   _11 = _2 * 4;
>   _13 = A1_12(D) + _11;
>   _14 = *_13;
>   *_10[0] = _14;
> 
> 
> There is an opportunity for optimization here that the compiler misses,
> probably due to the order of Gimple statements. If we rewrite
> 
>   _3 = _2 * 200;
>   _4 = _3 + 1000;
>   _6 = A2_5(D) + _4;
>   ...
>   _9 = _3 + 2000;
>   _10 = A2_5(D) + _9;
> 
> as
> 
>   _3 = _2 * 200;
>   _4 = _3 + A2_5(D);
>   _6 = 1000 + _4;
>   ...
>   _9 = _3 + A2_5(D);
>   _10 = 1000 + _9;
> 
> We can clearly omit instruction _9.
> 
> As the widening multiply pass has been improved to consider constant
> operands [1], this opportunity for optimization is lost as the widening
> multiply pass converts the sequence into:
> 
>   _3 = i_1(D) w* 200;
>   _4 = WIDEN_MULT_PLUS_EXPR ;
>   _6 = A2_5(D) + _4;
>   ...
>   _9 = WIDEN_MULT_PLUS_EXPR ;
>   _10 = A2_5(D) + _9;
> 
> 
> With this particular example, this causes a Dhrystone regression at the
> AArch64 back end.
> 
> Where in the front end could such an optimization take place? 
> 
> Bill, is this something that your Strength Reduction work [2] could be
> addressing?

Hm, no, this isn't really a strength reduction issue.  You're not
wanting to remove an unwanted multiply.  This is more of a
reassociation/value numbering concern.  Essentially you have:

  = A2 + ((i * 200) + 1000)
  = A2 + ((i * 200) + 2000)

which you'd like to reassociate into

  = (A2 + (i * 200)) + 1000
  = (A2 + (i * 200)) + 2000

so the parenthesized expression can be seen as available by value
numbering:

  T = A2 + (i * 200)
= T + 1000
= T + 2000

But reassociation just looks at a single expression tree and doesn't
know about the potential optimization.  I'm not sure this fits well into
any of the existing passes, but others will have more authoritative
answers than me...

Bill

> 
> Thanks
> Sofiane
> 
> -
> 
> [1] http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01751.html
> [2]
> http://gcc.gnu.org/wiki/cauldron2012?action=AttachFile&do=get&target=wschmid
> t.pdf
> 
> 
> 
> 
>

Re: Question on Gimple canonicalization

2013-04-12 Thread Bill Schmidt

On Fri, 2013-04-12 at 11:18 -0500, Bill Schmidt wrote:
> On Fri, 2013-04-12 at 15:51 +0100, Sofiane Naci wrote:
> > Hi,
> > 
> > Consider the following sequence, which computes 2 addresses to access an
> > array:
> > 
> >   _2 = (long unsigned int) i_1(D);
> >   _3 = _2 * 200;
> >   _4 = _3 + 1000;
> >   _6 = A2_5(D) + _4;
> >   *_6[0] = 1;
> >   _9 = _3 + 2000;
> >   _10 = A2_5(D) + _9;
> >   _11 = _2 * 4;
> >   _13 = A1_12(D) + _11;
> >   _14 = *_13;
> >   *_10[0] = _14;
> > 
> > 
> > There is an opportunity for optimization here that the compiler misses,
> > probably due to the order of Gimple statements. If we rewrite
> > 
> >   _3 = _2 * 200;
> >   _4 = _3 + 1000;
> >   _6 = A2_5(D) + _4;
> >   ...
> >   _9 = _3 + 2000;
> >   _10 = A2_5(D) + _9;
> > 
> > as
> > 
> >   _3 = _2 * 200;
> >   _4 = _3 + A2_5(D);
> >   _6 = 1000 + _4;
> >   ...
> >   _9 = _3 + A2_5(D);
> >   _10 = 1000 + _9;
> > 
> > We can clearly omit instruction _9.
> > 
> > As the widening multiply pass has been improved to consider constant
> > operands [1], this opportunity for optimization is lost as the widening
> > multiply pass converts the sequence into:
> > 
> >   _3 = i_1(D) w* 200;
> >   _4 = WIDEN_MULT_PLUS_EXPR ;
> >   _6 = A2_5(D) + _4;
> >   ...
> >   _9 = WIDEN_MULT_PLUS_EXPR ;
> >   _10 = A2_5(D) + _9;
> > 
> > 
> > With this particular example, this causes a Dhrystone regression at the
> > AArch64 back end.
> > 
> > Where in the front end could such an optimization take place? 
> > 
> > Bill, is this something that your Strength Reduction work [2] could be
> > addressing?
> 
> Hm, no, this isn't really a strength reduction issue.  You're not
> wanting to remove an unwanted multiply.  This is more of a
> reassociation/value numbering concern.  Essentially you have:
> 
>   = A2 + ((i * 200) + 1000)
>   = A2 + ((i * 200) + 2000)
> 
> which you'd like to reassociate into
> 
>   = (A2 + (i * 200)) + 1000
>   = (A2 + (i * 200)) + 2000
> 
> so the parenthesized expression can be seen as available by value
> numbering:
> 
>   T = A2 + (i * 200)
> = T + 1000
> = T + 2000
> 
> But reassociation just looks at a single expression tree and doesn't
> know about the potential optimization.  I'm not sure this fits well into
> any of the existing passes, but others will have more authoritative
> answers than me...

All this said, it's not completely foreign to how the strength reduction
pass is structured.  The problem is that strength reduction looks for
candidates of very restricted patterns, which keeps compile time down
and avoids deep searching:  (a * x) + b or a * (x + b).  Your particular
case adds only one more addend, but the number of ways that can be
reassociated immediately adds a fair amount of complexity.  If your
example is extended to a two-dimensional array case, it becomes more
complex still.  So the methods used by strength reduction don't scale
well to these more general problems.

I imagine the existing canonical form for address calculations is good
for some things, but not for others.  Hopefully someone with more
history in that area can suggest something.

> 
> Bill
> 
> > 
> > Thanks
> > Sofiane
> > 
> > -
> > 
> > [1] http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01751.html
> > [2]
> > http://gcc.gnu.org/wiki/cauldron2012?action=AttachFile&do=get&target=wschmid
> > t.pdf
> > 
> > 
> > 
> > 
> >

History question: Thread-safe profiling instrumentation

2013-04-22 Thread Bill Schmidt

Six years ago, Michael Matz proposed a patch for generating profile
instrumentation in a thread-safe manner:

http://gcc.gnu.org/ml/gcc-patches/2007-03/msg00950.html

Reading through the thread, I saw a few minor objections, but nothing to
indicate the patch should be withdrawn.  However, apparently the changes
were never made.

I'm curious about the history here.  What was the reason for abandoning
this?  Was a better alternative found?  Were all the counters made
thread-local?

My reason for asking involves a large heavily-threaded application that
is improved by feedback-directed optimization on some platforms, but not
on others.  One theory is that a defective profile is generated due to
counter dropouts from contention.  I'm somewhat skeptical about this
given that some platforms seem to do well with it, but it's possible.
I'm hopeful that knowing why the thread-safe profiling patch wasn't
implemented will give us more of a clue.

Thanks for any help!

Bill

Re: History question: Thread-safe profiling instrumentation

2013-04-22 Thread Bill Schmidt

On Mon, 2013-04-22 at 13:13 -0700, Xinliang David Li wrote:
> There is a similar patch (in google branches) from Rong Xu which
> enables atomic profile counter update.
> 
> http://gcc.gnu.org/ml/gcc-patches/2013-01/msg00072.html

Thanks, David!  We'll take a look.

> 
> On Mon, Apr 22, 2013 at 12:59 PM, Bill Schmidt
>  wrote:
> > Six years ago, Michael Matz proposed a patch for generating profile
> > instrumentation in a thread-safe manner:
> >
> > http://gcc.gnu.org/ml/gcc-patches/2007-03/msg00950.html
> >
> > Reading through the thread, I saw a few minor objections, but nothing to
> > indicate the patch should be withdrawn.  However, apparently the changes
> > were never made.
> >
> > I'm curious about the history here.  What was the reason for abandoning
> > this?  Was a better alternative found?  Were all the counters made
> > thread-local?
> >
> > My reason for asking involves a large heavily-threaded application that
> > is improved by feedback-directed optimization on some platforms, but not
> > on others.  One theory is that a defective profile is generated due to
> > counter dropouts from contention.  I'm somewhat skeptical about this
> > given that some platforms seem to do well with it, but it's possible.
> 
> Do you see lots of messages about insane profile data (under
> -fprofile-correction)? Most of the such messages we see (in our large
> multi-threaded applications) show only marginal errors.
> 

We're still waiting on that data.  We don't have direct access to the
application so we're having to work at some remove.  When we asked for
it the first time, the build logs had unfortunately been purged.

> 
> > I'm hopeful that knowing why the thread-safe profiling patch wasn't
> > implemented will give us more of a clue.
> 
> You can try out Rong's patch. He plan to submit it to trunk soon.

Much obliged!
Bill

> 
> thanks,
> 
> David
> 
> >
> > Thanks for any help!
> >
> > Bill
> >
>

Re: [RFC] vector subscripts/BIT_FIELD_REF in Big Endian.

2013-08-09 Thread Bill Schmidt

On Mon, 2013-08-05 at 11:47 +0100, Tejas Belagod wrote:
> Hi,
> 
> I'm looking for some help understanding how BIT_FIELD_REFs work with 
> big-endian.
> 
> Vector subscripts in this example:
> 
> #define vector __attribute__((vector_size(sizeof(int)*4) ))
> 
> typedef int vec vector;
> 
> int foo(vec a)
> {
>return a[0];
> }
> 
> gets lowered into array accesses by c-typeck.c
> 
> ;; Function foo (null)
> {
>return *(int *) &a;
> }
> 
> and gets gimplified into BIT_FIELD_REFs a bit later.
> 
> foo (vec a)
> {
>int _2;
> 
>:
>_2 = BIT_FIELD_REF ;
>return _2;
> 
> }
> 
> What's interesting to me here is the bitpos - does this not need 
> BYTES_BIG_ENDIAN correction? This seems to be inconsistenct with what happens 
> with reduction operations in the autovectorizer where the scalar result in 
> the 
> reduction epilogue gets extracted with a BIT_FIELD_REF but the bitpos there 
> is 
> corrected for BIG_ENDIAN.

a[0] is at the left end of the array in BIG_ENDIAN, and big-endian
machines number bits from the left, so bit position 0 is correct.

> 
> ... from tree-vect-loop.c:vect_create_epilog_for_reduction ()
> 
>/* 2.4  Extract the final scalar result.  Create:
>s_out3 = extract_field   */
> 
>if (extract_scalar_result)
>  {
>tree rhs;
> 
>if (dump_enabled_p ())
>  dump_printf_loc (MSG_NOTE, vect_location,
>"extract scalar result");
> 
>if (BYTES_BIG_ENDIAN)
>  bitpos = size_binop (MULT_EXPR,
>   bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 
> 1),
>   TYPE_SIZE (scalar_type));
>else
>  bitpos = bitsize_zero_node;
> 
> 
> For eg:
> 
> int foo(int * a)
> {
>int i, sum = 0;
> 
>for (i=0;i<16;i++)
> sum += a[i];
> 
>return sum;
> }
> 
> gets autovectorized into:
> 
> ...
>vect_sum_9.17_74 = [reduc_plus_expr] vect_sum_9.15_73;
>stmp_sum_9.16_75 = BIT_FIELD_REF ;
>sum_76 = stmp_sum_9.16_75 + sum_47;
> 
> the BIT_FIELD_REF here seems to have been corrected for BYTES_BIG_ENDIAN

Yes, because something else is going on here.  This is a reduction
operation where the sum ends up in the rightmost element of a vector
register that contains four 32-bit integers.  This is at position 96
from the left end of the register according to big-endian numbering.

> 
> If vec_extract is defined in the back-end, how does one figure out if the 
> BIT_FIELD_REF is a product of the gimplifier's indirect ref folding or the 
> vectorizer's bit-field extraction and apply the appropriate correction in 
> vec_extract's expansion? Or am I missing something that corrects 
> BIT_FIELD_REFs 
> between the gimplifier and the RTL expander?

There is no inconsistency here.

Hope this helps!
Bill

> 
> Thanks,
> Tejas.
>

Re: [RFC] vector subscripts/BIT_FIELD_REF in Big Endian.

2013-08-12 Thread Bill Schmidt

On Mon, 2013-08-12 at 11:54 +0100, Tejas Belagod wrote:
> >> What's interesting to me here is the bitpos - does this not need 
> >> BYTES_BIG_ENDIAN correction? This seems to be inconsistenct with what 
> >> happens 
> >> with reduction operations in the autovectorizer where the scalar result in 
> >> the 
> >> reduction epilogue gets extracted with a BIT_FIELD_REF but the bitpos 
> >> there is 
> >> corrected for BIG_ENDIAN.
> > 
> > a[0] is at the left end of the array in BIG_ENDIAN, and big-endian
> > machines number bits from the left, so bit position 0 is correct.
> > 
> >>
> >> ...
> >>vect_sum_9.17_74 = [reduc_plus_expr] vect_sum_9.15_73;
> >>stmp_sum_9.16_75 = BIT_FIELD_REF ;
> >>sum_76 = stmp_sum_9.16_75 + sum_47;
> >>
> >> the BIT_FIELD_REF here seems to have been corrected for BYTES_BIG_ENDIAN
> > 
> > Yes, because something else is going on here.  This is a reduction
> > operation where the sum ends up in the rightmost element of a vector
> > register that contains four 32-bit integers.  This is at position 96
> > from the left end of the register according to big-endian numbering.
> > 
> 
> Thanks for your reply.
> 
> Sorry, I'm still a bit confused here. The reduc_splus_ documentation says
> 
> "Compute the sum of the signed elements of a vector. The vector is operand 1,
> and the scalar result is stored in the least significant bits of operand 0
> (also a vector)."
> 
> Shouldn't this mean the scalar result should be in bitpos 0 which is the left 
> end of the register in BIG ENDIAN?

No.  The least significant bits of any register are the rightmost bits,
and big-endian numbering begins at the left.  (I don't really like the
commentary, since "least significant bits" isn't a very good term to use
with vectors.)  Analogously, a 64-bit integer is numbered with 0 on the
left being the most significant bit, and 63 on the right being the least
significant bit.

Thanks,
Bill

> 
> Thanks,
> Tejas
> 
> >> If vec_extract is defined in the back-end, how does one figure out if the 
> >> BIT_FIELD_REF is a product of the gimplifier's indirect ref folding or the 
> >> vectorizer's bit-field extraction and apply the appropriate correction in 
> >> vec_extract's expansion? Or am I missing something that corrects 
> >> BIT_FIELD_REFs 
> >> between the gimplifier and the RTL expander?
> > 
> > There is no inconsistency here.
> > 
> > Hope this helps!
> > Bill
> > 
> >> Thanks,
> >> Tejas.
> >>
> > 
> > 
> 
>

Generating minimum libstdc++ symbols for a new platform

2014-01-09 Thread Bill Schmidt

Hi,

It was recently pointed out to me that our new powerpc64le-linux-gnu
target does not yet have a corresponding directory in libstdc
++-v3/config/abi/post/ to hold a baseline_symbols.txt for the platform.
I've been looking around and haven't found any documentation for how the
minimum baseline symbols file should be generated.  Can someone please
enlighten me about the process?

Thanks,
Bill

Re: documentation of powerpc64{,le}-linux-gnu as primary platform

2020-07-09 Thread Bill Schmidt via Gcc


On 7/9/20 12:13 PM, Richard Biener via Gcc wrote:

On July 9, 2020 3:43:19 PM GMT+02:00, David Edelsohn via Gcc  
wrote:

On Thu, Jul 9, 2020 at 9:07 AM Matthias Klose  wrote:

On 7/9/20 1:58 PM, David Edelsohn via Gcc wrote:

On Thu, Jul 9, 2020 at 7:03 AM Matthias Klose 

wrote:

https://gcc.gnu.org/gcc-8/criteria.html lists the little endian

platform first

as a primary target, however it's not mentioned for GCC 9 and GCC

10. Just an

omission?

https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg00854.html

suggests that

the little endian platform should be mentioned, and maybe the big

endian

platform should be dropped?

Jakub suggested to fix that for GCC 9 and GCC 10, and get a

consensus for GCC 11.

Why are you so insistent to drop big endian?  No.  Please leave

this alone.

No, I don't leave this alone.  The little endian target is dropped in

GCC 9 and

GCC 10.  Is this really what you intended to do?

No, it's not dropped.  Some people are being pedantic about the name,
which is why Bill added {,le}.  powerpc64-unknown-linux-gnu means
everything.  If you want to add {,le} back, that's fine.  But there
always is some variant omitted, and that doesn't mean it is ignored.
The more that one over-specifies and enumerates some variants, the
more that it implies the other variants intentionally are ignored.

I would appreciate that we would separate the discussion about
explicit reference to {,le} from the discussion about dropping the big
endian platform.

I think for primary platforms it is important to be as specific as possible 
since certain regressions are supposed to block a release. That's less of an 
issue for secondary platforms but it's still a valid concern there as well for 
build issues.



Sorry, I've been on vacation and am a little late to this discussion.  I 
obviously agree with specifying both, since I did that for GCC 8 (and 
assumed it would be propagated forward).  I had forgotten I did this 
when I subsequently noticed for GCC 9 that it was only 
powerpc64-unknown-linux-gnu.  I brought it up then and, IIRC, was told 
by the maintainers that "LE is implied as well, don't worry about it."  
It looks like thoughts on this have changed, so certainly I would agree 
with putting "{,le}" back at this time.


I agree with David that BE isn't going anywhere anytime soon, so 
anything that implies it should be a secondary platform is wrong. We 
continue to support it and test it.


Matthias, if you want to post a patch for GCC 9 and GCC 10, I'm sure 
that would be accepted (though I do not have the power to pre-approve 
it).  Or I can put it on my list for later in the summer when my life 
settles down.  Your choice.


Bill




Richard.


Thanks, David

Re: documentation of powerpc64{,le}-linux-gnu as primary platform

2020-07-13 Thread Bill Schmidt via Gcc


On 7/13/20 7:08 AM, Florian Weimer wrote:

* Bill Schmidt via Gcc:


Matthias, if you want to post a patch for GCC 9 and GCC 10, I'm sure
that would be accepted (though I do not have the power to pre-approve
it).  Or I can put it on my list for later in the summer when my life
settles down.  Your choice.

I posted a patch:

   <https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549947.html>



Thanks, Florian!

Bill



Thanks,
Florian

Re: 10-12% performance decrease in benchmark going from GCC8 to GCC9

2020-08-10 Thread Bill Schmidt via Gcc




On 8/10/20 3:30 AM, Jonathan Wakely via Gcc wrote:

Hi Matt,

The best thing to do here is file a bug report with the code to reproduce it:
https://gcc.gnu.org/bugzill

Thanks



Also, be sure to follow the instructions at https://gcc.gnu.org/bugs/.

Bill



On Sat, 8 Aug 2020 at 23:01, Soul Studios  wrote:

Hi all,
recently have been working on a new version of the plf::colony container
(plflib.org) and found GCC9 was giving 10-12% worse performance on a
given benchmark than GCC8.

Previous versions of the colony container did not experience this
performance loss going from GCC8 to GCC9.
However Clang 6 and MSVC2019 show no performance loss going from the old
colony version to the new version.

The effect is repeatable across architectures - I've tested on xubuntu,
windows running nuwen mingw, and on Core2 and Haswell CPUs, with and
without -march=native specified.

Compiler flags are: -O2;-march=native;-std=c++17

Code is attached with an absolute minimum use-case - other benchmarks
have not shown such strong performance differences - including both
simpler and more complex tests.
So I cannot reduce further, please do not ask me to do so.

The benchmark in question inserts into a container initially then
iterates over container elements repeatedly, randomly erasing and/or
inserting new elements.


In addition I've attached the assembly output under both GCC8 and GCC9.
In this case I have output from 8.2 and 9.2 respectively, but the same
effects apply to 8.4 and 9.3. The output for 8 is a lot larger than 9,
wondering if there's more unrolling occurring.

Any questions let me know. I will help where I can, but my knowledge of
assembly is limited. If supplying the older version of colony is useful
I'm happy to do so.

Nanotimer is a ~nanosecond-precision sub-timeslice cross-platform timer.
Colony is a bucket-array-like unordered sequence container.
Thanks,
Matt

Installing a generated header file

2020-11-12 Thread Bill Schmidt via Gcc


Hi!  I'm working on a project where it's desirable to generate a 
target-specific header
file while building GCC, and install it with the rest of the target-specific 
headers
(i.e., in lib/gcc//11.0.0/include).  Today it appears that only those 
headers
listed in "extra_headers" in config.gcc will be placed there, and those are 
assumed to
be found in gcc/config/.  In my case, the header file will end up in my 
build
directory instead.

Questions:

* Has anyone tried something like this before?  I didn't find anything.
* If so, can you please point me to an example?
* Otherwise, I'd be interested in advice about providing new infrastructure to 
support
  this.  I'm a relative noob with respect to the configury code, and I'm sure my
  initial instincts will be wrong. :)

Thanks for any help!

Bill

Re: Installing a generated header file

2020-11-12 Thread Bill Schmidt via Gcc


Thanks for the pointer!  I'll have a look at this.

Much obliged,

Bill

On 11/12/20 9:54 AM, Jonathan Wakely wrote:

On Thu, 12 Nov 2020 at 15:39, Bill Schmidt via Gcc  wrote:

Hi!  I'm working on a project where it's desirable to generate a 
target-specific header
file while building GCC, and install it with the rest of the target-specific 
headers
(i.e., in lib/gcc//11.0.0/include).  Today it appears that only those 
headers
listed in "extra_headers" in config.gcc will be placed there, and those are 
assumed to
be found in gcc/config/.  In my case, the header file will end up in my 
build
directory instead.

Questions:

* Has anyone tried something like this before?  I didn't find anything.
* If so, can you please point me to an example?
* Otherwise, I'd be interested in advice about providing new infrastructure to 
support
this.  I'm a relative noob with respect to the configury code, and I'm sure 
my
initial instincts will be wrong. :)

I don't know how relevant it is to your requirement, but libstdc++
creates a target-specific $target/bits/c++config.h header for each
multilib target, but it installs them alongside the rest of the C++
library headers, not in lib/gcc//.

It's done with a bunch of shell commands that takes the
autoconf-generated config.h file, combines it with a template file
that's in the source repo (libstdc++-v3/include/bits/c++config) and
then modifies it with sed. See the ${host_builddir}/c++config.h target
in libstdc++-v3/include/Makefile.am for the gory details. The other
make targets below it (for gthr-single.h and gthr-posix.h) are also
target-specific.

Those headers are listed in the ${allcreated} variable which is a
prerequisite of the all-local target, and then in the install target
they get copied into place.

Re: Installing a generated header file

2020-11-12 Thread Bill Schmidt via Gcc


On 11/12/20 10:06 AM, Marc Glisse wrote:

On Thu, 12 Nov 2020, Bill Schmidt via Gcc wrote:

Hi!  I'm working on a project where it's desirable to generate a 
target-specific header file while building GCC, and install it with 
the rest of the target-specific headers (i.e., in 
lib/gcc//11.0.0/include).  Today it appears that only those 
headers listed in "extra_headers" in config.gcc will be placed there, 
and those are assumed to be found in gcc/config/.  In my 
case, the header file will end up in my build directory instead.


Questions:

* Has anyone tried something like this before?  I didn't find anything.
* If so, can you please point me to an example?
* Otherwise, I'd be interested in advice about providing new 
infrastructure to support
 this.  I'm a relative noob with respect to the configury code, and 
I'm sure my

 initial instincts will be wrong. :)


Does the i386 mm_malloc.h file match your scenario?

Ah, that looks promising indeed, and perhaps very simple!  Marc, thanks 
for the pointer!


Bill

Re: Installing a generated header file

2020-11-12 Thread Bill Schmidt via Gcc




On 11/12/20 10:15 AM, Bill Schmidt via Gcc wrote:

On 11/12/20 10:06 AM, Marc Glisse wrote:


Does the i386 mm_malloc.h file match your scenario?

Ah, that looks promising indeed, and perhaps very simple!  Marc, 
thanks for the pointer!


And indeed, with this example it was a two-line change to do what I 
needed.  Thanks again. :)


Bill

gengtype and automatically generated files

2021-01-04 Thread Bill Schmidt via Gcc

Hi!  I'm attempting to do something that may not have been done before, 
so I'm looking for advice, or a pointer to where, in fact, it has been 
done before. :)


I'm automatically generating a back-end header file that declares some 
structures that include trees, and a bunch of global variables that are 
also trees.  I've marked everything up appropriately, but I also need to 
teach the garbage collector that this file exists.


Most back-end files are automatically scanned by gengtype.  Per the 
documentation, anything that isn't handled automatically needs to be 
added to target_gtfiles in config.gcc.  However, I can't come up with a 
syntax for describing a file in the gcc/ build subdirectory.  Some 
places in config.gcc allow "./filename" as shorthand for "filename" 
being in the current build directory, but that doesn't seem to work for 
adding something to gtyp-input.list.


Any recommendations on what I should do next?  At the moment it looks 
like I might have to hack on gengtype to invent a way to scan a file in 
the build directory, but I have a mild amount of hope that someone has 
solved this before.  Thanks for any help!


Bill

Re: gengtype and automatically generated files

2021-01-04 Thread Bill Schmidt via Gcc

Actually, the "./filename" syntax works fine.  I was missing a 
dependency in my t-rs6000 to make the header file appear available.


Sorry for the noise!

Bill

On 1/4/21 11:40 AM, Bill Schmidt wrote:
Hi! I'm attempting to do something that may not have been done before, 
so I'm looking for advice, or a pointer to where, in fact, it has been 
done before. :)


I'm automatically generating a back-end header file that declares some 
structures that include trees, and a bunch of global variables that 
are also trees.  I've marked everything up appropriately, but I also 
need to teach the garbage collector that this file exists.


Most back-end files are automatically scanned by gengtype.  Per the 
documentation, anything that isn't handled automatically needs to be 
added to target_gtfiles in config.gcc.  However, I can't come up with 
a syntax for describing a file in the gcc/ build subdirectory.  Some 
places in config.gcc allow "./filename" as shorthand for "filename" 
being in the current build directory, but that doesn't seem to work 
for adding something to gtyp-input.list.


Any recommendations on what I should do next?  At the moment it looks 
like I might have to hack on gengtype to invent a way to scan a file 
in the build directory, but I have a mild amount of hope that someone 
has solved this before.  Thanks for any help!


Bill

Re: gengtype and automatically generated files

2021-01-05 Thread Bill Schmidt via Gcc




On 1/4/21 1:36 PM, Jeff Law wrote:


On 1/4/21 10:40 AM, Bill Schmidt via Gcc wrote:

Hi!  I'm attempting to do something that may not have been done
before, so I'm looking for advice, or a pointer to where, in fact, it
has been done before. :)

I'm automatically generating a back-end header file that declares some
structures that include trees, and a bunch of global variables that
are also trees.  I've marked everything up appropriately, but I also
need to teach the garbage collector that this file exists.

Most back-end files are automatically scanned by gengtype.  Per the
documentation, anything that isn't handled automatically needs to be
added to target_gtfiles in config.gcc.  However, I can't come up with
a syntax for describing a file in the gcc/ build subdirectory.  Some
places in config.gcc allow "./filename" as shorthand for "filename"
being in the current build directory, but that doesn't seem to work
for adding something to gtyp-input.list.

Any recommendations on what I should do next?  At the moment it looks
like I might have to hack on gengtype to invent a way to scan a file
in the build directory, but I have a mild amount of hope that someone
has solved this before.  Thanks for any help!

Yea, I don't see any indication this has ever been done before.  I'm a
bit surprised that ./ doesn't work since gengtype runs from
the build directory and has to reference things in the source directory
and ./ would seem to naturally reference the build directory

Jeff

I've gotten this working, with a little hacking on gengtype needed.  
I'll propose that patch next stage 1.


Bill

Re: On US corporate influence over Free Software and the GCC Steering Committee

2021-04-20 Thread Bill Schmidt via Gcc


On 4/20/21 7:42 AM, Richard Kenner via Gcc wrote:

Troubling indeed, but this might just be an overzealous manager.
IBM, like other corporations, has made significant technical
contributions to GCC over the years, for example the scheduler and
the vectorizer, and thus has assigned the copyright of these
contributions to the FSF.

Yes, as long as the employee is doing it as part of their work for IBM,
which was the case in your examples.  What's never been OK for IBM are
their employees doing software development "on their own time" because
they've taken the position that such doesn't exist.


It amazes me how many people who don't work for IBM want to assert IBM's 
policies.


There is certainly ability to work on projects on your own time that 
don't conflict with IBM's business.  You simply have to be open about it 
and make sure your management is aware.


Bill

Re: Enable the vectorizer at -O2 for GCC 12

2021-08-30 Thread Bill Schmidt via Gcc


On 8/30/21 8:04 AM, Florian Weimer wrote:

There has been a discussion, both off-list and on the gcc-help mailing
list (“Why vectorization didn't turn on by -O2”, spread across several
months), about enabling the auto-vectorizer at -O2, similar to what
Clang does.

I think the review concluded that the very cheap cost model should be
used for that.

Are there any remaining blockers?


Hi Florian,

I don't think I'd characterize it as having blockers, but we are 
continuing to investigate small performance issues that arise with 
very-cheap, including some things that regressed in GCC 12.  Kewen Lin 
is leading that effort.  Kewen, do you feel we have any major remaining 
concerns with this plan?


Thanks,
Bill



Thanks,
Florian

Re: libgfortran.so SONAME and powerpc64le-linux ABI changes

2021-10-14 Thread Bill Schmidt via Gcc

On 10/5/21 12:43 PM, Segher Boessenkool wrote:
> Hi Joseph,
>
> On Mon, Oct 04, 2021 at 07:24:31PM +, Joseph Myers wrote:
>> On Mon, 4 Oct 2021, Segher Boessenkool wrote:
>>> Some current Power GCC targets support neither.  Some support only
>>> double-double.  Making IEEE QP float work on those (that is, those that
>>> are < power8) will require some work still: it should use libquadmath or
>>> similar, but that needs to be put into the ABIs, to define how parameter
>>> passing works for those types.  Just treating it like a struct or an
>>> array of ints will work fine, but it needs to be written down.  This is
>>> more than just Fortran.
>> Is the 64-bit BE (ELFv1) ABI maintained somewhere?  (The 32-bit ABI, 
>> covering both hard float and soft float, is 
>>  - no activity lately, but I think 
>> Ryan said he'd given write access to someone still involved with Power.)

Just FYI, that person is me.  I've never tried to use my powers, either for good
or evil, since no proposals for updates have arisen since Ryan left; but my
credentials should still work.

> The last release (version 1.9) was in 2004.  If there is interest in
> making updates to it that coulde be done of course, it is GFDL, there is
> no red tape getting in the way.
>
> Maybe this could be maintained in the same repository even?

Well, I'm not sure it's quite this easy; when developing ELFv2, there was enough
doubt about the provenance/ownership of ELFv1 that we weren't comfortable 
borrowing
language from it.  That may have been an excess of caution, or it may not...

That said, with enough diligence I would hope we would be able to create
modifications to the ELFv1 document, but we might incur some paperwork.

Bill

>
>
> Segher

Re: libgfortran.so SONAME and powerpc64le-linux ABI changes

2021-10-15 Thread Bill Schmidt via Gcc

Thanks, Jakub, for starting this discussion, and to everyone who weighed in.  
The conversation
went in a number of different directions, so I'd like to summarize my 
understanding of points
where I think there was agreement.  I'd also like to separate out short-term 
considerations
for powerpc64le and GCC 12 from other topics like supporting more targets.

===

First, for the short-term.  For powerpc64le only (little-endian, ELFv2 ABI) 
Thomas suggested
that Fortran's best course of action is:
 - Change KIND=16 in GCC 12 from double-double to IEEE QP just for affected 
targets
 - Bump the SONAME just for affected targets
 - Have a preprocessor flag to help #ifdef out irrelevant code (which Jakub 
asserted exists)
 - Deal with binary (unformatted) I/O with a CONVERT option for OPEN, and/or an 
envvar, to
   allow selection between the two formats

There was some discussion of dual-mangling names for Fortran, but this didn't 
seem practical
because of a number of complicating factors.

There is an open question about possibly using KIND=15 or KIND=17 to represent 
double-double
going forward.  It's not clear whether or not this is necessary, but some C 
compatibility
scenarios were cited as possible motivations.

There was some concern about SONAME numbers differing across architectures, but 
consensus
seems to be that this can be handled.

Summary:  I didn't see any serious pushback to Thomas's suggested course of 
action, and the
only major open question is about maintaining a KIND to represent double-double.

===

Longer term, we have the question of supporting more Power targets.  AIX will 
continue to
use only double-double.  It is agreed that it would be useful for 32- and 
64-bit BE Linux
to support IEEE QP as well, on some future timeline.  The first step towards 
this is to
develop and document ABI for IEEE QP on those targets.  The simplest approach 
that everyone
seemed to like is for these ABIs to require AltiVec support in order for IEEE 
QP to be
supported.  This allows parameters and return values to always be passed in 
vector registers,
whether implemented with hardware instructions or a soft-float library.  
libquadmath can
be built for these targets.

[Sidebar: The ELFv1 document needs a new home, as the last version was 
published by the
now-defunct POWER.org.  But we can deal with that.]

Beyond ABI and compiler support, glibc would also need to support IEEE QP for 
these other
targets.  Currently we only have support for powerpc64le.

===

Is this a fair summary of the results of the discussion?

Thanks again!
Bill

Re: libgfortran.so SONAME and powerpc64le-linux ABI changes (work in progress patches)

2021-11-01 Thread Bill Schmidt via Gcc

Would starting from Advance Toolchain 15 with the most recent glibc make things 
easier for Thomas to test?

Thanks,
Bill

On 10/29/21 4:06 PM, Michael Meissner via Gcc wrote:
> On Fri, Oct 29, 2021 at 09:07:38PM +0200, Thomas Koenig wrote:
>> Hi Michael,
>>
>> I tried this out on the one POWER machine where I can get something
>> installed :-) It runs Ubuntu 20.04, but does not appear to have the
>> right glibc version; it has
>>
>> $ lsb_release -a
>> No LSB modules are available.
>> Distributor ID: Ubuntu
>> Description:Ubuntu 20.04.1 LTS
>> Release:20.04
>> Codename:   focal
>> $ ldd --version
>> ldd (Ubuntu GLIBC 2.31-0ubuntu9.1) 2.31
>>
>> Configure was
>>
>> ./trunk/configure --prefix=$HOME --enable-languages=c,c++,fortran
>> --with-advance-toolchain=at15.0
>> --with-native-system-header-dir=/opt/at15.0/include
>> --with-long-double-format=ieee
>>
>> and the error message
>>
>> msgfmt -o fr.mo ../../../../trunk/libstdc++-v3/po/fr.po
>> msgfmt: /lib/powerpc64le-linux-gnu/libm.so.6: version `GLIBC_2.32' not found
>> (required by 
>> /home/ig25/trunk-bin/powerpc64le-unknown-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so.6)
>> msgfmt: /lib/powerpc64le-linux-gnu/libc.so.6: version `GLIBC_2.33' not found
>> (required by 
>> /home/ig25/trunk-bin/powerpc64le-unknown-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so.6)
>> msgfmt: /lib/powerpc64le-linux-gnu/libc.so.6: version `GLIBC_2.34' not found
>> (required by 
>> /home/ig25/trunk-bin/powerpc64le-unknown-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so.6)
>> msgfmt: /lib/powerpc64le-linux-gnu/libc.so.6: version `GLIBC_2.32' not found
>> (required by 
>> /home/ig25/trunk-bin/powerpc64le-unknown-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so.6)
>> msgfmt: /lib/powerpc64le-linux-gnu/libc.so.6: version `GLIBC_2.34' not found
>> (required by /home/ig25/trunk-bin/./gcc/libgcc_s.so.1)
>>
>> and so on.
>>
>> Since gcc135 is also too old, that exhausts my possibilities at testing.
>>
>> Any hints on how best to proceed?
>>
>> Best regards
> As I've delved into it, it looks glibc 2.34 is really only needed for 
> switching
> long double over to IEEE 128-bit, since it has all of the F128 functions that
> would be needed.  That is because Fortran uses the 'q' names which are in
> libquadmath (that should be built).
>
> I built the original version with:
>
> --prefix=/home/meissner/fsf-install-ppc64le/fortran-orig \
> --enable-languages=c,c++,fortran \
> --disable-plugin \
> --enable-checking \
> --enable-stage1-checking \
> --enable-gnu-indirect-function \
> --disable-libgomp \
> --enable-decimal-float \
> --enable-secureplt \
> --enable-threads=posix \
> --enable-__cxa_atexit \
> --with-long-double-128 \
> --with-long-double-format=ibm \
> --with-cpu=power9 \
> --with-as=/opt/at12.0/bin/as \
> --with-ld=/opt/at12.0/bin/ld \
> --with-gnu-as=/opt/at12.0/bin/as \
> --with-gnu-ld=/opt/at12.0/bin/ld \
> --with-gmp=/home/meissner/tools-compiler/ppc64le \
> --with-mpfr=/home/meissner/tools-compiler/ppc64le \
> --with-mpc=/home/meissner/tools-compiler/ppc64le \
> --without-ppl \
> --without-cloog \
> --without-isl
>
> I needed to build my own version of mpfs, mpc, and gmp.  I built them without
> shared libraries, because I get messages like you get.
>
> I have a new version of the patch that makes new target hooks to allow the
> backend to specify KIND numbers for types.  I choose kind=16 to always be IEEE
> 128-bit, and kind=15 to be long double if long double is IBM (since I
> discovered yesterday, Fortran needs to be able to deal with long double).  I'm
> in the middle of the build an on internal IBM system, and I will start the
> build on gcc135 shortly.
>

Re: [power-ieee128] What should the math functions be annotated with?

2021-12-03 Thread Bill Schmidt via Gcc

Hi!

On 12/3/21 5:56 AM, Thomas Koenig wrote:
>
> Hi Jakub,
>
>> Note, we want to test both building gcc on ppc64le with older glibc
>> and newer glibc (and that libgfortran will have the same ABI between both
>> and one can move gcc including libgfortran and libquadmath from the older
>> glibc setup to newer and make -mabi=ieeelongdouble work in Fortran too).
>
> Using an older glibc is no problem - we can use gcc135 on the compile
> farm for that.
>
> As far as the other options you outlined, I think I'll defer to people
> who know more about setting up libraries than I do. I have root access,
> but chances are I would just mess up the virtual machine :-)

Easiest is probably to install the advance toolchain.  Mike said he'll work on

that later this morning.

Thanks!
Bill

>
> Regards
>
> Thomas

48 matches

Mail list logo