Re: duplicate arm test results?

2020-10-05 Thread Christophe Lyon via Gcc
On Thu, 24 Sep 2020 at 14:12, Christophe Lyon
 wrote:
>
> On Wed, 23 Sep 2020 at 17:50, Christophe Lyon
>  wrote:
> >
> > On Wed, 23 Sep 2020 at 17:33, Martin Sebor  wrote:
> > >
> > > On 9/23/20 2:54 AM, Christophe Lyon wrote:
> > > > On Wed, 23 Sep 2020 at 01:47, Martin Sebor  wrote:
> > > >>
> > > >> On 9/22/20 9:15 AM, Christophe Lyon wrote:
> > > >>> On Tue, 22 Sep 2020 at 17:02, Martin Sebor  wrote:
> > > 
> > >  Hi Christophe,
> > > 
> > >  While checking recent test results I noticed many posts with results
> > >  for various flavors of arm that at high level seem like duplicates
> > >  of one another.
> > > 
> > >  For example, the batch below all have the same title, but not all
> > >  of the contents are the same.  The details (such as test failures)
> > >  on some of the pages are different.
> > > 
> > >  Can you help explain the differences?  Is there a way to avoid
> > >  the duplication?
> > > 
> > > >>>
> > > >>> Sure, I am aware that many results look the same...
> > > >>>
> > > >>>
> > > >>> If you look at the top of the report (~line 5), you'll see:
> > > >>> Running target myarm-sim
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m3/-mfloat-abi=soft/-march=armv7-m
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m0/-mfloat-abi=soft/-march=armv6s-m
> > > >>> Running target 
> > > >>> myarm-sim/-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m7/-mfloat-abi=hard/-march=armv7e-m+fp.dp
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m4/-mfloat-abi=hard/-march=armv7e-m+fp
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m33/-mfloat-abi=hard/-march=armv8-m.main+fp+dsp
> > > >>> Running target 
> > > >>> myarm-sim/-mcpu=cortex-a7/-mfloat-abi=soft/-march=armv7ve+simd
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
> > > >>>
> > > >>> For all of these, the first line of the report is:
> > > >>> LAST_UPDATED: Tue Sep 22 09:39:18 UTC 2020 (revision
> > > >>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c)
> > > >>> TARGET=arm-none-eabi CPU=default FPU=default MODE=default
> > > >>>
> > > >>> I have other combinations where I override the configure flags, eg:
> > > >>> LAST_UPDATED: Tue Sep 22 11:25:12 UTC 2020 (revision
> > > >>> r9-8928-gb3043e490896ea37cd0273e6e149c3eeb3298720)
> > > >>> TARGET=arm-none-linux-gnueabihf CPU=cortex-a9 FPU=neon-fp16 MODE=thumb
> > > >>>
> > > >>> I tried to see if I could fit something in the subject line, but that
> > > >>> didn't seem convenient (would be too long, and I fear modifying the
> > > >>> awk script)
> > > >>
> > > >> Without some indication of a difference in the title there's no way
> > > >> to know what result to look at, and checking all of them isn't really
> > > >> practical.  The duplication (and the sheer number of results) also
> > > >> make it more difficult to find results for targets other than arm-*.
> > > >> There are about 13,000 results for September and over 10,000 of those
> > > >> for arm-* alone.  It's good to have data but when there's this much
> > > >> of it, and when the only form of presentation is as a running list,
> > > >> it's too cumbersome to work with.
> > > >>
> > > >
> > > > To help me track & report regressions, I build higher level reports 
> > > > like:
> > > > https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/0latest/report-build-info.html
> > > > where it's more obvious what configurations are tested.
> > >
> > > That looks awesome!  The regression indicator looks especially
> > > helpful.  I really wish we had an overview like this for all
> > > results.  I've been thinking about writing a script to scrape
> > > gcc-testresults and format an HTML table kind of like this for
> > > years.  With that, the number of posts sent to the list wouldn't
> > > be a problem (at least not for those using the page).  But it
> > > would require settling on a standard format for the basic
> > > parameters of each run.
> > >
> >
> > It's probably easier to detect regressions and format reports from the
> > .sum files rather than extracting them from the mailing-list.
> > But your approach has the advantage that you can detect regressions
> > from reports sent by other people, not only by you.
> >
> >
> > > >
> > > > Each line of such reports can send a message to gcc-testresults.
> > > >
> > > > I can control when such emails are sent, independently for each line:
> > > > - never
> > > > - for daily bump
> > > > - for each validation
> > > >
> > > > So, I can easily reduce the amount of emails (by disabling them for
> > > > some configurations),
> > > > but that won't make the subject more informative.
> > > > I included the short revision (rXX-) in the title to make it 
> > > > clearer.
> > > >
> > > > The number of configurations has grown o

Multilib Hierarchy

2020-10-05 Thread CHIGOT, CLEMENT via Gcc
Hi everyone, 

Recently, with David, we have introduced FAT library support on AIX to enable 
64bit as default target (called gcc64 here). Currently, 32bit is the default 
(gcc32) and 64bit is just a multilib linked to -maix64 option. These FAT 
libraries are archives including both 32 and 64bit shared objects. To create 
them, we are retrieving the shared object of the missing architecture from its 
multilib directory (ppc32 for 32bit or ppc64 for 64bit) and adding it to the 
default library/archive. Here is, for example, the code for libatomic: 
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libatomic/config/t-aix;h=08607727574f3c376a43e9460c81ae6fbee0fc5f;hb=HEAD.
 
The aims of these FAT libraries is to make programs created by gcc32 being able 
to run on the gcc64's runtime and vice versa. For example, "ppc32" directory is 
not present in gcc32, thus a 32bit program built with gcc64 (ie with -maix32 
and the use of ppc32 directory) must be able to fallback on the toplevel FAT 
libraries. 

By default, gcc is almost doing it already. The directories searched for 
libraries with a gcc64 32bit programs will have 
"/usr/lib/gcc/$target/$version/ppc32:/usr/lib/gcc/$target/$version/." or 
something similar. Thus, it will fallback on the FAT libraries under "/." if 
"/ppc32" isn't there. 

However, as AIX needs special handlers for threaded programs, we have two sets 
of FAT libraries, one in the toplevel directory "/." and one in the thread 
multilib directory "/pthread". The problem is that a gcc64 program compiled 
with both -pthread -maix32 will have a library path similar to 
"/pthread/ppc32:/.". Thus, the fallback is made on the toplevel FAT library and 
not the pthread one.  

We have tried several solutions in order to avoid this problem. First (and it's 
still the case in master), we have set "MUTLILIB_MATCHES = .=ppc32" for gcc64. 
That way, we are forcing 32bit programs to use the fallback directly instead of 
using its ppc32 directory. However, it creates too much problems as the 
toplevel directory is 64bit. The includes taken are sometimes wrong, the tests 
aren't using the right stuff, etc. 
I've also tried "MULTILIB_REUSE = pthread/ppc32=pthread", as "Target Fragment" 
doc page says "And for some targets it is better to reuse an existing multilib 
than to fall back to default multilib when there is no corresponding multilib. 
This can be done by adding reuse rules to MULTILIB_REUSE.". But it's not 
changing the search path as excepted. 

Thus, I'm wondering if there is a way to create a multilib hierarchy with 
several layers. Currently, there is only two layers: "use mutlilib directory 
then use default directory". What we would like is at least a 3-level layer: 
"use multilib directory then use another multilib directory then use default 
one". 
If it's not possible, I've several ideas about how it can be introduced and 
which part of gcc driver will need to be modified. But I want to be sure I'm 
not miss-using MULTILIB_REUSE first. 

Sincerely, 
Clément Chigot




Re: static inline giving multiple definition errors with "unity" builds

2020-10-05 Thread Nathan Sidwell

On 10/4/20 1:10 PM, Paul Smith wrote:

On Sun, 2020-10-04 at 03:36 -0400, Paul Smith wrote:

I have a templated class C that required its type T to have operator
bool() defined in order to work properly.


Never mind, I think there was some local error where things were not
being recompiled when they should be.  I don't know why that might be
but a full clean/rebuild fixed it.  I've never had this problem
before... so odd.

Sorry for the noise!


heh, it was an amusing story :)

  'the bug must be over there.    oops, no it wasn't'


nathan
--
Nathan Sidwell


[Patch] Overflow-trapping integer arithmetic routines7code: bloated and slooooow

2020-10-05 Thread Stefan Kanthak
The implementation of the functions __absv?i2(), __addv?i3() etc. for
trapping integer overflow provided in libgcc2.c is rather bad.
Same for __cmp?i2() and __ucmp?i2()

GCC creates awful to horrible code for them (at least for AMD64 and
i386 processors): see 
for some examples.

The attached diff/patch provides better implementations.

Stefan

libgcc2.diff
Description: Binary data


Re: [Patch] Overflow-trapping integer arithmetic routines7code: bloated and slooooow

2020-10-05 Thread Jonathan Wakely via Gcc
On Mon, 5 Oct 2020 at 17:12, Stefan Kanthak  wrote:
>
> The implementation of the functions __absv?i2(), __addv?i3() etc. for
> trapping integer overflow provided in libgcc2.c is rather bad.
> Same for __cmp?i2() and __ucmp?i2()
>
> GCC creates awful to horrible code for them (at least for AMD64 and
> i386 processors): see 
> for some examples.
>
> The attached diff/patch provides better implementations.


Patches should go to the gcc-patches list, see
https://gcc.gnu.org/contribute.html#patches


Re: Git rejecting branch merge

2020-10-05 Thread Joel Brobecker
> > > I wonder I can get the branch moved, so I can do the benchmarking :)
> > > Any suggestions how to do that?
> 
> I just installed a small patch, hot-fix style which I am hoping will
> fix your problem. Can you try it? It passes the testsuite, so the change
> should be safe.

And now, the fix that was actually pushed has also been deployed.

> Let me know how it goes. I will finish the work over the weekend
> so as to replace the local diff by an actual commit (after review
> from a coworker of mine).

-- 
Joel


Loop question

2020-10-05 Thread Jakub Jelinek via Gcc
Hi!

Compiling the following testcase with -O2 -fopenmp:
int a[1][128];

__attribute__((noipa)) void
foo (void)
{
  #pragma omp for simd schedule (simd: dynamic, 32) collapse(2)
  for (int i = 0; i < 1; i++)
for (int j = 0; j < 128; j++)
  a[i][j] += 3;
}

int
main ()
{
  for (int i = 0; i < 1; i++)
for (int j = 0; j < 128; j++)
  {
asm volatile ("" : : "r" (&a[0][0]) : "memory");
a[i][j] = i + j;
  }
  foo ();
  for (int i = 0; i < 1; i++)
for (int j = 0; j < 128; j++)
  if (a[i][j] != i + j + 3)
__builtin_abort ();
  return 0;
}
doesn't seem result in the vectorization I was hoping to see.
As has been changed recently, I'm only trying to vectorize now the
innermost loop of the collapse with outer loops around it being normal
scalar loops like those written in the source and with only omp simd
it works fine, but for the combined constructs the current thread gets
assigned some range of logical iterations, therefore I get a pair of
in this case i and j starting values.

At the end of ompexp I have:
...
  D.2106 = (unsigned int) D.2105;
  D.2107 = MIN_EXPR ;
  D.2103 = D.2107 + .iter.4;
  goto ; [INV]
;;succ:   5

;;   basic block 4, loop depth 2
;;pred:   5
  i = i.0;
  j = j.1;
  _1 = a[i][j];
  _2 = _1 + 3;
  a[i][j] = _2;
  .iter.4 = .iter.4 + 1;
  j.1 = j.1 + 1;
;;succ:   5

;;   basic block 5, loop depth 2
;;pred:   4
;;3
;;7
  if (.iter.4 < D.2103)
goto ; [87.50%]
  else
goto ; [12.50%]
;;succ:   4
;;6

;;   basic block 6, loop depth 2
;;pred:   5
  i.0 = i.0 + 1;
  if (i.0 < 1)
goto ; [87.50%]
  else
goto ; [12.50%]
;;succ:   8
;;7

;;   basic block 7, loop depth 2
;;pred:   6
  j.1 = 0;
  D.2108 = D.2099 - .iter.4;
  D.2109 = MIN_EXPR ;
  D.2103 = D.2109 + .iter.4;
  goto ; [INV]

I was really hoping bbs 4 and 5 would be one loop (the one I set safelen
and force_vectorize etc. for) and that basic blocks 6 and 7 would be
together with that inner loop another loop, but apparently loop discovery
thinks it is just one loop.
Any ideas what I'm doing wrong or is there any way how to make it two loops
(that would also survive all the cfg cleanups until vectorization)?

Essentially, in C I'm trying to have:
int a[1][128];
void get_me_start_end (int *, int *);
void
foo (void)
{
  int start, end, curend, i, j;
  get_me_start_end (&start, &end);
  i = start / 128;
  j = start % 128;
  curend = start + (end - start > 128 - j ? 128 - j : end - start);
  goto doit;
  for (i = 0; i < 1; i++)
{
  j = 0;
  curend = start + (end - start > 128 ? 128 : end - start);
  doit:;
  /* I'd use start < curend && j < 128 as condition here, but
 the vectorizer doesn't like that either.  So I went to
 using a single IV.  */
  for (; start < curend; start++, j++)
a[i][j] += 3;
}
}

This isn't vectorized with -O3 either for the same reason.

Jakub



Re: Navigational corrections

2020-10-05 Thread Alejandro Colomar via Gcc

Hi Michael,

On 2020-10-03 13:39, Michael Kerrisk (man-pages) wrote:

Hi Alex,

[...]


off_t would be great.

In case you are looking for some other candidates, some others
that I would be interested to see go into the page would be

fd_set
clock_t
clockid_t
and probably dev_t


Great!

off_t is almost done.  I think I have too many references in "See also".

I'll send you the patch, and trim as you want :)




Thanks,

Michael



Cheers,

Alex


Re: Loop question

2020-10-05 Thread Richard Biener
On Mon, 5 Oct 2020, Jakub Jelinek wrote:

> Hi!
> 
> Compiling the following testcase with -O2 -fopenmp:
> int a[1][128];
> 
> __attribute__((noipa)) void
> foo (void)
> {
>   #pragma omp for simd schedule (simd: dynamic, 32) collapse(2)
>   for (int i = 0; i < 1; i++)
> for (int j = 0; j < 128; j++)
>   a[i][j] += 3;
> }
> 
> int
> main ()
> {
>   for (int i = 0; i < 1; i++)
> for (int j = 0; j < 128; j++)
>   {
>   asm volatile ("" : : "r" (&a[0][0]) : "memory");
>   a[i][j] = i + j;
>   }
>   foo ();
>   for (int i = 0; i < 1; i++)
> for (int j = 0; j < 128; j++)
>   if (a[i][j] != i + j + 3)
>   __builtin_abort ();
>   return 0;
> }
> doesn't seem result in the vectorization I was hoping to see.
> As has been changed recently, I'm only trying to vectorize now the
> innermost loop of the collapse with outer loops around it being normal
> scalar loops like those written in the source and with only omp simd
> it works fine, but for the combined constructs the current thread gets
> assigned some range of logical iterations, therefore I get a pair of
> in this case i and j starting values.
> 
> At the end of ompexp I have:
> ...
>   D.2106 = (unsigned int) D.2105;
>   D.2107 = MIN_EXPR ;
>   D.2103 = D.2107 + .iter.4;
>   goto ; [INV]
> ;;succ:   5
> 
> ;;   basic block 4, loop depth 2
> ;;pred:   5
>   i = i.0;
>   j = j.1;
>   _1 = a[i][j];
>   _2 = _1 + 3;
>   a[i][j] = _2;
>   .iter.4 = .iter.4 + 1;
>   j.1 = j.1 + 1;
> ;;succ:   5
> 
> ;;   basic block 5, loop depth 2
> ;;pred:   4
> ;;3
> ;;7
>   if (.iter.4 < D.2103)
> goto ; [87.50%]
>   else
> goto ; [12.50%]
> ;;succ:   4
> ;;6
> 
> ;;   basic block 6, loop depth 2
> ;;pred:   5
>   i.0 = i.0 + 1;
>   if (i.0 < 1)
> goto ; [87.50%]
>   else
> goto ; [12.50%]
> ;;succ:   8
> ;;7
> 
> ;;   basic block 7, loop depth 2
> ;;pred:   6
>   j.1 = 0;
>   D.2108 = D.2099 - .iter.4;
>   D.2109 = MIN_EXPR ;
>   D.2103 = D.2109 + .iter.4;
>   goto ; [INV]
> 
> I was really hoping bbs 4 and 5 would be one loop (the one I set safelen
> and force_vectorize etc. for) and that basic blocks 6 and 7 would be
> together with that inner loop another loop, but apparently loop discovery
> thinks it is just one loop.
> Any ideas what I'm doing wrong or is there any way how to make it two loops
> (that would also survive all the cfg cleanups until vectorization)?

The early CFG looks like we have a common header with two latches
so it boils down to how we disambiguate those in the end (we seem
to unify the latches via a forwarder).  IIRC OMP lowering builds
loops itself, could it not do the appropriate disambiguation itself?

Richard.

> Essentially, in C I'm trying to have:
> int a[1][128];
> void get_me_start_end (int *, int *);
> void
> foo (void)
> {
>   int start, end, curend, i, j;
>   get_me_start_end (&start, &end);
>   i = start / 128;
>   j = start % 128;
>   curend = start + (end - start > 128 - j ? 128 - j : end - start);
>   goto doit;
>   for (i = 0; i < 1; i++)
> {
>   j = 0;
>   curend = start + (end - start > 128 ? 128 : end - start);
>   doit:;
>   /* I'd use start < curend && j < 128 as condition here, but
>the vectorizer doesn't like that either.  So I went to
>using a single IV.  */
>   for (; start < curend; start++, j++)
> a[i][j] += 3;
> }
> }
> 
> This isn't vectorized with -O3 either for the same reason.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend