date:20201005

Re: duplicate arm test results?

2020-10-05 Thread Christophe Lyon via Gcc

On Thu, 24 Sep 2020 at 14:12, Christophe Lyon
 wrote:
>
> On Wed, 23 Sep 2020 at 17:50, Christophe Lyon
>  wrote:
> >
> > On Wed, 23 Sep 2020 at 17:33, Martin Sebor  wrote:
> > >
> > > On 9/23/20 2:54 AM, Christophe Lyon wrote:
> > > > On Wed, 23 Sep 2020 at 01:47, Martin Sebor  wrote:
> > > >>
> > > >> On 9/22/20 9:15 AM, Christophe Lyon wrote:
> > > >>> On Tue, 22 Sep 2020 at 17:02, Martin Sebor  wrote:
> > > 
> > >  Hi Christophe,
> > > 
> > >  While checking recent test results I noticed many posts with results
> > >  for various flavors of arm that at high level seem like duplicates
> > >  of one another.
> > > 
> > >  For example, the batch below all have the same title, but not all
> > >  of the contents are the same.  The details (such as test failures)
> > >  on some of the pages are different.
> > > 
> > >  Can you help explain the differences?  Is there a way to avoid
> > >  the duplication?
> > > 
> > > >>>
> > > >>> Sure, I am aware that many results look the same...
> > > >>>
> > > >>>
> > > >>> If you look at the top of the report (~line 5), you'll see:
> > > >>> Running target myarm-sim
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m3/-mfloat-abi=soft/-march=armv7-m
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m0/-mfloat-abi=soft/-march=armv6s-m
> > > >>> Running target 
> > > >>> myarm-sim/-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m7/-mfloat-abi=hard/-march=armv7e-m+fp.dp
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m4/-mfloat-abi=hard/-march=armv7e-m+fp
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m33/-mfloat-abi=hard/-march=armv8-m.main+fp+dsp
> > > >>> Running target 
> > > >>> myarm-sim/-mcpu=cortex-a7/-mfloat-abi=soft/-march=armv7ve+simd
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
> > > >>>
> > > >>> For all of these, the first line of the report is:
> > > >>> LAST_UPDATED: Tue Sep 22 09:39:18 UTC 2020 (revision
> > > >>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c)
> > > >>> TARGET=arm-none-eabi CPU=default FPU=default MODE=default
> > > >>>
> > > >>> I have other combinations where I override the configure flags, eg:
> > > >>> LAST_UPDATED: Tue Sep 22 11:25:12 UTC 2020 (revision
> > > >>> r9-8928-gb3043e490896ea37cd0273e6e149c3eeb3298720)
> > > >>> TARGET=arm-none-linux-gnueabihf CPU=cortex-a9 FPU=neon-fp16 MODE=thumb
> > > >>>
> > > >>> I tried to see if I could fit something in the subject line, but that
> > > >>> didn't seem convenient (would be too long, and I fear modifying the
> > > >>> awk script)
> > > >>
> > > >> Without some indication of a difference in the title there's no way
> > > >> to know what result to look at, and checking all of them isn't really
> > > >> practical.  The duplication (and the sheer number of results) also
> > > >> make it more difficult to find results for targets other than arm-*.
> > > >> There are about 13,000 results for September and over 10,000 of those
> > > >> for arm-* alone.  It's good to have data but when there's this much
> > > >> of it, and when the only form of presentation is as a running list,
> > > >> it's too cumbersome to work with.
> > > >>
> > > >
> > > > To help me track & report regressions, I build higher level reports 
> > > > like:
> > > > https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/0latest/report-build-info.html
> > > > where it's more obvious what configurations are tested.
> > >
> > > That looks awesome!  The regression indicator looks especially
> > > helpful.  I really wish we had an overview like this for all
> > > results.  I've been thinking about writing a script to scrape
> > > gcc-testresults and format an HTML table kind of like this for
> > > years.  With that, the number of posts sent to the list wouldn't
> > > be a problem (at least not for those using the page).  But it
> > > would require settling on a standard format for the basic
> > > parameters of each run.
> > >
> >
> > It's probably easier to detect regressions and format reports from the
> > .sum files rather than extracting them from the mailing-list.
> > But your approach has the advantage that you can detect regressions
> > from reports sent by other people, not only by you.
> >
> >
> > > >
> > > > Each line of such reports can send a message to gcc-testresults.
> > > >
> > > > I can control when such emails are sent, independently for each line:
> > > > - never
> > > > - for daily bump
> > > > - for each validation
> > > >
> > > > So, I can easily reduce the amount of emails (by disabling them for
> > > > some configurations),
> > > > but that won't make the subject more informative.
> > > > I included the short revision (rXX-) in the title to make it 
> > > > clearer.
> > > >
> > > > The number of configurations has grown o

Multilib Hierarchy

2020-10-05 Thread CHIGOT, CLEMENT via Gcc

Hi everyone,

Recently, with David, we have introduced FAT library support on AIX to enable
64bit as default target (called gcc64 here). Currently, 32bit is the default
(gcc32) and 64bit is just a multilib linked to -maix64 option. These FAT
libraries are archives including both 32 and 64bit shared objects. To create
them, we are retrieving the shared object of the missing architecture from its
multilib directory (ppc32 for 32bit or ppc64 for 64bit) and adding it to the
default library/archive. Here is, for example, the code for libatomic:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libatomic/config/t-aix;h=08607727574f3c376a43e9460c81ae6fbee0fc5f;hb=HEAD.
The aims of these FAT libraries is to make programs created by gcc32 being able
to run on the gcc64's runtime and vice versa. For example, "ppc32" directory is
not present in gcc32, thus a 32bit program built with gcc64 (ie with -maix32
and the use of ppc32 directory) must be able to fallback on the toplevel FAT
libraries.

By default, gcc is almost doing it already. The directories searched for
libraries with a gcc64 32bit programs will have
"/usr/lib/gcc/$target/$version/ppc32:/usr/lib/gcc/$target/$version/." or
something similar. Thus, it will fallback on the FAT libraries under "/." if
"/ppc32" isn't there.

However, as AIX needs special handlers for threaded programs, we have two sets
of FAT libraries, one in the toplevel directory "/." and one in the thread
multilib directory "/pthread". The problem is that a gcc64 program compiled
with both -pthread -maix32 will have a library path similar to
"/pthread/ppc32:/.". Thus, the fallback is made on the toplevel FAT library and
not the pthread one.

We have tried several solutions in order to avoid this problem. First (and it's
still the case in master), we have set "MUTLILIB_MATCHES = .=ppc32" for gcc64.
That way, we are forcing 32bit programs to use the fallback directly instead of
using its ppc32 directory. However, it creates too much problems as the
toplevel directory is 64bit. The includes taken are sometimes wrong, the tests
aren't using the right stuff, etc.
I've also tried "MULTILIB_REUSE = pthread/ppc32=pthread", as "Target Fragment"
doc page says "And for some targets it is better to reuse an existing multilib
than to fall back to default multilib when there is no corresponding multilib.
This can be done by adding reuse rules to MULTILIB_REUSE.". But it's not
changing the search path as excepted.

Thus, I'm wondering if there is a way to create a multilib hierarchy with
several layers. Currently, there is only two layers: "use mutlilib directory
then use default directory". What we would like is at least a 3-level layer:
"use multilib directory then use another multilib directory then use default
one".
If it's not possible, I've several ideas about how it can be introduced and
which part of gcc driver will need to be modified. But I want to be sure I'm
not miss-using MULTILIB_REUSE first.

Sincerely,
Clément Chigot

Re: static inline giving multiple definition errors with "unity" builds

2020-10-05 Thread Nathan Sidwell


On 10/4/20 1:10 PM, Paul Smith wrote:

On Sun, 2020-10-04 at 03:36 -0400, Paul Smith wrote:

I have a templated class C that required its type T to have operator
bool() defined in order to work properly.


Never mind, I think there was some local error where things were not
being recompiled when they should be.  I don't know why that might be
but a full clean/rebuild fixed it.  I've never had this problem
before... so odd.

Sorry for the noise!


heh, it was an amusing story :)

  'the bug must be over there.    oops, no it wasn't'


nathan
--
Nathan Sidwell

[Patch] Overflow-trapping integer arithmetic routines7code: bloated and slooooow

2020-10-05 Thread Stefan Kanthak

The implementation of the functions __absv?i2(), __addv?i3() etc. for
trapping integer overflow provided in libgcc2.c is rather bad.
Same for __cmp?i2() and __ucmp?i2()

GCC creates awful to horrible code for them (at least for AMD64 and
i386 processors): see 
for some examples.

The attached diff/patch provides better implementations.

Stefan

libgcc2.diff
Description: Binary data

Re: [Patch] Overflow-trapping integer arithmetic routines7code: bloated and slooooow

2020-10-05 Thread Jonathan Wakely via Gcc

On Mon, 5 Oct 2020 at 17:12, Stefan Kanthak  wrote:
>
> The implementation of the functions __absv?i2(), __addv?i3() etc. for
> trapping integer overflow provided in libgcc2.c is rather bad.
> Same for __cmp?i2() and __ucmp?i2()
>
> GCC creates awful to horrible code for them (at least for AMD64 and
> i386 processors): see 
> for some examples.
>
> The attached diff/patch provides better implementations.


Patches should go to the gcc-patches list, see
https://gcc.gnu.org/contribute.html#patches

Re: Git rejecting branch merge

2020-10-05 Thread Joel Brobecker

> > > I wonder I can get the branch moved, so I can do the benchmarking :)
> > > Any suggestions how to do that?
> 
> I just installed a small patch, hot-fix style which I am hoping will
> fix your problem. Can you try it? It passes the testsuite, so the change
> should be safe.

And now, the fix that was actually pushed has also been deployed.

> Let me know how it goes. I will finish the work over the weekend
> so as to replace the local diff by an actual commit (after review
> from a coworker of mine).

-- 
Joel

Loop question

2020-10-05 Thread Jakub Jelinek via Gcc

Hi!

Compiling the following testcase with -O2 -fopenmp:
int a[1][128];

__attribute__((noipa)) void
foo (void)
{
  #pragma omp for simd schedule (simd: dynamic, 32) collapse(2)
  for (int i = 0; i < 1; i++)
for (int j = 0; j < 128; j++)
  a[i][j] += 3;
}

int
main ()
{
  for (int i = 0; i < 1; i++)
for (int j = 0; j < 128; j++)
  {
asm volatile ("" : : "r" (&a[0][0]) : "memory");
a[i][j] = i + j;
  }
  foo ();
  for (int i = 0; i < 1; i++)
for (int j = 0; j < 128; j++)
  if (a[i][j] != i + j + 3)
__builtin_abort ();
  return 0;
}
doesn't seem result in the vectorization I was hoping to see.
As has been changed recently, I'm only trying to vectorize now the
innermost loop of the collapse with outer loops around it being normal
scalar loops like those written in the source and with only omp simd
it works fine, but for the combined constructs the current thread gets
assigned some range of logical iterations, therefore I get a pair of
in this case i and j starting values.

At the end of ompexp I have:
...
  D.2106 = (unsigned int) D.2105;
  D.2107 = MIN_EXPR ;
  D.2103 = D.2107 + .iter.4;
  goto ; [INV]
;;succ:   5

;;   basic block 4, loop depth 2
;;pred:   5
  i = i.0;
  j = j.1;
  _1 = a[i][j];
  _2 = _1 + 3;
  a[i][j] = _2;
  .iter.4 = .iter.4 + 1;
  j.1 = j.1 + 1;
;;succ:   5

;;   basic block 5, loop depth 2
;;pred:   4
;;3
;;7
  if (.iter.4 < D.2103)
goto ; [87.50%]
  else
goto ; [12.50%]
;;succ:   4
;;6

;;   basic block 6, loop depth 2
;;pred:   5
  i.0 = i.0 + 1;
  if (i.0 < 1)
goto ; [87.50%]
  else
goto ; [12.50%]
;;succ:   8
;;7

;;   basic block 7, loop depth 2
;;pred:   6
  j.1 = 0;
  D.2108 = D.2099 - .iter.4;
  D.2109 = MIN_EXPR ;
  D.2103 = D.2109 + .iter.4;
  goto ; [INV]

I was really hoping bbs 4 and 5 would be one loop (the one I set safelen
and force_vectorize etc. for) and that basic blocks 6 and 7 would be
together with that inner loop another loop, but apparently loop discovery
thinks it is just one loop.
Any ideas what I'm doing wrong or is there any way how to make it two loops
(that would also survive all the cfg cleanups until vectorization)?

Essentially, in C I'm trying to have:
int a[1][128];
void get_me_start_end (int *, int *);
void
foo (void)
{
  int start, end, curend, i, j;
  get_me_start_end (&start, &end);
  i = start / 128;
  j = start % 128;
  curend = start + (end - start > 128 - j ? 128 - j : end - start);
  goto doit;
  for (i = 0; i < 1; i++)
{
  j = 0;
  curend = start + (end - start > 128 ? 128 : end - start);
  doit:;
  /* I'd use start < curend && j < 128 as condition here, but
 the vectorizer doesn't like that either.  So I went to
 using a single IV.  */
  for (; start < curend; start++, j++)
a[i][j] += 3;
}
}

This isn't vectorized with -O3 either for the same reason.

Jakub

Re: Navigational corrections

2020-10-05 Thread Alejandro Colomar via Gcc


Hi Michael,

On 2020-10-03 13:39, Michael Kerrisk (man-pages) wrote:

Hi Alex,

[...]


off_t would be great.

In case you are looking for some other candidates, some others
that I would be interested to see go into the page would be

fd_set
clock_t
clockid_t
and probably dev_t


Great!

off_t is almost done.  I think I have too many references in "See also".

I'll send you the patch, and trim as you want :)




Thanks,

Michael



Cheers,

Alex

Re: Loop question

2020-10-05 Thread Richard Biener

On Mon, 5 Oct 2020, Jakub Jelinek wrote:

> Hi!
> 
> Compiling the following testcase with -O2 -fopenmp:
> int a[1][128];
> 
> __attribute__((noipa)) void
> foo (void)
> {
>   #pragma omp for simd schedule (simd: dynamic, 32) collapse(2)
>   for (int i = 0; i < 1; i++)
> for (int j = 0; j < 128; j++)
>   a[i][j] += 3;
> }
> 
> int
> main ()
> {
>   for (int i = 0; i < 1; i++)
> for (int j = 0; j < 128; j++)
>   {
>   asm volatile ("" : : "r" (&a[0][0]) : "memory");
>   a[i][j] = i + j;
>   }
>   foo ();
>   for (int i = 0; i < 1; i++)
> for (int j = 0; j < 128; j++)
>   if (a[i][j] != i + j + 3)
>   __builtin_abort ();
>   return 0;
> }
> doesn't seem result in the vectorization I was hoping to see.
> As has been changed recently, I'm only trying to vectorize now the
> innermost loop of the collapse with outer loops around it being normal
> scalar loops like those written in the source and with only omp simd
> it works fine, but for the combined constructs the current thread gets
> assigned some range of logical iterations, therefore I get a pair of
> in this case i and j starting values.
> 
> At the end of ompexp I have:
> ...
>   D.2106 = (unsigned int) D.2105;
>   D.2107 = MIN_EXPR ;
>   D.2103 = D.2107 + .iter.4;
>   goto ; [INV]
> ;;succ:   5
> 
> ;;   basic block 4, loop depth 2
> ;;pred:   5
>   i = i.0;
>   j = j.1;
>   _1 = a[i][j];
>   _2 = _1 + 3;
>   a[i][j] = _2;
>   .iter.4 = .iter.4 + 1;
>   j.1 = j.1 + 1;
> ;;succ:   5
> 
> ;;   basic block 5, loop depth 2
> ;;pred:   4
> ;;3
> ;;7
>   if (.iter.4 < D.2103)
> goto ; [87.50%]
>   else
> goto ; [12.50%]
> ;;succ:   4
> ;;6
> 
> ;;   basic block 6, loop depth 2
> ;;pred:   5
>   i.0 = i.0 + 1;
>   if (i.0 < 1)
> goto ; [87.50%]
>   else
> goto ; [12.50%]
> ;;succ:   8
> ;;7
> 
> ;;   basic block 7, loop depth 2
> ;;pred:   6
>   j.1 = 0;
>   D.2108 = D.2099 - .iter.4;
>   D.2109 = MIN_EXPR ;
>   D.2103 = D.2109 + .iter.4;
>   goto ; [INV]
> 
> I was really hoping bbs 4 and 5 would be one loop (the one I set safelen
> and force_vectorize etc. for) and that basic blocks 6 and 7 would be
> together with that inner loop another loop, but apparently loop discovery
> thinks it is just one loop.
> Any ideas what I'm doing wrong or is there any way how to make it two loops
> (that would also survive all the cfg cleanups until vectorization)?

The early CFG looks like we have a common header with two latches
so it boils down to how we disambiguate those in the end (we seem
to unify the latches via a forwarder).  IIRC OMP lowering builds
loops itself, could it not do the appropriate disambiguation itself?

Richard.

> Essentially, in C I'm trying to have:
> int a[1][128];
> void get_me_start_end (int *, int *);
> void
> foo (void)
> {
>   int start, end, curend, i, j;
>   get_me_start_end (&start, &end);
>   i = start / 128;
>   j = start % 128;
>   curend = start + (end - start > 128 - j ? 128 - j : end - start);
>   goto doit;
>   for (i = 0; i < 1; i++)
> {
>   j = 0;
>   curend = start + (end - start > 128 ? 128 : end - start);
>   doit:;
>   /* I'd use start < curend && j < 128 as condition here, but
>the vectorizer doesn't like that either.  So I went to
>using a single IV.  */
>   for (; start < curend; start++, j++)
> a[i][j] += 3;
> }
> }
> 
> This isn't vectorized with -O3 either for the same reason.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

Re: duplicate arm test results?

Multilib Hierarchy

Re: static inline giving multiple definition errors with "unity" builds

[Patch] Overflow-trapping integer arithmetic routines7code: bloated and slooooow

Re: [Patch] Overflow-trapping integer arithmetic routines7code: bloated and slooooow

Re: Git rejecting branch merge

Loop question

Re: Navigational corrections

Re: Loop question

9 matches

Site Navigation

Mail list logo

Footer information