RE: Warning: unpredictable: identical transfer and status registers --`stxr w4,x5,[x4] using aarch64 poky gcc 8.3

2019-02-14 Thread Peng Fan

> -Original Message-
> From: Michael Matz [mailto:m...@suse.de]
> Sent: 2019年2月13日 22:45
> To: Peng Fan 
> Cc: gcc@gcc.gnu.org; james.greenha...@arm.com; n...@arm.com;
> jailhouse-...@googlegroups.com; will.dea...@arm.com; Catalin Marinas
> 
> Subject: RE: Warning: unpredictable: identical transfer and status registers
> --`stxr w4,x5,[x4] using aarch64 poky gcc 8.3
> 
> Hi,
> 
> On Wed, 13 Feb 2019, Peng Fan wrote:
> 
> > So the fix should be the following, right?
> 
> Yup.

Thanks for your help.

Thanks,
Peng.

> 
> 
> Ciao,
> Michael.


riscv64 dep. computation

2019-02-14 Thread Paulo Matos
Hello,

While working on a private port of riscv, I noticed that upstream shows
the same behaviour.

For the code:
#define TYPE unsigned short

struct foo_t
{
  TYPE a;
  TYPE b;
  TYPE c;
};

void
func (struct foo_t *x, struct foo_t *y)
{
  y->a = x->a;
  y->b = x->b;
  y->c = x->c;
}

If I compile this with -O2, sched1 groups all loads and all stores
together. That's perfect. However, if I change TYPE to unsigned char and
recompile, the stores and loads are interleaved.

Further investigation shows that for unsigned char there are extra
dependencies that block the scheduler from grouping stores and loads.

For example, there's a dependency between:
(insn 8 3 9 2 (set (mem:QI (reg/v/f:DI 76 [ yD.1533 ]) [0
y_6(D)->aD.1529+0 S1 A8])
(subreg/s/v:QI (reg:DI 72 [ _1 ]) 0)) "test.c":13:8 142
{*movqi_internal}
 (expr_list:REG_DEAD (reg:DI 72 [ _1 ])
(nil)))

and
(insn 11 10 12 2 (set (reg:DI 74 [ _3 ])
(zero_extend:DI (mem:QI (plus:DI (reg/v/f:DI 75 [ xD.1532 ])
(const_int 2 [0x2])) [0 x_5(D)->cD.1531+0 S1 A8])))
"test.c":15:11 89 {zero_extendqidi2}
 (expr_list:REG_DEAD (reg/v/f:DI 75 [ xD.1532 ])
(nil)))

which didn't exist in the `unsigned short' case.

I can't find where this dependency is coming from but also can't justify
it so it seems like a bug to me. Is there a reason for this to happen
that I might not be aware of?

While I am at it, debugging compute_block_dependencies in sched-rgn.c is
a massive pain. This calls sched_analyze which receives a struct
deps_desc that tracks the dependencies in the insn list. Is there a way
to pretty print this structure in gdb?

Kind regards,

-- 
Paulo Matos


Re: riscv64 dep. computation

2019-02-14 Thread Jim Wilson
On 2/14/19 3:13 AM, Paulo Matos wrote:

If I compile this with -O2, sched1 groups all loads and all stores
together. That's perfect. However, if I change TYPE to unsigned char and
recompile, the stores and loads are interleaved.

Further investigation shows that for unsigned char there are extra
dependencies that block the scheduler from grouping stores and loads.


The ISO C standard says that anything can be casted to char *, and char 
* can be casted to anything.  Hence, a char * pointer aliases everything.


If you look at the alias set info in the MEMs, you can see that the char 
* references are in alias set 0, which means that they alias everything. 
 The short * references are in alias set 2 which means they only alias 
other stuff in alias set 2.  The difference here is that short * does 
not alias the structure pointers, but char * does.  I haven't tried 
debugging your example, but this is presumably where the difference 
comes from.


Because x and y are pointer parameters, the compiler must assume that 
they might alias.  And because char * aliases everything, the char 
references alias them too.  If you change x and y to global variables, 
then they no longer alias each other, and the compiler will schedule all 
of the loads first, even for char.


Jim


GCC missing -flto optimizations? SPEC lbm benchmark

2019-02-14 Thread Steve Ellcey
I have a question about SPEC CPU 2017 and what GCC can and cannot do
with -flto.  As part of some SPEC analysis I am doing I found that with
-Ofast, ICC and GCC were not that far apart (especially spec int rate,
spec fp rate was a slightly larger difference).

But when I added -ipo to the ICC command and -flto to the GCC command,
the difference got larger.  In particular the 519.lbm_r was more than
twice as fast with ICC and -ipo, but -flto did not help GCC at all.

There are other tests that also show this type of improvement with -ipo
like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and
548.exchange2_r, but none are as dramatic as 519.lbm_r.  Anyone have
any idea on what ICC is doing that GCC is missing?  Is GCC just not
agressive enough with its inlining?

Steve Ellcey
sell...@marvell.com


gcc-7-20190214 is now available

2019-02-14 Thread gccadmin
Snapshot gcc-7-20190214 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/7-20190214/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 7 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch 
revision 268907

You'll find:

 gcc-7-20190214.tar.xzComplete GCC

  SHA256=fd4aa146e4354b847a3c073f6b7ebce83462d236a0107a66b930c566d85a5762
  SHA1=406ecabeee26380bf1fe22e7130ae3bf99d47157

Diffs from 7-20190207 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-7
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: riscv64 dep. computation

2019-02-14 Thread Paulo Matos


On 14/02/2019 19:56, Jim Wilson wrote:
> On 2/14/19 3:13 AM, Paulo Matos wrote:
>> If I compile this with -O2, sched1 groups all loads and all stores
>> together. That's perfect. However, if I change TYPE to unsigned char and
>> recompile, the stores and loads are interleaved.
>>
>> Further investigation shows that for unsigned char there are extra
>> dependencies that block the scheduler from grouping stores and loads.
> 
> The ISO C standard says that anything can be casted to char *, and char
> * can be casted to anything.  Hence, a char * pointer aliases everything.
> 
> If you look at the alias set info in the MEMs, you can see that the char
> * references are in alias set 0, which means that they alias everything.
>  The short * references are in alias set 2 which means they only alias
> other stuff in alias set 2.  The difference here is that short * does
> not alias the structure pointers, but char * does.  I haven't tried
> debugging your example, but this is presumably where the difference
> comes from.
>

OK, that seems to make sense. Indeed if I use restrict on the argument
pointers, the compiler will sort itself out and group the loads and stores.

> Because x and y are pointer parameters, the compiler must assume that
> they might alias.  And because char * aliases everything, the char
> references alias them too.  If you change x and y to global variables,
> then they no longer alias each other, and the compiler will schedule all
> of the loads first, even for char.
> 

Are global variables not supposed to alias each other?
If I indeed do that, gcc still won't group loads and stores:
https://cx.rv8.io/g/rFjGLa

-- 
Paulo Matos