RE: Warning: unpredictable: identical transfer and status registers --`stxr w4,x5,[x4] using aarch64 poky gcc 8.3
> -Original Message- > From: Michael Matz [mailto:m...@suse.de] > Sent: 2019年2月13日 22:45 > To: Peng Fan > Cc: gcc@gcc.gnu.org; james.greenha...@arm.com; n...@arm.com; > jailhouse-...@googlegroups.com; will.dea...@arm.com; Catalin Marinas > > Subject: RE: Warning: unpredictable: identical transfer and status registers > --`stxr w4,x5,[x4] using aarch64 poky gcc 8.3 > > Hi, > > On Wed, 13 Feb 2019, Peng Fan wrote: > > > So the fix should be the following, right? > > Yup. Thanks for your help. Thanks, Peng. > > > Ciao, > Michael.
riscv64 dep. computation
Hello, While working on a private port of riscv, I noticed that upstream shows the same behaviour. For the code: #define TYPE unsigned short struct foo_t { TYPE a; TYPE b; TYPE c; }; void func (struct foo_t *x, struct foo_t *y) { y->a = x->a; y->b = x->b; y->c = x->c; } If I compile this with -O2, sched1 groups all loads and all stores together. That's perfect. However, if I change TYPE to unsigned char and recompile, the stores and loads are interleaved. Further investigation shows that for unsigned char there are extra dependencies that block the scheduler from grouping stores and loads. For example, there's a dependency between: (insn 8 3 9 2 (set (mem:QI (reg/v/f:DI 76 [ yD.1533 ]) [0 y_6(D)->aD.1529+0 S1 A8]) (subreg/s/v:QI (reg:DI 72 [ _1 ]) 0)) "test.c":13:8 142 {*movqi_internal} (expr_list:REG_DEAD (reg:DI 72 [ _1 ]) (nil))) and (insn 11 10 12 2 (set (reg:DI 74 [ _3 ]) (zero_extend:DI (mem:QI (plus:DI (reg/v/f:DI 75 [ xD.1532 ]) (const_int 2 [0x2])) [0 x_5(D)->cD.1531+0 S1 A8]))) "test.c":15:11 89 {zero_extendqidi2} (expr_list:REG_DEAD (reg/v/f:DI 75 [ xD.1532 ]) (nil))) which didn't exist in the `unsigned short' case. I can't find where this dependency is coming from but also can't justify it so it seems like a bug to me. Is there a reason for this to happen that I might not be aware of? While I am at it, debugging compute_block_dependencies in sched-rgn.c is a massive pain. This calls sched_analyze which receives a struct deps_desc that tracks the dependencies in the insn list. Is there a way to pretty print this structure in gdb? Kind regards, -- Paulo Matos
Re: riscv64 dep. computation
On 2/14/19 3:13 AM, Paulo Matos wrote: If I compile this with -O2, sched1 groups all loads and all stores together. That's perfect. However, if I change TYPE to unsigned char and recompile, the stores and loads are interleaved. Further investigation shows that for unsigned char there are extra dependencies that block the scheduler from grouping stores and loads. The ISO C standard says that anything can be casted to char *, and char * can be casted to anything. Hence, a char * pointer aliases everything. If you look at the alias set info in the MEMs, you can see that the char * references are in alias set 0, which means that they alias everything. The short * references are in alias set 2 which means they only alias other stuff in alias set 2. The difference here is that short * does not alias the structure pointers, but char * does. I haven't tried debugging your example, but this is presumably where the difference comes from. Because x and y are pointer parameters, the compiler must assume that they might alias. And because char * aliases everything, the char references alias them too. If you change x and y to global variables, then they no longer alias each other, and the compiler will schedule all of the loads first, even for char. Jim
GCC missing -flto optimizations? SPEC lbm benchmark
I have a question about SPEC CPU 2017 and what GCC can and cannot do with -flto. As part of some SPEC analysis I am doing I found that with -Ofast, ICC and GCC were not that far apart (especially spec int rate, spec fp rate was a slightly larger difference). But when I added -ipo to the ICC command and -flto to the GCC command, the difference got larger. In particular the 519.lbm_r was more than twice as fast with ICC and -ipo, but -flto did not help GCC at all. There are other tests that also show this type of improvement with -ipo like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and 548.exchange2_r, but none are as dramatic as 519.lbm_r. Anyone have any idea on what ICC is doing that GCC is missing? Is GCC just not agressive enough with its inlining? Steve Ellcey sell...@marvell.com
gcc-7-20190214 is now available
Snapshot gcc-7-20190214 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/7-20190214/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 7 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch revision 268907 You'll find: gcc-7-20190214.tar.xzComplete GCC SHA256=fd4aa146e4354b847a3c073f6b7ebce83462d236a0107a66b930c566d85a5762 SHA1=406ecabeee26380bf1fe22e7130ae3bf99d47157 Diffs from 7-20190207 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-7 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: riscv64 dep. computation
On 14/02/2019 19:56, Jim Wilson wrote: > On 2/14/19 3:13 AM, Paulo Matos wrote: >> If I compile this with -O2, sched1 groups all loads and all stores >> together. That's perfect. However, if I change TYPE to unsigned char and >> recompile, the stores and loads are interleaved. >> >> Further investigation shows that for unsigned char there are extra >> dependencies that block the scheduler from grouping stores and loads. > > The ISO C standard says that anything can be casted to char *, and char > * can be casted to anything. Hence, a char * pointer aliases everything. > > If you look at the alias set info in the MEMs, you can see that the char > * references are in alias set 0, which means that they alias everything. > The short * references are in alias set 2 which means they only alias > other stuff in alias set 2. The difference here is that short * does > not alias the structure pointers, but char * does. I haven't tried > debugging your example, but this is presumably where the difference > comes from. > OK, that seems to make sense. Indeed if I use restrict on the argument pointers, the compiler will sort itself out and group the loads and stores. > Because x and y are pointer parameters, the compiler must assume that > they might alias. And because char * aliases everything, the char > references alias them too. If you change x and y to global variables, > then they no longer alias each other, and the compiler will schedule all > of the loads first, even for char. > Are global variables not supposed to alias each other? If I indeed do that, gcc still won't group loads and stores: https://cx.rv8.io/g/rFjGLa -- Paulo Matos