Question about creating global varaiable during IPA PASS.

2023-12-13 Thread Hanke Zhang via Gcc
Hi, I'm trying to create a global variable in my own PASS which
located at the LATE_IPA_PASSES. (I'm using GCC 10.3.0.)

And after creating it, I added the attributes like the following.

// 1. create the var
tree new_name = get_identifier (xx);
tree new_type = build_pointer_type (xx);
tree new_var = build_decl (UNKNOWN_LOCATION, VAR_DECL, new_name, new_type);
add_attributes (new_var);

static void
add_attributes (tree var)
{
DECL_ARTIFICIAL (var) = 1;
DECL_EXTERNAL (var) = 0;
TREE_STATIC (var) = 1;
TREE_PUBLIC (var) = 1;
TREE_USED (var) = 1;
DECL_CONTEXT (var) = NULL_TREE;
TREE_THIS_VOLATILE (var) = 0;
TREE_ADDRESSABLE (var) = 0;
TREE_READONLY (var) = 0;
if (is_global_var (var))
  set_decl_tls_model (var, TLS_MODEL_NONE);
}

But when I try to compile some example files with -flto, error occurs.

/usr/bin/ld: xxx.ltrans0.ltrans.o: in function `xxx':
xxx.c: undefined reference to `glob_var'
xxx.c: undefined reference to `glob_var'
xxx.c: undefined reference to `glob_var'

Here `glob_var' is the global varaiable created in my PASS.

I would like to ask, am I using some attributes incorrectly?

Thanks
Hanke Zhang


Switching x86_64-linux-gnu to GNU2 TLS descriptors by default

2023-12-13 Thread Florian Weimer via Gcc
I feel like I have asked this before.  Currently, GCC uses calls to
__tls_get_addr to obtain the address of global-dynamic TLS variables.
On other architectures with support for GNU2 TLS descriptors, those are
used by default.

Should we flip the default to GNU2 descriptors?  Support has been
available in glibc for a long, long time.  Is there any other reason for
not doing this?  On the glibc side, the behavior regarding lazy
initialization and symbol binding does not change whether the old or new
interface is used.

Thanks,
Florian



Re: Switching x86_64-linux-gnu to GNU2 TLS descriptors by default

2023-12-13 Thread H.J. Lu via Gcc
On Wed, Dec 13, 2023 at 6:19 AM Florian Weimer via Gcc  wrote:
>
> I feel like I have asked this before.  Currently, GCC uses calls to
> __tls_get_addr to obtain the address of global-dynamic TLS variables.
> On other architectures with support for GNU2 TLS descriptors, those are
> used by default.
>
> Should we flip the default to GNU2 descriptors?  Support has been
> available in glibc for a long, long time.  Is there any other reason for
> not doing this?  On the glibc side, the behavior regarding lazy
> initialization and symbol binding does not change whether the old or new
> interface is used.
>
> Thanks,
> Florian
>

It sounds good to me.

Thanks.

-- 
H.J.


Build breakage

2023-12-13 Thread Jerry D via Gcc

I am getting this failure to build from clean trunk.

In file included from ../../../../trunk/libgomp/config/linux/allocator.c:31:
../../../../trunk/libgomp/config/linux/allocator.c: In function 
‘linux_memspace_alloc’:
../../../../trunk/libgomp/config/linux/allocator.c:70:26: error: format 
‘%ld’ expects argument of type ‘long int’, but argument 3 has type 
‘size_t’ {aka ‘unsigned int’} [-Werror=format=]

   70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
  |  ^
   71 |   " memory (ulimit too low?)\n", size);
  |  
  |  |
  |  size_t 
{aka unsigned int}
../../../../trunk/libgomp/libgomp.h:186:29: note: in definition of macro 
‘gomp_debug’

  186 |   (gomp_debug) ((KIND), __VA_ARGS__); \
  | ^~~
../../../../trunk/libgomp/config/linux/allocator.c:70:52: note: format 
string is defined here

   70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
  |  ~~^
  ||
  |long int
  |  %d



Re: Build breakage

2023-12-13 Thread Thomas Schwinge
Hi!

On 2023-12-13T11:15:54-0800, Jerry D via Gcc  wrote:
> I am getting this failure to build from clean trunk.

This is due to commit r14-6499-g348874f0baac0f22c98ab11abbfa65fd172f6bdd
"libgomp: basic pinned memory on Linux", which supposedly was only tested
with '--disable-multilib' or so.  As Andrew's now on vacations --
conveniently ;-P -- I'll soon push a fix.

(To restore your build, you may locally disable the 'gomp_debug' call, or
cast 'size' into '(long) size', for example.)


Grüße
 Thomas


> In file included from ../../../../trunk/libgomp/config/linux/allocator.c:31:
> ../../../../trunk/libgomp/config/linux/allocator.c: In function
> ‘linux_memspace_alloc’:
> ../../../../trunk/libgomp/config/linux/allocator.c:70:26: error: format
> ‘%ld’ expects argument of type ‘long int’, but argument 3 has type
> ‘size_t’ {aka ‘unsigned int’} [-Werror=format=]
> 70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
>|  ^
> 71 |   " memory (ulimit too low?)\n", size);
>|  
>|  |
>|  size_t
> {aka unsigned int}
> ../../../../trunk/libgomp/libgomp.h:186:29: note: in definition of macro
> ‘gomp_debug’
>186 |   (gomp_debug) ((KIND), __VA_ARGS__); \
>| ^~~
> ../../../../trunk/libgomp/config/linux/allocator.c:70:52: note: format
> string is defined here
> 70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
>|  ~~^
>||
>|long int
>|  %d
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: Switching x86_64-linux-gnu to GNU2 TLS descriptors by default

2023-12-13 Thread Sam James via Gcc


Florian Weimer via Gcc  writes:

> I feel like I have asked this before.  Currently, GCC uses calls to
> __tls_get_addr to obtain the address of global-dynamic TLS variables.
> On other architectures with support for GNU2 TLS descriptors, those are
> used by default.
>
> Should we flip the default to GNU2 descriptors?  Support has been
> available in glibc for a long, long time.  Is there any other reason for
> not doing this?  On the glibc side, the behavior regarding lazy
> initialization and symbol binding does not change whether the old or new
> interface is used.
>

I was planning on bringing this up but I was worried there was some
reason we hadn't done it given it's been so long. Thank you!




Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld' format string mismatch (was: Build breakage)

2023-12-13 Thread Thomas Schwinge
Hi!

On 2023-12-13T20:36:40+0100, I wrote:
> On 2023-12-13T11:15:54-0800, Jerry D via Gcc  wrote:
>> I am getting this failure to build from clean trunk.
>
> This is due to commit r14-6499-g348874f0baac0f22c98ab11abbfa65fd172f6bdd
> "libgomp: basic pinned memory on Linux", which supposedly was only tested
> with '--disable-multilib' or so.  As Andrew's now on vacations --
> conveniently ;-P -- I'll soon push a fix.

Pushed to master branch commit 5445ff4a51fcee4d281f79b5f54b349290d0327d
"Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld' format string 
mismatch",
see attached.


Grüße
 Thomas


>> In file included from ../../../../trunk/libgomp/config/linux/allocator.c:31:
>> ../../../../trunk/libgomp/config/linux/allocator.c: In function
>> ‘linux_memspace_alloc’:
>> ../../../../trunk/libgomp/config/linux/allocator.c:70:26: error: format
>> ‘%ld’ expects argument of type ‘long int’, but argument 3 has type
>> ‘size_t’ {aka ‘unsigned int’} [-Werror=format=]
>> 70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
>>|  ^
>> 71 |   " memory (ulimit too low?)\n", size);
>>|  
>>|  |
>>|  size_t
>> {aka unsigned int}
>> ../../../../trunk/libgomp/libgomp.h:186:29: note: in definition of macro
>> ‘gomp_debug’
>>186 |   (gomp_debug) ((KIND), __VA_ARGS__); \
>>| ^~~
>> ../../../../trunk/libgomp/config/linux/allocator.c:70:52: note: format
>> string is defined here
>> 70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
>>|  ~~^
>>||
>>|long int
>>|  %d


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 5445ff4a51fcee4d281f79b5f54b349290d0327d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 13 Dec 2023 17:48:11 +0100
Subject: [PATCH] Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld'
 format string mismatch
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fix-up for commit 348874f0baac0f22c98ab11abbfa65fd172f6bdd
"libgomp: basic pinned memory on Linux", which may result in build failures
as follow, for example, for the '-m32' multilib of x86_64-pc-linux-gnu:

In file included from [...]/source-gcc/libgomp/config/linux/allocator.c:31:
[...]/source-gcc/libgomp/config/linux/allocator.c: In function ‘linux_memspace_alloc’:
[...]/source-gcc/libgomp/config/linux/allocator.c:70:26: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ {aka ‘unsigned int’} [-Werror=format=]
   70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
  |  ^
   71 |   " memory (ulimit too low?)\n", size);
  |  
  |  |
  |  size_t {aka unsigned int}
[...]/source-gcc/libgomp/libgomp.h:186:29: note: in definition of macro ‘gomp_debug’
  186 |   (gomp_debug) ((KIND), __VA_ARGS__); \
  | ^~~
[...]/source-gcc/libgomp/config/linux/allocator.c:70:52: note: format string is defined here
   70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
  |  ~~^
  ||
  |long int
  |  %d
cc1: all warnings being treated as errors
make[9]: *** [allocator.lo] Error 1
make[9]: Leaving directory `[...]/build-gcc/x86_64-pc-linux-gnu/32/libgomp'
[...]

Fix this in the same way as used elsewhere in libgomp.

	libgomp/
	* config/linux/allocator.c (linux_memspace_alloc): Fix 'size_t'
	vs. '%ld' format string mismatch.
---
 libgomp/config/linux/allocator.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index 269d0d607d8..6ffa2417913 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -50,6 +50,9 @@
 #include 
 #includ

Re: Build breakage

2023-12-13 Thread Jonathan Wakely via Gcc
On Wed, 13 Dec 2023, 19:37 Thomas Schwinge,  wrote:

> Hi!
>
> On 2023-12-13T11:15:54-0800, Jerry D via Gcc  wrote:
> > I am getting this failure to build from clean trunk.
>
> This is due to commit r14-6499-g348874f0baac0f22c98ab11abbfa65fd172f6bdd
> "libgomp: basic pinned memory on Linux", which supposedly was only tested
> with '--disable-multilib' or so.  As Andrew's now on vacations --
> conveniently ;-P -- I'll soon push a fix.
>
> (To restore your build, you may locally disable the 'gomp_debug' call, or
> cast 'size' into '(long) size', for example.)
>

Wouldn't --disable-werror work too?


Re: Build breakage

2023-12-13 Thread Thomas Schwinge
Hi!

On 2023-12-13T20:27:44+, Jonathan Wakely  wrote:
> On Wed, 13 Dec 2023, 19:37 Thomas Schwinge,  wrote:
>> On 2023-12-13T11:15:54-0800, Jerry D via Gcc  wrote:
>> > I am getting this failure to build from clean trunk.
>>
>> This is due to commit r14-6499-g348874f0baac0f22c98ab11abbfa65fd172f6bdd
>> "libgomp: basic pinned memory on Linux", which supposedly was only tested
>> with '--disable-multilib' or so.  As Andrew's now on vacations --
>> conveniently ;-P -- I'll soon push a fix.
>>
>> (To restore your build, you may locally disable the 'gomp_debug' call, or
>> cast 'size' into '(long) size', for example.)
>>
>
> Wouldn't --disable-werror work too?

I suppose so, but that comes with re-'configure'ing, re-starting the
build from scratch, or otherwise manually fiddling with 'Makefile's etc.,
whereas after editing the source file as indicated, you may simply resume
'make'.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: Build breakage

2023-12-13 Thread Jonathan Wakely via Gcc
On Wed, 13 Dec 2023 at 20:56, Thomas Schwinge  wrote:
>
> Hi!
>
> On 2023-12-13T20:27:44+, Jonathan Wakely  wrote:
> > On Wed, 13 Dec 2023, 19:37 Thomas Schwinge,  wrote:
> >> On 2023-12-13T11:15:54-0800, Jerry D via Gcc  wrote:
> >> > I am getting this failure to build from clean trunk.
> >>
> >> This is due to commit r14-6499-g348874f0baac0f22c98ab11abbfa65fd172f6bdd
> >> "libgomp: basic pinned memory on Linux", which supposedly was only tested
> >> with '--disable-multilib' or so.  As Andrew's now on vacations --
> >> conveniently ;-P -- I'll soon push a fix.
> >>
> >> (To restore your build, you may locally disable the 'gomp_debug' call, or
> >> cast 'size' into '(long) size', for example.)
> >>
> >
> > Wouldn't --disable-werror work too?
>
> I suppose so, but that comes with re-'configure'ing, re-starting the
> build from scratch, or otherwise manually fiddling with 'Makefile's etc.,
> whereas after editing the source file as indicated, you may simply resume
> 'make'.

True. Sometimes start a completely new build with different config is
preferable to changing the sources. Sometimes it isn't :-)


Re: Switching x86_64-linux-gnu to GNU2 TLS descriptors by default

2023-12-13 Thread Andrew Pinski via Gcc
On Wed, Dec 13, 2023 at 6:19 AM Florian Weimer via Gcc  wrote:
>
> I feel like I have asked this before.  Currently, GCC uses calls to
> __tls_get_addr to obtain the address of global-dynamic TLS variables.
> On other architectures with support for GNU2 TLS descriptors, those are
> used by default.
>
> Should we flip the default to GNU2 descriptors?  Support has been
> available in glibc for a long, long time.  Is there any other reason for
> not doing this?  On the glibc side, the behavior regarding lazy
> initialization and symbol binding does not change whether the old or new
> interface is used.

Just FYI, the last time this was asked was 6 years ago but maybe
things has changed since:
https://inbox.sourceware.org/gcc-patches/came9rop_68qpdlz25poha1ewb6pgquvv_+h5bxgfhu05mh9...@mail.gmail.com/

Thanks,
Andrew

>
> Thanks,
> Florian
>


Re: Switching x86_64-linux-gnu to GNU2 TLS descriptors by default

2023-12-13 Thread Andrew Pinski via Gcc
On Wed, Dec 13, 2023 at 1:08 PM Andrew Pinski  wrote:
>
> On Wed, Dec 13, 2023 at 6:19 AM Florian Weimer via Gcc  
> wrote:
> >
> > I feel like I have asked this before.  Currently, GCC uses calls to
> > __tls_get_addr to obtain the address of global-dynamic TLS variables.
> > On other architectures with support for GNU2 TLS descriptors, those are
> > used by default.
> >
> > Should we flip the default to GNU2 descriptors?  Support has been
> > available in glibc for a long, long time.  Is there any other reason for
> > not doing this?  On the glibc side, the behavior regarding lazy
> > initialization and symbol binding does not change whether the old or new
> > interface is used.
>
> Just FYI, the last time this was asked was 6 years ago but maybe
> things has changed since:
> https://inbox.sourceware.org/gcc-patches/came9rop_68qpdlz25poha1ewb6pgquvv_+h5bxgfhu05mh9...@mail.gmail.com/

Oh I noticed that was a bug filed before that asking for testcases to
be added for it on x86_64 but it looks like it was not implemented:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48868

So it might even be broken.

Thanks,
Andrew


>
> Thanks,
> Andrew
>
> >
> > Thanks,
> > Florian
> >


Re: Deprecating nds32-*-linux-* target for GCC 14 (and removing it for GCC 15)

2023-12-13 Thread Chung-Ju Wu via Gcc

On 2023/12/12 07:43 UTC+8, Jeff Law via Gcc wrote:



On 12/11/23 16:19, Andrew Pinski via Gcc wrote:

nds32 support in Linux was removed last year:
https://www.phoronix.com/news/Andes-Tech-NDS32-Removal

The support for glibc never made it upstream as far as I can tell either.

What are others thoughts on this?

I believe the architecture is dead, so I wouldn't lose any sleep if it got 
deprecated across the board.  While my tester includes nds32le-elf and 
nds32be-elf, there's no gdbsim, so those targets don't provide much, if any, 
additional coverage over targets in the tester.

Jeff


Hi Jeff & Andrew,

After a brief discussion with the previous maintainers of
the Linux nds32 port, it is reasonable to deprecate nds32-linux
for GCC 14 and remove it for GCC 15.

However, considering that there are still customers relying on
nds32le-elf and nds32be-elf for their embedded systems,
it would be great to keep nds32le-elf and nds32be-elf in the
GCC/binutils-gdb/newlib-cygwin repository.

Regards,
jasonwucj



Re: Deprecating nds32-*-linux-* target for GCC 14 (and removing it for GCC 15)

2023-12-13 Thread Chung-Ju Wu via Gcc

On 2023/12/12 07:43 UTC+8, Jeff Law via Gcc wrote:



On 12/11/23 16:19, Andrew Pinski via Gcc wrote:

nds32 support in Linux was removed last year:
https://www.phoronix.com/news/Andes-Tech-NDS32-Removal

The support for glibc never made it upstream as far as I can tell either.

What are others thoughts on this?

I believe the architecture is dead, so I wouldn't lose any sleep if it got 
deprecated across the board.  While my tester includes nds32le-elf and 
nds32be-elf, there's no gdbsim, so those targets don't provide much, if any, 
additional coverage over targets in the tester.

Jeff


Hi Jeff,

As for gdbsim/openocd, I remember that we did have nds32 contributions 
previously:
  - gdb: http://sourceware.org/ml/gdb-patches/2013-07/msg00223.html
  - openocd: http://openocd.zylin.com/1259

I suppose they have recently been dropped from the current gdb/openocd 
repository
due to the lack of maintenance.  I will reach out to Andes members to inquire 
whether
the architecture is still considered to be available in gdb/openocd.

Regards,
jasonwucj



Re: Switching x86_64-linux-gnu to GNU2 TLS descriptors by default

2023-12-13 Thread Sam James via Gcc


Florian Weimer via Gcc  writes:

> [...]
> On other architectures with support for GNU2 TLS descriptors, those are
> used by default.
>

It looks like arm32 defaults to gnu, not gnu2. andrew mentioned fdpic
will be an issue there but maybe we could carve that out.

> Should we flip the default to GNU2 descriptors?
> [...]

thanks,
sam


Re: Discussion about arm/aarch64 testcase failures seen with patch for PR111673

2023-12-13 Thread Surya Kumari Jangala via Gcc
Hi Richard,
Thanks a lot for your response!

Another failure reported by the Linaro CI is as follows:

Running gcc:gcc.dg/dg.exp ...
FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump pro_and_epilogue "Performing 
shrink-wrapping"
FAIL: gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing 
shrink-wrapping"

I analyzed the failures and the root cause is the same for both the failures.

The test pr10474.c is as follows:

void f(int *i)
{
if (!i)
return;
else
{
__builtin_printf("Hi");
*i=0;
}
}


With the patch (for PR111673), x1 (volatile) is being assigned to hold value of
x0 (first parameter). Since it is a volatile, x1 is saved to the stack as there
is a call later on. The save to the stack is generated in the LRA pass. The save
is generated in the entry basic block. Due to the usage of the stack pointer in
the entry bb, the testcase fails to be shrink wrapped.

The reason why LRA generates the store insn in the entry bb is as follows:
LRA emits insns to save volatile registers in the inheritance/splitting pass.
In this pass, LRA builds EBBs (Extended Basic Block) and traverses the insns in
the EBBs in reverse order from the last insn to the first insn. When LRA sees a
write to a pseudo (that has been assigned a volatile register), and there is a
read following the write, with an intervening call insn between the write and 
read,
then LRA generates a spill immediately after the write and a restore immediately
before the read. In the above test, there is an EBB containing the entry bb and
the bb with the printf call. In the entry bb, there is a write to x1 (basically
a copy from x0 to x1) and in the printf bb, there is a read of x1 after the call
insn. So LRA generates a spill in the entry bb.

Without patch, x19 is chosen to hold the value of x0. Since x19 is a 
non-volatile,
the input RTL to the shrink wrap pass does not have any code to save x19 to the
stack. Only the insn that copies x0 to x19 is present in the entry bb. In the
shrink wrap pass, this insn is moved down the cfg to the bb containing the call
to printf, thereby allowing prolog to be allocated only where needed. Thus 
shrink
wrap succeeds.


Shrink wrap can be made to succeed if the save of x1 occurs just before the call
insn, instead of generating it after the write to x1. This will ensure that the
spill does not occur in the entry bb. In fact, it is more efficient if the save
occurs only in the path containing the printf call instead of occurring in the
entry bb.

I have a patch (bootstrapped and regtested on powerpc) that makes changes in
LRA to save volatile registers before a call instead of after the write to the
volatile. With this patch, both the above tests pass.

Since the patch for PR111673 has been approved by Vladimir, I plan to
commit the patch to trunk. And I will fix the test failures after doing the
commit.

Regards,
Surya



On 28/11/23 7:18 pm, Richard Earnshaw wrote:
> 
> 
> On 28/11/2023 12:52, Surya Kumari Jangala wrote:
>> Hi Richard,
>> Thanks a lot for your response!
>>
>> Another failure reported by the Linaro CI is as follows :
>> (Note: I am planning to send a separate mail for each failure, as this will 
>> make
>> the discussion easy to track)
>>
>> FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve 
>> -moverride=tune=none  check-function-bodies dup_x0_m
>>
>> Expected code:
>>
>>    ...
>>    add (x[0-9]+), x0, #?1
>>    mov (p[0-7])\.b, p15\.b
>>    mov z0\.d, \2/m, \1
>>    ...
>>    ret
>>
>>
>> Code obtained w/o patch:
>>  addvl   sp, sp, #-1
>>  str p15, [sp]
>>  add x0, x0, 1
>>  mov p3.b, p15.b
>>  mov z0.d, p3/m, x0
>>  ldr p15, [sp]
>>  addvl   sp, sp, #1
>>  ret
>>
>> Code obtained w/ patch:
>> addvl   sp, sp, #-1
>>  str p15, [sp]
>>  mov p3.b, p15.b
>>  add x0, x0, 1
>>  mov z0.d, p3/m, x0
>>  ldr p15, [sp]
>>  addvl   sp, sp, #1
>>  ret
>>
>> As we can see, with the patch, the following two instructions are 
>> interchanged:
>>  add x0, x0, 1
>>  mov p3.b, p15.b
> 
> Indeed, both look acceptable results to me, especially given that we don't 
> schedule results at -O1.
> 
> There's two ways of fixing this:
> 1) Simply swap the order to what the compiler currently generates (which is a 
> little fragile, since it might flip back someday).
> 2) Write the test as
> 
> 
> ** (
> **   add (x[0-9]+), x0, #?1
> **   mov (p[0-7])\.b, p15\.b
> **   mov z0\.d, \2/m, \1
> ** |
> **   mov (p[0-7])\.b, p15\.b
> **   add (x[0-9]+), x0, #?1
> **   mov z0\.d, \1/m, \2
> ** )
> 
> Note, we need to swap the match names in the third insn to account for the 
> different order of the earlier instructions.
> 
> Neither is ideal, but the second is