https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Torbjorn SVENSSON from comment #10)
> With the change introduced in comment 8, I now see the following failures
> for arm-none-eabi:
>
> FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized
> "BIT_FIELD_REF" 0
> FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized
> "BIT_INSERT_EXPR" 0
> FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized
> "BIT_FIELD_REF" 0
> FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized
> "BIT_INSERT_EXPR" 1
>
> The tests only fails for Cortex-M55/M85 with -mfloat-abi=hard, but not when
> using -mfloat-abu=soft.
>
>
>
> The content of forwprop-40.c.273t.optimized for Cortex-M55
> (thumb/arch=armv8.1-m.main+mve.fp+fp.dp/tune=cortex-m55/float-abi=hard/
> fpu=auto) with r16-8253-geb50d28a9353e9 is:
> ;; Function g (g, funcdef_no=0, decl_uid=7945, cgraph_uid=1, symbol_order=0)
>
> vector(4) int g (vector(4) int a)
> {
> int b;
>
> <bb 2> [local count: 1073741824]:
> b_2 = BIT_FIELD_REF <a_1(D), 32, 0>;
> a_3 = BIT_INSERT_EXPR <a_1(D), b_2, 0 (32 bits)>;
> return a_3;
> }
>
>
> The content of forwprop-41.c.273t.optimized for Cortex-M55
> (thumb/arch=armv8.1-m.main+mve.fp+fp.dp/tune=cortex-m55/float-abi=hard/
> fpu=auto) with r16-8253-geb50d28a9353e9 is:
> ;; Function g (g, funcdef_no=0, decl_uid=7946, cgraph_uid=1, symbol_order=0)
>
> vector(4) int g (vector(4) int a, int c)
> {
> int b;
>
> <bb 2> [local count: 1073741824]:
> b_2 = BIT_FIELD_REF <a_1(D), 32, 64>;
> a_3 = BIT_INSERT_EXPR <a_1(D), b_2, 64 (32 bits)>;
> a_5 = BIT_INSERT_EXPR <a_3, c_4(D), 32 (32 bits)>;
> return a_5;
> }
>
>
> Should the tests be xfail, just like they are for s390 or is this a bug in
> GCC?
>
> The same tests pass'es on x86_64-pc-linux-gnu.
This means that your target cannot perform the required constant vector
permute.
If that's a true incapability or just a missed pattern I cannot say.
I suggest to XFAIL on the relevant targets and see to fix the backend in
stage1 if possible (on x86 even two or three instruction sequences are
generated for constant permutes - the middle-end never tries to decompose
those into target supported pieces).