[r15-4709 Regression] FAIL: 23_containers/vector/cons/from_range.cc -std=gnu++26 (test for excess errors) on Linux/x86_64

2024-10-27 Thread haochen.jiang
On Linux/x86_64,

b281e13ecad12d07209924a7282c53be3a1c3774 is the first bad commit
commit b281e13ecad12d07209924a7282c53be3a1c3774
Author: Jonathan Wakely 
Date:   Tue Oct 8 21:15:18 2024 +0100

libstdc++: Add P1206R7 from_range members to std::vector [PR111055]

caused

FAIL: 23_containers/vector/cons/from_range.cc  -std=gnu++23 (test for excess 
errors)
FAIL: 23_containers/vector/cons/from_range.cc  -std=gnu++26 (test for excess 
errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-4709/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/cons/from_range.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/cons/from_range.cc 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/cons/from_range.cc 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/cons/from_range.cc 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[PATCH] Match: Fold pow calls to ldexp when possible [PR57492]

2024-10-27 Thread Soumya AR
This patch transforms the following POW calls to equivalent LDEXP calls, as
discussed in PR57492:

powi (2.0, i) -> ldexp (1.0, i)

a * powi (2.0, i) -> ldexp (a, i)

2.0 * powi (2.0, i) -> ldexp (1.0, i + 1)

pow (powof2, i) -> ldexp (1.0, i * log2 (powof2))

powof2 * pow (2, i) -> ldexp (1.0, i + log2 (powof2))

This is especially helpful for SVE architectures as LDEXP calls can be
implemented using the FSCALE instruction, as seen in the following patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/664160.html

SPEC2017 was run with this patch, while there are no noticeable improvements,
there are no non-noise regressions either.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR 

gcc/ChangeLog:
PR target/57492
* match.pd: Added patterns to fold certain calls to pow to ldexp.

gcc/testsuite/ChangeLog:
PR target/57492
* gcc.dg/tree-ssa/pow-to-ldexp.c: New test.



0001-Match-Fold-pow-calls-to-ldexp-when-possible-PR57492.patch
Description: 0001-Match-Fold-pow-calls-to-ldexp-when-possible-PR57492.patch


[PATCH v2] [aarch64] Fix function multiversioning dispatcher link error with LTO

2024-10-27 Thread Yangyu Chen
We forgot to apply DECL_EXTERNAL to __init_cpu_features_resolver decl. When
building with LTO, the linker cannot find the
__init_cpu_features_resolver.lto_priv* symbol, causing the link error.

This patch get this fixed by adding DECL_EXTERNAL to the decl. To avoid used but
never defined warning for this symbol, we also mark TREE_PUBLIC to the decl.

Minimal steps to reproduce the bug:

echo '__attribute__((target_clones("default", "aes"))) void func1() { }' > 1.c
echo '__attribute__((target_clones("default", "aes"))) void func2() { }' > 2.c
echo 'void func1();void func2();int main(){func1();func2();return 0;}' > main.c
gcc -flto -c 1.c 2.c
gcc -flto main.c 1.o 2.o

Fixes: 0cfde688e213 ("[aarch64] Add function multiversioning support")

gcc/ChangeLog:

* config/aarch64/aarch64.cc (dispatch_function_versions): Adding
DECL_EXTERNAL and TREE_PUBLIC to __init_cpu_features_resolver decl.
---
 gcc/config/aarch64/aarch64.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5770491b30c..37123befeaf 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -20437,6 +20437,8 @@ dispatch_function_versions (tree dispatch_decl,
   tree init_fn_id = get_identifier ("__init_cpu_features_resolver");
   tree init_fn_decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL,
  init_fn_id, init_fn_type);
+  DECL_EXTERNAL (init_fn_decl) = 1;
+  TREE_PUBLIC (init_fn_decl) = 1;
   tree arg1 = DECL_ARGUMENTS (dispatch_decl);
   tree arg2 = TREE_CHAIN (arg1);
   ifunc_cpu_init_stmt = gimple_build_call (init_fn_decl, 2, arg1, arg2);
-- 
2.47.0



[PATCH v2] Fix MV clones can not redirect to specific target on some targets

2024-10-27 Thread Yangyu Chen
Following the implementation of commit b8ce8129a5 ("Redirect call
within specific target attribute among MV clones (PR ipa/82625)"),
we can now optimize calls by invoking a versioned function callee
from a caller that shares the same target attribute. However, on
targets that define TARGET_HAS_FMV_TARGET_ATTRIBUTE to zero, meaning
they use the "target_versions" attribute instead of "target", this
optimization is not feasible. Currently, the only target affected
by this limitation is AArch64.

This commit resolves the issue by not directly using "target" with
lookup_attribute. Instead, it checks the TARGET_HAS_FMV_TARGET_ATTRIBUTE
macro to decide between using the "target" or "target_version"
attribute.

Fixes: 79891c4cb5 ("Add support for target_version attribute")

gcc/ChangeLog:

* multiple_target.cc (redirect_to_specific_clone): Fix the redirection
does not work on target without TARGET_HAS_FMV_TARGET_ATTRIBUTE.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mvc-redirect.C: New test.
---
 gcc/multiple_target.cc|  8 +++---
 .../g++.target/aarch64/mvc-redirect.C | 25 +++
 2 files changed, 30 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/mvc-redirect.C

diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
index d2c9671fc1b..a1c18f4a3a7 100644
--- a/gcc/multiple_target.cc
+++ b/gcc/multiple_target.cc
@@ -446,8 +446,10 @@ redirect_to_specific_clone (cgraph_node *node)
   cgraph_function_version_info *fv = node->function_version ();
   if (fv == NULL)
 return;
+  const char *fmv_attr = (TARGET_HAS_FMV_TARGET_ATTRIBUTE
+ ? "target" : "target_version");
 
-  tree attr_target = lookup_attribute ("target", DECL_ATTRIBUTES (node->decl));
+  tree attr_target = lookup_attribute (fmv_attr, DECL_ATTRIBUTES (node->decl));
   if (attr_target == NULL_TREE)
 return;
 
@@ -458,7 +460,7 @@ redirect_to_specific_clone (cgraph_node *node)
   if (!fv2)
continue;
 
-  tree attr_target2 = lookup_attribute ("target",
+  tree attr_target2 = lookup_attribute (fmv_attr,
DECL_ATTRIBUTES (e->callee->decl));
 
   /* Function is not calling proper target clone.  */
@@ -472,7 +474,7 @@ redirect_to_specific_clone (cgraph_node *node)
  for (; fv2 != NULL; fv2 = fv2->next)
{
  cgraph_node *callee = fv2->this_node;
- attr_target2 = lookup_attribute ("target",
+ attr_target2 = lookup_attribute (fmv_attr,
   DECL_ATTRIBUTES (callee->decl));
  if (attr_target2 != NULL_TREE
  && attribute_value_equal (attr_target, attr_target2))
diff --git a/gcc/testsuite/g++.target/aarch64/mvc-redirect.C 
b/gcc/testsuite/g++.target/aarch64/mvc-redirect.C
new file mode 100644
index 000..f29cc3745a3
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/mvc-redirect.C
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+
+__attribute__((target_clones("default", "dotprod", "sve+sve2")))
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target_clones("default", "dotprod", "sve+sve2")))
+int bar()
+{
+  return foo ();
+}
+
+/* { dg-final { scan-assembler-times "\n_Z3foov\.default:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\._Mdotprod:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\._MsveMsve2:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tbl\t_Z3foov.default\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tbl\t_Z3foov._Mdotprod\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tbl\t_Z3foov._MsveMsve2\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z3foov, 
%gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.set\t_Z3foov,_Z3foov\.resolver\n" 
1 } } */
-- 
2.47.0



Re: [PATCH] Add COBOL to gcc (was: Add 'cobol' to Makefile.def)

2024-10-27 Thread James K. Lowden
On Wed, 23 Oct 2024 15:12:19 +0200
Richard Biener  wrote:

> The rest of the changes look OK to me.

Below is a revised patch incorporating recent feedback.  Changes:

*  remove blank lines at EOF
*  add gcc/cobol/lang.opt.urls 
*  simpllify gcc/cobol/config-lang.in (and FE requires C++)
*  add stub gcc/cobol/ChangeLog
*  group ChangeLog entries by directory
*  support  --enable-generated-files-in-srcdir
*  remove reference to --fdump-generic-nodes option

> This would say
> 
>* configure: Regenerated.

done.

The patch previous reported "9 files" but contained only 8.  We added
2, so the total is now 10.  

As before, this patch comprises all the "meta files" needed for the
Cobol front end, including every existing file that we modified.  

1.  It does not interfere with --languages=c,c++, etc
2.  It does not work with --languages=cobol because the source files
are missing.  

I have not tested with git-gcc-verify because I don't know how to use
it  It does apply cleanly with "git am" (on my end, at least). 

--jkl

[snip]
>From be8c3d34ad7f8a92f4e1679dbbe411b4bcb04d0fbld.patch 4 Oct 2024 12:01:22 
>-0400
From: "James K. Lowden" 
Date: Sat 26 Oct 2024 06:41:52 PM EDT
Subject: [PATCH]  Add 'cobol' to 10 files

ChangeLog
* Makefile.def: Add libgcobol module and cobol language.
* configure: Regenerated
* configure.ac: Add libgcobol module and cobol language.

gcc/ChangeLog
* gcc/common.opt: Add libgcobol module and cobol language.

gcc/cobol/ChangeLog
* gcc/cobol/ChangeLog: Add gcc/cobol/ChangeLog
* gcc/cobol/LICENSE: Add gcc/cobol/LICENSE
* gcc/cobol/Make-lang.in: Add gcc/cobol/Make-lang.in
* gcc/cobol/config-lang.in: Add gcc/cobol/config-lang.in
* gcc/cobol/lang.opt: Add gcc/cobol/lang.opt
* gcc/cobol/lang.opt.urls: Add gcc/cobol/lang.opt.urls

---
Makefile.def | ++-
configure | +-
configure.ac | +-
gcc/cobol/ChangeLog | ++-
gcc/cobol/LICENSE | +-
gcc/cobol/Make-lang.in | 
-
gcc/cobol/config-lang.in | +++-
gcc/cobol/lang.opt | 
-
gcc/cobol/lang.opt.urls | +-
gcc/common.opt | 
10 files changed, 479 insertions(+), 10 deletions(-)
diff --git a/Makefile.def b/Makefile.def
index 19954e7d731..1192e852c7a 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -209,6 +209,7 @@ target_modules = { module= libgomp; bootstrap= true; 
lib_path=.libs; };
 target_modules = { module= libitm; lib_path=.libs; };
 target_modules = { module= libatomic; bootstrap=true; lib_path=.libs; };
 target_modules = { module= libgrust; };
+target_modules = { module= libgcobol; };
 
 // These are (some of) the make targets to be done in each subdirectory.
 // Not all; these are the ones which don't have special options.
@@ -324,6 +325,7 @@ flags_to_pass = { flag= CXXFLAGS_FOR_TARGET ; };
 flags_to_pass = { flag= DLLTOOL_FOR_TARGET ; };
 flags_to_pass = { flag= DSYMUTIL_FOR_TARGET ; };
 flags_to_pass = { flag= FLAGS_FOR_TARGET ; };
+flags_to_pass = { flag= GCOBOL_FOR_TARGET ; };
 flags_to_pass = { flag= GFORTRAN_FOR_TARGET ; };
 flags_to_pass = { flag= GOC_FOR_TARGET ; };
 flags_to_pass = { flag= GOCFLAGS_FOR_TARGET ; };
@@ -655,6 +657,7 @@ lang_env_dependencies = { module=libgcc; no_gcc=true; 
no_c=true; };
 // built newlib on some targets (e.g. Cygwin).  It still needs
 // a dependency on libgcc for native targets to configure.
 lang_env_dependencies = { module=libiberty; no_c=true; };
+lang_env_dependencies = { module=libgcobol; cxx=true; };
 
 dependencies = { module=configure-target-fastjar; on=configure-target-zlib; };
 dependencies = { module=all-target-fastjar; on=all-target-zlib; };
@@ -690,6 +693,7 @@ dependencies = { module=install-target-libvtv; 
on=install-target-libgcc; };
 dependencies = { module=install-target-libitm; on=install-target-libgcc; };
 dependencies = { module=install-target-libobjc; on=install-target-libgcc; };
 dependencies = { module=install-target-libstdc++-v3; on=install-target-libgcc; 
};
+dependencies = { module=install-target-libgcobol; 
on=install-target-libstdc++-v3; };
 
 // Target modules in the 'src' repository.
 lang_env_dependencies = { module=libtermcap; };
@@ -727,6 +731,8 @@ languages = { language=d;   gcc-check-target=check-d;
lib-check-target=check-target-libphobos; };
 languages = { language=jit;gcc-check-target=check-jit; };
 languages = { language=rust;   gcc-check-target=check-rust; };
+languages = { language=cobol;  gcc-check-target=check-cobol;
+   lib-check-target=check-target-libg

[PATCH 1/6] PR 117048: simplify-rtx: Simplify (X << C1) [+,^] (X >> C2) into ROTATE

2024-10-27 Thread Kyrylo Tkachov
Hi all,

simplify-rtx can transform (X << C1) | (X >> C2) into ROTATE (X, C1) when
C1 + C2 == mode-width.  But the transformation is also valid for PLUS and XOR.
Indeed GIMPLE can also do the fold.  Let's teach RTL to do it too.

The motivating testcase for this is in AArch64 intrinsics:

uint64x2_t G2(uint64x2_t a, uint64x2_t b) {
uint64x2_t c = veorq_u64(a, b);
return veorq_u64(vaddq_u64(c, c), vshrq_n_u64(c, 63));
}

which I was hoping to fold to a single XAR (a ROTATE+XOR instruction) but
GCC was failing to detect the rotate operation for two reasons:
1) The combination of the two arms of the expression is done under XOR rather
than IOR that simplify-rtx currently supports.
2) The ASHIFT operation is actually a (PLUS X X) operation and thus is not
detected as the LHS of the two arms we require.

The patch fixes both issues.  The analysis of the two arms of the rotation
expression is factored out into a common helper simplify_rotate which is
then used in the PLUS, XOR, IOR cases in simplify_binary_operation_1.

The check-assembly testcase for this is added in the following patch because
it needs some extra AArch64 backend work, but I've added self-tests in this
patch to validate the transformation.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for mainline?
Thanks,
Kyrill

Signed-off-by: Kyrylo Tkachov 

PR target/117048
* simplify-rtx.cc (extract_ashift_operands_p): Define.
(simplify_rotate_op): Likewise.
(simplify_context::simplify_binary_operation_1): Use the above in
the PLUS, IOR, XOR cases.
(test_vector_rotate): Define.
(test_vector_ops): Use the above.




v3-0001-PR-117048-simplify-rtx-Simplify-X-C1-X-C2-into-ROTAT.patch
Description: v3-0001-PR-117048-simplify-rtx-Simplify-X-C1-X-C2-into-ROTAT.patch


[PATCH 2/6] aarch64: Use canonical RTL representation for SVE2 XAR and extend it to fixed-width modes

2024-10-27 Thread Kyrylo Tkachov
Hi all,

The MD pattern for the XAR instruction in SVE2 is currently expressed with
non-canonical RTL by using a ROTATERT code with a constant rotate amount.
Fix it by using the left ROTATE code.  This necessitates adjusting the rotate
amount during expand. 

Additionally, as the SVE2 XAR instruction is unpredicated and can handle all
element sizes from .b to .d, it is a good fit for implementing the XOR+ROTATE
operation for Advanced SIMD modes where the TARGET_SHA3 cannot be used
(that can only handle V2DImode operands).  Therefore let's extend the accepted
modes of the SVE2 patternt to include the Advanced SIMD integer modes.

This leads to some tests for the svxar* intrinsics to fail because they now
simplify to a plain EOR when the rotate amount is the width of the element.
This simplification is desirable (EOR instructions have better or equal
throughput than XAR, and they are non-destructive of their input) so the
tests are adjusted.

For V2DImode XAR operations we should prefer the Advanced SIMD version when
it is available (TARGET_SHA3) because it is non-destructive, so restrict the
SVE2 pattern accordingly.  Tests are added to confirm this.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for mainline?
Thanks,
Kyrill

Signed-off-by: Kyrylo Tkachov 

gcc/

* config/aarch64/iterators.md (SVE_ASIMD_FULL_I): New mode iterator.
* config/aarch64/aarch64-sve2.md (@aarch64_sve2_xar):
Use SVE_ASIMD_FULL_I modes.  Use ROTATE code for the rotate step.
Adjust output logic.
* config/aarch64/aarch64-sve-builtins-sve2.cc (svxar_impl): Define.
(svxar): Use the above.

gcc/testsuite/

* gcc.target/aarch64/xar_neon_modes.c: New test.
* gcc.target/aarch64/xar_v2di_nonsve.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/xar_s16.c: Scan for EOR rather than
XAR.
* gcc.target/aarch64/sve2/acle/asm/xar_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/xar_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/xar_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/xar_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/xar_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/xar_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/xar_u8.c: Likewise.



v3-0002-aarch64-Use-canonical-RTL-representation-for-SVE2-XA.patch
Description: v3-0002-aarch64-Use-canonical-RTL-representation-for-SVE2-XA.patch


[PATCH 3/6] PR 117048: aarch64: Add define_insn_and_split for vector ROTATE

2024-10-27 Thread Kyrylo Tkachov
The ultimate goal in this PR is to match the XAR pattern that is represented
as a (ROTATE (XOR X Y) VCST) from the ACLE intrinsics code in the testcase.
The first blocker for this was the missing recognition of ROTATE in
simplify-rtx, which is fixed in the previous patch.
The next problem is that once the ROTATE has been matched from the shifts
and orr/xor/plus, it will try to match it in an insn before trying to combine
the XOR into it.  But as we don't have a backend pattern for a vector ROTATE
this recog fails and combine does not try the followup XOR+ROTATE combination
which would have succeeded.

This patch solves that by introducing a sort of "scaffolding" pattern for
vector ROTATE, which allows it to be combined into the XAR.
If it fails to be combined into anything the splitter will break it back
down into the SHL+USRA sequence that it would have emitted.
By having this splitter we can special-case some rotate amounts in the future
to emit more specialised instructions e.g. from the REV* family.
This can be done if the ROTATE is not combined into something else.

This optimisation is done in the next patch in the series.

Bootstrapped and tested on aarch64-none-linux-gnu.
I’ll push this if the prerequisites are approved.
Thanks,
Kyrill

Signed-off-by: Kyrylo Tkachov 

gcc/

PR target/117048
* config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm):
New define_insn_and_split.

gcc/testsuite/

PR target/117048
* gcc.target/aarch64/simd/pr117048.c: New test.



v3-0003-PR-117048-aarch64-Add-define_insn_and_split-for-vect.patch
Description: v3-0003-PR-117048-aarch64-Add-define_insn_and_split-for-vect.patch


[PATCH 4/6] expmed, aarch64: Optimize vector rotates as vector permutes where possible

2024-10-27 Thread Kyrylo Tkachov
Hi all,

Some vector rotate operations can be implemented in a single instruction
rather than using the fallback SHL+USRA sequence.
In particular, when the rotate amount is half the bitwidth of the element
we can use a REV64,REV32,REV16 instruction.
More generally, rotates by a byte amount can be implented using vector
permutes.
This patch adds such a generic routine in expmed.cc called
expand_rotate_as_vec_perm that calculates the required permute indices
and uses the expand_vec_perm_const interface.

On aarch64 this ends up generating the single-instruction sequences above
where possible and can use LDR+TBL sequences too, which are a good choice.

With help from Richard, the routine should be VLA-safe.
However, the only use of expand_rotate_as_vec_perm introduced in this patch
is in aarch64-specific code that for now only handles fixed-width modes.

A runtime aarch64 test is added to ensure the permute indices are not messed
up.

Bootstrapped and tested on aarch64-none-linux-gnu.
Richard had approved these changes in the previous iteration, but I’ll only push
this after the prerequisites in the series.

Thanks,
Kyrill

Signed-off-by: Kyrylo Tkachov 

gcc/

* expmed.h (expand_rotate_as_vec_perm): Declare.
* expmed.cc (expand_rotate_as_vec_perm): Define.
* config/aarch64/aarch64-protos.h (aarch64_emit_opt_vec_rotate):
Declare prototype.
* config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Implement.
* config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm):
Call the above.

gcc/testsuite/

* gcc.target/aarch64/vec-rot-exec.c: New test.
* gcc.target/aarch64/simd/pr117048_2.c: New test.



v3-0004-aarch64-Optimize-vector-rotates-as-vector-permutes-w.patch
Description: v3-0004-aarch64-Optimize-vector-rotates-as-vector-permutes-w.patch


[PATCH 6/6] simplify-rtx: Simplify ROTATE:HI (X:HI, 8) into BSWAP:HI (X)

2024-10-27 Thread Kyrylo Tkachov
Hi all,

With recent patch to improve detection of vector rotates at RTL level
combine now tries matching a V8HImode rotate by 8 in the example in the
testcase.  We can teach AArch64 to emit a REV16 instruction for such a rotate
but really this operation corresponds to the RTL code BSWAP, for which we
already have the right patterns.  BSWAP is arguably a simpler representation
than ROTATE here because it has only one operand, so let's teach simplify-rtx
to generate it.

With this patch the testcase now generates the simplest form:
.L2:
ldr q31, [x1, x0]
rev16   v31.16b, v31.16b
str q31, [x0, x2]
add x0, x0, 16
cmp x0, 2048
bne .L2

instead of the previous:
.L2:
ldr q31, [x1, x0]
shl v30.8h, v31.8h, 8
usrav30.8h, v31.8h, 8
str q30, [x0, x2]
add x0, x0, 16
cmp x0, 2048
bne .L2

IMO ideally the bswap detection would have been done during vectorisation
time and used the expanders for that, but teaching simplify-rtx to do this
transformation is fairly straightforward and, unlike at tree level, we have
the native RTL BSWAP code.  This change is not enough to generate the
equivalent sequence in SVE, but that is something that should be tackled
separately.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov 

gcc/

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Simplify (rotate:HI x:HI, 8) -> (bswap:HI x:HI).

gcc/testsuite/

* gcc.target/aarch64/rot_to_bswap.c: New test.



v3-0006-simplify-rtx-Simplify-ROTATE-HI-X-HI-8-into-BSWAP-HI.patch
Description: v3-0006-simplify-rtx-Simplify-ROTATE-HI-X-HI-8-into-BSWAP-HI.patch


[PATCH 5/6] aarch64: Emit XAR for vector rotates where possible

2024-10-27 Thread Kyrylo Tkachov
Hi all,

We can make use of the integrated rotate step of the XAR instruction
to implement most vector integer rotates, as long we zero out one
of the input registers for it.  This allows for a lower-latency sequence
than the fallback SHL+USRA, especially when we can hoist the zeroing operation
away from loops and hot parts.  This should be safe to do for 64-bit vectors
as well even though the XAR instructions operate on 128-bit values, as the
bottom 64-bit results is later accessed through the right subregs.

This strategy is used whenever we have XAR instructions, the logic
in aarch64_emit_opt_vec_rotate is adjusted to resort to
expand_rotate_as_vec_perm only when it's expected to generate a single REV*
instruction or when XAR instructions are not present.

With this patch we can gerate for the input:
v4si
G1 (v4si r)
{
return (r >> 23) | (r << 9);
}

v8qi
G2 (v8qi r)
{
  return (r << 3) | (r >> 5);
}
the assembly for +sve2:
G1:
moviv31.4s, 0
xar z0.s, z0.s, z31.s, #23
ret

G2:
moviv31.4s, 0
xar z0.b, z0.b, z31.b, #5
ret

instead of the current:
G1:
shl v31.4s, v0.4s, 9
usrav31.4s, v0.4s, 23
mov v0.16b, v31.16b
ret
G2:
shl v31.8b, v0.8b, 3
usrav31.8b, v0.8b, 5
mov v0.8b, v31.8b
ret

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov 

gcc/

* config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Add
generation of XAR sequences when possible.

gcc/testsuite/

* gcc.target/aarch64/rotate_xar_1.c: New test.



v3-0005-aarch64-Emit-XAR-for-vector-rotates-where-possible.patch
Description: v3-0005-aarch64-Emit-XAR-for-vector-rotates-where-possible.patch


Re: [PATCH 4/6] aarch64: Optimize vector rotates into REV* instructions where possible

2024-10-27 Thread Kyrylo Tkachov


> On 25 Oct 2024, at 15:25, Richard Sandiford  wrote:
> 
> Kyrylo Tkachov  writes:
>>> On 25 Oct 2024, at 13:46, Richard Sandiford  
>>> wrote:
>>> 
>>> Kyrylo Tkachov  writes:
 Thank you for the suggestions! I’m trying them out now.
 
>> +  if (rotamnt % BITS_PER_UNIT != 0)
>> +return NULL_RTX;
>> +  machine_mode qimode;
>> +  if (!qimode_for_vec_perm (mode).exists (&qimode))
>> +return NULL_RTX;
>> +
>> +  vec_perm_builder builder;
>> +  unsigned nunits = GET_MODE_SIZE (GET_MODE_INNER (mode));
> 
> simpler as GET_MODE_UNIT_SIZE
> 
>> +  unsigned total_units;
>> +  /* TODO: Handle VLA vector rotates?  */
>> +  if (!GET_MODE_SIZE (mode).is_constant (&total_units))
>> +return NULL_RTX;
> 
> Yeah.  I think we can do that by changing:
> 
>> +  builder.new_vector (total_units, 1, total_units);
> 
> to:
> 
> builder.new_vector (total_units, 3, units);
 
 I think units here is the size in units of the fixed-width component of 
 the mode? So e.g. 16 for V4SI and VNx4SI but 8 for V4HI and VN4HI?
>>> 
>>> Ah, no, sorry, I meant "nunits" rather than "units", with "nunits"
>>> being the same as for your code.  So for V4SI and VNx4SI we'd push
>>> 12 elements total, as 4 (nunits) "patterns" of 3 elements each.
>>> The first argument (total_units) is just GET_MODE_SIZE (mode)
>>> in all its poly_int glory.
>> 
>> Hmm, I’m afraid I’m lost again. For V4SI we have a vector of 16 bytes, how 
>> can 12 indices be enough to describe the permute?
>> With this scheme we do end up pushing 12 elements, in the order: 
>> 2,3,0,1,6,7,4,5,10,11,8,9 .
>> In the final RTX emitted in the instruction stream this seems to end up as:
>>(const_vector:V16QI [
>>(const_int 2 [0x2])
>>(const_int 3 [0x3])
>>(const_int 0 [0])
>>(const_int 1 [0x1])
>>(const_int 6 [0x6])
>>(const_int 7 [0x7])
>>(const_int 4 [0x4])
>>(const_int 5 [0x5])
>>(const_int 10 [0xa])
>>(const_int 11 [0xb])
>>(const_int 8 [0x8])
>>(const_int 9 [0x9]) repeated x2
>>(const_int 14 [0xe])
>>(const_int 7 [0x7])
>>(const_int 0 [0])
>>])
>> 
>> So the first 12 elements are indeed correct, but the last 4 elements are not.
> 
> Gah, sorry, I got the arguments the wrong way around.  It should be:
> 
>   builder.new_vector (GET_MODE_SIZE (mode), nunits, 3);
> 
> (4 patterns, 3 elements per pattern)
> 

Thanks! That works.
I’ve resubmitted a fixed patch with 
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/08.html (along with 
other updates in the series)
Kyrill

> Thanks,
> Richard



Re: [pushed] doc, fortran: Add a missing menu item.

2024-10-27 Thread Thomas Koenig

Am 27.10.24 um 00:15 schrieb Iain Sandoe:

Tested on x86_64-darwin21 and linux, with makeinfo 6.7 pushed to trunk,
thanks


Thanks!

For the record, makeinfo 6.8 did not show this as an error.

Best regards

Thomas



[PATCH] Match: Optimize log (x) CMP CST and exp (x) CMP CST operations

2024-10-27 Thread Soumya AR
This patch implements transformations for the following optimizations.

logN(x) CMP CST -> x CMP expN(CST)
expN(x) CMP CST -> x CMP logN(CST)

For example:

int
foo (float x)
{
  return __builtin_logf (x) < 0.0f;
}

can just be:

int
foo (float x)
{
  return x < 1.0f;
} 

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR 

gcc/ChangeLog:

* match.pd: Fold logN(x) CMP CST -> x CMP expN(CST)
and expN(x) CMP CST -> x CMP logN(CST)

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/log_exp.c: New test.



0001-Match-Optimize-log-x-CMP-CST-and-exp-x-CMP-CST-opera.patch
Description: 0001-Match-Optimize-log-x-CMP-CST-and-exp-x-CMP-CST-opera.patch


Re: [pushed] doc, fortran: Add a missing menu item.

2024-10-27 Thread Iain Sandoe



> On 27 Oct 2024, at 08:08, Thomas Koenig  wrote:
> 
> Am 27.10.24 um 00:15 schrieb Iain Sandoe:
>> Tested on x86_64-darwin21 and linux, with makeinfo 6.7 pushed to trunk,
>> thanks

> For the record, makeinfo 6.8 did not show this as an error.

Hmm that’s maybe a regression in texinfo 6.8 then, because the entry was, 
indeed,
missing.  According to our installation pages we only require >= 4.7 (although, 
for
some reason, I was under the impression that had been bumped up recently).

Anyway .. resolved for now
cheers
Iain



[PATCH] Fix MV clones can not redirect to specific target on some targets

2024-10-27 Thread Yangyu Chen
Following the implementation of commit b8ce8129a5 ("Redirect call
within specific target attribute among MV clones (PR ipa/82625)"),
we can now optimize calls by invoking a versioned function callee
from a caller that shares the same target attribute. However, on
targets that define TARGET_HAS_FMV_TARGET_ATTRIBUTE to zero, meaning
they use the "target_versions" attribute instead of "target", this
optimization is not feasible. Currently, the only target affected
by this limitation is AArch64.

This commit resolves the issue by not directly using "target" with
lookup_attribute. Instead, it checks the TARGET_HAS_FMV_TARGET_ATTRIBUTE
macro to decide between using the "target" or "target_version"
attribute.

Fixes: 79891c4cb5 ("Add support for target_version attribute")

gcc/ChangeLog:

* multiple_target.cc (redirect_to_specific_clone): Fix the redirection
does not work on target without TARGET_HAS_FMV_TARGET_ATTRIBUTE.
---
 gcc/multiple_target.cc | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
index d2c9671fc1b..a1c18f4a3a7 100644
--- a/gcc/multiple_target.cc
+++ b/gcc/multiple_target.cc
@@ -446,8 +446,10 @@ redirect_to_specific_clone (cgraph_node *node)
   cgraph_function_version_info *fv = node->function_version ();
   if (fv == NULL)
 return;
+  const char *fmv_attr = (TARGET_HAS_FMV_TARGET_ATTRIBUTE
+ ? "target" : "target_version");
 
-  tree attr_target = lookup_attribute ("target", DECL_ATTRIBUTES (node->decl));
+  tree attr_target = lookup_attribute (fmv_attr, DECL_ATTRIBUTES (node->decl));
   if (attr_target == NULL_TREE)
 return;
 
@@ -458,7 +460,7 @@ redirect_to_specific_clone (cgraph_node *node)
   if (!fv2)
continue;
 
-  tree attr_target2 = lookup_attribute ("target",
+  tree attr_target2 = lookup_attribute (fmv_attr,
DECL_ATTRIBUTES (e->callee->decl));
 
   /* Function is not calling proper target clone.  */
@@ -472,7 +474,7 @@ redirect_to_specific_clone (cgraph_node *node)
  for (; fv2 != NULL; fv2 = fv2->next)
{
  cgraph_node *callee = fv2->this_node;
- attr_target2 = lookup_attribute ("target",
+ attr_target2 = lookup_attribute (fmv_attr,
   DECL_ATTRIBUTES (callee->decl));
  if (attr_target2 != NULL_TREE
  && attribute_value_equal (attr_target, attr_target2))
-- 
2.47.0



[PATCH] vec-lowering: Fix ABSU lowering [PR111285]

2024-10-27 Thread Andrew Pinski
ABSU_EXPR lowering incorrectly used the resulting type
for the new expression but in the case of ABSU the resulting
type is an unsigned type and with ABSU is folded away. The fix
is to use a signed type for the expression instead.

Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/111285

gcc/ChangeLog:

* tree-vect-generic.cc (do_unop): Use a signed type for the
operand if the operation was ABSU_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/torture/vect-absu-1.C: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/torture/vect-absu-1.C | 29 ++
 gcc/tree-vect-generic.cc   | 10 +++-
 2 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/vect-absu-1.C

diff --git a/gcc/testsuite/g++.dg/torture/vect-absu-1.C 
b/gcc/testsuite/g++.dg/torture/vect-absu-1.C
new file mode 100644
index 000..0b2035f638f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/vect-absu-1.C
@@ -0,0 +1,29 @@
+// { dg-do run }
+// PR middle-end/111285
+
+// The lowering of vect absu was done incorrectly
+
+#define vect1 __attribute__((vector_size(sizeof(int
+
+#define negabs(a) a < 0 ? a : -a
+
+__attribute__((noinline))
+int s(int a)
+{
+  return negabs(a);
+}
+__attribute__((noinline))
+vect1 int v(vect1 int a)
+{
+  return negabs(a);
+}
+
+int main(void)
+{
+for(int i = -10; i < 10; i++)
+{
+  vect1 int t = {i};
+  if (v(t)[0] != s(i))
+__builtin_abort();
+}
+}
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index ef7d2dd259d..21d906e9c55 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -168,7 +168,15 @@ do_unop (gimple_stmt_iterator *gsi, tree inner_type, tree 
a,
 tree b ATTRIBUTE_UNUSED, tree bitpos, tree bitsize,
 enum tree_code code, tree type ATTRIBUTE_UNUSED)
 {
-  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  tree rhs_type = inner_type;
+
+  /* For ABSU_EXPR, use the signed type for the rhs if the rhs was signed. */
+  if (code == ABSU_EXPR
+  && ANY_INTEGRAL_TYPE_P (TREE_TYPE (a))
+  && !TYPE_UNSIGNED (TREE_TYPE (a)))
+rhs_type = signed_type_for (rhs_type);
+
+  a = tree_vec_extract (gsi, rhs_type, a, bitsize, bitpos);
   return gimplify_build1 (gsi, code, inner_type, a);
 }
 
-- 
2.43.0



[PATCH] phiopt: Move check for maybe_undef_p slightly earlier

2024-10-27 Thread Andrew Pinski
This moves the check for maybe_undef_p in match_simplify_replacement
slightly earlier before figuring out the true/false arg using arg0/arg1
instead.
In most cases this is no difference in compile time; just in the case
there is an undef in the args there would be a slight compile time
improvement as there is no reason to figure out which arg corresponds
to the true/false side of the conditional.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (match_simplify_replacement): Move
check for maybe_undef_p earlier.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index f8b119ea836..cffafe101a4 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -943,6 +943,13 @@ match_simplify_replacement (basic_block cond_bb, 
basic_block middle_bb,
  stmt_to_move_alt))
 return false;
 
+  /* Do not make conditional undefs unconditional.  */
+  if ((TREE_CODE (arg0) == SSA_NAME
+   && ssa_name_maybe_undef_p (arg0))
+  || (TREE_CODE (arg1) == SSA_NAME
+ && ssa_name_maybe_undef_p (arg1)))
+return false;
+
 /* At this point we know we have a GIMPLE_COND with two successors.
  One successor is BB, the other successor is an empty block which
  falls through into BB.
@@ -982,13 +989,6 @@ match_simplify_replacement (basic_block cond_bb, 
basic_block middle_bb,
   arg_false = arg0;
 }
 
-  /* Do not make conditional undefs unconditional.  */
-  if ((TREE_CODE (arg_true) == SSA_NAME
-   && ssa_name_maybe_undef_p (arg_true))
-  || (TREE_CODE (arg_false) == SSA_NAME
- && ssa_name_maybe_undef_p (arg_false)))
-return false;
-
   tree type = TREE_TYPE (gimple_phi_result (phi));
   {
 auto_flow_sensitive s1(stmt_to_move);
-- 
2.43.0



Re: counted_by attribute and type compatibility

2024-10-27 Thread Martin Uecker
Am Freitag, dem 25.10.2024 um 14:03 + schrieb Qing Zhao:
> 
> > On Oct 25, 2024, at 08:13, Martin Uecker  wrote:
> > 
> > > > I agree, and error makes sense.  What worries me a little bit
> > > > is tying this to a semantic change in type compatibility.
> > > > 
> > > > typedef struct foo { int n; int m; 
> > > > [[gnu::counted_by(n)]] char buf[]; } aaa_t;
> > > > 
> > > > void foo()
> > > > {
> > > > struct foo { int n; int m;
> > > > [[gnu::counted_by(m)]] char buf[]; } *b;
> > > > 
> > > > ... = _Generic(b, aaa_t*: 1, default: 0); 
> > > > }
> > > > 
> > > > would go into the default branch for compilers supporting 
> > > > the attribute but go into the first branch for others.  Also
> > > > it affects ailasing rules.
> > > 
> > > So, they are in separate compilations? Then the compiler is not able to 
> > > catch such
> > > inconsistency during compilation time. 
> > 
> > I am not entirely sure what you mean by this. 
> > 
> > These are two different types in different scopes, so they
> > are allowed to be different.
> 
> Okay, so the two types, aaa_t and the “struct foo” inside the function “foo”, 
> are two different types. 
> And this is legal. 
> > 
> > But _Generic then tests whether they are compatible and
> > takes the attribute into account for GCC.
> 
> Then, these two types are not compatible due to the attribute, is this 
> correct?

Correct.

> 
> >  But for
> > earlier GCC or other compilers that do not support the
> > attribute the result would be different.
> For a compiler that does not support the “counted_by” attribute, if the
> compiler reports error for the unsupported attribute, then the user needs to
> modify the source code to eliminate the unsupported attribute, then the 
> problem
> should be resolved by the user?

All compilers I know only emit a warning for unknown attributes.

> If the compiler just ignores the unsupported attribute, then these two types
> will be treated compatible types by the compiler. Will doing this cause any
> issue? Since the “counted-by” attribute is not supported by the compiler and
> is ignored by the compiler, these two types should be compatible from my
> understanding, do I miss anything obvious here?

For standard attributes, there is a policy that the attribute should
be ignorable, i.e. removing it from a valid program should not cause
any change in semantics. 

For GCC's attributes this is not necessarily the case, but I still
think it is a good policy in general.  The reason is that as a reviewer
of code you do not need to take subtle effects of attribute into
account.  You can just pretend those do not exist when analyzing
core semantics, which reduces cognitive load and specific knowledge
one has to have to understand what is going on.

I do not think it is a big issue, but I think it would be better to
if removing / ignoring the attribute would *not* cause a change in 
program semantics.

Martin

> 
> Qing
> > 
> > So maybe instead of changing the return value of comptypes,
> > we simply set different_types_p (which would prevent
> > redeclaration in the same scope) and also set another flag 
> > similar to enum_and_int_p (e.g. inconsistent_counted_by_p)
> > and emit an error in the callers at some appropriate places.
> > 
> > > > 
> > > > But maybe this is not a problem.
> > > This does look like an issue to me…
> > > Not sure how to resolve such issue at this moment.
> > > 
> > > Or, only when the “counted_by” information is included into the TYPE, 
> > > such issue can be resolved?
> > 
> > > 
> > > > 
> > > > > > 
> > > > > > But I was thinking about the case where you have a type with
> > > > > > a counted_by attribute and one without. Using them together
> > > > > > seems useful, e.g. to add a counted_by in your local version
> > > > > > of a type which needs to be compatible to some API.
> > > > > 
> > > > > For API compatibility purpose, yes, I agree here. 
> > > > > A stupid question here: if one is defined locally, the other one
> > > > > is NOT defined locally, can such inconsistency be caught by the
> > > > > same compilation (is this the LTO compilation?)
> > > > 
> > > > If there is separate compilation this is not catched. LTO
> > > > has a much coarser notion of types and would not notice
> > > > either (I believe).
> > > 
> > > Okay. Then such inconsistency will not be caught during compilation time.
> > 
> > Yeah, but here we will miss many other inconsistencies too...
> > 
> > > 
> > > > 
> > > > > Suppose we can catch such inconsistency in the same compilation,
> > > > > which version we should keep? I guess that we should keep the
> > > > > version without the counted_by attribute? 
> > > > > 
> > > > I would keep the one with the attribute, because this is the
> > > > one which has more information. 
> > > Make sense to me
> > > 
> > 
> > 
> > Martin
> > 
> > > .
> > > 
> > > Thanks.
> > > Qing
> > > > 
> > > > 
> > > > Martin
> > > 
> > > 
> > 
> > -- 
> > Univ.-Prof. Dr. rer. nat. Martin Uecker
> > Graz Univers

[patch, Fortran] Introduce unsigned versions of MASKL and MASKR

2024-10-27 Thread Thomas Koenig

Hello world,

MASKR and MASKL are obvious candidates for unsigned, too; in the
previous version of the doc patch, I had promised that these would
take unsigned arguments in the future. What I had in mind was
they could take an unsigned argument and return an unsigned result.

Thinking about this a bit more, I realized that this was actually a
bad idea; nowhere else do we allow UNSIGNED for bit counting, and things
like checking for negative number of bits (which is illegal) would not
work.

Hence, two new intrinsics, UMASKL and UMASKR.  Regressoin-tesed
(and this time, I added the intrinsics to the list, so no trouble
expected there :-)

OK for trunk?

Best regards

Thomas

gcc/fortran/ChangeLog:

* check.cc (gfc_check_mask): Handle BT_INSIGNED.
* gfortran.h (enum gfc_isym_id): Add GFC_ISYM_UMASKL and
GFC_ISYM_UMASKR.
* gfortran.texi: List UMASKL and UMASKR, remove unsigned future
unsigned arguments for MASKL and MASKR.
* intrinsic.cc (add_functions): Add UMASKL and UMASKR.
* intrinsic.h (gfc_simplify_umaskl): New function.
(gfc_simplify_umaskr): New function.
(gfc_resolve_umasklr): New function.
* intrinsic.texi: Document UMASKL and UMASKR.
* iresolve.cc (gfc_resolve_umasklr): New function.
* simplify.cc (gfc_simplify_umaskr): New function.
(gfc_simplify_umaskl): New function.

gcc/testsuite/ChangeLog:

* gfortran.dg/unsigned_39.f90: New test.diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
index 304ca1b9ae8..2d4af8e7df3 100644
--- a/gcc/fortran/check.cc
+++ b/gcc/fortran/check.cc
@@ -4466,7 +4466,12 @@ gfc_check_mask (gfc_expr *i, gfc_expr *kind)
 {
   int k;
 
-  if (!type_check (i, 0, BT_INTEGER))
+  if (flag_unsigned)
+{
+  if (!type_check2 (i, 0, BT_INTEGER, BT_UNSIGNED))
+	return false;
+}
+  else if (!type_check (i, 0, BT_INTEGER))
 return false;
 
   if (!nonnegative_check ("I", i))
@@ -4478,7 +4483,7 @@ gfc_check_mask (gfc_expr *i, gfc_expr *kind)
   if (kind)
 gfc_extract_int (kind, &k);
   else
-k = gfc_default_integer_kind;
+k = i->ts.type == BT_UNSIGNED ? gfc_default_unsigned_kind : gfc_default_integer_kind;
 
   if (!less_than_bitsizekind ("I", i, k))
 return false;
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index dd599bc97a2..309095d74d5 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -699,6 +699,8 @@ enum gfc_isym_id
   GFC_ISYM_UBOUND,
   GFC_ISYM_UCOBOUND,
   GFC_ISYM_UMASK,
+  GFC_ISYM_UMASKL,
+  GFC_ISYM_UMASKR,
   GFC_ISYM_UNLINK,
   GFC_ISYM_UNPACK,
   GFC_ISYM_VERIFY,
diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 3b2691649b0..429d8461f8f 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -2825,16 +2825,11 @@ The following intrinsics take unsigned arguments:
 The following intinsics are enabled with @option{-funsigned}:
 @itemize @bullet
 @item @code{UINT}, @pxref{UINT}
+@item @code{UMASKL}, @pxref{UMASKL}
+@item @code{UMASKR}, @pxref{UMASKR}
 @item @code{SELECTED_UNSIGNED_KIND}, @pxref{SELECTED_UNSIGNED_KIND}
 @end itemize
 
-The following intrinsics will take unsigned arguments
-in the future:
-@itemize @bullet
-@item @code{MASKL}, @pxref{MASKL}
-@item @code{MASKR}, @pxref{MASKR}
-@end itemize
-
 The following intrinsics are not yet implemented in GNU Fortran,
 but will take unsigned arguments once they have been:
 @itemize @bullet
diff --git a/gcc/fortran/intrinsic.cc b/gcc/fortran/intrinsic.cc
index 83b65d34e43..3fb1c63bbd4 100644
--- a/gcc/fortran/intrinsic.cc
+++ b/gcc/fortran/intrinsic.cc
@@ -2568,6 +2568,22 @@ add_functions (void)
 
   make_generic ("maskr", GFC_ISYM_MASKR, GFC_STD_F2008);
 
+  add_sym_2 ("umaskl", GFC_ISYM_UMASKL, CLASS_ELEMENTAL, ACTUAL_NO,
+	 BT_INTEGER, di, GFC_STD_F2008,
+	 gfc_check_mask, gfc_simplify_umaskl, gfc_resolve_umasklr,
+	 i, BT_INTEGER, di, REQUIRED,
+	 kind, BT_INTEGER, di, OPTIONAL);
+
+  make_generic ("umaskl", GFC_ISYM_UMASKL, GFC_STD_F2008);
+
+  add_sym_2 ("umaskr", GFC_ISYM_UMASKR, CLASS_ELEMENTAL, ACTUAL_NO,
+	 BT_INTEGER, di, GFC_STD_F2008,
+	 gfc_check_mask, gfc_simplify_umaskr, gfc_resolve_umasklr,
+	 i, BT_INTEGER, di, REQUIRED,
+	 kind, BT_INTEGER, di, OPTIONAL);
+
+  make_generic ("umaskr", GFC_ISYM_UMASKR, GFC_STD_F2008);
+
   add_sym_2 ("matmul", GFC_ISYM_MATMUL, CLASS_TRANSFORMATIONAL, ACTUAL_NO, BT_REAL, dr, GFC_STD_F95,
 	 gfc_check_matmul, gfc_simplify_matmul, gfc_resolve_matmul,
 	 ma, BT_REAL, dr, REQUIRED, mb, BT_REAL, dr, REQUIRED);
diff --git a/gcc/fortran/intrinsic.h b/gcc/fortran/intrinsic.h
index ea29219819d..61d85eedc69 100644
--- a/gcc/fortran/intrinsic.h
+++ b/gcc/fortran/intrinsic.h
@@ -434,6 +434,8 @@ gfc_expr *gfc_simplify_transpose (gfc_expr *);
 gfc_expr *gfc_simplify_trim (gfc_expr *);
 gfc_expr *gfc_simplify_ubound (gfc_expr *, gfc_expr *, gfc_expr *);
 gfc_expr *gfc_simplify_ucobound (gfc_expr *, gfc_expr *, gfc_expr *);

[Patch, fortran] [13-15 regressions] PR115070 & 115348

2024-10-27 Thread Paul Richard Thomas
Pushed as 'obvious' in commit r15-4702. This patch has been on my tree
since July so I thought to get it out of the way before it died of bit-rot.
Will backport in a week.

Fortran: Fix regressions with intent(out) class[PR115070, PR115348].

2024-10-27  Paul Thomas  

gcc/fortran
PR fortran/115070
PR fortran/115348
* trans-expr.cc (gfc_trans_class_init_assign): If all the
components of the default initializer are null for a scalar,
build an empty statement to prevent prior declarations from
disappearing.

gcc/testsuite/
PR fortran/115070
* gfortran.dg/pr115070.f90: New test.

PR fortran/115348
* gfortran.dg/pr115348.f90: New test.

Paul


[r15-4702 Regression] FAIL: gfortran.dg/pr115070.f90 -O (test for excess errors) on Linux/x86_64

2024-10-27 Thread haochen.jiang
On Linux/x86_64,

ed8ca972f8857869d2bb4a416994bb896eb1c34e is the first bad commit
commit ed8ca972f8857869d2bb4a416994bb896eb1c34e
Author: Paul Thomas 
Date:   Sun Oct 27 12:40:42 2024 +

Fortran: Fix regressions with intent(out) class[PR115070, PR115348].

caused

FAIL: gfortran.dg/pr115070.f90   -O  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-4702/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/pr115070.f90 --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/pr115070.f90 --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/pr115070.f90 --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/pr115070.f90 --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH] xtensa: Define TARGET_DIFFERENT_ADDR_DISPLACEMENT_P target hook

2024-10-27 Thread Max Filippov
On Tue, Oct 22, 2024 at 7:31 PM Takayuki 'January June' Suwa
 wrote:
>
> In commit bc5a9dab55d13f888a3cdd150c8cf5c2244f35e0 ("gcc: xtensa: reorder
> movsi_internal patterns for better code generation during LRA"),  the
> instruction order in "movsi_internal" MD definition was changed to make LRA
> use load/store instructions with larger memory address displacements, but as
> a side effect, it now uses the larger displacements (ie., the larger
> instructions) even outside of reload operations.
>
> The underlying problem is that LRA assumes by default that there is only one
> maximal legitimate displacement for the same address structure, meaning that
> it has no choice but to use the first load/store instruction it finds.
>
> To fix this, define TARGET_DIFFERENT_ADDR_DISPLACEMENT_P hook to always
> return true.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.cc (TARGET_DIFFERENT_ADDR_DISPLACEMENT_P):
> Add new target hook to always return true.
> * config/xtensa/xtensa.md (movsi_internal):
> Revert the previous changes.
> ---
>   gcc/config/xtensa/xtensa.cc |  3 +++
>   gcc/config/xtensa/xtensa.md | 12 ++--
>   2 files changed, 9 insertions(+), 6 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master

-- 
Thanks.
-- Max


Re: [PATCH v4 2/2] RISC-V: Add testcases for unsigned .SAT_SUB form 2 with IMM = 1.

2024-10-27 Thread Jeff Law




On 10/24/24 7:22 PM, Li Xu wrote:

From: xuli 

form2:
T __attribute__((noinline)) \
sat_u_sub_imm##IMM##_##T##_fmt_2 (T x)  \
{   \
   return x >= (T)IMM ? x - (T)IMM : 0;  \
}

Passed the rv64gcv regression test.

Signed-off-by: Li Xu 
gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_u_sub_imm-run-5.c: add run case for imm=1.
* gcc.target/riscv/sat_u_sub_imm-run-6.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-run-7.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-run-8.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-5_3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6_3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7_3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-8_1.c: New test.

This is fine once the prerequisite patch is installed.

Thanks,
jeff



Re: [PATCH #1/7] allow vuses in ifcombine blocks

2024-10-27 Thread Jeff Law




On 10/25/24 5:50 AM, Alexandre Oliva wrote:


Disallowing vuses in blocks for ifcombine is too strict, and it
prevents usefully moving fold_truth_andor into ifcombine.  That
tree-level folder has long ifcombined loads, absent other relevant
side effects.


for  gcc/ChangeLog

* tree-ssa-ifcombine.c (bb_no_side_effects_p): Allow vuses,
but not vdefs.

OK
jeff



Re: [PATCH #2/7] drop redundant ifcombine_ifandif parm

2024-10-27 Thread Jeff Law




On 10/25/24 5:51 AM, Alexandre Oliva wrote:


In preparation to changes that may modify both inner and outer
conditions in ifcombine, drop the redundant parameter result_inv, that
is always identical to inner_inv.


for  gcc/ChangeLog

* tree-ssa-ifcombine.cc (ifcombine_ifandif): Drop redundant
result_inv parm.  Adjust all callers.

OK
jeff



Re: [PATCH 6/6] simplify-rtx: Simplify ROTATE:HI (X:HI, 8) into BSWAP:HI (X)

2024-10-27 Thread Jeff Law




On 10/24/24 12:24 AM, Kyrylo Tkachov wrote:




On 24 Oct 2024, at 07:36, Jeff Law  wrote:



On 10/22/24 2:26 PM, Kyrylo Tkachov wrote:

Hi all,
With recent patch to improve detection of vector rotates at RTL level
combine now tries matching a V8HImode rotate by 8 in the example in the
testcase.  We can teach AArch64 to emit a REV16 instruction for such a rotate
but really this operation corresponds to the RTL code BSWAP, for which we
already have the right patterns.  BSWAP is arguably a simpler representation
than ROTATE here because it has only one operand, so let's teach simplify-rtx
to generate it.
With this patch the testcase now generates the simplest form:
.L2:
 ldr q31, [x1, x0]
 rev16   v31.16b, v31.16b
 str q31, [x0, x2]
 add x0, x0, 16
 cmp x0, 2048
 bne .L2
instead of the previous:
.L2:
 ldr q31, [x1, x0]
 shl v30.8h, v31.8h, 8
 usrav30.8h, v31.8h, 8
 str q30, [x0, x2]
 add x0, x0, 16
 cmp x0, 2048
 bne .L2
IMO ideally the bswap detection would have been done during vectorisation
time and used the expanders for that, but teaching simplify-rtx to do this
transformation is fairly straightforward and, unlike at tree level, we have
the native RTL BSWAP code.  This change is not enough to generate the
equivalent sequence in SVE, but that is something that should be tackled
separately.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov
gcc/
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Simplify (rotate:HI x:HI, 8) -> (bswap:HI x:HI).
gcc/testsuite/
* gcc.target/aarch64/rot_to_bswap.c: New test.
v2-0006-simplify-rtx-Simplify-ROTATE-HI-X-HI-8-into-BSWAP-HI.patch
 From 79e6dcf698361eae46d0e99f851077199a8ce43a Mon Sep 17 00:00:00 2001
From: Kyrylo Tkachov
Date: Thu, 17 Oct 2024 06:39:57 -0700
Subject: [PATCH 6/6] simplify-rtx: Simplify ROTATE:HI (X:HI, 8) into BSWAP:HI
  (X)
With recent patch to improve detection of vector rotates at RTL level
combine now tries matching a V8HImode rotate by 8 in the example in the
testcase.  We can teach AArch64 to emit a REV16 instruction for such a rotate
but really this operation corresponds to the RTL code BSWAP, for which we
already have the right patterns.  BSWAP is arguably a simpler representation
than ROTATE here because it has only one operand, so let's teach simplify-rtx
to generate it.
With this patch the testcase now generates the simplest form:
.L2:
 ldr q31, [x1, x0]
 rev16   v31.16b, v31.16b
 str q31, [x0, x2]
 add x0, x0, 16
 cmp x0, 2048
 bne .L2
instead of the previous:
.L2:
 ldr q31, [x1, x0]
 shl v30.8h, v31.8h, 8
 usrav30.8h, v31.8h, 8
 str q30, [x0, x2]
 add x0, x0, 16
 cmp x0, 2048
 bne .L2
IMO ideally the bswap detection would have been done during vectorisation
time and used the expanders for that, but teaching simplify-rtx to do this
transformation is fairly straightforward and, unlike at tree level, we have
the native RTL BSWAP code.  This change is not enough to generate the
equivalent sequence in SVE, but that is something that should be tackled
separately.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov
gcc/
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Simplify (rotate:HI x:HI, 8) -> (bswap:HI x:HI).
gcc/testsuite/
* gcc.target/aarch64/rot_to_bswap.c: New test.
---
  gcc/simplify-rtx.cc   |  6 +
  .../gcc.target/aarch64/rot_to_bswap.c | 23 +++
  2 files changed, 29 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/rot_to_bswap.c
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 089e03c2a7a..205a251f005 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -4328,6 +4328,12 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
mode, op0, new_amount_rtx);
   }
  #endif
+  /* ROTATE/ROTATERT:HI (X:HI, 8) is BSWAP:HI (X).  */
+  tem = unwrap_const_vec_duplicate (trueop1);
+  if (GET_MODE_UNIT_BITSIZE (mode) == (2 * BITS_PER_UNIT)
+  && CONST_INT_P (tem) && INTVAL (tem) == BITS_PER_UNIT)
+ return simplify_gen_unary (BSWAP, mode, op0, mode);

So what about other modes?  I haven't really pondered this, but isn't there 
something similar for ROTATE:SI (X:SI, 16)?  I guess the basic question is 
whether or not this really need to be limited to HImode.



A (ROTATE:SI (X:SI, 16)) would represent a half-word swap, rather than a 
byte-swap. For example, 0x12345678 rotated by 16 gives 0x56781234 whereas a 
bswap would give 0x78563412.
AArch64 does have native operations that perform these half-word (and word) 
swaps, but they are not RTL BSWAP operations unfortunately.
So this pattern effectively only works for HI and vector HI 

[committed] libstdc++: Fix std::vector::emplace to forward parameter

2024-10-27 Thread Jonathan Wakely
If the parameter is not lvalue-convertible to bool then the current code
will fail to compile. The parameter should be forwarded to restore the
original value category.

libstdc++-v3/ChangeLog:

* include/bits/stl_bvector.h (emplace_back, emplace): Forward
parameter pack to preserve value category.
* testsuite/23_containers/vector/bool/emplace_rvalue.cc: New
test.
---
Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/bits/stl_bvector.h   |  4 ++--
 .../vector/bool/emplace_rvalue.cc | 24 +++
 2 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/vector/bool/emplace_rvalue.cc

diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
b/libstdc++-v3/include/bits/stl_bvector.h
index 42261ac5915..70f69b5b5b5 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -1343,7 +1343,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #endif
emplace_back(_Args&&... __args)
{
- push_back(bool(__args...));
+ push_back(bool(std::forward<_Args>(__args)...));
 #if __cplusplus > 201402L
  return back();
 #endif
@@ -1353,7 +1353,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
_GLIBCXX20_CONSTEXPR
iterator
emplace(const_iterator __pos, _Args&&... __args)
-   { return insert(__pos, bool(__args...)); }
+   { return insert(__pos, bool(std::forward<_Args>(__args)...)); }
 #endif
 
 protected:
diff --git a/libstdc++-v3/testsuite/23_containers/vector/bool/emplace_rvalue.cc 
b/libstdc++-v3/testsuite/23_containers/vector/bool/emplace_rvalue.cc
new file mode 100644
index 000..5dea2426d60
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/vector/bool/emplace_rvalue.cc
@@ -0,0 +1,24 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+struct S
+{
+  explicit operator bool() &&;
+};
+
+void
+test_emplace_back()
+{
+  S s;
+  std::vector v;
+  v.emplace_back(std::move(s));
+}
+
+void
+test_emplace()
+{
+  S s;
+  std::vector v;
+  v.emplace(v.begin(), std::move(s));
+}
-- 
2.47.0



Re: [PATCH #3/7] introduce ifcombine_replace_cond

2024-10-27 Thread Jeff Law




On 10/25/24 5:52 AM, Alexandre Oliva wrote:


Refactor ifcombine_ifandif, moving the common code from the various
paths that apply the combined condition to a new function.


for  gcc/ChangeLog

* tree-ssa-ifcombine.cc (ifcombine_replace_cond): Factor out
of...
(ifcombine_ifandif): ... this.
It looks like you also did some simplifications in ifcombine_ifandif. 
Those should be noted in the ChangeLog.  Specifically you no longer make 
the calls to force_gimple_operand_gsi and simplified the equality test.


OK with that change.

jeff




Re: [PATCH] RISC-V: Remove skip of decl in registered_function.

2024-10-27 Thread Jeff Law




On 10/22/24 12:24 AM, KuanLin Chen wrote:

The GTY skip makes GGC clean the registered functions wrongly in lto.

Example:
riscv64-unknown-elf-gcc -flto gcc/testsuite/gcc.target/riscv/rvv/base/bug-3.c
-O2 -march=rv64gcv

In file included from bug-3.c:2: internal compiler error: Segmentation fault

gcc/ChangeLog:

 *riscv-vector-builtins.cc (registered_function): Remove skip at
 decl.

How was this tested?

I put it through a regression testsuite run and it resulted in about 
4700 new failures for both riscv32-elf and riscv64-elf.


Patches need to be regression tested.

Jeff





Re: [PATCH] RISC-V: Fix rvv builtin function groups registration asynchronously.

2024-10-27 Thread Jeff Law




On 10/22/24 12:26 AM, KuanLin Chen wrote:

In the origin, cc1 registers rvv builtins with turn on all sub vector
extensions but lto not.  It makes lto use the asynchronous DECL_MD_FUNCTION_CODE
from lto-objects.

Example:
riscv64-unknown-elf-gcc -flto gcc/testsuite/gcc.target/riscv/rvv/base/bug-3.c
-O2 -march=rv64gcv

bug-3.c: In function 'main':
bug-3.c:10:3: error: invalid argument to built-in function
10 |   __riscv_vse32_v_i32m1 (d, vd, 1);

gcc/ChangeLog:

 * config/riscv/riscv-c.cc
   (riscv_pragma_intrinsic_flags_pollute): Move to
   riscv-vector-builtins.cc
   (riscv_pragma_intrinsic_flags_restore): Ditto
   (riscv_ext_version_value): Remove flags initialization.
 * config/riscv/riscv-vector-builtins.cc:
   (reinit_builtins): Remove handle_pragma_vector in lto_p.
   (riscv_pragma_intrinsic_flags_pollute): Cut from riscv-c.cc.
   (riscv_pragma_intrinsic_flags_restore): Ditto.
   (riscv_vector_push_setting): Backup flags.
   (riscv_vector_pop_setting): Restore flags.
   (handle_pragma_vector): Initialize flags for registering
   builtins.


You need to run the regression testsuite and verify that there are no 
new failures after your patch compared to a run without your patch.


jeff


Re: [PATCH v2] testsuite: Sanitize pacbti test cases for Cortex-M

2024-10-27 Thread Torbjorn SVENSSON




On 2024-10-25 12:30, Richard Earnshaw (lists) wrote:

On 14/10/2024 13:23, Christophe Lyon wrote:



On 10/13/24 19:50, Torbjörn SVENSSON wrote:

Ok for trunk and releases/gcc-14?

Changes since v1:

- Dropped changes to dg- instructions. These will be addressed in a separate 
set of patches later.


LGTM, let's avoid mixing changes.


This is OK, though I think in most (but not all) cases the additional matches 
on a tab are unnecessary when the instruction takes arguments.  The problem 
cases are mostly for instructions that do not take any arguments (or where we 
don't try to match them).


I did not include it in v1, but it was suggested in 
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662113.html and 
I did not see that would hurt, so I included it.


Pushed as r15-4701-g6ad29a858ba and r14.2.0-317-gec9bd14144a.

Kind regards,
Torbjörn


Re: [PATCH v2 8/8] RISC-V: Add else operand to masked loads [PR115336].

2024-10-27 Thread Jeff Law




On 10/18/24 8:22 AM, Robin Dapp wrote:

This patch adds else operands to masked loads.  Currently the default
else operand predicate accepts "undefined" (i.e. SCRATCH) as well as
all-ones values.

Note that this series introduces a large number of new RVV FAILs for
riscv.  All of them are due to us not being able to elide redundant
vec_cond_exprs.

PR middle-end/115336
PR middle-end/116059

gcc/ChangeLog:

* config/riscv/autovec.md: Add else operand.
* config/riscv/predicates.md (maskload_else_operand): New
predicate.
* config/riscv/riscv-v.cc (get_else_operand): Remove static.
(expand_load_store): Use get_else_operand and adjust index.
(expand_gather_scatter): Ditto.
(expand_lanes_load_store): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr115336.c: New test.
* gcc.target/riscv/rvv/autovec/pr116059.c: New test.

OK once prereqs are resolved.

jeff



Re: [PATCH #4/7] adjust update_profile_after_ifcombine for noncontiguous ifcombine

2024-10-27 Thread Jeff Law




On 10/25/24 5:54 AM, Alexandre Oliva wrote:


Prepare for ifcombining noncontiguous blocks, adding (still unused)
logic to the ifcombine profile updater to handle such cases.


for  gcc/ChangeLog

* tree-ssa-ifcombine.cc (known_succ_p): New.
(update_profile_after_ifcombine): Handle noncontiguous blocks.

OK
jeff