date:20220813

Re: [x86 PATCH take #2] Move V1TI shift/rotate lowering from expand to pre-reload split.

2022-08-13 Thread Uros Bizjak via Gcc-patches

On Fri, Aug 12, 2022 at 11:24 PM Roger Sayle  wrote:
>
>
> Hi Uros,
> As requested, here's an updated version of my patch that introduces a new
> const_0_to_255_not_mul_8_operand as you've requested.  I think in this
> instance, having mutually exclusive patterns that can appear in any order,
> without imposing implicit ordering constraints, is slightly preferable,
> especially as (thanks to STV)  some related patterns may appear in
> sse.md and others appear in i386.md (making ordering tricky).
>
> This patch has been retested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-08-12  Roger Sayle  
> Uroš Bizjak  
>
> gcc/ChangeLog
> * config/i386/predicates.md (const_0_to_255_not_mul_8_operand):
> New predicate for values between 0/1 and 255, not multiples of 8.
> * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left
> shifts by constant bit counts.
> (*ashlvti3_internal): New define_insn_and_split that lowers
> logical left shifts by constant bit counts, that aren't multiples
> of 8, before reload.
> (lshrv1ti3): Delay lowering of logical right shifts by constant.
> (*lshrv1ti3_internal): New define_insn_and_split that lowers
> logical right shifts by constant bit counts, that aren't multiples
> of 8, before reload.
> (ashrv1ti3):: Delay lowering of arithmetic right shifts by
> constant bit counts.
> (*ashrv1ti3_internal): New define_insn_and_split that lowers
> arithmetic right shifts by constant bit counts before reload.
> (rotlv1ti3): Delay lowering of rotate left by constant.
> (*rotlv1ti3_internal): New define_insn_and_split that lowers
> rotate left by constant bits counts before reload.
> (rotrv1ti3): Delay lowering of rotate right by constant.
> (*rotrv1ti3_internal): New define_insn_and_split that lowers
> rotate right by constant bits counts before reload.

OK with a small nit:

+  "TARGET_SSE2
+   && TARGET_64BIT

Please put these target options to one line, as in many examples
throuhout i386.md

Thanks,
Uros.

[PATCH] Add pattern to convert vector shift + bitwise and + multiply to vector compare in some cases.

2022-08-13 Thread mtsamis

When using SWAR (SIMD in a register) techniques a comparison operation within
such a register can be made by using a combination of shifts, bitwise and and
multiplication. If code using this scheme is vectorized then there is potential
to replace all these operations with a single vector comparison, by 
reinterpreting
the vector types to match the width of the SWAR register.

For example, for the test function packed_cmp_16_32, the original generated 
code is:

ldr q0, [x0]
add w1, w1, 1
ushrv0.4s, v0.4s, 15
and v0.16b, v0.16b, v2.16b
shl v1.4s, v0.4s, 16
sub v0.4s, v1.4s, v0.4s
str q0, [x0], 16
cmp w2, w1
bhi .L20

with this pattern the above can be optimized to:

ldr q0, [x0]
add w1, w1, 1
cmltv0.8h, v0.8h, #0
str q0, [x0], 16
cmp w2, w1
bhi .L20

The effect is similar for x86-64.

gcc/ChangeLog:

* match.pd: Simplify vector shift + bit_and + multiply in some cases.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/swar_to_vec_cmp.c: New test.

Signed-off-by: mtsamis 
---
 gcc/match.pd  | 57 +++
 .../gcc.target/aarch64/swar_to_vec_cmp.c  | 72 +++
 2 files changed, 129 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 8bbc0dbd5cd..5c768a94846 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -301,6 +301,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (view_convert (bit_and:itype (view_convert @0)
 (ne @1 { build_zero_cst (type); })))
 
+/* In SWAR (SIMD in a register) code a comparison of packed data can
+   be consturcted with a particular combination of shift, bitwise and,
+   and multiplication by constants.  If that code is vectorized we can
+   convert this pattern into a more efficient vector comparison.  */
+(simplify
+ (mult (bit_and (rshift @0 @1) @2) @3)
+ (with {
+   tree op_type = TREE_TYPE (@0);
+   tree rshift_cst = NULL_TREE;
+   tree bit_and_cst = NULL_TREE;
+   tree mult_cst = NULL_TREE;
+  }
+  /* Make sure we're working with vectors and uniform vector constants.  */
+  (if (VECTOR_TYPE_P (op_type)
+   && (rshift_cst = uniform_integer_cst_p (@1))
+   && (bit_and_cst = uniform_integer_cst_p (@2))
+   && (mult_cst = uniform_integer_cst_p (@3)))
+   /* Compute what constants would be needed for this to represent a packed
+  comparison based on the shift amount denoted by RSHIFT_CST.  */
+   (with {
+ HOST_WIDE_INT vec_elem_bits = vector_element_bits (op_type);
+ HOST_WIDE_INT vec_nelts = TYPE_VECTOR_SUBPARTS (op_type).to_constant ();
+ HOST_WIDE_INT vec_bits = vec_elem_bits * vec_nelts;
+
+ unsigned HOST_WIDE_INT cmp_bits_i, bit_and_i, mult_i;
+ unsigned HOST_WIDE_INT target_mult_i, target_bit_and_i;
+ cmp_bits_i = tree_to_uhwi (rshift_cst) + 1;
+ target_mult_i = (HOST_WIDE_INT_1U << cmp_bits_i) - 1;
+
+ mult_i = tree_to_uhwi (mult_cst);
+ bit_and_i = tree_to_uhwi (bit_and_cst);
+ target_bit_and_i = 0;
+
+ for (unsigned i = 0; i < vec_elem_bits / cmp_bits_i; i++)
+   target_bit_and_i = (target_bit_and_i << cmp_bits_i) | 1U;
+}
+(if ((exact_log2 (cmp_bits_i)) >= 0
+&& cmp_bits_i < HOST_BITS_PER_WIDE_INT
+&& vec_elem_bits <= HOST_BITS_PER_WIDE_INT
+&& tree_fits_uhwi_p (rshift_cst)
+&& tree_fits_uhwi_p (mult_cst)
+&& tree_fits_uhwi_p (bit_and_cst)
+&& target_mult_i == mult_i
+&& target_bit_and_i == bit_and_i)
+ /* Compute the vector shape for the comparison and check if the target is
+   able to expand the comparison with that type.  */
+ (with {
+   tree bool_type = build_nonstandard_boolean_type (cmp_bits_i);
+   int vector_type_nelts = vec_bits / cmp_bits_i;
+   tree vector_type = build_vector_type (bool_type, vector_type_nelts);
+   tree zeros = build_zero_cst (vector_type);
+   tree mask_type = truth_type_for (vector_type);
+  }
+  (if (expand_vec_cmp_expr_p (vector_type, mask_type, LT_EXPR))
+   (view_convert:op_type (lt:mask_type (view_convert:vector_type @0)
+  { zeros; })
+
 (for cmp (gt ge lt le)
  outp (convert convert negate negate)
  outn (negate negate convert convert)
diff --git a/gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c 
b/gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c
new file mode 100644
index 000..26f9ad9ef28
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c
@@ -0,0 +1,72 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+typedef unsigned char uint8_t;
+typedef unsigned short uint16_t;
+typedef unsigned int uint32_t;
+
+/* 8-bit SWAR tests.  */
+
+static uint8_t packed_cmp_8_8(uint8_t a)
+{
+  return ((a >> 7) & 0x1U

[committed] testsuite: Disable out-of-bounds checker in analyzer/torture/pr93451.c

2022-08-13 Thread Tim Lange

This patch disables Wanalyzer-out-of-bounds for analyzer/torture/pr93451.c
and makes the test case pass when compiled with -m32.

The emitted warning is a true positive but only occurs if
sizeof (long int) is less than sizeof (double). I've already discussed a
similar case with Dave in the context of pr96764.c and we came to the
conclusion that we just disable the checker in such cases.

Committed under the "obvious fix" rule.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/torture/pr93451.c:
Disable Wanalyzer-out-of-bounds.

---
 gcc/testsuite/gcc.dg/analyzer/torture/pr93451.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/pr93451.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/pr93451.c
index 5908bc4b69f..daac745d504 100644
--- a/gcc/testsuite/gcc.dg/analyzer/torture/pr93451.c
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/pr93451.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-Wno-analyzer-out-of-bounds" } */
+
 void
 mt (double);
 
-- 
2.37.1

[PATCH] c: Implement C23 nullptr (N3042)

2022-08-13 Thread Marek Polacek via Gcc-patches

This patch implements the C23 nullptr literal:
, which is
intended to replace the problematic definition of NULL which might be
either of integer type or void*.

Since C++ has had nullptr for over a decade now, it was relatively easy
to just move the built-in node definitions from the C++ FE to the C/C++
common code.  Also, our DWARF emitter already handles NULLPTR_TYPE by
emitting DW_TAG_unspecified_type.  However, I had to handle a lot of
contexts such as ?:, comparison, conversion, etc.

There are some minor differences, e.g. in C you can do

  bool b = nullptr;

but in C++ you have to use direct-initialization:

  bool b{nullptr};

And I think that

  nullptr_t n = 0;

is only valid in C++.

Of course, C doesn't have to handle mangling, RTTI, substitution,
overloading, ...

This patch also defines nullptr_t in .  I'm uncertain about
the __STDC_VERSION__ version I should be checking.  Also, I'm not
defining __STDC_VERSION_STDDEF_H__ yet, because I don't know what value
it should be defined to.  Do we know yet?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/c-family/ChangeLog:

* c-common.cc (c_common_reswords): Enable nullptr in C.
(c_common_nodes_and_builtins): Create the built-in node for nullptr.
* c-common.h (enum c_tree_index): Add CTI_NULLPTR, CTI_NULLPTR_TYPE.
(nullptr_node): Define.
(nullptr_type_node): Define.
(NULLPTR_TYPE_P): Define.
* c-pretty-print.cc (c_pretty_printer::simple_type_specifier): Handle
NULLPTR_TYPE.
(c_pretty_printer::direct_abstract_declarator): Likewise.
(c_pretty_printer::constant): Likewise.

gcc/c/ChangeLog:

* c-convert.cc (c_convert) : Handle NULLPTR_TYPE.
Give a better diagnostic when converting to nullptr_t.
* c-decl.cc (c_init_decl_processing): Perform C-specific nullptr
initialization.
* c-parser.cc (c_parser_postfix_expression): Handle RID_NULLPTR.
* c-typeck.cc (null_pointer_constant_p): Return true for NULLPTR_TYPE_P.
(build_unary_op) : Handle NULLPTR_TYPE.
(build_conditional_expr): Handle the case when the second/third operand
is NULLPTR_TYPE and third/second operand is POINTER_TYPE.
(convert_for_assignment): Handle converting an expression of type
nullptr_t to pointer/bool.
(build_binary_op) : Handle NULLPTR_TYPE.
: Likewise.

gcc/cp/ChangeLog:

* cp-tree.h (enum cp_tree_index): Remove CTI_NULLPTR, CTI_NULLPTR_TYPE.
Move it to c_tree_index.
(nullptr_node): No longer define here.
(nullptr_type_node): Likewise.
(NULLPTR_TYPE_P): Likewise.
* decl.cc (cxx_init_decl_processing): Only keep C++-specific nullptr
initialization; move the shared code to c_common_nodes_and_builtins.

gcc/ChangeLog:

* ginclude/stddef.h: Define nullptr_t.

gcc/testsuite/ChangeLog:

* gcc.dg/Wcxx-compat-2.c: Remove nullptr test.
* gcc.dg/c2x-nullptr-1.c: New test.
* gcc.dg/c2x-nullptr-2.c: New test.
* gcc.dg/c2x-nullptr-3.c: New test.
* gcc.dg/c2x-nullptr-4.c: New test.
* gcc.dg/c2x-nullptr-5.c: New test.
---
 gcc/c-family/c-common.cc |  13 +-
 gcc/c-family/c-common.h  |   8 +
 gcc/c-family/c-pretty-print.cc   |   7 +
 gcc/c/c-convert.cc   |  19 ++-
 gcc/c/c-decl.cc  |   6 +
 gcc/c/c-parser.cc|   8 +
 gcc/c/c-typeck.cc|  55 +-
 gcc/cp/cp-tree.h |   8 -
 gcc/cp/decl.cc   |   8 +-
 gcc/ginclude/stddef.h|   8 +
 gcc/testsuite/gcc.dg/Wcxx-compat-2.c |   1 -
 gcc/testsuite/gcc.dg/c2x-nullptr-1.c | 239 +++
 gcc/testsuite/gcc.dg/c2x-nullptr-2.c |   9 +
 gcc/testsuite/gcc.dg/c2x-nullptr-3.c |  62 +++
 gcc/testsuite/gcc.dg/c2x-nullptr-4.c |  10 ++
 gcc/testsuite/gcc.dg/c2x-nullptr-5.c |  11 ++
 16 files changed, 448 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/c2x-nullptr-1.c
 create mode 100644 gcc/testsuite/gcc.dg/c2x-nullptr-2.c
 create mode 100644 gcc/testsuite/gcc.dg/c2x-nullptr-3.c
 create mode 100644 gcc/testsuite/gcc.dg/c2x-nullptr-4.c
 create mode 100644 gcc/testsuite/gcc.dg/c2x-nullptr-5.c

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 6e41ceb38e9..809e7ff5804 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -500,7 +500,7 @@ const struct c_common_resword c_common_reswords[] =
   { "namespace",   RID_NAMESPACE,  D_CXXONLY | D_CXXWARN },
   { "new", RID_NEW,D_CXXONLY | D_CXXWARN },
   { "noexcept",RID_NOEXCEPT,   D_CXXONLY | D_CXX11 | D_CXXWARN 
},
-  { "nullptr", RID_NULLPTR,D_CXXONLY | D_CXX11 | D_CXXWARN },
+  { "nullptr", RID_NULLPTR,D_CXX11 | D_CXXWARN },
   { "operator",RID_O

Re: [x86 PATCH take #2] Move V1TI shift/rotate lowering from expand to pre-reload split.

[PATCH] Add pattern to convert vector shift + bitwise and + multiply to vector compare in some cases.

[committed] testsuite: Disable out-of-bounds checker in analyzer/torture/pr93451.c

[PATCH] c: Implement C23 nullptr (N3042)

4 matches

Site Navigation

Mail list logo

Footer information