Re: build broke, cris-elf: [committed] libstdc++: Implement C++20 time zone support in

2022-12-23 Thread Jonathan Wakely via Gcc-patches
On Fri, 23 Dec 2022 at 02:15, Hans-Peter Nilsson via Libstdc++
 wrote:
>
> > From: Jonathan Wakely via Gcc-patches 
> > Date: Fri, 23 Dec 2022 00:37:04 +0100
>
> > This is the largest missing piece of C++20 support. Only the cxx11 ABI
> > is supported, due to the use of std::string in the API for time zones.
>
> > libstdc++-v3/ChangeLog:
> >
> >   * acinclude.m4 (GLIBCXX_ZONEINFO_DIR): New macro.
> >   * config.h.in: Regenerate.
> >   * config/abi/pre/gnu.ver: Export new symbols.
> >   * configure: Regenerate.
> >   * configure.ac (GLIBCXX_ZONEINFO_DIR): Use new macro.
> >   * include/std/chrono (utc_clock::from_sys): Correct handling
> >   of leap seconds.
> >   (nonexistent_local_time::_M_make_what_str): Define.
> >   (ambiguous_local_time::_M_make_what_str): Define.
> >   (__throw_bad_local_time): Define new function.
> >   (time_zone, tzdb_list, tzdb): Implement all members.
> >   (remote_version, zoned_time, get_leap_second_info): Define.
> >   * include/std/version: Add comment for __cpp_lib_chrono.
> >   * src/c++20/Makefile.am: Add new file.
> >   * src/c++20/Makefile.in: Regenerate.
> >   * src/c++20/tzdb.cc: New file.
> >   * testsuite/lib/libstdc++.exp: Define effective target tzdb.
> >   * testsuite/std/time/clock/file/members.cc: Check file_time
> >   alias and file_clock::now() member.
> >   * testsuite/std/time/clock/gps/1.cc: Likewise for gps_clock.
> >   * testsuite/std/time/clock/tai/1.cc: Likewise for tai_clock.
> >   * testsuite/std/time/syn_c++20.cc: Uncomment everything except
> >   parse.
> >   * testsuite/std/time/clock/utc/leap_second_info.cc: New test.
> >   * testsuite/std/time/exceptions.cc: New test.
> >   * testsuite/std/time/time_zone/get_info_local.cc: New test.
> >   * testsuite/std/time/time_zone/get_info_sys.cc: New test.
> >   * testsuite/std/time/time_zone/requirements.cc: New test.
> >   * testsuite/std/time/tzdb/1.cc: New test.
> >   * testsuite/std/time/tzdb/leap_seconds.cc: New test.
> >   * testsuite/std/time/tzdb_list/1.cc: New test.
> >   * testsuite/std/time/tzdb_list/requirements.cc: New test.
> >   * testsuite/std/time/zoned_time/1.cc: New test.
> >   * testsuite/std/time/zoned_time/custom.cc: New test.
> >   * testsuite/std/time/zoned_time/deduction.cc: New test.
> >   * testsuite/std/time/zoned_time/req_neg.cc: New test.
> >   * testsuite/std/time/zoned_time/requirements.cc: New test.
> >   * testsuite/std/time/zoned_traits.cc: New test.
>
>
> > +++ b/libstdc++-v3/src/c++20/tzdb.cc
>
> > +  static_assert(sizeof(datetime) == 8 && alignof(datetime) == 4);
>
> This broke build for cris-elf:
> x/autotest/hpautotest-gcc1/gcc/libstdc++-v3/src/c++20/tzdb.cc:451:38: error: 
> static assertion failed
>   451 |   static_assert(sizeof(datetime) == 8 && alignof(datetime) == 4);
>   | ~^~~~
> x/autotest/hpautotest-gcc1/gcc/libstdc++-v3/src/c++20/tzdb.cc:451:38: note: 
> the comparison reduces to '(7 == 8)'
> make[5]: *** [Makefile:562: tzdb.lo] Error 1
>
> (and I don't think "alignof(datetime) == 4" is true either)


Sorry about that, I can just remove the assertion now. I'll commit that ASAP.



[PATCH,WWWDOCS] htdocs: news: GCC BPF in Compiler Explorer

2022-12-23 Thread Jose E. Marchesi via Gcc-patches
This patch adds an entry to the News section in index.html, announcing
the availability of a nightly build of bpf-unknown-none-gcc.
---
 htdocs/index.html | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/htdocs/index.html b/htdocs/index.html
index 655b7373..e91fadf1 100644
--- a/htdocs/index.html
+++ b/htdocs/index.html
@@ -55,6 +55,12 @@ mission statement.
 News
 
 
+https://godbolt.org";>GCC BPF in Compiler Explorer
+ [2022-12-23]
+Support for a nightly build of the bpf-unknown-none-gcc compiler
+  has been contributed to Compiler Explorer (aka godbolt.org) by Marc
+  Poulhiès
+
 https://gcc.gnu.org/wiki/cauldron2022";>GNU Tools Cauldron 
2022
 [2022-09-02]
 Prague, Czech Republic and online, September 16-18 2022
-- 
2.30.2



[committed] libstdc++: Remove problematic static_assert from src/c++20/tzdb.cc

2022-12-23 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

This assertion fails for cris-elf where sizeof(datetime) is only 7, due
to lower alignment requirements. The assertion was used while I was
writing the code to check that the objects were as compact as I wanted,
but it doesn't need to be kept now.

libstdc++-v3/ChangeLog:

* src/c++20/tzdb.cc: Remove static_assert.
---
 libstdc++-v3/src/c++20/tzdb.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libstdc++-v3/src/c++20/tzdb.cc b/libstdc++-v3/src/c++20/tzdb.cc
index a0bb03173a9..a02bcd4aec7 100644
--- a/libstdc++-v3/src/c++20/tzdb.cc
+++ b/libstdc++-v3/src/c++20/tzdb.cc
@@ -448,7 +448,6 @@ namespace std::chrono
   // This allows on_day to reuse padding of at_time.
   // This keeps the size to 8 bytes and the alignment to 4 bytes.
   struct datetime : at_time { on_day day; };
-  static_assert(sizeof(datetime) == 8 && alignof(datetime) == 4);
 
   // TODO combining name+letters into a single string (like in ZoneInfo)
   // would save sizeof(string) and make Rule fit in a single cacheline.
-- 
2.38.1



Re: build broke, cris-elf: [committed] libstdc++: Implement C++20 time zone support in

2022-12-23 Thread Jonathan Wakely via Gcc-patches
On Fri, 23 Dec 2022 at 09:29, Jonathan Wakely  wrote:
>
> On Fri, 23 Dec 2022 at 02:15, Hans-Peter Nilsson via Libstdc++
>  wrote:
> >
> > > From: Jonathan Wakely via Gcc-patches 
> > > Date: Fri, 23 Dec 2022 00:37:04 +0100
> >
> > > This is the largest missing piece of C++20 support. Only the cxx11 ABI
> > > is supported, due to the use of std::string in the API for time zones.
> >
> > > libstdc++-v3/ChangeLog:
> > >
> > >   * acinclude.m4 (GLIBCXX_ZONEINFO_DIR): New macro.
> > >   * config.h.in: Regenerate.
> > >   * config/abi/pre/gnu.ver: Export new symbols.
> > >   * configure: Regenerate.
> > >   * configure.ac (GLIBCXX_ZONEINFO_DIR): Use new macro.
> > >   * include/std/chrono (utc_clock::from_sys): Correct handling
> > >   of leap seconds.
> > >   (nonexistent_local_time::_M_make_what_str): Define.
> > >   (ambiguous_local_time::_M_make_what_str): Define.
> > >   (__throw_bad_local_time): Define new function.
> > >   (time_zone, tzdb_list, tzdb): Implement all members.
> > >   (remote_version, zoned_time, get_leap_second_info): Define.
> > >   * include/std/version: Add comment for __cpp_lib_chrono.
> > >   * src/c++20/Makefile.am: Add new file.
> > >   * src/c++20/Makefile.in: Regenerate.
> > >   * src/c++20/tzdb.cc: New file.
> > >   * testsuite/lib/libstdc++.exp: Define effective target tzdb.
> > >   * testsuite/std/time/clock/file/members.cc: Check file_time
> > >   alias and file_clock::now() member.
> > >   * testsuite/std/time/clock/gps/1.cc: Likewise for gps_clock.
> > >   * testsuite/std/time/clock/tai/1.cc: Likewise for tai_clock.
> > >   * testsuite/std/time/syn_c++20.cc: Uncomment everything except
> > >   parse.
> > >   * testsuite/std/time/clock/utc/leap_second_info.cc: New test.
> > >   * testsuite/std/time/exceptions.cc: New test.
> > >   * testsuite/std/time/time_zone/get_info_local.cc: New test.
> > >   * testsuite/std/time/time_zone/get_info_sys.cc: New test.
> > >   * testsuite/std/time/time_zone/requirements.cc: New test.
> > >   * testsuite/std/time/tzdb/1.cc: New test.
> > >   * testsuite/std/time/tzdb/leap_seconds.cc: New test.
> > >   * testsuite/std/time/tzdb_list/1.cc: New test.
> > >   * testsuite/std/time/tzdb_list/requirements.cc: New test.
> > >   * testsuite/std/time/zoned_time/1.cc: New test.
> > >   * testsuite/std/time/zoned_time/custom.cc: New test.
> > >   * testsuite/std/time/zoned_time/deduction.cc: New test.
> > >   * testsuite/std/time/zoned_time/req_neg.cc: New test.
> > >   * testsuite/std/time/zoned_time/requirements.cc: New test.
> > >   * testsuite/std/time/zoned_traits.cc: New test.
> >
> >
> > > +++ b/libstdc++-v3/src/c++20/tzdb.cc
> >
> > > +  static_assert(sizeof(datetime) == 8 && alignof(datetime) == 4);
> >
> > This broke build for cris-elf:
> > x/autotest/hpautotest-gcc1/gcc/libstdc++-v3/src/c++20/tzdb.cc:451:38: 
> > error: static assertion failed
> >   451 |   static_assert(sizeof(datetime) == 8 && alignof(datetime) == 
> > 4);
> >   | ~^~~~
> > x/autotest/hpautotest-gcc1/gcc/libstdc++-v3/src/c++20/tzdb.cc:451:38: note: 
> > the comparison reduces to '(7 == 8)'
> > make[5]: *** [Makefile:562: tzdb.lo] Error 1
> >
> > (and I don't think "alignof(datetime) == 4" is true either)
>
>
> Sorry about that, I can just remove the assertion now. I'll commit that ASAP.

Should be fixed at r13-4871-gdb3c5831f80e67



[PATCH,WWWDOCS] htdocs: add an Atom feed for GCC news

2022-12-23 Thread Jose E. Marchesi via Gcc-patches
This patch adds an Atom feed for GCC news, which can then be easily
aggregated in other sites, such as the GNU planet
(https://planet.gnu.org).

The feed lives in a file news.xml, and this patch initializes it with
the latest entry in News as an example.
---
 htdocs/index.html |  9 -
 htdocs/news.xml   | 28 
 2 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 htdocs/news.xml

diff --git a/htdocs/index.html b/htdocs/index.html
index e91fadf1..2ddee6f6 100644
--- a/htdocs/index.html
+++ b/htdocs/index.html
@@ -6,6 +6,9 @@
 
 GCC, the GNU Compiler Collection
 https://gcc.gnu.org/gcc.css";>
+
 
 
 
@@ -48,7 +51,11 @@ mission statement.
 
 
 
 
diff --git a/htdocs/news.xml b/htdocs/news.xml
new file mode 100644
index ..bebcaa66
--- /dev/null
+++ b/htdocs/news.xml
@@ -0,0 +1,28 @@
+
+
+
+  
+News about the GNU Compiler Collection
+https://gcc.gnu.org
+
+  The GNU Compiler Collection includes front ends for C, C++,
+  Objective-C, Fortran, Ada, Go, and D, as well as libraries for
+  these languages (libstdc++,...). GCC was originally written as
+  the compiler for the GNU operating system. The GNU system was
+  developed to be 100% free software, free in the sense that it
+  respects the user's freedom.
+
+
+
+  GCC BPF in Compiler Explorer
+  https://godbolt.org
+  
+Support for a nightly build of the bpf-unknown-none-gcc
+compiler has been contributed to Compiler Explorer (aka
+godbolt.org) by Marc Poulhiès
+  
+  Fri, 23 December 2022 11:00:00 CET
+
+
+  
+
-- 
2.30.2



[PATCH,WWWDOCS] htdocs: rotate news

2022-12-23 Thread Jose E. Marchesi via Gcc-patches
---
 htdocs/index.html | 24 
 htdocs/news.html  | 24 
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/htdocs/index.html b/htdocs/index.html
index 2ddee6f6..2ab65a95 100644
--- a/htdocs/index.html
+++ b/htdocs/index.html
@@ -92,30 +92,6 @@ mission statement.
 [2022-04-21]
 
 
-https://gcc.gnu.org/wiki/linuxplumbers2021";>GNU Tools @ 
Linux Plumbers Conference 2021
-[2021-09-15]
-Will be held online, September 20-24 2021
-
-GCC 11.2 released
-[2021-07-28]
-
-
-GCC 9.4 released
-[2021-06-01]
-
-
-GCC 8.5 released
-[2021-05-14]
-
-
-GCC 11.1 released
-[2021-04-27]
-
-
-GCC 10.3 released
-[2021-04-08]
-
-
 
 
 
diff --git a/htdocs/news.html b/htdocs/news.html
index e1384852..2a8b7feb 100644
--- a/htdocs/news.html
+++ b/htdocs/news.html
@@ -17,6 +17,30 @@
 
 
 
+https://gcc.gnu.org/wiki/linuxplumbers2021";>GNU Tools @ 
Linux Plumbers Conference 2021
+[2021-09-15]
+Will be held online, September 20-24 2021
+
+GCC 11.2 released
+[2021-07-28]
+
+
+GCC 9.4 released
+[2021-06-01]
+
+
+GCC 8.5 released
+[2021-05-14]
+
+
+GCC 11.1 released
+[2021-04-27]
+
+
+GCC 10.3 released
+[2021-04-08]
+
+
 GCC 10.2 released
 [2020-07-23]
 
-- 
2.30.2



Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-23 Thread Richard Sandiford via Gcc-patches
Alexander Monakov via Gcc-patches  writes:
> On Thu, 22 Dec 2022, Jose E. Marchesi via Gcc-patches wrote:
>
>> The first instruction scheduler pass reorders instructions in the TRY
>> block in a way `b=true' gets executed before the call to the function
>> `f'.  This optimization is wrong, because `main' calls setjmp and `f'
>> is known to call longjmp.
>> 
>> As discussed in BZ 57067, the root cause for this is the fact that
>> setjmp is not properly modeled in RTL, and therefore the backend
>> passes have no normalized way to handle this situation.
>> 
>> As Alexander Monakov noted in the BZ, many RTL passes refuse to touch
>> functions that call setjmp.  This includes for example gcse,
>> store_motion and cprop.  This patch adds the sched1 pass to that list.
>> 
>> Note that the other instruction scheduling passes are still allowed to
>> run on these functions, since they reorder instructions within basic
>> blocks, and therefore they cannot cross function calls.
>> 
>> This doesn't fix the fundamental issue, but at least assures that
>> sched1 wont perform invalid transformation in correct C programs.
>
> I think scheduling across calls in the pre-RA scheduler is simply an 
> oversight,
> we do not look at dataflow information and with 50% chance risk extending
> lifetime of a pseudoregister across a call, causing higher register pressure 
> at
> the point of the call, and potentially an extra spill.
>
> Therefore I would suggest to indeed solve the root cause, with (untested):
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index 948aa0c3b..343fe2bfa 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -3688,7 +3688,13 @@ deps_analyze_insn (class deps_desc *deps, rtx_insn 
> *insn)
>
>CANT_MOVE (insn) = 1;
>
> -  if (find_reg_note (insn, REG_SETJMP, NULL))
> +  if (!reload_completed)
> +   {
> + /* Do not schedule across calls, this is prone to extending lifetime
> +of a pseudo and causing extra spill later on.  */
> + reg_pending_barrier = MOVE_BARRIER;
> +   }
> +  else if (find_reg_note (insn, REG_SETJMP, NULL))
>  {
>/* This is setjmp.  Assume that all registers, not just
>   hard registers, may be clobbered by this call.  */

+1 for trying this FWIW.  There's still plenty of time to try an
alternative solution if there are unexpected performance problems.

Richard


Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-23 Thread Jose E. Marchesi via Gcc-patches


> Alexander Monakov via Gcc-patches  writes:
>> On Thu, 22 Dec 2022, Jose E. Marchesi via Gcc-patches wrote:
>>
>>> The first instruction scheduler pass reorders instructions in the TRY
>>> block in a way `b=true' gets executed before the call to the function
>>> `f'.  This optimization is wrong, because `main' calls setjmp and `f'
>>> is known to call longjmp.
>>> 
>>> As discussed in BZ 57067, the root cause for this is the fact that
>>> setjmp is not properly modeled in RTL, and therefore the backend
>>> passes have no normalized way to handle this situation.
>>> 
>>> As Alexander Monakov noted in the BZ, many RTL passes refuse to touch
>>> functions that call setjmp.  This includes for example gcse,
>>> store_motion and cprop.  This patch adds the sched1 pass to that list.
>>> 
>>> Note that the other instruction scheduling passes are still allowed to
>>> run on these functions, since they reorder instructions within basic
>>> blocks, and therefore they cannot cross function calls.
>>> 
>>> This doesn't fix the fundamental issue, but at least assures that
>>> sched1 wont perform invalid transformation in correct C programs.
>>
>> I think scheduling across calls in the pre-RA scheduler is simply an 
>> oversight,
>> we do not look at dataflow information and with 50% chance risk extending
>> lifetime of a pseudoregister across a call, causing higher register pressure 
>> at
>> the point of the call, and potentially an extra spill.
>>
>> Therefore I would suggest to indeed solve the root cause, with (untested):
>>
>> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
>> index 948aa0c3b..343fe2bfa 100644
>> --- a/gcc/sched-deps.cc
>> +++ b/gcc/sched-deps.cc
>> @@ -3688,7 +3688,13 @@ deps_analyze_insn (class deps_desc *deps, rtx_insn 
>> *insn)
>>
>>CANT_MOVE (insn) = 1;
>>
>> -  if (find_reg_note (insn, REG_SETJMP, NULL))
>> +  if (!reload_completed)
>> +   {
>> + /* Do not schedule across calls, this is prone to extending 
>> lifetime
>> +of a pseudo and causing extra spill later on.  */
>> + reg_pending_barrier = MOVE_BARRIER;
>> +   }
>> +  else if (find_reg_note (insn, REG_SETJMP, NULL))
>>  {
>>/* This is setjmp.  Assume that all registers, not just
>>   hard registers, may be clobbered by this call.  */
>
> +1 for trying this FWIW.  There's still plenty of time to try an
> alternative solution if there are unexpected performance problems.

Let me see if Alexander's patch fixes the issue at hand (it must) and
will also do some regression testing.


Re: [PATCH] RISC-V: Support VSETVL PASS for RVV support

2022-12-23 Thread Andreas Schwab
How has this been tested?

In file included from ../../gcc/config/riscv/riscv-vsetvl.cc:89:
../../gcc/config/riscv/riscv-vsetvl.h: In member function 
'riscv_vector::avl_info riscv_vector::vl_vtype_info::get_avl_info() const':
../../gcc/config/riscv/riscv-vsetvl.h:175:43: error: implicitly-declared 
'constexpr riscv_vector::avl_info::avl_info(const riscv_vector::avl_info&)' is 
deprecated [-Werror=deprecated-copy]
  175 |   avl_info get_avl_info () const { return m_avl; }
  |   ^
../../gcc/config/riscv/riscv-vsetvl.h:131:13: note: because 
'riscv_vector::avl_info' has user-provided 'riscv_vector::avl_info& 
riscv_vector::avl_info::operator=(const riscv_vector::avl_info&)'
  131 |   avl_info &operator= (const avl_info &);
  | ^~~~
../../gcc/config/riscv/riscv-vsetvl.cc: In function 'bool 
change_insn(rtl_ssa::function_info*, rtl_ssa::insn_change, rtl_ssa::insn_info*, 
rtx)':
../../gcc/config/riscv/riscv-vsetvl.cc:823:27: error: unquoted whitespace 
character '\x0a' in format [-Werror=format-diag]
  823 |   pp_printf (&pp, "\n");
  |   ^~~~
../../gcc/config/riscv/riscv-vsetvl.cc:847:27: error: unquoted whitespace 
character '\x0a' in format [-Werror=format-diag]
  847 |   pp_printf (&pp, "\n");
  |   ^~~~
../../gcc/config/riscv/riscv-vsetvl.cc: In constructor 
'riscv_vector::vl_vtype_info::vl_vtype_info(riscv_vector::avl_info, uint8_t, 
riscv_vector::vlmul_type, uint8_t, bool, bool)':
../../gcc/config/riscv/riscv-vsetvl.cc:905:5: error: implicitly-declared 
'constexpr riscv_vector::avl_info::avl_info(const riscv_vector::avl_info&)' is 
deprecated [-Werror=deprecated-copy]
  905 |   : m_avl (avl_in), m_sew (sew_in), m_vlmul (vlmul_in), m_ratio 
(ratio_in),
  | ^~
../../gcc/config/riscv/riscv-vsetvl.cc:859:1: note: because 
'riscv_vector::avl_info' has user-provided 'riscv_vector::avl_info& 
riscv_vector::avl_info::operator=(const riscv_vector::avl_info&)'
  859 | avl_info::operator= (const avl_info &other)
  | ^~~~
../../gcc/config/riscv/riscv-vsetvl.cc: In member function 'void 
riscv_vector::vector_insn_info::dump(FILE*) const':
../../gcc/config/riscv/riscv-vsetvl.cc:1366:27: error: unquoted whitespace 
character '\x0a' in format [-Werror=format-diag]
 1366 |   pp_printf (&pp, "\n");
  |   ^~~~
cc1plus: all warnings being treated as errors
make[3]: *** [../../gcc/config/riscv/t-riscv:59: riscv-vsetvl.o] Error 1

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[PATCH v6 00/11] OpenMP: C/C++ lvalue parsing, C/C++/Fortran "declare mapper" support

2022-12-23 Thread Julian Brown
Following on from here:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608577.html

this is a complete patch series, rebased against mainline.  The final
three patches are the revised C "lvalue"-parsing patches and C and Fortran
"declare mapper" support patches mentioned in that email.  (Several of
the earlier patches are approved already, but dependent or semi-dependent
on other patches that haven't been yet.)

The last three patches have been retested (each, cumulatively) with
offloading to NVPTX.

OK?

Thanks,

Julian

Julian Brown (11):
  OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in
{c_}finish_omp_clause
  OpenMP/OpenACC: Rework clause expansion and nested struct handling
  OpenMP/OpenACC: Refine condition for when map clause expansion happens
  OpenMP: implicitly map base pointer for array-section pointer
components
  OpenMP: Pointers and member mappings
  OpenMP/OpenACC: Unordered/non-constant component offset runtime
diagnostic
  OpenMP: lvalue parsing for map/to/from clauses (C++)
  OpenMP: C++ "declare mapper" support
  OpenMP: lvalue parsing for map clauses (C)
  OpenMP: Support OpenMP 5.0 "declare mapper" directives for C
  OpenMP: Fortran "!$omp declare mapper" support

 gcc/c-family/c-common.h   |   77 +-
 gcc/c-family/c-omp.cc | 1153 +-
 gcc/c-family/c-pretty-print.cc|   12 +
 gcc/c/c-decl.cc   |  169 +
 gcc/c/c-objc-common.h |   12 +
 gcc/c/c-parser.cc |  479 ++-
 gcc/c/c-tree.h|   10 +
 gcc/c/c-typeck.cc |  865 ++---
 gcc/cp/constexpr.cc   |   10 +
 gcc/cp/cp-gimplify.cc |6 +
 gcc/cp/cp-objcp-common.h  |9 +
 gcc/cp/cp-tree.h  |   19 +-
 gcc/cp/decl.cc|   27 +-
 gcc/cp/decl2.cc   |   54 +-
 gcc/cp/error.cc   |   34 +
 gcc/cp/parser.cc  |  508 ++-
 gcc/cp/parser.h   |3 +
 gcc/cp/pt.cc  |   82 +-
 gcc/cp/semantics.cc   | 1277 ---
 gcc/cp/typeck.cc  |   50 +
 gcc/fortran/dependency.cc |  128 +
 gcc/fortran/dependency.h  |1 +
 gcc/fortran/dump-parse-tree.cc|3 +
 gcc/fortran/f95-lang.cc   |7 +
 gcc/fortran/gfortran.h|   56 +-
 gcc/fortran/match.cc  |9 +-
 gcc/fortran/match.h   |1 +
 gcc/fortran/module.cc |  252 +-
 gcc/fortran/openmp.cc |  299 +-
 gcc/fortran/parse.cc  |   15 +-
 gcc/fortran/resolve.cc|2 +
 gcc/fortran/st.cc |2 +-
 gcc/fortran/symbol.cc |   16 +
 gcc/fortran/trans-decl.cc |   30 +-
 gcc/fortran/trans-openmp.cc   |  939 -
 gcc/fortran/trans-stmt.h  |1 +
 gcc/fortran/trans.h   |3 +
 gcc/gimplify.cc   | 2314 +---
 gcc/langhooks-def.h   |   13 +
 gcc/langhooks.cc  |   35 +
 gcc/langhooks.h   |   16 +
 gcc/omp-general.cc|  425 +++
 gcc/omp-general.h |  155 +
 gcc/omp-low.cc|8 +-
 gcc/testsuite/c-c++-common/gomp/clauses-2.c   |2 +-
 .../c-c++-common/gomp/declare-mapper-12.c |   22 +
 .../c-c++-common/gomp/declare-mapper-3.c  |   30 +
 .../c-c++-common/gomp/declare-mapper-4.c  |   78 +
 .../c-c++-common/gomp/declare-mapper-5.c  |   26 +
 .../c-c++-common/gomp/declare-mapper-6.c  |   23 +
 .../c-c++-common/gomp/declare-mapper-7.c  |   29 +
 .../c-c++-common/gomp/declare-mapper-8.c  |   43 +
 .../c-c++-common/gomp/declare-mapper-9.c  |   34 +
 gcc/testsuite/c-c++-common/gomp/map-6.c   |   10 +-
 gcc/testsuite/c-c++-common/gomp/target-50.c   |2 +-
 .../c-c++-common/gomp/target-implicit-map-2.c |3 +-
 gcc/testsuite/g++.dg/gomp/array-section-1.C   |   38 +
 gcc/testsuite/g++.dg/gomp/array-section-2.C   |   63 +
 .../g++.dg/gomp/bad-array-section-1.C |   35 +
 .../g++.dg/gomp/bad-array-section-10.C|   35 +
 .../g++.dg/gomp/bad-array-section-11.C|   36 +
 .../g++.dg/gomp/bad-array-section-2.C |   33 +
 .../g++.dg/gomp/bad-array-section-3.C |   28 +
 .../g++.dg/gomp/bad-array-section-4.C |   50 +
 .../g++.dg/gomp/bad-array-section-5.C |   50 +
 .../g++.dg/gomp/bad-array-section-6.C |   24 +
 .../g++.dg/gomp/ba

[PATCH v6 01/11] OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in {c_}finish_omp_clause

2022-12-23 Thread Julian Brown
This patch trivially adds braces and reindents the
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza in
c_finish_omp_clause and finish_omp_clause, in preparation for the
following patch (to clarify the diff a little).

2022-09-13  Julian Brown  

gcc/c/
* c-typeck.cc (c_finish_omp_clauses): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.

gcc/cp/
* semantics.cc (finish_omp_clause): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.
---
 gcc/c/c-typeck.cc   | 615 +-
 gcc/cp/semantics.cc | 786 ++--
 2 files changed, 706 insertions(+), 695 deletions(-)

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index e06f052eb46a..cedb4d0f8982 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -15346,321 +15346,326 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
case OMP_CLAUSE_TO:
case OMP_CLAUSE_FROM:
case OMP_CLAUSE__CACHE_:
- t = OMP_CLAUSE_DECL (c);
- if (TREE_CODE (t) == TREE_LIST)
-   {
- grp_start_p = pc;
- grp_sentinel = OMP_CLAUSE_CHAIN (c);
+ {
+   t = OMP_CLAUSE_DECL (c);
+   if (TREE_CODE (t) == TREE_LIST)
+ {
+   grp_start_p = pc;
+   grp_sentinel = OMP_CLAUSE_CHAIN (c);
 
- if (handle_omp_array_sections (c, ort))
-   remove = true;
- else
-   {
- t = OMP_CLAUSE_DECL (c);
- if (!omp_mappable_type (TREE_TYPE (t)))
-   {
- error_at (OMP_CLAUSE_LOCATION (c),
-   "array section does not have mappable type "
-   "in %qs clause",
-   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
- remove = true;
-   }
- else if (TYPE_ATOMIC (TREE_TYPE (t)))
-   {
- error_at (OMP_CLAUSE_LOCATION (c),
-   "%<_Atomic%> %qE in %qs clause", t,
-   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
- remove = true;
-   }
- while (TREE_CODE (t) == ARRAY_REF)
-   t = TREE_OPERAND (t, 0);
- if (TREE_CODE (t) == COMPONENT_REF
- && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
-   {
- do
-   {
- t = TREE_OPERAND (t, 0);
- if (TREE_CODE (t) == MEM_REF
- || TREE_CODE (t) == INDIRECT_REF)
-   {
- t = TREE_OPERAND (t, 0);
- STRIP_NOPS (t);
- if (TREE_CODE (t) == POINTER_PLUS_EXPR)
-   t = TREE_OPERAND (t, 0);
-   }
-   }
- while (TREE_CODE (t) == COMPONENT_REF
-|| TREE_CODE (t) == ARRAY_REF);
-
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
- && OMP_CLAUSE_MAP_IMPLICIT (c)
- && (bitmap_bit_p (&map_head, DECL_UID (t))
- || bitmap_bit_p (&map_field_head, DECL_UID (t))
- || bitmap_bit_p (&map_firstprivate_head,
-  DECL_UID (t
-   {
- remove = true;
- break;
-   }
- if (bitmap_bit_p (&map_field_head, DECL_UID (t)))
-   break;
- if (bitmap_bit_p (&map_head, DECL_UID (t)))
-   {
- if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP)
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in motion "
- "clauses", t);
- else if (ort == C_ORT_ACC)
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in data "
- "clauses", t);
- else
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in map "
- "clauses", t);
- remove = true;
-   }
- else
-   {
- bitmap_set_bit (&map_head, DECL_UID (t));
- bitmap_set_bit (&map_field_head, DECL_UID (t));
-   }
-   }
-

[PATCH v6 06/11] OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic

2022-12-23 Thread Julian Brown
This patch adds support for non-constant component offsets in "map"
clauses for OpenMP (and the equivalants for OpenACC), which are not able
to be sorted into order at compile time.  Normally struct accesses in
such clauses are gathered together and sorted into increasing address
order after a "GOMP_MAP_STRUCT" node: if we have variable indices,
that is no longer possible.

This version of the patch scales back the previously-posted version to
merely add a diagnostic for incorrect usage of component accesses with
variably-indexed arrays of structs: the only permitted variant is where
we have multiple indices that are the same, but we could not prove so
at compile time.  Rather than silently producing the wrong result for
cases where the indices are in fact different, we error out (e.g.,
"map(dtarr(i)%arrptr, dtarr(j)%arrptr(4:8))", for different i/j).

For now, multiple *constant* array indices are still supported (see
map-arrayofstruct-1.c).  That could perhaps be addressed with a follow-up
patch, if necessary.

2022-10-18  Julian Brown  

gcc/
* gimplify.cc (extract_base_bit_offset): Add VARIABLE_OFFSET parameter.
(omp_get_attachment, omp_group_last, omp_group_base,
omp_directive_maps_explicitly): Add GOMP_MAP_STRUCT_UNORD support.
(omp_accumulate_sibling_list): Update calls to extract_base_bit_offset.
Support GOMP_MAP_STRUCT_UNORD.
(omp_build_struct_sibling_lists, gimplify_scan_omp_clauses,
gimplify_adjust_omp_clauses, gimplify_omp_target_update): Add
GOMP_MAP_STRUCT_UNORD support.
* omp-low.cc (lower_omp_target): Add GOMP_MAP_STRUCT_UNORD support.
* tree-pretty-print.cc (dump_omp_clause): Likewise.

include/
* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_STRUCT_UNORD.

libgomp/
* oacc-mem.c (find_group_last, goacc_enter_data_internal,
goacc_exit_data_internal, GOACC_enter_exit_data): Add
GOMP_MAP_STRUCT_UNORD support.
* target.c (gomp_map_vars_internal): Add GOMP_MAP_STRUCT_UNORD support.
Detect incorrect use of variable indexing of arrays of structs.
(GOMP_target_enter_exit_data, gomp_target_task_fn): Add
GOMP_MAP_STRUCT_UNORD support.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c: New test.
* testsuite/libgomp.fortran/map-subarray-5.f90: New test.
---
 gcc/gimplify.cc   | 110 ++
 gcc/omp-low.cc|   1 +
 gcc/tree-pretty-print.cc  |   3 +
 include/gomp-constants.h  |   6 +
 libgomp/oacc-mem.c|   6 +-
 libgomp/target.c  |  60 +-
 .../map-arrayofstruct-1.c |  38 ++
 .../map-arrayofstruct-2.c |  58 +
 .../map-arrayofstruct-3.c |  68 +++
 .../libgomp.fortran/map-subarray-5.f90|  54 +
 10 files changed, 377 insertions(+), 27 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/map-subarray-5.f90

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 1c42f25a317c..d086ab8e4455 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -8913,7 +8913,8 @@ build_omp_struct_comp_nodes (enum tree_code code, tree 
grp_start, tree grp_end,
 
 static tree
 extract_base_bit_offset (tree base, poly_int64 *bitposp,
-poly_offset_int *poffsetp)
+poly_offset_int *poffsetp,
+bool *variable_offset)
 {
   tree offset;
   poly_int64 bitsize, bitpos;
@@ -8931,10 +8932,13 @@ extract_base_bit_offset (tree base, poly_int64 *bitposp,
   if (offset && poly_int_tree_p (offset))
 {
   poffset = wi::to_poly_offset (offset);
-  offset = NULL_TREE;
+  *variable_offset = false;
 }
   else
-poffset = 0;
+{
+  poffset = 0;
+  *variable_offset = (offset != NULL_TREE);
+}
 
   if (maybe_ne (bitpos, 0))
 poffset += bits_to_bytes_round_down (bitpos);
@@ -9090,6 +9094,7 @@ omp_get_attachment (omp_mapping_group *grp)
   return error_mark_node;
 
 case GOMP_MAP_STRUCT:
+case GOMP_MAP_STRUCT_UNORD:
 case GOMP_MAP_FORCE_DEVICEPTR:
 case GOMP_MAP_DEVICE_RESIDENT:
 case GOMP_MAP_LINK:
@@ -9175,6 +9180,7 @@ omp_group_last (tree *start_p)
   break;
 
 case GOMP_MAP_STRUCT:
+case GOMP_MAP_STRUCT_UNORD:
   {
unsigned HOST_WIDE_INT num_mappings
  = tree_to_uhwi (OMP_CLAUSE_SIZE (c));
@@ -9334,6 +9340,7 @@ omp_group_base (omp_mapping_group *grp, unsigned int 

[PATCH v6 04/11] OpenMP: implicitly map base pointer for array-section pointer components

2022-12-23 Thread Julian Brown
Following from discussion in:

  https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html

and:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608100.html

and also upstream OpenMP issue 342, this patch changes mapping for array
sections of pointer components on compute regions like this:

  #pragma omp target map(s.ptr[0:10])
  {
...use of 's'...
  }

so the base pointer 's.ptr' is implicitly mapped, and thus pointer
attachment happens.  This is subtly different in the "enter data"
case, e.g:

  #pragma omp target enter data map(s.ptr[0:10])

if 's.ptr' (or the whole of 's') is not present on the target before
the directive is executed, the array section is copied to the target
but pointer attachment does *not* take place, since 's' (or 's.ptr')
is not mapped implicitly for "enter data".

To get a pointer attachment with "enter data", you can do, e.g:

  #pragma omp target enter data map(s.ptr, s.ptr[0:10])

  #pragma omp target
  {
...implicit use of 's'...
  }

That is, once the attachment has happened, implicit mapping of 's'
and uses of 's.ptr[...]' work correctly in the target region.

ChangeLog

2022-12-12  Julian Brown  

gcc/
* gimplify.cc (omp_accumulate_sibling_list): Don't require
explicitly-mapped base pointer for compute regions.

gcc/testsuite/
* c-c++-comon/gomp/target-implicit-map-2.c: Update expected scan output.

libgomp/
* testsuite/libgomp.c-c++-common/target-implicit-map-2.c: Fix missing
"free".
* testsuite/libgomp.c-c++-common/target-implicit-map-3.c: New test.
* testsuite/libgomp.c-c++-common/target-map-zlas-1.c: New test.
* testsuite/libgomp.c/target-22.c: Remove explicit base pointer
mappings.
---
 gcc/gimplify.cc   |  9 ++--
 .../c-c++-common/gomp/target-implicit-map-2.c |  3 +-
 .../target-implicit-map-2.c   |  2 +
 .../target-implicit-map-3.c   | 50 +++
 .../libgomp.c-c++-common/target-map-zlas-1.c  | 36 +
 libgomp/testsuite/libgomp.c/target-22.c   |  3 +-
 6 files changed, 97 insertions(+), 6 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-implicit-map-3.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-map-zlas-1.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index f4eb092f8771..9bad071bae21 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -10617,6 +10617,7 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
   poly_int64 cbitpos;
   tree ocd = OMP_CLAUSE_DECL (grp_end);
   bool openmp = !(region_type & ORT_ACC);
+  bool target = (region_type & ORT_TARGET) != 0;
   tree *continue_at = NULL;
 
   while (TREE_CODE (ocd) == ARRAY_REF)
@@ -10721,9 +10722,9 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
}
 
  /* For OpenMP semantics, we don't want to implicitly allocate
-space for the pointer here.  A FRAGILE_P node is only being
-created so that omp-low.cc is able to rewrite the struct
-properly.
+space for the pointer here for non-compute regions (e.g. "enter
+data").  A FRAGILE_P node is only being created so that
+omp-low.cc is able to rewrite the struct properly.
 For references (to pointers), we want to actually allocate the
 space for the reference itself in the sorted list following the
 struct node.
@@ -10731,6 +10732,7 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
 mapping of the attachment point, but not otherwise.  */
  if (*fragile_p
  || (openmp
+ && !target
  && attach_detach
  && TREE_CODE (TREE_TYPE (ocd)) == POINTER_TYPE
  && !OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED (grp_end)))
@@ -11043,6 +11045,7 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
 
  if (*fragile_p
  || (openmp
+ && !target
  && attach_detach
  && TREE_CODE (TREE_TYPE (ocd)) == POINTER_TYPE
  && !OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED (grp_end)))
diff --git a/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c 
b/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c
index 5ba1d7efe08d..72df5b1e 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c
@@ -49,4 +49,5 @@ main (void)
 
 /* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(tofrom:a 
\[len: [0-9]+\]\[implicit\]\)} "gimple" } } */
 
-/* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(struct:a 
\[len: 1\]\) map\(alloc:a\.ptr \[len: 0\]\) map\(tofrom:\*_[0-9]+ \[len: 
[0-9]+\]\) map\(attach:a\.ptr \[bias: 0\]\)} "gimple" } } */
+/* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(struct:a 
\[len:

[PATCH v6 03/11] OpenMP/OpenACC: Refine condition for when map clause expansion happens

2022-12-23 Thread Julian Brown
This patch fixes some cases for OpenACC and OpenMP where map clauses were
being expanded (adding firstprivate_pointer, attach/detach nodes, and so
forth) unnecessarily, after the "OpenMP/OpenACC: Rework clause expansion
and nested struct handling" patch (approved but not yet committed):

  https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603792.html

This is done by introducing a C_ORT_ACC_TARGET region type for OpenACC
compute regions to help distinguish them from non-compute regions that
need different handling, and by passing the region type through to the
clause expansion functions.

The patch also fixes clause expansion for OpenMP TO/FROM clauses, which
need to dereference references but not have any additional mapping nodes.

(These cases showed up due to the gimplification changes in the C++
"declare mapper" patch, but logically belong next to the earlier patch
named above.)

2022-11-30  Julian Brown  

gcc/
* c-family/c-common.h (c_omp_region_type): Add C_ORT_ACC_TARGET.
(c_omp_address_inspector): Pass c_omp_region_type instead of "target"
bool.
* c-family/c-omp.cc (c_omp_address_inspector::expand_array_base):
Adjust clause expansion for OpenACC and non-map (OpenMP to/from)
clauses.
(c_omp_address_inspector::expand_component_selector): Use
c_omp_region_type parameter.  Don't expand OpenMP to/from clauses.
(c_omp_address_inspector::expand_map_clause): Take ORT parameter, pass
to expand_array_base, etc.

gcc/c/
* c-parser.cc (c_parser_oacc_all_clauses): Add TARGET parameter. Use
to select region type for c_finish_omp_clauses call.
(c_parser_oacc_loop): Update calls to c_parser_oacc_all_clauses.
(c_parser_oacc_compute): Likewise.
* c-typeck.cc (handle_omp_array_sctions_1): Update for C_ORT_ACC_TARGET
addition and ai.expand_map_clause signature change.
(c_finish_omp_clauses): Likewise.

gcc/cp/
* parser.cc (cp_parser_oacc_all_clauses): Add TARGET parameter. Use
to select region type for finish_omp_clauses call.
(cp_parser_oacc_declare): Update call to cp_parser_oacc_all_clauses.
(cp_parser_oacc_loop): Update calls to cp_parser_oacc_all_clauses.
(cp_parser_oacc_compute): Likewise.
* pt.cc (tsubst_expr): Use C_ORT_ACC_TARGET for call to
tsubst_omp_clauses for compute regions.
* semantics.cc (handle_omp_array_sections_1): Update for
C_ORT_ACC_TARGET addition and ai.expand_map_clause signature change.
(finish_omp_clauses): Likewise.
---
 gcc/c-family/c-common.h | 10 +++--
 gcc/c-family/c-omp.cc   | 90 -
 gcc/c/c-parser.cc   | 15 ---
 gcc/c/c-typeck.cc   | 39 --
 gcc/cp/parser.cc| 15 ---
 gcc/cp/pt.cc|  4 +-
 gcc/cp/semantics.cc | 47 ++---
 7 files changed, 144 insertions(+), 76 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index d935d4b3d7d9..06674e769bd4 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1245,7 +1245,8 @@ enum c_omp_region_type
   C_ORT_DECLARE_SIMD   = 1 << 2,
   C_ORT_TARGET = 1 << 3,
   C_ORT_OMP_DECLARE_SIMD   = C_ORT_OMP | C_ORT_DECLARE_SIMD,
-  C_ORT_OMP_TARGET = C_ORT_OMP | C_ORT_TARGET
+  C_ORT_OMP_TARGET = C_ORT_OMP | C_ORT_TARGET,
+  C_ORT_ACC_TARGET = C_ORT_ACC | C_ORT_TARGET
 };
 
 extern tree c_finish_omp_master (location_t, tree);
@@ -1345,10 +1346,11 @@ public:
   bool maybe_zero_length_array_section (tree);
 
   tree expand_array_base (tree, vec &, tree, unsigned *,
- bool, bool);
+ c_omp_region_type, bool);
   tree expand_component_selector (tree, vec &, tree,
- unsigned *, bool);
-  tree expand_map_clause (tree, tree, vec &, bool);
+ unsigned *, c_omp_region_type);
+  tree expand_map_clause (tree, tree, vec &,
+ c_omp_region_type);
 };
 
 enum c_omp_directive_kind {
diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index d32c2a977304..74c01d8f2a52 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -3370,7 +3370,8 @@ tree
 c_omp_address_inspector::expand_array_base (tree c,
vec &addr_tokens,
tree expr, unsigned *idx,
-   bool target, bool decl_p)
+   c_omp_region_type ort,
+   bool decl_p)
 {
   using namespace omp_addr_tokenizer;
   location_t loc = OMP_CLAUSE_LOCATION (c);
@@ -3380,14 +3381,26 @@ c_omp_address_inspector::expand_array_base (tree c,
   && is_global_var (decl)
   && lookup_attribute ("omp declare target",
  

[PATCH v6 11/11] OpenMP: Fortran "!$omp declare mapper" support

2022-12-23 Thread Julian Brown
This patch implements "omp declare mapper" functionality for Fortran,
following the equivalent support for C and C++.

Fortran differs quite substantially from C and C++ in that "map"
clauses are naturally represented in the gfortran front-end's own
representation rather than as trees. Those are turned into one -- or
several -- OMP_CLAUSE_MAP nodes in gfc_trans_omp_clauses.

The "several nodes" case is problematic for mappers, for a few different
reasons:

 - Firstly, if we're invoking a nested mapper, we need some way of
   keeping those nodes together so they can be replaced "as one" by the
   clauses listed in that mapper. (For C and C++, a single OMP_CLAUSE_MAP
   node is used to represent a map clause early in compilation, which
   is then expanded in c_finish_omp_clauses for C, and similar for C++.
   We process mappers before that function is called.)

 - Secondly, the process of translating FE representation of clauses
   into "tree" mapping nodes can generate preamble code, and we need to
   either defer that generation or else put the preamble code somewhere
   if we're defining a mapper.

 - Thirdly, gfc_trans_omp_clauses needs to examine both the FE
   representation and partially-translated tree codes.  In the case
   where we're instantiating mappers implicitly from the middle end,
   the FE representation is long gone.

The scheme used is as follows.

For the first problem, we introduce a GOMP_MAP_MAPPING_GROUP mapping
kind.  This is used to keep several mapping nodes together in mapper
definitions until instantiation time.  If the group triggers a nested
mapper, the required information can be extracted from it and then it
can be deleted/replaced as a whole.

For the second and third problems, we emit preamble code into a function
wrapping the "omp declare mapper" node.  This extends the scheme currently
under review for C++, and performs inlining of a modified version of the
function whenever a mapper is invoked from the middle-end.  New copies
of variables (e.g. temporary array descriptors or other metadata) are
introduced to copy needed values out of the inlined function to where
they're needed in the mapper instantiation.

For Fortran, we also need to add special-case handling for mapping
derived-type variables that are (a) pointers and (b) trigger a mapper,
in both the explicit mapping and implicit mapping cases.  If we have a
type and a mapper like this:

  type T
  integer, dimension(10) :: iarr
  end type T

  type(T), pointer :: tptr

  !$omp declare mapper (T :: t) map(t%iarr)

  !$omp target map(tptr)
  [...]
  !$omp end target

Here "map(tptr)" maps the pointer itself, and implicitly maps the
pointed-to object as well.  So, when invoking the mapper, rather than
rewriting this as just:

  !$omp target map(tptr%iarr)

we must introduce a new node to map the pointer also, i.e.:

  !$omp target map(alloc:tptr) map(tptr%iarr)

...before the mapping nodes go off to gimplify for processing.

We also need to handle module writing and reading for "declare mappers".
This requires an ABI bump that I noticed one of Tobias's patches also
does, so we'll probably need to synchronize on that somehow.

This version of the patch is rebased wrt. current-ish mainline and
refactorings that are now done higher up this patch series.

2022-12-23  Julian Brown  

gcc/fortran/
* dump-parse-tree.cc (show_attr): Show omp_udm_artificial_var flag.
(show_omp_namelist): Support OMP_MAP_UNSET.
* f95-lang.cc (LANG_HOOKS_OMP_FINISH_MAPPER_CLAUSES,
LANG_HOOKS_OMP_EXTRACT_MAPPER_DIRECTIVE,
LANG_HOOKS_OMP_MAP_ARRAY_SECTION): Define language hooks.
* gfortran.h (gfc_statement): Add ST_OMP_DECLARE_MAPPER.
(symbol_attribute): Add omp_udm_artificial_var attribute.
(gfc_omp_map_op): Add OMP_MAP_UNSET.
(gfc_omp_namelist): Add udm pointer to u2 union.
(gfc_omp_udm): New struct.
(gfc_omp_namelist_udm): New struct.
(gfc_symtree): Add omp_udm pointer.
(gfc_namespace): Add omp_udm_root symtree. Add omp_udm_ns flag.
(gfc_free_omp_namelist): Update prototype.
(gfc_free_omp_udm, gfc_omp_udm_find, gfc_find_omp_udm,
gfc_resolve_omp_udms): Add prototypes.
* match.cc (gfc_free_omp_namelist): Change FREE_NS and FREE_ALIGN
parameters to LIST number, to handle freeing user-defined mapper
namelists safely.
* match.h (gfc_match_omp_declare_mapper): Add prototype.
* module.cc (MOD_VERSION): Bump to 16.
(ab_attribute): Add AB_OMP_DECLARE_MAPPER_VAR.
(attr_bits): Add OMP_DECLARE_MAPPER_VAR.
(mio_symbol_attribute): Read/write AB_OMP_DECLARE_MAPPER_VAR attribute.
Set referenced attr on read.
(omp_map_clause_ops, omp_map_cardinality): New arrays.
(load_omp_udms, check_omp_declare_mappers): New functions.
(read_module): Load and check OMP declare mappers.
(write_omp_udm, write_omp_udms): New functions.
(write_

[PATCH v6 05/11] OpenMP: Pointers and member mappings

2022-12-23 Thread Julian Brown
This patch changes the mapping node arrangement used for array components
of derived types, e.g.:

  type T
  integer, pointer, dimension(:) :: arrptr
  end type T

  type(T) :: tvar
  [...]
  !$omp target map(tofrom: tvar%arrptr)

This will currently be mapped using three mapping nodes:

  GOMP_MAP_TO tvar%arrptr   (the descriptor)
  GOMP_MAP_TOFROM *tvar%arrptr%data (the actual array data)
  GOMP_MAP_ALWAYS_POINTER tvar%arrptr%data  (a pointer to the array data)

This follows OMP 5.0, 2.19.7.1 (or OpenMP 5.2, 5.8.3) "map Clause":

  "If a list item in a map clause is an associated pointer and the
   pointer is not the base pointer of another list item in a map clause
   on the same construct, then it is treated as if its pointer target
   is implicitly mapped in the same clause. For the purposes of the map
   clause, the mapped pointer target is treated as if its base pointer
   is the associated pointer."

However, we can also write this:

  map(to: tvar%arrptr) map(tofrom: tvar%arrptr(3:8))

and then instead we should follow (OpenMP 5.2, 5.8.3 "map Clause"):

  "For map clauses on map-entering constructs, if any list item has a base
   pointer for which a corresponding pointer exists in the data environment
   upon entry to the region and either a new list item or the corresponding
   pointer is created in the device data environment on entry to the region,
   then:
   1. [Fortran] The corresponding pointer variable is associated with
  a pointer target that has the same rank and bounds as the pointer
  target of the original pointer, such that the corresponding list item
  can be accessed through the pointer in a target region.
   2. The corresponding pointer variable becomes an attached pointer
  for the corresponding list item."

With this patch you can write the above mappings, and the mapping nodes
used to map pointers to array sections (with descriptors) now look
like this:

  1) map(to: tvar%arrptr)   -->
  GOMP_MAP_TO [implicit]  *tvar%arrptr%data  (the array data)
  GOMP_MAP_TO_PSETtvar%arrptr(the descriptor)
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data

  2) map(tofrom: tvar%arrptr(3:8)   -->
  GOMP_MAP_TOFROM *tvar%arrptr%data(3)  (size 8-3+1, etc.)
  GOMP_MAP_TO_PSETtvar%arrptr
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data  (bias 3, etc.)

In this case, we can determine in the front-end that the
whole-array/pointer mapping (1) is only needed to map the pointer --
so we drop it entirely.  (Note also that we set -- early -- the
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P flag for whole-array-via-pointer
mappings. See below.)

In the middle end, we process mappings using the struct sibling-list
handling machinery by moving the "GOMP_MAP_TO_PSET" node from the middle
of the group of three mapping nodes to the proper sorted position after
the GOMP_MAP_STRUCT mapping:

  GOMP_MAP_STRUCT   tvar (len: 1)
  GOMP_MAP_TO_PSET  tvar%arr (size: 64, etc.)  <--. moved here
  [...]   |
  GOMP_MAP_TOFROM *tvar%arrptr%data(3) ___|
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data

In another case, if we have an array of derived-type values "dtarr",
and mappings like:

  i = 1
  j = 1
  map(to: dtarr(i)%arrptr) map(tofrom: dtarr(j)%arrptr(3:8))

We still map the same way, but this time we cannot prove that the base
expressions "dtarr(i) and "dtarr(j)" are the same in the front-end.
So we keep both mappings, but we move the "[implicit]" mapping of the
full-array reference to the end of the clause list in gimplify.cc (by
adjusting the topological sorting algorithm):

  GOMP_MAP_STRUCT dtvar  (len: 2)
  GOMP_MAP_TO_PSETdtvar(i)%arrptr
  GOMP_MAP_TO_PSETdtvar(j)%arrptr
  [...]
  GOMP_MAP_TOFROM *dtvar(j)%arrptr%data(3)  (size: 8-3+1)
  GOMP_MAP_ATTACH_DETACH  dtvar(j)%arrptr%data
  GOMP_MAP_TO [implicit]  *dtvar(i)%arrptr%data(1)  (size: whole array)
  GOMP_MAP_ATTACH_DETACH  dtvar(i)%arrptr%data

Always moving "[implicit]" full-array mappings after array-section
mappings (without that bit set) means that we'll avoid copying the whole
array unnecessarily -- even in cases where we can't prove that the arrays
are the same.

This version of the patch fixes some bugs with "enter data" and "exit
data" directives with this new mapping arrangement.  Also now if you
have mappings like this:

  #pragma omp target enter data map(to: dv, dv%arr(1:20))

The whole of the derived-type variable "dv" is mapped, so the
GOMP_MAP_TO_PSET for the array-section mapping can be dropped:

  GOMP_MAP_TOdv

  GOMP_MAP_TO*dv%arr%data
  GOMP_MAP_TO_PSET   dv%arr <-- deleted (array section mapping)
  GOMP_MAP_ATTACH_DETACH dv%arr%data

For struct components, the GOMP_MAP_TO_PSET mapping is turned into
GOMP_MAP_RELEASE at gimplify time for "exit data" directives.

2022-12-15  Julian Brown  

gcc/fortran/
* dependency.cc (gfc_omp_expr_prefix_same): New function.
* 

[PATCH v6 09/11] OpenMP: lvalue parsing for map clauses (C)

2022-12-23 Thread Julian Brown
This patch adds support for parsing general lvalues for OpenMP "map", "to"
and "from" clauses to the C front-end, similar to the previously-posted
patch for C++.

This version of the patch incorporates the patch to change uses of
TREE_LIST to the new OMP_ARRAY_SECTION tree code to represent OpenMP
array sections, and rejects array sections in certain expressions where
they make no sense (see new tests).

2022-12-22  Julian Brown  

gcc/c/
* c-pretty-print.cc (c_pretty_printer::postfix_expression,
c_pretty_printer::expression): Add OMP_ARRAY_SECTION support.
* c-parser.cc (c_parser_braced_init, c_parser_conditional_expression):
Don't allow OpenMP array section.
(c_parser_postfix_expression): Don't allow array section in statement
expression.
(c_parser_postfix_expression_after_primary): Add support
for OpenMP array section parsing.
(c_parser_expr_list): Don't allow OpenMP array section here.
(c_parser_omp_variable_list): Change ALLOW_DEREF parameter to
MAP_LVALUE.  Support parsing of general lvalues in "map", "to" and
"from" clauses.
(c_parser_omp_var_list_parens): Change ALLOW_DEREF parameter to
MAP_LVALUE.  Update call to c_parser_omp_variable_list.
(c_parser_oacc_data_clause, c_parser_omp_clause_to,
c_parser_omp_clause_from): Update calls to
c_parser_omp_var_list_parens.
* c-tree.h (c_omp_array_section_p): Add extern declaration.
(build_omp_array_section): Add prototype.
* c-typeck.c (c_omp_array_section_p): Add flag.
(mark_exp_read): Support OMP_ARRAY_SECTION.
(build_omp_array_section): Add function.
(build_external_ref): Tweak error path for OpenMP array sections.
(handle_omp_array_sections_1): Use OMP_ARRAY_SECTION tree code instead
of TREE_LIST.  Handle more kinds of expressions.
(c_finish_omp_clauses): Use OMP_ARRAY_SECTION instead of TREE_LIST.
Check for supported expression types.

gcc/testsuite/
* gcc.dg/gomp/bad-array-section-c-1.c: New test.
* gcc.dg/gomp/bad-array-section-c-2.c: New test.
* gcc.dg/gomp/bad-array-section-c-3.c: New test.
* gcc.dg/gomp/bad-array-section-c-4.c: New test.
* gcc.dg/gomp/bad-array-section-c-5.c: New test.
* gcc.dg/gomp/bad-array-section-c-6.c: New test.
* gcc.dg/gomp/bad-array-section-c-7.c: New test.
* gcc.dg/gomp/bad-array-section-c-8.c: New test.

libgomp/
* testsuite/libgomp.c-c++-common/ind-base-4.c: New test.
* testsuite/libgomp.c-c++-common/unary-ptr-1.c: New test.
---
 gcc/c-family/c-pretty-print.cc|  12 ++
 gcc/c/c-parser.cc | 187 +++---
 gcc/c/c-tree.h|   2 +
 gcc/c/c-typeck.cc | 109 --
 .../gcc.dg/gomp/bad-array-section-c-1.c   |  16 ++
 .../gcc.dg/gomp/bad-array-section-c-2.c   |  13 ++
 .../gcc.dg/gomp/bad-array-section-c-3.c   |  24 +++
 .../gcc.dg/gomp/bad-array-section-c-4.c   |  26 +++
 .../gcc.dg/gomp/bad-array-section-c-5.c   |  15 ++
 .../gcc.dg/gomp/bad-array-section-c-6.c   |  16 ++
 .../gcc.dg/gomp/bad-array-section-c-7.c   |  26 +++
 .../gcc.dg/gomp/bad-array-section-c-8.c   |  21 ++
 .../libgomp.c-c++-common/ind-base-4.c |  50 +
 .../libgomp.c-c++-common/unary-ptr-1.c|  16 ++
 14 files changed, 486 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-1.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-2.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-4.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-5.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-6.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-7.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-8.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/ind-base-4.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/unary-ptr-1.c

diff --git a/gcc/c-family/c-pretty-print.cc b/gcc/c-family/c-pretty-print.cc
index c99b2ceffe65..d9954bd2b951 100644
--- a/gcc/c-family/c-pretty-print.cc
+++ b/gcc/c-family/c-pretty-print.cc
@@ -1615,6 +1615,17 @@ c_pretty_printer::postfix_expression (tree e)
   pp_c_right_bracket (this);
   break;
 
+case OMP_ARRAY_SECTION:
+  postfix_expression (TREE_OPERAND (e, 0));
+  pp_c_left_bracket (this);
+  if (TREE_OPERAND (e, 1))
+   expression (TREE_OPERAND (e, 1));
+  pp_colon (this);
+  if (TREE_OPERAND (e, 2))
+   expression (TREE_OPERAND (e, 2));
+  pp_c_right_bracket (this);
+  break;
+
 case CALL_EXPR:
   {
call_expr_arg_iterator iter;
@@ -2664,6 +2675,7 @@ c_pretty_printer::expression (tree e)
   

[PATCH v6 08/11] OpenMP: C++ "declare mapper" support

2022-12-23 Thread Julian Brown
This is a new version of the patch to support OpenMP 5.0 "declare mapper"
functionality for C++.  As with the previously-posted version, arrays
of structs whose elements would be mapped via a user-defined mapper
remain unsupported.

This version of the patch uses a magic VAR_DECL instead of a magic
FUNCTION_DECL for representing mappers, which simplifies parsing
somewhat, and slightly reduces the number of places that need special-case
handling in the FE.  We use the DECL_INITIAL of the VAR_DECL to hold the
OMP_DECLARE_MAPPER definition.  To make types agree, we use the type of
the object to be mapped for both the var decl and the OMP_DECLARE_MAPPER
node itself.  Hence the OMP_DECLARE_MAPPER looks like a magic constant
of struct type in this particular case.

The magic var decl can go in all the places that the "declare mapper"
function decl went previously: at the top level of the program,
within a class definition (including template classes), and within a
function definition (including template functions).  In the class case
we conceptually use the C++-17-ism of definining the var decl "inline
static", equivalent to e.g.:

   [template ...]
   class bla {
 static inline omp declare mapper ... = #define omp declare mapper ..."
   };

(though of course we don't restrict the "declare mapper"-in-class syntax
to C++-17.)

The new representation necessitates some changes to template instantiation
-- declare mappers may trigger implicitly, so we must make sure they
are instantiated before they are needed (see changes to mark_used, etc.).

I've rearranged the processing done by the gimplify_scan_omp_clauses and
gimplify_adjust_omp_clauses functions so the order of the phases can
remain intact in the presence of declared mappers.  To do this, most
gimplification of clauses in gimplify_scan_omp_clauses has been moved
to gimplify_adjust_omp_clauses.  This allows e.g. struct sibling-list
handling and topological clause sorting to work with the non-gimplified
form of clauses in the latter function -- including those that arise
from mapper expansion.  This seems to work well now.

Relative to the last-posted version, this patch brings forward various
refactoring that was previously done by the C and Fortran "declare mapper"
support patches -- aiming to reduce churn.  E.g. nested mapper finding
and mapper instantiation has been moved to c-family/c-omp.cc so it can
be shared between C and C++, and omp_name_type in omp-general.h (used
as the key to hash mapper definitions) is already templatized ready for
Fortran support.

This patch does not synthesize default mappers that map each of a struct's
elements individually: whole-struct mappings are still done by copying
the block of memory containing the struct.  That works fine apart from
cases where a struct has a member that is a reference (to a pointer).
We could fix that by synthesizing a mapper for such cases (only), but
that hasn't been attempted yet.  (I think that means Jakub's concerns
about blow-up of element mappings won't be a problem until that's done.)

New tests added in {gcc,libgomp}/c-c++-common have been restricted to
C++ for now.

2022-11-30  Julian Brown  

gcc/c-family/
* c-common.h (omp_mapper_list): Add forward declaration.
(c_omp_find_nested_mappers, c_omp_instantiate_mappers): Add prototypes.
* c-omp.cc (c_omp_find_nested_mappers): New function.
(remap_mapper_decl_info): New struct.
(remap_mapper_decl_1, omp_instantiate_mapper,
c_omp_instantiate_mappers): New functions.

gcc/cp/
* constexpr.cc (reduced_constant_expression_p): Add OMP_DECLARE_MAPPER
case.
(cxx_eval_constant_expression, potential_constant_expression_1):
Likewise.
* cp-gimplify.cc (cxx_omp_finish_mapper_clauses): New function.
* cp-objcp-common.h (LANG_HOOKS_OMP_FINISH_MAPPER_CLAUSES,
LANG_HOOKS_OMP_MAPPER_LOOKUP, LANG_HOOKS_OMP_EXTRACT_MAPPER_DIRECTIVE,
LANG_HOOKS_OMP_MAP_ARRAY_SECTION): Define langhooks.
* cp-tree.h (lang_decl_base): Add omp_declare_mapper_p field.  Recount
spare bits comment.
(DECL_OMP_DECLARE_MAPPER_P): New macro.
(omp_mapper_id, cp_check_omp_declare_mapper, omp_instantiate_mappers,
cxx_omp_finish_mapper_clauses, cxx_omp_mapper_lookup,
cxx_omp_extract_mapper_directive, cxx_omp_map_array_section: Add
prototypes.
* decl.cc (check_initializer): Add OpenMP declare mapper support.
(cp_finish_decl): Set DECL_INITIAL for OpenMP declare mapper var decls
as appropriate.
* decl2.cc (mark_used): Instantiate OpenMP "declare mapper" magic var
decls.
* error.cc (dump_omp_declare_mapper): New function.
(dump_simple_decl): Use above.
* parser.cc (cp_parser_omp_clause_map): Add KIND parameter.  Support
"mapper" modifier.
(cp_parser_omp_all_clauses): Add KIND argument to
cp_parser_omp_clause_map call.
(cp_parser_omp_ta

[PATCH v6 10/11] OpenMP: Support OpenMP 5.0 "declare mapper" directives for C

2022-12-23 Thread Julian Brown
This patch adds support for "declare mapper" directives (and the "mapper"
modifier on "map" clauses) for C.  As for C++, arrays of custom-mapped
objects are not supported yet.

I've taken hints from the existing C support for "declare reduction"
directives: this works a little differently from C++ for things such as
looking up user-defined reductions (or user-defined mappers, in our case).

This version of the patch removes some unnecessary function setup/teardown
code from c_parser_omp_declare_mapper, and has been rebased (hence
simplified) wrt. refactoring done higher up this patch series.

2022-12-23  Julian Brown  

gcc/c/
* c-decl.cc (c_omp_mapper_id, c_omp_mapper_decl, c_omp_mapper_lookup,
c_omp_extract_mapper_directive, c_omp_map_array_section,
c_omp_scan_mapper_bindings_r, c_omp_scan_mapper_bindings): New
functions.
* c-objc-common.h (LANG_HOOKS_OMP_FINISH_MAPPER_CLAUSES,
LANG_HOOKS_OMP_MAPPER_LOOKUP, LANG_HOOKS_OMP_EXTRACT_MAPPER_DIRECTIVE,
LANG_HOOKS_OMP_MAP_ARRAY_SECTION): Define langhooks for C.
* c-parser.cc (c_parser_omp_clause_map): Add KIND parameter.  Handle
mapper modifier.
(c_parser_omp_all_clauses): Update call to c_parser_omp_clause_map with
new kind argument.
(c_parser_omp_target): Instantiate explicit mappers and record bindings
for implicit mappers.
(c_parser_omp_declare_mapper): Parse "declare mapper" directives.
(c_parser_omp_declare): Support "declare mapper".
* c-tree.h (c_omp_finish_mapper_clauses, c_omp_mapper_lookup,
c_omp_extract_mapper_directive, c_omp_map_array_section,
c_omp_mapper_id, c_omp_mapper_decl, c_omp_scan_mapper_bindings,
c_omp_instantiate_mappers): Add prototypes.
* c-typeck.cc (c_finish_omp_clauses): Handle GOMP_MAP_PUSH_MAPPER_NAME
and GOMP_MAP_POP_MAPPER_NAME.
(c_omp_finish_mapper_clauses): New function (langhook).

gcc/testsuite/
* c-c++-common/gomp/declare-mapper-4.c: Enable for C.
* c-c++-common/gomp/declare-mapper-5.c: Likewise.
* c-c++-common/gomp/declare-mapper-6.c: Likewise.
* c-c++-common/gomp/declare-mapper-7.c: Likewise.
* c-c++-common/gomp/declare-mapper-8.c: Likewise.
* c-c++-common/gomp/declare-mapper-9.c: Likewise.
* c-c++-common/gomp/declare-mapper-12.c: Enable for C.
* gcc.dg/gomp/declare-mapper-10.c: New test.
* gcc.dg/gomp/declare-mapper-11.c: New test.

libgomp/
* testsuite/libgomp.c-c++-common/declare-mapper-9.c: Enable for C.
* testsuite/libgomp.c-c++-common/declare-mapper-10.c: Likewise.
* testsuite/libgomp.c-c++-common/declare-mapper-11.c: Likewise.
* testsuite/libgomp.c-c++-common/declare-mapper-12.c: Likewise.
* testsuite/libgomp.c-c++-common/declare-mapper-13.c: Likewise.
* testsuite/libgomp.c-c++-common/declare-mapper-14.c: Likewise.
---
 gcc/c/c-decl.cc   | 169 +++
 gcc/c/c-objc-common.h |  12 +
 gcc/c/c-parser.cc | 277 +-
 gcc/c/c-tree.h|   8 +
 gcc/c/c-typeck.cc |  15 +
 .../c-c++-common/gomp/declare-mapper-12.c |   2 +-
 .../c-c++-common/gomp/declare-mapper-4.c  |   2 +-
 .../c-c++-common/gomp/declare-mapper-5.c  |   2 +-
 .../c-c++-common/gomp/declare-mapper-6.c  |   2 +-
 .../c-c++-common/gomp/declare-mapper-7.c  |   2 +-
 .../c-c++-common/gomp/declare-mapper-8.c  |   2 +-
 .../c-c++-common/gomp/declare-mapper-9.c  |   2 +-
 gcc/testsuite/gcc.dg/gomp/declare-mapper-10.c |  61 
 gcc/testsuite/gcc.dg/gomp/declare-mapper-11.c |  33 +++
 .../libgomp.c-c++-common/declare-mapper-10.c  |   2 +-
 .../libgomp.c-c++-common/declare-mapper-11.c  |   2 +-
 .../libgomp.c-c++-common/declare-mapper-12.c  |   2 +-
 .../libgomp.c-c++-common/declare-mapper-13.c  |   2 +-
 .../libgomp.c-c++-common/declare-mapper-14.c  |   2 +-
 .../libgomp.c-c++-common/declare-mapper-9.c   |   2 +-
 20 files changed, 572 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/declare-mapper-10.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/declare-mapper-11.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index e47ca6718b3e..de5a41ee0c02 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -13084,6 +13084,175 @@ c_check_omp_declare_reduction_r (tree *tp, int *, 
void *data)
   return NULL_TREE;
 }
 
+/* Return identifier to look up for omp declare reduction.  */
+
+tree
+c_omp_mapper_id (tree mapper_id)
+{
+  const char *p = NULL;
+
+  const char prefix[] = "omp declare mapper ";
+
+  if (mapper_id == NULL_TREE)
+p = "";
+  else if (TREE_CODE (mapper_id) == IDENTIFIER_NODE)
+p = IDENTIFIER_POINTER (mapper_id);
+  else
+return error_mark_node;
+
+  size_t lenp = sizeof (prefix);
+  size_t len = strlen (p);
+  char *name = XALLOCAVEC (char, lenp 

Re: Re: [PATCH] RISC-V: Support VSETVL PASS for RVV support

2022-12-23 Thread 钟居哲
Would you mind telling me how you reproduce these errors ?
I failed to reproduce this. Thanks



juzhe.zh...@rivai.ai
 
From: Andreas Schwab
Date: 2022-12-23 18:53
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Support VSETVL PASS for RVV support
How has this been tested?
 
In file included from ../../gcc/config/riscv/riscv-vsetvl.cc:89:
../../gcc/config/riscv/riscv-vsetvl.h: In member function 
'riscv_vector::avl_info riscv_vector::vl_vtype_info::get_avl_info() const':
../../gcc/config/riscv/riscv-vsetvl.h:175:43: error: implicitly-declared 
'constexpr riscv_vector::avl_info::avl_info(const riscv_vector::avl_info&)' is 
deprecated [-Werror=deprecated-copy]
  175 |   avl_info get_avl_info () const { return m_avl; }
  |   ^
../../gcc/config/riscv/riscv-vsetvl.h:131:13: note: because 
'riscv_vector::avl_info' has user-provided 'riscv_vector::avl_info& 
riscv_vector::avl_info::operator=(const riscv_vector::avl_info&)'
  131 |   avl_info &operator= (const avl_info &);
  | ^~~~
../../gcc/config/riscv/riscv-vsetvl.cc: In function 'bool 
change_insn(rtl_ssa::function_info*, rtl_ssa::insn_change, rtl_ssa::insn_info*, 
rtx)':
../../gcc/config/riscv/riscv-vsetvl.cc:823:27: error: unquoted whitespace 
character '\x0a' in format [-Werror=format-diag]
  823 |   pp_printf (&pp, "\n");
  |   ^~~~
../../gcc/config/riscv/riscv-vsetvl.cc:847:27: error: unquoted whitespace 
character '\x0a' in format [-Werror=format-diag]
  847 |   pp_printf (&pp, "\n");
  |   ^~~~
../../gcc/config/riscv/riscv-vsetvl.cc: In constructor 
'riscv_vector::vl_vtype_info::vl_vtype_info(riscv_vector::avl_info, uint8_t, 
riscv_vector::vlmul_type, uint8_t, bool, bool)':
../../gcc/config/riscv/riscv-vsetvl.cc:905:5: error: implicitly-declared 
'constexpr riscv_vector::avl_info::avl_info(const riscv_vector::avl_info&)' is 
deprecated [-Werror=deprecated-copy]
  905 |   : m_avl (avl_in), m_sew (sew_in), m_vlmul (vlmul_in), m_ratio 
(ratio_in),
  | ^~
../../gcc/config/riscv/riscv-vsetvl.cc:859:1: note: because 
'riscv_vector::avl_info' has user-provided 'riscv_vector::avl_info& 
riscv_vector::avl_info::operator=(const riscv_vector::avl_info&)'
  859 | avl_info::operator= (const avl_info &other)
  | ^~~~
../../gcc/config/riscv/riscv-vsetvl.cc: In member function 'void 
riscv_vector::vector_insn_info::dump(FILE*) const':
../../gcc/config/riscv/riscv-vsetvl.cc:1366:27: error: unquoted whitespace 
character '\x0a' in format [-Werror=format-diag]
1366 |   pp_printf (&pp, "\n");
  |   ^~~~
cc1plus: all warnings being treated as errors
make[3]: *** [../../gcc/config/riscv/t-riscv:59: riscv-vsetvl.o] Error 1
 
-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
 


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-23 Thread Jiufu Guo via Gcc-patches
Hi,

Segher Boessenkool  writes:

> On Thu, Dec 22, 2022 at 11:28:01AM +, Richard Biener wrote:
>> On Thu, 22 Dec 2022, Jiufu Guo wrote:
>> > To reduce risk, I'm just draft straightforward patches for
>> > special cases currently, Like:
>> > https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
>> > and this patch.
>> 
>> Heh, yes - though I'm not fond of special-casing things.  RTL
>> expansion is already full of special cases :/
>
> And many of those are not useful at all (would be done by later passes),
> or are actively harmful.  Not to mention that expand is currently one of
> the most impregnable and undebuggable RTL passes.
>
> But there are also many things done during expand that although they
> should be done somewhat later, aren't actually done later at all
> currently.  So that needs fixing.
>
> Maybe things should go via an intermediate step, where all the decisions
> can be made, and then later we just have to translate the "low Gimple"
> or "RTL-Gimple" ("Rimple"?) to RTL.  A format that is looser in many
> ways than either RTL or Gimple.  A bit like Generic in that way.

Thanks for all your great comments!

BR,
Jeff (Jiufu)
>
>
> Segher


[PATCH] RISC-V: Fix ICE for avl_info deprecated copy and pp_print error.

2022-12-23 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (change_insn): Remove pp_print.
(avl_info::avl_info): Add copy function.
(vector_insn_info::dump): Remove pp_print.
* config/riscv/riscv-vsetvl.h: Add copy function.

---
 gcc/config/riscv/riscv-vsetvl.cc | 32 
 gcc/config/riscv/riscv-vsetvl.h  |  1 +
 2 files changed, 9 insertions(+), 24 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 01530c1ae75..a55b5a1c394 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -810,15 +810,6 @@ change_insn (function_info *ssa, insn_change change, 
insn_info *insn,
   fprintf (dump_file, "\nChange PATTERN of insn %d from:\n",
   INSN_UID (rinsn));
   print_rtl_single (dump_file, PATTERN (rinsn));
-  if (dump_flags & TDF_DETAILS)
-   {
- fprintf (dump_file, "RTL_SSA info:\n");
- pretty_printer pp;
- pp.buffer->stream = dump_file;
- insn->print_full (&pp);
- pp_printf (&pp, "\n");
- pp_flush (&pp);
-   }
 }
 
   insn_change_watermark watermark;
@@ -834,19 +825,16 @@ change_insn (function_info *ssa, insn_change change, 
insn_info *insn,
 {
   fprintf (dump_file, "\nto:\n");
   print_rtl_single (dump_file, PATTERN (rinsn));
-  if (dump_flags & TDF_DETAILS)
-   {
- fprintf (dump_file, "RTL_SSA info:\n");
- pretty_printer pp;
- pp.buffer->stream = dump_file;
- insn->print_full (&pp);
- pp_printf (&pp, "\n");
- pp_flush (&pp);
-   }
 }
   return true;
 }
 
+avl_info::avl_info (const avl_info &other)
+{
+  m_value = other.get_value ();
+  m_source = other.get_source ();
+}
+
 avl_info::avl_info (rtx value_in, set_info *source_in)
   : m_value (value_in), m_source (source_in)
 {}
@@ -1355,12 +1343,8 @@ vector_insn_info::dump (FILE *file) const
 {
   if (get_insn ())
{
- fprintf (file, "RTL_SSA insn_info=");
- pretty_printer pp;
- pp.buffer->stream = file;
- get_insn ()->print_full (&pp);
- pp_printf (&pp, "\n");
- pp_flush (&pp);
+ fprintf (file, "The real INSN=");
+ print_rtl_single (file, get_insn ()->rtl ());
}
   if (get_dirty_pat ())
{
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index ad9bb27cebf..6f27004fab1 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h
@@ -125,6 +125,7 @@ private:
 
 public:
   avl_info () : m_value (NULL_RTX), m_source (nullptr) {}
+  avl_info (const avl_info &);
   avl_info (rtx, rtl_ssa::set_info *);
   rtx get_value () const { return m_value; }
   rtl_ssa::set_info *get_source () const { return m_source; }
-- 
2.36.3



Re: Re: [PATCH] RISC-V: Support VSETVL PASS for RVV support

2022-12-23 Thread 钟居哲
Hi, Andreas. Thank you for reporting this.
Even though I didn't reproduce this error, I have an idea to fix it:
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609045.html 
Would you mind testing this patch for me before merging it?
Thanks.


juzhe.zh...@rivai.ai
 
From: Andreas Schwab
Date: 2022-12-23 18:53
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Support VSETVL PASS for RVV support
How has this been tested?
 
In file included from ../../gcc/config/riscv/riscv-vsetvl.cc:89:
../../gcc/config/riscv/riscv-vsetvl.h: In member function 
'riscv_vector::avl_info riscv_vector::vl_vtype_info::get_avl_info() const':
../../gcc/config/riscv/riscv-vsetvl.h:175:43: error: implicitly-declared 
'constexpr riscv_vector::avl_info::avl_info(const riscv_vector::avl_info&)' is 
deprecated [-Werror=deprecated-copy]
  175 |   avl_info get_avl_info () const { return m_avl; }
  |   ^
../../gcc/config/riscv/riscv-vsetvl.h:131:13: note: because 
'riscv_vector::avl_info' has user-provided 'riscv_vector::avl_info& 
riscv_vector::avl_info::operator=(const riscv_vector::avl_info&)'
  131 |   avl_info &operator= (const avl_info &);
  | ^~~~
../../gcc/config/riscv/riscv-vsetvl.cc: In function 'bool 
change_insn(rtl_ssa::function_info*, rtl_ssa::insn_change, rtl_ssa::insn_info*, 
rtx)':
../../gcc/config/riscv/riscv-vsetvl.cc:823:27: error: unquoted whitespace 
character '\x0a' in format [-Werror=format-diag]
  823 |   pp_printf (&pp, "\n");
  |   ^~~~
../../gcc/config/riscv/riscv-vsetvl.cc:847:27: error: unquoted whitespace 
character '\x0a' in format [-Werror=format-diag]
  847 |   pp_printf (&pp, "\n");
  |   ^~~~
../../gcc/config/riscv/riscv-vsetvl.cc: In constructor 
'riscv_vector::vl_vtype_info::vl_vtype_info(riscv_vector::avl_info, uint8_t, 
riscv_vector::vlmul_type, uint8_t, bool, bool)':
../../gcc/config/riscv/riscv-vsetvl.cc:905:5: error: implicitly-declared 
'constexpr riscv_vector::avl_info::avl_info(const riscv_vector::avl_info&)' is 
deprecated [-Werror=deprecated-copy]
  905 |   : m_avl (avl_in), m_sew (sew_in), m_vlmul (vlmul_in), m_ratio 
(ratio_in),
  | ^~
../../gcc/config/riscv/riscv-vsetvl.cc:859:1: note: because 
'riscv_vector::avl_info' has user-provided 'riscv_vector::avl_info& 
riscv_vector::avl_info::operator=(const riscv_vector::avl_info&)'
  859 | avl_info::operator= (const avl_info &other)
  | ^~~~
../../gcc/config/riscv/riscv-vsetvl.cc: In member function 'void 
riscv_vector::vector_insn_info::dump(FILE*) const':
../../gcc/config/riscv/riscv-vsetvl.cc:1366:27: error: unquoted whitespace 
character '\x0a' in format [-Werror=format-diag]
1366 |   pp_printf (&pp, "\n");
  |   ^~~~
cc1plus: all warnings being treated as errors
make[3]: *** [../../gcc/config/riscv/t-riscv:59: riscv-vsetvl.o] Error 1
 
-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
 


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-23 Thread Jiufu Guo via Gcc-patches
HI,

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> Richard Biener  writes:
>
>> On Thu, 22 Dec 2022, guojiufu wrote:
>>
>>> Hi,
>>> 
>>> On 2022-12-21 15:30, Richard Biener wrote:
>>> > On Wed, 21 Dec 2022, Jiufu Guo wrote:
>>> > 
>>> >> Hi,
>>> >> 
>>> >> This patch is fixing an issue about parameter accessing if the
>>> >> parameter is struct type and passed through integer registers, and
>>> >> there is floating member is accessed. Like below code:
>>> >> 
>>> >> typedef struct DF {double a[4]; long l; } DF;
>>> >> double foo_df (DF arg){return arg.a[3];}
>>> >> 
>>> >> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
>>> >> generated.  While instruction "mtvsrd 1, 6" would be enough for
>>> >> this case.
>>> > 
>>> > So why do we end up spilling for PPC?
>>> 
>>> Good question! According to GCC source code (in function.cc/expr.cc),
>>> it is common behavior: using "word_mode" to store the parameter to stack,
>>> And using the field's mode (e.g. float mode) to load from the stack.
>>> But with some tries, I fail to construct cases on many platforms.
>>> So, I convert the fix to a target hook and implemented the rs6000 part
>>> first.
>>> 
>>> > 
>>> > struct X { int i; float f; };
>>> > 
>>> > float foo (struct X x)
>>> > {
>>> >   return x.f;
>>> > }
>>> > 
>>> > does pass the structure in $RDI on x86_64 and we manage (with
>>> > optimization, with -O0 we spill) to generate
>>> > 
>>> > shrq$32, %rdi
>>> > movd%edi, %xmm0
>>> > 
>>> > and RTL expansion generates
>>> > 
>>> > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>>> > (insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
>>> > (reg:DI 5 di [ x ])) "t.c":4:1 -1
>>> >  (nil))
>>> > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
>>> > (insn 6 3 7 2 (parallel [
>>> > (set (reg:DI 85)
>>> > (ashiftrt:DI (reg/v:DI 83 [ x ])
>>> > (const_int 32 [0x20])))
>>> > (clobber (reg:CC 17 flags))
>>> > ]) "t.c":5:11 -1
>>> >  (nil))
>>> > (insn 7 6 8 2 (set (reg:SI 86)
>>> > (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
>>> >  (nil))
>>> > 
>>> > I would imagine that for the ppc case we only see the subreg here
>>> > which should be even easier to optimize.
>>> > 
>>> > So how's this not fixable by providing proper patterns / subreg
>>> > capabilities?  Looking a bit at the RTL we have the issue might
>>> > be that nothing seems to handle CSE of
>>> > 
>>> 
>>> This case is also related to 'parameter on struct', PR89310 is
>>> just for this case. On trunk, it is fixed.
>>> One difference: the parameter is in DImode, and passed via an
>>> integer register for "{int i; float f;}".
>>> But for "{double a[4]; long l;}", the parameter is in BLKmode,
>>> and stored to stack during the argument setup.
>>
>> OK, so this would be another case for "heuristics" to use
>> sth different than word_mode for storing, but of course
>> the arguments are in integer registers and using different
>> modes can for example prohibit store-multiple instruction use.
>>
>> As I said in the related thread an RTL expansion time "SRA"
>> with the incoming argument assignment in mind could make
>> more optimal decisions for these kind of special-cases.
>
> Thanks a lot for your comments!
>
> Yeap! Using SRA-like analysis during expansion for parameter
> and returns (and may also some field accessing) would be a
> generic improvement for this kind of issue (PR101926 collected
> a lot of them).
> While we may still need some work for various ABIs and different
> targets, to analyze where the 'struct field' come from
> (int/float/vector/.. registers, or stack) and how the struct
> need to be handled (keep in pseudo or store in the stack).
> This may indicate a mount of changes for param_setup code.
>
> To reduce risk, I'm just draft straightforward patches for
> special cases currently, Like:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
> and this patch.
>
>>
>>> > (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>>> > (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
>>> > (const_int 56 [0x38])) [2 arg+24 S8 A64])
>>> > (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
>>> >  (expr_list:REG_DEAD (reg:DI 6 6)
>>> > (nil)))
>>> > (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
>>> > (note 10 7 15 2 NOTE_INSN_DELETED)
>>> > (insn 15 10 16 2 (set (reg/i:DF 33 1)
>>> > (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
>>> > (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40
>>> > 576 {*movdf_hardfloat64}
>>> >  (nil))
>>> > 
>>> > Possibly because the store and load happen in a different mode?  Can
>>> > you see why CSE doesn't handle this (producing a subreg)?  On
>>> 
>>> Yes, exactly! For "{double a[4]; long l;}", because the store and load
>>> are using a different mode, and then CSE does not optimize it.  This
>>> patch makes the store and load using the same mode (DImode), and then
>>> leverage CSE to handle it.
>>
>>

nvptx: Support global constructors/destructors via 'collect2' for offloading (was: nvptx: Support global constructors/destructors via 'collect2')

2022-12-23 Thread Thomas Schwinge
Hi!

On 2022-12-02T14:35:35+0100, I wrote:
> On 2022-12-01T22:13:38+0100, I wrote:
>> I'm working on support for global constructors/destructors with
>> GCC/nvptx
>
> See "nvptx: Support global constructors/destructors via 'collect2'"
> [posted before]

Building on that, attached is now the additional "for offloading" piece:
"nvptx: Support global constructors/destructors via 'collect2' for offloading".
OK to push?

I did manually test this (by putting a few constructors/destructors into
'libgomp/config/nvptx/oacc-parallel.c', and observing them be executed),
and also in my WIP development tree with standard libgfortran
constructors (with 'LIBGFOR_MINIMAL' disabled).


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


nvptx: Support global constructors/destructors via 'collect2' for offloading (was: nvptx: Support global constructors/destructors via 'collect2')

2022-12-23 Thread Thomas Schwinge
Hi!

On 2022-12-23T14:35:16+0100, I wrote:
> On 2022-12-02T14:35:35+0100, I wrote:
>> On 2022-12-01T22:13:38+0100, I wrote:
>>> I'm working on support for global constructors/destructors with
>>> GCC/nvptx
>>
>> See "nvptx: Support global constructors/destructors via 'collect2'"
>> [posted before]
>
> Building on that, attached is now the additional "for offloading" piece:
> "nvptx: Support global constructors/destructors via 'collect2' for 
> offloading".
> OK to push?

Now really attached.

> I did manually test this (by putting a few constructors/destructors into
> 'libgomp/config/nvptx/oacc-parallel.c', and observing them be executed),
> and also in my WIP development tree with standard libgfortran
> constructors (with 'LIBGFOR_MINIMAL' disabled).


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From fb67006eeca0c8e2bfdf86576ed3109dacaf6868 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 30 Nov 2022 22:09:35 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'
 for offloading

This extends "nvptx: Support global constructors/destructors via 'collect2'"
for offloading.

	libgcc/
	* config/nvptx/crtstuff.c ["mgomp"]
	(__do_global_ctors__entry__mgomp)
	(__do_global_dtors__entry__mgomp): New.
	[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
	New.
	libgomp/
	* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
	(nvptx_close_device, GOMP_OFFLOAD_load_image)
	(GOMP_OFFLOAD_unload_image): Call it.
---
 libgcc/config/nvptx/crtstuff.c |  64 ++-
 libgomp/plugin/plugin-nvptx.c  | 113 -
 2 files changed, 175 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
index 0823fc49901..8dc80687e0a 100644
--- a/libgcc/config/nvptx/crtstuff.c
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -29,6 +29,14 @@
files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
do so anyway, for symmetry with other configurations.  */
 
+
+/* See 'crt0.c', 'mgomp.c'.  */
+#if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+extern void *__nvptx_stacks[32] __attribute__((shared,nocommon));
+extern unsigned __nvptx_uni[32] __attribute__((shared,nocommon));
+#endif
+
+
 #ifdef CRT_BEGIN
 
 void
@@ -37,6 +45,33 @@ __do_global_ctors (void)
   DO_GLOBAL_CTORS_BODY;
 }
 
+/* Need '.entry' wrapper for offloading.  */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_ctors__entry__mgomp (void *);
+
+void
+__do_global_ctors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __do_global_ctors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_ctors__entry (void);
+
+void
+__do_global_ctors__entry (void)
+{
+  __do_global_ctors ();
+}
+
+# endif
+
 #elif defined(CRT_END) /* ! CRT_BEGIN */
 
 void
@@ -45,7 +80,7 @@ __do_global_dtors (void)
   /* In this configuration here, there's no way that "this routine is run more
  than once [...] when exit is called recursively": for nvptx target, the
  call to '__do_global_dtors' is registered via 'atexit', which doesn't
- re-enter a function already run.
+ re-enter a function already run, and neither does nvptx offload target.
  Therefore, we do *not* "arrange to remember where in the list we left off
  processing".  */
   func_ptr *p;
@@ -53,6 +88,33 @@ __do_global_dtors (void)
 (*p++) ();
 }
 
+/* Need '.entry' wrapper for offloading.  */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_dtors__entry__mgomp (void *);
+
+void
+__do_global_dtors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __do_global_dtors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_dtors__entry (void);
+
+void
+__do_global_dtors__entry (void)
+{
+  __do_global_dtors ();
+}
+
+# endif
+
 #else /* ! CRT_BEGIN && ! CRT_END */
 #error "One of CRT_BEGIN or CRT_END must be defined."
 #endif
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index fcc97c6e0d5..395639537e8 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -338,6 +338,11 @@ struct ptx_device
 
 static struct ptx_device **ptx_devices;
 
+static bool nvptx_do_global_cdtors (CUmodule, struct ptx_device *,
+const char *);
+static size_t nvptx_stacks_size ();
+static void *nvptx_stacks_acquire (struct ptx_device *, size_t, int);
+
 static inline struct nvptx_thread *
 nvptx_thread (void)
 {
@@ -557,6 +562,17 @@ nvptx_close_device (struct ptx_device *ptx_dev)
   if (!ptx_dev)
 return true;
 
+  bool ret = true;
+
+  

[committed] libstdc++: Fix Darwin bootstrap error in src/c++20/tzdb.cc

2022-12-23 Thread Jonathan Wakely via Gcc-patches
A fix for another bootstrap error caused by yesterday's C++20 time zone
commit, for macOS this time.

I have only tested on x86_64-linux but Iain confirmed this works for his
darwin testers. Pushed to trunk.

-- >8 --

Mach-O requires weak symbols to have a definition, so add a default
implementation of __gnu_cxx::zoneinfo_dir_override.

libstdc++-v3/ChangeLog:

* src/c++20/tzdb.cc [__APPLE__] (zoneinfo_dir_override): Add
definition.
---
 libstdc++-v3/src/c++20/tzdb.cc | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/src/c++20/tzdb.cc b/libstdc++-v3/src/c++20/tzdb.cc
index a02bcd4aec7..5f5c4199f65 100644
--- a/libstdc++-v3/src/c++20/tzdb.cc
+++ b/libstdc++-v3/src/c++20/tzdb.cc
@@ -52,6 +52,10 @@
 # endif
 #endif
 
+#ifndef _GLIBCXX_ZONEINFO_DIR
+# define _GLIBCXX_ZONEINFO_DIR "/usr/share/zoneinfo"
+#endif
+
 namespace __gnu_cxx
 {
 #ifdef _AIX
@@ -59,6 +63,12 @@ namespace __gnu_cxx
   const char* (*zoneinfo_dir_override)() = nullptr;
 #else
   [[gnu::weak]] const char* zoneinfo_dir_override();
+
+#ifdef __APPLE__
+  // Need a weak definition for Mach-O.
+  [[gnu::weak]] const char* zoneinfo_dir_override()
+  { return _GLIBCXX_ZONEINFO_DIR; }
+#endif
 #endif
 }
 
@@ -934,9 +944,6 @@ namespace std::chrono
 return info;
   }
 
-#ifndef _GLIBCXX_ZONEINFO_DIR
-# define _GLIBCXX_ZONEINFO_DIR "/usr/share/zoneinfo"
-#endif
  namespace
  {
 string
-- 
2.38.1



[patch, fortran] ICE on automatic array of derived type with DTIO

2022-12-23 Thread Jerry D via Gcc-patches

I have committed the obvious as simple.

The master branch has been updated by Jerry DeLisle :

https://gcc.gnu.org/g:7e76cd96950f49ce21246d44780e972d86b2bcdd

commit r13-4862-g7e76cd96950f49ce21246d44780e972d86b2bcdd
Author: Steve Kargl 
Date:   Thu Dec 22 20:38:57 2022 -0800

Remove not needed assert macro which fails.

PR fortran/106731

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_trans_auto_array_allocation): Remove
gcc_assert (!TREE_STATIC()).

gcc/testsuite/ChangeLog:

* gfortran.dg/pr106731.f90: New test.


nvptx: '-mframe-malloc-threshold', '-Wframe-malloc-threshold' (was: Handling of large stack objects in GPU code generation -- maybe transform into heap allocation?)

2022-12-23 Thread Thomas Schwinge
Hi!

On 2022-11-11T15:35:44+0100, Richard Biener via Fortran  
wrote:
> On Fri, Nov 11, 2022 at 3:13 PM Thomas Schwinge  
> wrote:
>> For example, for Fortran code like:
>>
>> write (*,*) "Hello world"
>>
>> ..., 'gfortran' creates:
>>
>> struct __st_parameter_dt dt_parm.0;
>>
>> try
>>   {
>> dt_parm.0.common.filename = 
>> &"source-gcc/libgomp/testsuite/libgomp.oacc-fortran/print-1_.f90"[1]{lb: 1 
>> sz: 1};
>> dt_parm.0.common.line = 29;
>> dt_parm.0.common.flags = 128;
>> dt_parm.0.common.unit = 6;
>> _gfortran_st_write (&dt_parm.0);
>> _gfortran_transfer_character_write (&dt_parm.0, &"Hello 
>> world"[1]{lb: 1 sz: 1}, 11);
>> _gfortran_st_write_done (&dt_parm.0);
>>   }
>> finally
>>   {
>> dt_parm.0 = {CLOBBER(eol)};
>>   }
>>
>> The issue: the stack object 'dt_parm.0' is a half-KiB in size (yes,
>> really! -- there's a lot of state in Fortran I/O apparently).  That's a
>> problem for GPU execution -- here: OpenACC/nvptx -- where typically you
>> have small stacks.  (For example, GCC/OpenACC/nvptx: 1 KiB per thread;
>> GCC/OpenMP/nvptx is an exception, because of its use of '-msoft-stack'
>> "Use custom stacks instead of local memory for automatic storage".)
>>
>> Now, the Nvidia Driver tries to accomodate for such largish stack usage,
>> and dynamically increases the per-thread stack as necessary (thereby
>> potentially reducing parallelism) -- if it manages to understand the call
>> graph.  In case of libgfortran I/O, it evidently doesn't.  Not being able
>> to disprove existance of recursion is the common problem, as I've read.
>> At run time, via 'CU_JIT_INFO_LOG_BUFFER' you then get, for example:
>>
>> warning : Stack size for entry function 'MAIN__$_omp_fn$0' cannot be 
>> statically determined
>>
>> That's still not an actual problem: if the GPU kernel's stack usage still
>> fits into 1 KiB.  Very often it does, but if, as happens in libgfortran
>> I/O handling, there is another such 'dt_parm' put onto the stack, the
>> stack then overflows; device-side SIGSEGV.
>>
>> (There is, by the way, some similar analysis by Tom de Vries in
>>  "[nvptx, openacc, openmp, testsuite]
>> Recursive tests may fail due to thread stack limit".)
>>
>> Of course, you shouldn't really be doing I/O in GPU kernels, but people
>> do like their occasional "'printf' debugging", so we ought to make that
>> work (... without pessimizing any "normal" code).
>>
>> I assume that generally reducing the size of 'dt_parm' etc. is out of
>> scope.
>>
>> There is a way to manually set a per-thread stack size, but it's not
>> obvious which size to set: that sizes needs to work for the whole GPU
>> kernel, and should be as low as possible (to maximize parallelism).
>> I assume that even if GCC did an accurate call graph analysis of the GPU
>> kernel's maximum stack usage, that still wouldn't help: that's before the
>> PTX JIT does its own code transformations, including stack spilling.
>>
>> There exists a 'CU_JIT_LTO' flag to "Enable link-time optimization
>> (-dlto) for device code".  This might help, assuming that it manages to
>> simplify the libgfortran I/O code such that the PTX JIT then understands
>> the call graph.  But: that's available only starting with recent
>> CUDA 11.4, so not a general solution -- if it works at all, which I've
>> not tested.
>>
>> Similarly, we could enable GCC's LTO for device code generation -- but
>> that's a big project, out of scope at this time.  And again, we don't
>> know if that at all helps this case.
>>
>> I see a few options:
>>
>> (a) Figure out what it is in the libgfortran I/O implementation that
>> causes "Stack size [...] cannot be statically determined", and re-work
>> that code to avoid that, or even disable certain things for nvptx, if
>> feasible.

> Shrink st_parameter_dt (it's part of the ABI though, kind of).  Lots of the
> bloat is from things that are unused for simpler I/O cases (so some
> "inheritance" could help), and lots of the bloat is from using
> string/length pairs using char * + size_t for what looks like could be
> encoded a lot more efficiently.
>
> There's probably not much low-hanging fruit.

(Similarly comments in Janne's email.)


Well, as had to be expected, libgfortran I/O is really just one example,
but the underlying problem may also be triggered in other ways (via other
newlib/libc functions, for example).

So, really a generic solution seems to be called for.

>> (b) Also for GCC/OpenACC/nvptx use the GCC/OpenMP/nvptx '-msoft-stack'.
>> I don't really want to do that however: it does introduce a bit of
>> complexity in all the generated device code and run-time overhead that we
>> generally would like to avoid.

Directly using '-msoft-stack' isn't actually possible: it does implement
"one stack per 32-threads warp", but for OpenACC we need "one stack per
thread of a warp" (that is, each OpenACC 'vector' independently), and

Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-23 Thread Qing Zhao via Gcc-patches


> On Dec 23, 2022, at 2:33 AM, Alexander Monakov  wrote:
> 
> 
> On Thu, 22 Dec 2022, Qing Zhao wrote:
> 
>>> I think scheduling across calls in the pre-RA scheduler is simply an 
>>> oversight,
>>> we do not look at dataflow information and with 50% chance risk extending
>>> lifetime of a pseudoregister across a call, causing higher register 
>>> pressure at
>>> the point of the call, and potentially an extra spill.
>> 
>> I am a little confused, you mean pre-RA scheduler does not look at the data 
>> flow
>> information at all when scheduling insns across calls currently?
> 
> I think it does not inspect liveness info, and may extend lifetime of a pseudo
> across a call, transforming
> 
>  call foo
>  reg = 1
>  ...
>  use reg
> 
> to
> 
>  reg = 1
>  call foo
>  ...
>  use reg
> 
> but this is undesirable, because now register allocation cannot select a
> call-clobbered register for 'reg’.
Okay, thanks for the explanation.

Then, why not just check the liveness info instead of inhibiting all scheduling 
across calls?

Qing
> 
> Alexander



Re: [PATCH] loading float member of parameter stored via int registers

2022-12-23 Thread Segher Boessenkool
Hi!

On Fri, Dec 23, 2022 at 08:36:36PM +0800, Jiufu Guo wrote:
> It seems some limitations there. e.g. 1. "subreg:DF on DI register"
> may not work well on pseudo,

It is perfectly normal:
  A hard register may be accessed in various modes throughout one
  function, but each pseudo register is given a natural mode
  and is accessed only in that mode.  When it is necessary to describe
  an access to a pseudo register using a nonnatural mode, a @code{subreg}
  expression is used.

and:
  @code{subreg} expressions are used to refer to a register in a machine
  mode other than its natural one, or to refer to one register of
  a multi-part @code{reg} that actually refers to several registers.

  Each pseudo register has a natural mode.  If it is necessary to
  operate on it in a different mode, the register must be
  enclosed in a @code{subreg}.

and we even have:
  @item hard registers
  It is seldom necessary to wrap hard registers in @code{subreg}s; such
  registers would normally reduce to a single @code{reg} rtx.  This use of
  @code{subreg}s is discouraged and may not be supported in the future.

> and 2. to convert high-part:DI to SF,
> a "shift/rotate" is needed, and then we need to "emit shift insn"
> in cse. I may need to update this patch.

Hrm.  The machine insns to do this is just mtvsrd;xscvspdpn, but for
converting the lowpart it is mtvsrws;xscvspdpn (this needs p9 or
later).  We should arrive at those patterns, and we should try to not
go via the more expensive formulations with shifts, which don't describe
the hardware well, and which overestimate the cost of it.

None of this belongs in generic code at all imo.  At expand time it
should be expanded to something that works and can be optimised well,
so not anything with :BLK (which has to be put in memory, something with
unbounded size cannot be put in registers), not anything specifically
tailored to any cpu, something nice and regular.  Using a subreg (of a
pseudo!) is the standard way of writing a bitcast.

So generic code would do a  (subreg:SF (reg:SI) 0)  to express a 32-bit
integer bitcast to an IEEE SP number, and our machine description should
make it work nicely.


Segher


Re: [PATCH v5 3/4] OpenMP: Pointers and member mappings

2022-12-23 Thread Julian Brown
On Thu, 15 Dec 2022 16:46:50 +
Julian Brown  wrote:

> On Thu, 15 Dec 2022 14:54:58 +
> Julian Brown  wrote:
> 
> > On Wed, 7 Dec 2022 17:31:20 +0100
> > Tobias Burnus  wrote:
> >   
> > > Hi Julian,
> > > 
> > > I think this patch is OK; however, at least for gimplify.cc Jakub
> > > needs to have a second look.
> > 
> > Thanks for the review!  Here's a new version that hopefully
> > addresses your comments.  (The gimplify bits change a bit more in
> > this version!)  
> 
> FYI, this is the current dependency list for this patch:
> 
> (1) "OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in
> {c_}finish_omp_clause"
> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603791.html
> Approved.
> 
> (2) "OpenMP/OpenACC: Rework clause expansion and nested struct
> handling"
> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603792.html
> Approved, but waiting for *this* patch to avoid regressing Fortran
> pointer-mapping behaviour, and which Tobias noticed an issue with
> prior to committing, addressed by (4).
> 
> (3) "OpenMP/OpenACC: Refine condition for when map clause expansion
> happens"
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607543.html
> Not reviewed (partly OpenACC).
> 
> (4) "OpenMP: implicitly map base pointer for array-section pointer
> components"
> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608318.html
> Not reviewed.
> 
> The following patches also depend on this one and the above:
> 
> (5) "OpenMP: lvalue parsing for map clauses (C++)"
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605367.html
> Mostly approved.
> 
> (6) "OpenMP: C++ "declare mapper" support"
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607544.html
> Revised version unreviewed.
> 
> ...and the to-be-revised "lvalue parsing for C", and C/Fortran
> "declare mapper" patches.

Followup:

https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609031.html




Re: [PATCH] RISC-V: Support VSETVL PASS for RVV support

2022-12-23 Thread Andreas Schwab
On Dez 23 2022, 钟居哲 wrote:

> Would you mind telling me how you reproduce these errors ?

make bootstrap

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: Re: [PATCH] RISC-V: Support VSETVL PASS for RVV support

2022-12-23 Thread 钟居哲
Thank you. Would you mind testing this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609045.html 
to see whether the issue is fixed ?
Thanks



juzhe.zh...@rivai.ai
 
From: Andreas Schwab
Date: 2022-12-23 22:54
To: 钟居哲
CC: gcc-patches; kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Support VSETVL PASS for RVV support
On Dez 23 2022, 钟居哲 wrote:
 
> Would you mind telling me how you reproduce these errors ?
 
make bootstrap
 
-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
 


Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-23 Thread Alexander Monakov via Gcc-patches
On Fri, 23 Dec 2022, Qing Zhao wrote:
> >> I am a little confused, you mean pre-RA scheduler does not look at the 
> >> data flow
> >> information at all when scheduling insns across calls currently?
> > 
> > I think it does not inspect liveness info, and may extend lifetime of a 
> > pseudo
> > across a call, transforming
> > 
> >  call foo
> >  reg = 1
> >  ...
> >  use reg
> > 
> > to
> > 
> >  reg = 1
> >  call foo
> >  ...
> >  use reg
> > 
> > but this is undesirable, because now register allocation cannot select a
> > call-clobbered register for 'reg’.
> Okay, thanks for the explanation.
> 
> Then, why not just check the liveness info instead of inhibiting all 
> scheduling across calls?

Because there's almost nothing to gain from pre-RA scheduling across calls in
the first place. Remember that the call transfers control flow elsewhere and
therefore the scheduler has no idea about the pipeline state after the call
and after the return, so modeling-wise it's a gamble.

For instructions that lie on a critical path such scheduling can be useful when
it substantially reduces the difference between the priority of the call and
nearby instructions of the critical path. But we don't track which instructions
are on critical path(s) and which are not.

(scheduling across calls in sched2 is somewhat dubious as well, but
it doesn't risk register pressure issues, and on VLIW CPUs it at least
can result in better VLIW packing)

Alexander


[committed] tree-ssa-dom: can_infer_simple_equiv fixes [PR108068]

2022-12-23 Thread Jakub Jelinek via Gcc-patches
Hi!

As reported in the PR, tree-ssa-dom.cc uses real_zerop call to find
if a floating point constant is zero and it shouldn't try to infer
equivalences from comparison against it if signed zeros are honored.
This doesn't work at all for decimal types, because real_zerop always
returns false for them (one can have different representations of decimal
zero beyond -0/+0), and it doesn't work for vector compares either,
as real_zerop checks if all elements are zero, while we need to avoid
infering equivalences from comparison against vector constants which have
at least one zero element in it (if signed zeros are honored).
Furthermore, as mentioned by Joseph, for decimal types many other values
aren't singleton.

So, this patch stops infering anything if element mode is decimal, and
otherwise uses instead of real_zerop a new function, real_maybe_zerop,
which will work even for decimal types and for complex or vector will
return true if any element is or might be zero (so it returns true
for anything but constants for now).

Bootstrapped/regtested on x86_64-linux and i686-linux, acked by Richi
in the PR, committed to trunk.

2022-12-23  Jakub Jelinek  

PR tree-optimization/108068
* tree.h (real_maybe_zerop): Declare.
* tree.cc (real_maybe_zerop): Define.
* tree-ssa-dom.cc (record_edge_info): Use it instead of
real_zerop or TREE_CODE (op1) == SSA_NAME || real_zerop.  Always set
can_infer_simple_equiv to false for decimal floating point types.

* gcc.dg/dfp/pr108068.c: New test.

--- gcc/tree.h.jj   2022-12-21 09:03:45.722562726 +0100
+++ gcc/tree.h  2022-12-21 16:34:56.316622678 +0100
@@ -5497,6 +5497,7 @@ extern bool needs_to_live_in_memory (con
 extern tree reconstruct_complex_type (tree, tree);
 extern bool real_onep (const_tree);
 extern bool real_minus_onep (const_tree);
+extern bool real_maybe_zerop (const_tree);
 extern void init_ttree (void);
 extern void build_common_tree_nodes (bool);
 extern void build_common_builtin_nodes (void);
--- gcc/tree.cc.jj  2022-12-21 09:03:45.719562769 +0100
+++ gcc/tree.cc 2022-12-21 16:35:46.567899636 +0100
@@ -3180,6 +3180,35 @@ real_minus_onep (const_tree expr)
 }
 }
 
+/* Return true if T could be a floating point zero.  */
+
+bool
+real_maybe_zerop (const_tree expr)
+{
+  switch (TREE_CODE (expr))
+{
+case REAL_CST:
+  /* Can't use real_zerop here, as it always returns false for decimal
+floats.  And can't use TREE_REAL_CST (expr).cl == rvc_zero
+either, as decimal zeros are rvc_normal.  */
+  return real_equal (&TREE_REAL_CST (expr), &dconst0);
+case COMPLEX_CST:
+  return (real_maybe_zerop (TREE_REALPART (expr))
+ || real_maybe_zerop (TREE_IMAGPART (expr)));
+case VECTOR_CST:
+  {
+   unsigned count = vector_cst_encoded_nelts (expr);
+   for (unsigned int i = 0; i < count; ++i)
+ if (real_maybe_zerop (VECTOR_CST_ENCODED_ELT (expr, i)))
+   return true;
+   return false;
+  }
+default:
+  /* Perhaps for SSA_NAMEs we could query frange.  */
+  return true;
+}
+}
+
 /* Nonzero if EXP is a constant or a cast of a constant.  */
 
 bool
--- gcc/tree-ssa-dom.cc.jj  2022-11-23 09:24:48.781253319 +0100
+++ gcc/tree-ssa-dom.cc 2022-12-21 16:36:37.756163125 +0100
@@ -615,9 +615,9 @@ record_edge_info (basic_block bb)
 {
   tree cond = build2 (code, boolean_type_node, op0, op1);
   tree inverted = invert_truthvalue_loc (loc, cond);
-  bool can_infer_simple_equiv
-= !(HONOR_SIGNED_ZEROS (op0)
-&& real_zerop (op0));
+ bool can_infer_simple_equiv
+   = !(HONOR_SIGNED_ZEROS (op0) && real_maybe_zerop (op0))
+ && !DECIMAL_FLOAT_MODE_P (element_mode (TREE_TYPE (op0)));
  class edge_info *edge_info;
 
  edge_info = new class edge_info (true_edge);
@@ -639,9 +639,9 @@ record_edge_info (basic_block bb)
 {
   tree cond = build2 (code, boolean_type_node, op0, op1);
   tree inverted = invert_truthvalue_loc (loc, cond);
-  bool can_infer_simple_equiv
-= !(HONOR_SIGNED_ZEROS (op1)
-&& (TREE_CODE (op1) == SSA_NAME || real_zerop (op1)));
+ bool can_infer_simple_equiv
+   = !(HONOR_SIGNED_ZEROS (op1) && real_maybe_zerop (op1))
+ && !DECIMAL_FLOAT_MODE_P (element_mode (TREE_TYPE (op1)));
  class edge_info *edge_info;
 
  edge_info = new class edge_info (true_edge);
--- gcc/testsuite/gcc.dg/dfp/pr108068.c.jj  2022-12-21 16:41:45.243738850 
+0100
+++ gcc/testsuite/gcc.dg/dfp/pr108068.c 2022-12-21 16:41:38.267839223 +0100
@@ -0,0 +1,14 @@
+/* PR tree-optimization/108068 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+int
+main ()
+{
+  _Decimal64 x = -1;
+  while (x != 0)
+x /= 10;
+  double d = x;
+  if (!__builtin_

[PATCH] strlen: do not use cond_expr for boundaries

2022-12-23 Thread Martin Liška
Hi.

We reach cond_expr and then we get an ICE in tree_int_cst_lt.
Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR tree-optimization/108137

gcc/ChangeLog:

* tree-ssa-strlen.cc (get_range_strlen_phi): Reject anything
different from INTEGER_CST.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr108137.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr108137.c |  8 
 gcc/tree-ssa-strlen.cc   | 13 +++--
 2 files changed, 15 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr108137.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr108137.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr108137.c
new file mode 100644
index 000..f0cb71b2267
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr108137.c
@@ -0,0 +1,8 @@
+// PR tree-optimization/108137
+// { dg-do compile }
+// { dg-options "-Wformat-overflow" }
+
+void f(unsigned short x_port, unsigned int x_host)
+{
+__builtin_printf("missing %s", x_port ? "host" : &"host:port"[x_host ? 5 : 
0]);
+}
diff --git a/gcc/tree-ssa-strlen.cc b/gcc/tree-ssa-strlen.cc
index abec225566d..a2edac4c77f 100644
--- a/gcc/tree-ssa-strlen.cc
+++ b/gcc/tree-ssa-strlen.cc
@@ -1136,14 +1136,15 @@ get_range_strlen_phi (tree src, gphi *phi,
 
   /* Adjust the minimum and maximum length determined so far and
 the upper bound on the array size.  */
-  if (!pdata->minlen
- || tree_int_cst_lt (argdata.minlen, pdata->minlen))
+  if (TREE_CODE (argdata.minlen) == INTEGER_CST
+ && (!pdata->minlen
+ || tree_int_cst_lt (argdata.minlen, pdata->minlen)))
pdata->minlen = argdata.minlen;
 
-  if (!pdata->maxlen
- || (argdata.maxlen
- && TREE_CODE (argdata.maxlen) == INTEGER_CST
- && tree_int_cst_lt (pdata->maxlen, argdata.maxlen)))
+  if (TREE_CODE (argdata.maxlen) == INTEGER_CST
+ && (!pdata->maxlen
+ || (argdata.maxlen
+ && tree_int_cst_lt (pdata->maxlen, argdata.maxlen
pdata->maxlen = argdata.maxlen;
 
   if (!pdata->maxbound
-- 
2.39.0



Re: [PATCH] c++: get_nsdmi in template context [PR108116]

2022-12-23 Thread Patrick Palka via Gcc-patches
On Thu, 22 Dec 2022, Patrick Palka wrote:

> On Thu, 22 Dec 2022, Jason Merrill wrote:
> 
> > On 12/22/22 16:41, Patrick Palka wrote:
> > > On Thu, 22 Dec 2022, Jason Merrill wrote:
> > > 
> > > > On 12/22/22 11:31, Patrick Palka wrote:
> > > > > On Wed, 21 Dec 2022, Jason Merrill wrote:
> > > > > 
> > > > > > On 12/21/22 09:52, Patrick Palka wrote:
> > > > > > > Here during ahead of time checking of C{}, we indirectly call
> > > > > > > get_nsdmi
> > > > > > > for C::m from finish_compound_literal, which in turn calls
> > > > > > > break_out_target_exprs for C::m's (non-templated) initializer,
> > > > > > > during
> > > > > > > which we end up building a call to A::~A and checking
> > > > > > > expr_noexcept_p
> > > > > > > for it (from build_vec_delete_1).  But this is all done with
> > > > > > > processing_template_decl set, so the built A::~A call is templated
> > > > > > > (whose form r12-6897-gdec8d0e5fa00ceb2 recently changed) which
> > > > > > > expr_noexcept_p doesn't expect and we crash.
> > > > > > > 
> > > > > > > In r10-6183-g20afdcd3698275 we fixed a similar issue by guarding a
> > > > > > > expr_noexcept_p call with !processing_template_decl, which works
> > > > > > > here
> > > > > > > too.  But it seems to me since the initializer we obtain in
> > > > > > > get_nsdmi is
> > > > > > > always non-templated, it should be calling break_out_target_exprs
> > > > > > > with
> > > > > > > processing_template_decl cleared since otherwise the function 
> > > > > > > might
> > > > > > > end
> > > > > > > up mixing templated and non-templated trees.
> > > > > > > 
> > > > > > > I'm not sure about this though, perhaps this is not the best fix
> > > > > > > here.
> > > > > > > Alternatively, when processing_template_decl we could make 
> > > > > > > get_nsdmi
> > > > > > > avoid calling break_out_target_exprs at all or something.
> > > > > > > Additionally,
> > > > > > > perhaps break_out_target_exprs should be a no-op more generally 
> > > > > > > when
> > > > > > > processing_template_decl since we shouldn't see any TARGET_EXPRs
> > > > > > > inside
> > > > > > > a template?
> > > > > > 
> > > > > > Hmm.
> > > > > > 
> > > > > > Any time we would call break_out_target_exprs we're dealing with
> > > > > > non-dependent
> > > > > > expressions; if we're in a template, we're building up an 
> > > > > > initializer
> > > > > > or a
> > > > > > call that we'll soon throw away, just for the purpose of checking or
> > > > > > type
> > > > > > computation.
> > > > > > 
> > > > > > Furthermore, as you say, the argument is always a non-template tree,
> > > > > > whether
> > > > > > in get_nsdmi or convert_default_arg.  So having
> > > > > > processing_template_decl
> > > > > > cleared would be correct.
> > > > > > 
> > > > > > I don't think we can get away with not calling 
> > > > > > break_out_target_exprs
> > > > > > at
> > > > > > all
> > > > > > in a template; if nothing else, we would lose immediate invocation
> > > > > > expansion.
> > > > > > However, we could probably skip the bot_manip tree walk, which 
> > > > > > should
> > > > > > avoid
> > > > > > the problem.
> > > > > > 
> > > > > > Either way we end up returning non-template trees, as we do now, and
> > > > > > callers
> > > > > > have to deal with transient CONSTRUCTORs containing such (as we do 
> > > > > > in
> > > > > > massage_init_elt).
> > > > > 
> > > > > Ah I see, makes sense.
> > > > > 
> > > > > > 
> > > > > > Does convert_default_arg not run into the same problem, e.g. when
> > > > > > calling
> > > > > > 
> > > > > > void g(B = {0});
> > > > > 
> > > > > In practice it seems not, because we don't call convert_default_arg
> > > > > when processing_template_decl is set (verified with an assert to
> > > > > that effect).  In build_over_call for example we exit early when
> > > > > processing_template_decl is set, and return a templated CALL_EXPR
> > > > > that doesn't include default arguments at all.  A consequence of
> > > > > this is that we don't reject ahead of time a call that would use
> > > > > an ill-formed dependent default argument, e.g.
> > > > > 
> > > > > template
> > > > > void g(B = T{0});
> > > > > 
> > > > > template
> > > > > void f() {
> > > > >   g();
> > > > > }
> > > > > 
> > > > > since the default argument instantiation would be the responsibility
> > > > > of convert_default_arg.
> > > > > 
> > > > > Thinking hypothetically here, if we do in the future want to include
> > > > > default
> > > > > arguments in the templated form of a CALL_EXPR,
> > > > 
> > > > We definitely do not want to; the templated form should be as close as
> > > > possible to the source.
> > > 
> > > Ah, sounds good.
> > > 
> > > > 
> > > > We might want to perform non-dependent conversions to get any errors 
> > > > (such
> > > > as
> > > > this one) before throwing away the result.  Which would be parallel to
> > > > what we
> > > > currently do in calling get_nsdmi, and would want the same behavior.
> > > 

Re: [PATCH] c++: get_nsdmi in template context [PR108116]

2022-12-23 Thread Jason Merrill via Gcc-patches

On 12/23/22 10:48, Patrick Palka wrote:

On Thu, 22 Dec 2022, Patrick Palka wrote:


On Thu, 22 Dec 2022, Jason Merrill wrote:


On 12/22/22 16:41, Patrick Palka wrote:

On Thu, 22 Dec 2022, Jason Merrill wrote:


On 12/22/22 11:31, Patrick Palka wrote:

On Wed, 21 Dec 2022, Jason Merrill wrote:


On 12/21/22 09:52, Patrick Palka wrote:

Here during ahead of time checking of C{}, we indirectly call
get_nsdmi
for C::m from finish_compound_literal, which in turn calls
break_out_target_exprs for C::m's (non-templated) initializer,
during
which we end up building a call to A::~A and checking
expr_noexcept_p
for it (from build_vec_delete_1).  But this is all done with
processing_template_decl set, so the built A::~A call is templated
(whose form r12-6897-gdec8d0e5fa00ceb2 recently changed) which
expr_noexcept_p doesn't expect and we crash.

In r10-6183-g20afdcd3698275 we fixed a similar issue by guarding a
expr_noexcept_p call with !processing_template_decl, which works
here
too.  But it seems to me since the initializer we obtain in
get_nsdmi is
always non-templated, it should be calling break_out_target_exprs
with
processing_template_decl cleared since otherwise the function might
end
up mixing templated and non-templated trees.

I'm not sure about this though, perhaps this is not the best fix
here.
Alternatively, when processing_template_decl we could make get_nsdmi
avoid calling break_out_target_exprs at all or something.
Additionally,
perhaps break_out_target_exprs should be a no-op more generally when
processing_template_decl since we shouldn't see any TARGET_EXPRs
inside
a template?


Hmm.

Any time we would call break_out_target_exprs we're dealing with
non-dependent
expressions; if we're in a template, we're building up an initializer
or a
call that we'll soon throw away, just for the purpose of checking or
type
computation.

Furthermore, as you say, the argument is always a non-template tree,
whether
in get_nsdmi or convert_default_arg.  So having
processing_template_decl
cleared would be correct.

I don't think we can get away with not calling break_out_target_exprs
at
all
in a template; if nothing else, we would lose immediate invocation
expansion.
However, we could probably skip the bot_manip tree walk, which should
avoid
the problem.

Either way we end up returning non-template trees, as we do now, and
callers
have to deal with transient CONSTRUCTORs containing such (as we do in
massage_init_elt).


Ah I see, makes sense.



Does convert_default_arg not run into the same problem, e.g. when
calling

 void g(B = {0});


In practice it seems not, because we don't call convert_default_arg
when processing_template_decl is set (verified with an assert to
that effect).  In build_over_call for example we exit early when
processing_template_decl is set, and return a templated CALL_EXPR
that doesn't include default arguments at all.  A consequence of
this is that we don't reject ahead of time a call that would use
an ill-formed dependent default argument, e.g.

 template
 void g(B = T{0});

 template
 void f() {
   g();
 }

since the default argument instantiation would be the responsibility
of convert_default_arg.

Thinking hypothetically here, if we do in the future want to include
default
arguments in the templated form of a CALL_EXPR,


We definitely do not want to; the templated form should be as close as
possible to the source.


Ah, sounds good.



We might want to perform non-dependent conversions to get any errors (such
as
this one) before throwing away the result.  Which would be parallel to
what we
currently do in calling get_nsdmi, and would want the same behavior.


*nod*




[snip]



shall we go with the original approach to clear
processing_template_decl directly from get_nsdmi?


OK, but then we should also checking_assert !processing_template_decl in
b_o_t_e.


Unfortunately we'd trigger that assert from maybe_constant_value, which
potentially calls b_o_t_e with processing_template_decl set.


maybe_constant_value could also clear processing_template_decl; entries in
cv_cache are non-templated.


Aha!  I'll try that.


How does this look?  Bootstrapped and regtested on x86_64-pc-linux-gnu.


OK.


-- >8 --

Subject: [PATCH] c++: get_nsdmi in template context [PR108116]

Here during ahead of time checking of C{}, we indirectly call get_nsdmi
for C::m from finish_compound_literal, which in turn calls
break_out_target_exprs for C::m's (non-templated) initializer, during
which we build a call to A::~A and check expr_noexcept_p for it (from
build_vec_delete_1).  But this is all done with processing_template_decl
set, so the built A::~A call is templated (whose form was recently
changed by r12-6897-gdec8d0e5fa00ceb2) which expr_noexcept_p doesn't
expect, and we crash.

This patch fixes this by clearing processing_template_decl before
the call to break_out_target_exprs from get_nsdmi.  And since it more
generally seems we shouldn't be seeing (or producing) non-templat

*PING* [PATCH] Fortran: incorrect array bounds when bound intrinsic used in decl [PR108131]

2022-12-23 Thread Harald Anlauf via Gcc-patches

Am 17.12.22 um 22:21 schrieb Harald Anlauf via Gcc-patches:

Dear all,

the previous fix for pr103505 introduced a regression that could lead
to wrong array bounds when LBOUND/UBOUND were used in the array spec
of a declaration.  The reason was that we tried to simplify too early
the array element spec, which appears to have interfered with the
subtle semantics of the bound intrinsics.

The solution is to undo the fix for pr103505.  It turns out that
there are other code changes in place that were put in place to
fix related ICEs, and which handle that one, too, and only lead
to a change of the emitted error diagnostics.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

As this is a 10/11/12/13 regression, I would like to backport
as seems fit.

Thanks,
Harald






Re: [PATCH] loading float member of parameter stored via int registers

2022-12-23 Thread Richard Biener via Gcc-patches



> Am 23.12.2022 um 15:48 schrieb Segher Boessenkool 
> :
> 
> Hi!
> 
>> On Fri, Dec 23, 2022 at 08:36:36PM +0800, Jiufu Guo wrote:
>> It seems some limitations there. e.g. 1. "subreg:DF on DI register"
>> may not work well on pseudo,
> 
> It is perfectly normal:
>  A hard register may be accessed in various modes throughout one
>  function, but each pseudo register is given a natural mode
>  and is accessed only in that mode.  When it is necessary to describe
>  an access to a pseudo register using a nonnatural mode, a @code{subreg}
>  expression is used.
> 
> and:
>  @code{subreg} expressions are used to refer to a register in a machine
>  mode other than its natural one, or to refer to one register of
>  a multi-part @code{reg} that actually refers to several registers.
> 
>  Each pseudo register has a natural mode.  If it is necessary to
>  operate on it in a different mode, the register must be
>  enclosed in a @code{subreg}.
> 
> and we even have:
>  @item hard registers
>  It is seldom necessary to wrap hard registers in @code{subreg}s; such
>  registers would normally reduce to a single @code{reg} rtx.  This use of
>  @code{subreg}s is discouraged and may not be supported in the future.
> 
>> and 2. to convert high-part:DI to SF,
>> a "shift/rotate" is needed, and then we need to "emit shift insn"
>> in cse. I may need to update this patch.
> 
> Hrm.  The machine insns to do this is just mtvsrd;xscvspdpn, but for
> converting the lowpart it is mtvsrws;xscvspdpn (this needs p9 or
> later).  We should arrive at those patterns, and we should try to not
> go via the more expensive formulations with shifts, which don't describe
> the hardware well, and which overestimate the cost of it.
> 
> None of this belongs in generic code at all imo.  At expand time it
> should be expanded to something that works and can be optimised well,
> so not anything with :BLK (which has to be put in memory, something with
> unbounded size cannot be put in registers), not anything specifically
> tailored to any cpu, something nice and regular.  Using a subreg (of a
> pseudo!) is the standard way of writing a bitcast.
> 
> So generic code would do a  (subreg:SF (reg:SI) 0)  to express a 32-bit
> integer bitcast to an IEEE SP number, and our machine description should
> make it work nicely.

There’s also a byte offset in subreg, so (subreg:sf (reg:di) 4) is a Highpart 
bitcast.  Note whether targets actually support subreg operations needs to be 
queried and I’m not sure how subreg with offset validation should work there.

Richard 

> 
> 
> Segher


[x86 PATCH] Use movss/movsd to implement V4SI/V2DI VEC_PERM.

2022-12-23 Thread Roger Sayle

This patch tweaks the x86 backend to use the movss and movsd instructions
to perform some vector permutations on integer vectors (V4SI and V2DI) in
the same way they are used for floating point vectors (V4SF and V2DF).

As a motivating example, consider:

typedef unsigned int v4si __attribute__((vector_size(16)));
typedef float v4sf __attribute__((vector_size(16)));
v4si foo(v4si x,v4si y) { return (v4si){y[0],x[1],x[2],x[3]}; }
v4sf bar(v4sf x,v4sf y) { return (v4sf){y[0],x[1],x[2],x[3]}; }

which is currently compiled with -O2 to:

foo:movdqa  %xmm0, %xmm2
shufps  $80, %xmm0, %xmm1
movdqa  %xmm1, %xmm0
shufps  $232, %xmm2, %xmm0
ret

bar:movss   %xmm1, %xmm0
ret

with this patch both functions compile to the same form.
Likewise for the V2DI case:

typedef unsigned long v2di __attribute__((vector_size(16)));
typedef double v2df __attribute__((vector_size(16)));

v2di foo(v2di x,v2di y) { return (v2di){y[0],x[1]}; }
v2df bar(v2df x,v2df y) { return (v2df){y[0],x[1]}; }

which is currently generates:

foo:shufpd  $2, %xmm0, %xmm1
movdqa  %xmm1, %xmm0
ret

bar:movsd   %xmm1, %xmm0
ret

There are two possible approaches to adding integer vector forms of the
sse_movss and sse2_movsd instructions.  One is to use a mode iterator
(VI4F_128 or VI8F_128) on the existing define_insn patterns, but this
requires renaming the patterns to sse_movss_ which then requires
changes to i386-builtins.def and through-out the backend to reflect the
new naming of gen_sse_movss_v4sf.  The alternate approach (taken here)
is to simply clone and specialize the existing patterns.  Uros, if you'd
prefer the first approach, I'm happy to make/test/commit those changes.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?

2022-12-23  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-expand.cc (expand_vec_perm_movs): Also allow
V4SImode with TARGET_SSE and V2DImode with TARGET_SSE2.
* config/i386/sse.md (sse_movss_v4si): New define_insn, a V4SI
specialization of sse_movss.
(sse2_movsd_v2di): Likewise, a V2DI specialization of sse2_movsd.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse-movss-4.c: New test case.
* gcc.target/i386/sse2-movsd-3.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index a45640f..ad7745a 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -18903,8 +18903,10 @@ expand_vec_perm_movs (struct expand_vec_perm_d *d)
 return false;
 
   if (!(TARGET_SSE && vmode == V4SFmode)
+  && !(TARGET_SSE && vmode == V4SImode)
   && !(TARGET_MMX_WITH_SSE && vmode == V2SFmode)
-  && !(TARGET_SSE2 && vmode == V2DFmode))
+  && !(TARGET_SSE2 && vmode == V2DFmode)
+  && !(TARGET_SSE2 && vmode == V2DImode))
 return false;
 
   /* Only the first element is changed.  */
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index de632b2..f5860f2c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10513,6 +10513,21 @@
(set_attr "prefix" "orig,maybe_evex")
(set_attr "mode" "SF")])
 
+(define_insn "sse_movss_v4si"
+  [(set (match_operand:V4SI 0 "register_operand"   "=x,v")
+   (vec_merge:V4SI
+ (match_operand:V4SI 2 "register_operand" " x,v")
+ (match_operand:V4SI 1 "register_operand" " 0,v")
+ (const_int 1)))]
+  "TARGET_SSE"
+  "@
+   movss\t{%2, %0|%0, %2}
+   vmovss\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix" "orig,maybe_evex")
+   (set_attr "mode" "SF")])
+
 (define_insn "avx2_vec_dup"
   [(set (match_operand:VF1_128_256 0 "register_operand" "=v")
(vec_duplicate:VF1_128_256
@@ -13523,6 +13538,21 @@
   (const_string "orig")))
(set_attr "mode" "DF,DF,V1DF,V1DF,V1DF,V2DF,V1DF,V1DF,V1DF")])
 
+(define_insn "sse2_movsd_v2di"
+  [(set (match_operand:V2DI 0 "register_operand"   "=x,v")
+   (vec_merge:V2DI
+ (match_operand:V2DI 2 "register_operand" " x,v")
+ (match_operand:V2DI 1 "register_operand" " 0,v")
+ (const_int 1)))]
+  "TARGET_SSE2"
+  "@
+   movsd\t{%2, %0|%0, %2}
+   vmovsd\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix" "orig,maybe_evex")
+   (set_attr "mode" "DF")])
+
 (define_insn "vec_dupv2df"
   [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
(vec_duplicate:V2DF
diff --git a/gcc/testsuite/gcc.target/i386/sse-movss-4.c 
b/gcc/testsuite/gcc.target/i386/sse-movss-4.c
new file mode 100644
index 000..ec3019c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse-movss-4.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse" } */
+
+typedef unsigned int v4si __attribut

Re: [PATCH] loading float member of parameter stored via int registers

2022-12-23 Thread Segher Boessenkool
On Fri, Dec 23, 2022 at 05:20:09PM +0100, Richard Biener wrote:
> > Am 23.12.2022 um 15:48 schrieb Segher Boessenkool 
> > :
> > None of this belongs in generic code at all imo.  At expand time it
> > should be expanded to something that works and can be optimised well,
> > so not anything with :BLK (which has to be put in memory, something with
> > unbounded size cannot be put in registers), not anything specifically
> > tailored to any cpu, something nice and regular.  Using a subreg (of a
> > pseudo!) is the standard way of writing a bitcast.
> > 
> > So generic code would do a  (subreg:SF (reg:SI) 0)  to express a 32-bit
> > integer bitcast to an IEEE SP number, and our machine description should
> > make it work nicely.
> 
> There’s also a byte offset in subreg, so (subreg:sf (reg:di) 4) is a Highpart 
> bitcast.

There are at least six very different kinds of subreg:

0) Lvalue subregs.  Most archs have no use for it, and it can be
   expressed much more clearly and cleanly always.
1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
   this will go away.
2) Subregs of hard registers.  Do not use, there are much better ways to
   write subregs of a non-zero byte offset, and for zero offset this is
   non-canonical RTL.
3) Bitcast subregs.  In principle they go from one mode to another mode
   of the same size (but read on).
4) Paradoxical subregs.  A concept completely separate from the rest,
   different rules for everything, it has to be special cased almost
   everywhere, it would be better if it was a separate rtx_code imo.
5) Finally, normal subregs, taking a contiguous span of bits from some
   value.

Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
written as just one subreg, as you say.  And a 4) of a 5) is just
invalid afaics (and let's not talk about 0)..2) anymore :-) )

> Note whether targets actually support subreg operations needs to be queried 
> and I’m not sure how subreg with offset validation should work there.

But 3) is always valid, no?  On pseudos.


Segher


[PATCH] libstdc++, configure: Fix GLIBCXX_ZONEINFO_DIR configuration macro.

2022-12-23 Thread Iain Sandoe via Gcc-patches
 This is a patch for comment on the approach - tested on x86_64-darwi21
 thoughts?
 Iain
 
 --- 8< ---

Testing on Darwin revealed that the GLIBCXX_ZONEINFO_DIR was not doing quite
the right thing (we ended up with ${withval} in the config.h file).

This patch proposes revising the behaviour of the configure flag thus:

--with-libstdcxx-zoneinfo-dir=
 unspecified : Set _GLIBCXX_ZONEINFO_DIR to a default suitable for $host
 yes : Set _GLIBCXX_ZONEINFO_DIR to a default suitable for $host
 no  : Do not set _GLIBCXX_ZONEINFO_DIR
 /some/path  : set _GLIBCXX_ZONEINFO_DIR = "/some/path"

Signed-off-by: Iain Sandoe 

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ZONEINFO_DIR): Revise configure flag
handling.
* configure: Regenerate.
* src/c++20/tzdb.cc: Add a comment that an unset _GLIBCXX_ZONEINFO_DIR
implies that the configuration specified that no directory should be
used.
---
 libstdc++-v3/acinclude.m4  | 21 ++---
 libstdc++-v3/configure | 28 +++-
 libstdc++-v3/src/c++20/tzdb.cc |  1 +
 3 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index f73946a4918..3653822aed4 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -5153,18 +5153,25 @@ AC_DEFUN([GLIBCXX_ZONEINFO_DIR], [
   AC_ARG_WITH([libstdcxx-zoneinfo-dir],
 AC_HELP_STRING([--with-libstdcxx-zoneinfo-dir],
   [the directory to search for tzdata files]),
-[zoneinfo_dir="${withval}"
- AC_DEFINE(_GLIBCXX_ZONEINFO_DIR, "${withval}",
-   [Define if a non-default location should be used for tzdata files.])
-],
-[
+[],[with_libstdcxx_zoneinfo_dir=yes])
+
+  # Pick a default when no specific path is set.
+  if test x${with_libstdcxx_zoneinfo_dir} = xyes; then
 case "$host" in
   # *-*-aix*) zoneinfo_dir="/usr/share/lib/zoneinfo" ;;
+  *-*-darwin2*) zoneinfo_dir="/usr/share/lib/zoneinfo.default" ;;
   *) zoneinfo_dir="/usr/share/zoneinfo" ;;
 esac
-])
-
+  elif test x${with_libstdcxx_zoneinfo_dir} = xno; then
+zoneinfo_dir=none
+  else
+zoneinfo_dir=${with_libstdcxx_zoneinfo_dir}
+  fi
   AC_MSG_NOTICE([zoneinfo data directory: ${zoneinfo_dir}])
+  if test x${zoneinfo_dir} != xnone; then
+AC_DEFINE_UNQUOTED(_GLIBCXX_ZONEINFO_DIR, "${zoneinfo_dir}",
+   [Define if a non-default location should be used for tzdata files.])
+  fi
 ])
 
 # Macros from the top-level gcc directory.

diff --git a/libstdc++-v3/src/c++20/tzdb.cc b/libstdc++-v3/src/c++20/tzdb.cc
index 5f5c4199f65..c4311d0902a 100644
--- a/libstdc++-v3/src/c++20/tzdb.cc
+++ b/libstdc++-v3/src/c++20/tzdb.cc
@@ -52,6 +52,7 @@
 # endif
 #endif
 
+// This is a bit odd; the configure-time setting was 'no zoneinfo directory'
 #ifndef _GLIBCXX_ZONEINFO_DIR
 # define _GLIBCXX_ZONEINFO_DIR "/usr/share/zoneinfo"
 #endif
-- 
2.37.1 (Apple Git-137.1)



Re: [x86 PATCH] Use movss/movsd to implement V4SI/V2DI VEC_PERM.

2022-12-23 Thread Uros Bizjak via Gcc-patches
On Fri, Dec 23, 2022 at 5:46 PM Roger Sayle  wrote:
>
>
> This patch tweaks the x86 backend to use the movss and movsd instructions
> to perform some vector permutations on integer vectors (V4SI and V2DI) in
> the same way they are used for floating point vectors (V4SF and V2DF).
>
> As a motivating example, consider:
>
> typedef unsigned int v4si __attribute__((vector_size(16)));
> typedef float v4sf __attribute__((vector_size(16)));
> v4si foo(v4si x,v4si y) { return (v4si){y[0],x[1],x[2],x[3]}; }
> v4sf bar(v4sf x,v4sf y) { return (v4sf){y[0],x[1],x[2],x[3]}; }
>
> which is currently compiled with -O2 to:
>
> foo:movdqa  %xmm0, %xmm2
> shufps  $80, %xmm0, %xmm1
> movdqa  %xmm1, %xmm0
> shufps  $232, %xmm2, %xmm0
> ret
>
> bar:movss   %xmm1, %xmm0
> ret
>
> with this patch both functions compile to the same form.
> Likewise for the V2DI case:
>
> typedef unsigned long v2di __attribute__((vector_size(16)));
> typedef double v2df __attribute__((vector_size(16)));
>
> v2di foo(v2di x,v2di y) { return (v2di){y[0],x[1]}; }
> v2df bar(v2df x,v2df y) { return (v2df){y[0],x[1]}; }
>
> which is currently generates:
>
> foo:shufpd  $2, %xmm0, %xmm1
> movdqa  %xmm1, %xmm0
> ret
>
> bar:movsd   %xmm1, %xmm0
> ret
>
> There are two possible approaches to adding integer vector forms of the
> sse_movss and sse2_movsd instructions.  One is to use a mode iterator
> (VI4F_128 or VI8F_128) on the existing define_insn patterns, but this
> requires renaming the patterns to sse_movss_ which then requires
> changes to i386-builtins.def and through-out the backend to reflect the
> new naming of gen_sse_movss_v4sf.  The alternate approach (taken here)
> is to simply clone and specialize the existing patterns.  Uros, if you'd
> prefer the first approach, I'm happy to make/test/commit those changes.

I would really prefer the variant with VI4F_128/VI8F_128, these two
iterators were introduced specifically for this case (see e.g.
sse_shufps_ and sse2_shufpd_. The internal name of the
pattern is fairly irrelevant and a trivial search and replace
operation can replace the grand total of 6 occurrences ...)

Also, changing sse2_movsd to use VI8F_128 mode iterator would enable
more alternatives besides movsd, so we give combine pass some more
opportunities with memory operands.

So, the patch with those two iterators is pre-approved.

Uros.

> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
> 2022-12-23  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (expand_vec_perm_movs): Also allow
> V4SImode with TARGET_SSE and V2DImode with TARGET_SSE2.
> * config/i386/sse.md (sse_movss_v4si): New define_insn, a V4SI
> specialization of sse_movss.
> (sse2_movsd_v2di): Likewise, a V2DI specialization of sse2_movsd.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/sse-movss-4.c: New test case.
> * gcc.target/i386/sse2-movsd-3.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-23 Thread Jose E. Marchesi via Gcc-patches


> On Fri, 23 Dec 2022, Qing Zhao wrote:
>> >> I am a little confused, you mean pre-RA scheduler does not look at the 
>> >> data flow
>> >> information at all when scheduling insns across calls currently?
>> > 
>> > I think it does not inspect liveness info, and may extend lifetime of a 
>> > pseudo
>> > across a call, transforming
>> > 
>> >  call foo
>> >  reg = 1
>> >  ...
>> >  use reg
>> > 
>> > to
>> > 
>> >  reg = 1
>> >  call foo
>> >  ...
>> >  use reg
>> > 
>> > but this is undesirable, because now register allocation cannot select a
>> > call-clobbered register for 'reg’.
>> Okay, thanks for the explanation.
>> 
>> Then, why not just check the liveness info instead of inhibiting all 
>> scheduling across calls?
>
> Because there's almost nothing to gain from pre-RA scheduling across calls in
> the first place. Remember that the call transfers control flow elsewhere and
> therefore the scheduler has no idea about the pipeline state after the call
> and after the return, so modeling-wise it's a gamble.
>
> For instructions that lie on a critical path such scheduling can be useful 
> when
> it substantially reduces the difference between the priority of the call and
> nearby instructions of the critical path. But we don't track which 
> instructions
> are on critical path(s) and which are not.
>
> (scheduling across calls in sched2 is somewhat dubious as well, but
> it doesn't risk register pressure issues, and on VLIW CPUs it at least
> can result in better VLIW packing)

Does sched2 actually schedule across calls?  All the comments in the
source code stress the fact that the second scheduler pass (after
register allocation) works in regions that correspond to basic blocks:
"(after reload, each region is of one block)".


Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-23 Thread Alexander Monakov via Gcc-patches


On Fri, 23 Dec 2022, Jose E. Marchesi wrote:

> > (scheduling across calls in sched2 is somewhat dubious as well, but
> > it doesn't risk register pressure issues, and on VLIW CPUs it at least
> > can result in better VLIW packing)
> 
> Does sched2 actually schedule across calls?  All the comments in the
> source code stress the fact that the second scheduler pass (after
> register allocation) works in regions that correspond to basic blocks:
> "(after reload, each region is of one block)".

A call instruction does not end a basic block.

(also, with -fsched2-use-superblocks sched2 works on regions like sched1)

Alexander


Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-23 Thread Jose E. Marchesi via Gcc-patches


> On Fri, 23 Dec 2022, Jose E. Marchesi wrote:
>
>> > (scheduling across calls in sched2 is somewhat dubious as well, but
>> > it doesn't risk register pressure issues, and on VLIW CPUs it at least
>> > can result in better VLIW packing)
>> 
>> Does sched2 actually schedule across calls?  All the comments in the
>> source code stress the fact that the second scheduler pass (after
>> register allocation) works in regions that correspond to basic blocks:
>> "(after reload, each region is of one block)".
>
> A call instruction does not end a basic block.

Ok, so my original assumption in the patch explaining why I disabled
sched1 but not sched2 was not correct.  Good to know.

> (also, with -fsched2-use-superblocks sched2 works on regions like sched1)
>
> Alexander


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-23 Thread Richard Biener via Gcc-patches



> Am 23.12.2022 um 17:55 schrieb Segher Boessenkool 
> :
> 
> On Fri, Dec 23, 2022 at 05:20:09PM +0100, Richard Biener wrote:
 Am 23.12.2022 um 15:48 schrieb Segher Boessenkool 
 :
>>> None of this belongs in generic code at all imo.  At expand time it
>>> should be expanded to something that works and can be optimised well,
>>> so not anything with :BLK (which has to be put in memory, something with
>>> unbounded size cannot be put in registers), not anything specifically
>>> tailored to any cpu, something nice and regular.  Using a subreg (of a
>>> pseudo!) is the standard way of writing a bitcast.
>>> 
>>> So generic code would do a  (subreg:SF (reg:SI) 0)  to express a 32-bit
>>> integer bitcast to an IEEE SP number, and our machine description should
>>> make it work nicely.
>> 
>> There’s also a byte offset in subreg, so (subreg:sf (reg:di) 4) is a 
>> Highpart bitcast.
> 
> There are at least six very different kinds of subreg:
> 
> 0) Lvalue subregs.  Most archs have no use for it, and it can be
>   expressed much more clearly and cleanly always.
> 1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
>   this will go away.
> 2) Subregs of hard registers.  Do not use, there are much better ways to
>   write subregs of a non-zero byte offset, and for zero offset this is
>   non-canonical RTL.
> 3) Bitcast subregs.  In principle they go from one mode to another mode
>   of the same size (but read on).
> 4) Paradoxical subregs.  A concept completely separate from the rest,
>   different rules for everything, it has to be special cased almost
>   everywhere, it would be better if it was a separate rtx_code imo.
> 5) Finally, normal subregs, taking a contiguous span of bits from some
>   value.
> 
> Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
> written as just one subreg, as you say.  And a 4) of a 5) is just
> invalid afaics (and let's not talk about 0)..2) anymore :-) )
> 
>> Note whether targets actually support subreg operations needs to be queried 
>> and I’m not sure how subreg with offset validation should work there.
> 
> But 3) is always valid, no?  On pseudos.

Yes, but it will eventually result in a spill/reload which is undesirable when 
we created this from CSE from a load.  So I think for CSE we do want to know 
whether a spill will definitely not occur.

Richard 
> 
> Segher


Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-23 Thread Qing Zhao via Gcc-patches
Then, sched2 still can move insn across calls? 
So does sched2 have the same issue of incorrectly moving  the insn across a 
call which has unknown control flow?

Qing

> On Dec 23, 2022, at 12:31 PM, Alexander Monakov  wrote:
> 
> 
> On Fri, 23 Dec 2022, Jose E. Marchesi wrote:
> 
>>> (scheduling across calls in sched2 is somewhat dubious as well, but
>>> it doesn't risk register pressure issues, and on VLIW CPUs it at least
>>> can result in better VLIW packing)
>> 
>> Does sched2 actually schedule across calls?  All the comments in the
>> source code stress the fact that the second scheduler pass (after
>> register allocation) works in regions that correspond to basic blocks:
>> "(after reload, each region is of one block)".
> 
> A call instruction does not end a basic block.
> 
> (also, with -fsched2-use-superblocks sched2 works on regions like sched1)
> 
> Alexander



Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-23 Thread Alexander Monakov via Gcc-patches



On Fri, 23 Dec 2022, Qing Zhao wrote:

> Then, sched2 still can move insn across calls? 
> So does sched2 have the same issue of incorrectly moving  the insn across a 
> call which has unknown control flow?

I think problems are unlikely because register allocator assigns pseudos that
cross setjmp to memory.

I think you hit the problem with sched1 because most testing is done on x86 and
sched1 is not enabled there, otherwise the problem would have been noticed much
earlier.

Alexander


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-23 Thread Segher Boessenkool
On Fri, Dec 23, 2022 at 08:13:48PM +0100, Richard Biener wrote:
> > Am 23.12.2022 um 17:55 schrieb Segher Boessenkool 
> > :
> > There are at least six very different kinds of subreg:
> > 
> > 0) Lvalue subregs.  Most archs have no use for it, and it can be
> >   expressed much more clearly and cleanly always.
> > 1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
> >   this will go away.
> > 2) Subregs of hard registers.  Do not use, there are much better ways to
> >   write subregs of a non-zero byte offset, and for zero offset this is
> >   non-canonical RTL.
> > 3) Bitcast subregs.  In principle they go from one mode to another mode
> >   of the same size (but read on).
> > 4) Paradoxical subregs.  A concept completely separate from the rest,
> >   different rules for everything, it has to be special cased almost
> >   everywhere, it would be better if it was a separate rtx_code imo.
> > 5) Finally, normal subregs, taking a contiguous span of bits from some
> >   value.
> > 
> > Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
> > written as just one subreg, as you say.  And a 4) of a 5) is just
> > invalid afaics (and let's not talk about 0)..2) anymore :-) )
> > 
> >> Note whether targets actually support subreg operations needs to be 
> >> queried and I’m not sure how subreg with offset validation should work 
> >> there.
> > 
> > But 3) is always valid, no?  On pseudos.
> 
> Yes, but it will eventually result in a spill/reload which is undesirable 
> when we created this from CSE from a load.  So I think for CSE we do want to 
> know whether a spill will definitely not occur.

Does it cause reloads though?  On any sane backend?  If no movsf pattern
allows integer registers, can things work at all?

Anyway, the normal way to test if some RTL is valid is to just generate
it (using validate_change) and then do apply_change_group, which then
cancels the changes if they do not work.  CSE already does some of this.

(I am doubtful doing any of this in CSE is a good idea fwiw).


Segher


Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-23 Thread Qing Zhao via Gcc-patches



> On Dec 23, 2022, at 2:36 PM, Alexander Monakov  wrote:
> 
> 
> 
> On Fri, 23 Dec 2022, Qing Zhao wrote:
> 
>> Then, sched2 still can move insn across calls? 
>> So does sched2 have the same issue of incorrectly moving  the insn across a 
>> call which has unknown control flow?
> 
> I think problems are unlikely because register allocator assigns pseudos that
> cross setjmp to memory.
> 
> I think you hit the problem with sched1 because most testing is done on x86 
> and
> sched1 is not enabled there, otherwise the problem would have been noticed 
> much
> earlier.


Yes, the problem with this bug is in sched1 on aarch64.  On x86 the same issue 
will be exposed when explicitly enable sched1 with -fschedule-insns. 

BTW, Why sched1 is not enabled on x86 by default?

Another question is:  As discussed in the original bug PR57067: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57067
The root cause of this issue related to the abnormal control flow edges (from 
setjmp/longjmp) cannot be represented correctly at RTL stage, shall we fix
this root cause instead? 

Qing


> Alexander



Re: [PATCH v2] rs6000: Rework option -mpowerpc64 handling [PR106680]

2022-12-23 Thread Segher Boessenkool
Hi!

On Wed, Oct 12, 2022 at 04:12:21PM +0800, Kewen.Lin wrote:
> PR106680 shows that -m32 -mpowerpc64 is different from
> -mpowerpc64 -m32, this is determined by the way how we
> handle option powerpc64 in rs6000_handle_option.
> 
> Segher pointed out this difference should be taken as
> a bug and we should ensure that option powerpc64 is
> independent of -m32/-m64.  So this patch removes the
> handlings in rs6000_handle_option and add some necessary
> supports in rs6000_option_override_internal instead.

Sorry for the late review.

> +  /* Don't expect powerpc64 enabled on those OSes with OS_MISSING_POWERPC64,
> + since they don't support saving the high part of 64-bit registers on
> + context switch.  If the user explicitly specifies it, we won't interfere
> + with the user's specification.  */

It depends on the OS, and what you call "context switch".  For example
on Linux the context switches done by the kernel are fine, only things
done by setjmp/longjmp and getcontext/setcontext are not.  So just be a
bit more vague here?  "Since they do not save and restore the high half
of the GPRs correctly in all cases", something like that?

Okay for trunk like that.  Thanks!


Segher


Re: [PATCH] Fortran: incorrect array bounds when bound intrinsic used in decl [PR108131]

2022-12-23 Thread Jerry D via Gcc-patches

On 12/17/22 1:21 PM, Harald Anlauf via Fortran wrote:

Dear all,

the previous fix for pr103505 introduced a regression that could lead
to wrong array bounds when LBOUND/UBOUND were used in the array spec
of a declaration.  The reason was that we tried to simplify too early
the array element spec, which appears to have interfered with the
subtle semantics of the bound intrinsics.

The solution is to undo the fix for pr103505.  It turns out that
there are other code changes in place that were put in place to
fix related ICEs, and which handle that one, too, and only lead
to a change of the emitted error diagnostics.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?



Yes, OK for mainline.

My thought is that this is the kind of bug that can go unseen with 
incorrect array bounds so is a good candidate to backport.  At least 12, 
10 and 11 if you have time and it is applicable.


As this is a 10/11/12/13 regression, I would like to backport
as seems fit.

Thanks,
Harald





Re: nvptx: '-mframe-malloc-threshold', '-Wframe-malloc-threshold' (was: Handling of large stack objects in GPU code generation -- maybe transform into heap allocation?)

2022-12-23 Thread Jerry D via Gcc-patches

On 12/23/22 6:08 AM, Thomas Schwinge wrote:

Hi!

On 2022-11-11T15:35:44+0100, Richard Biener via Fortran  
wrote:

On Fri, Nov 11, 2022 at 3:13 PM Thomas Schwinge  wrote:

For example, for Fortran code like:

 write (*,*) "Hello world"

..., 'gfortran' creates:

 struct __st_parameter_dt dt_parm.0;

 try
   {
 dt_parm.0.common.filename = 
&"source-gcc/libgomp/testsuite/libgomp.oacc-fortran/print-1_.f90"[1]{lb: 1 sz: 
1};
 dt_parm.0.common.line = 29;
 dt_parm.0.common.flags = 128;
 dt_parm.0.common.unit = 6;
 _gfortran_st_write (&dt_parm.0);
 _gfortran_transfer_character_write (&dt_parm.0, &"Hello world"[1]{lb: 
1 sz: 1}, 11);
 _gfortran_st_write_done (&dt_parm.0);
   }
 finally
   {
 dt_parm.0 = {CLOBBER(eol)};
   }

The issue: the stack object 'dt_parm.0' is a half-KiB in size (yes,
really! -- there's a lot of state in Fortran I/O apparently).  That's a
problem for GPU execution -- here: OpenACC/nvptx -- where typically you
have small stacks.  (For example, GCC/OpenACC/nvptx: 1 KiB per thread;
GCC/OpenMP/nvptx is an exception, because of its use of '-msoft-stack'
"Use custom stacks instead of local memory for automatic storage".)

Now, the Nvidia Driver tries to accomodate for such largish stack usage,
and dynamically increases the per-thread stack as necessary (thereby
potentially reducing parallelism) -- if it manages to understand the call
graph.  In case of libgfortran I/O, it evidently doesn't.  Not being able
to disprove existance of recursion is the common problem, as I've read.
At run time, via 'CU_JIT_INFO_LOG_BUFFER' you then get, for example:

 warning : Stack size for entry function 'MAIN__$_omp_fn$0' cannot be 
statically determined

That's still not an actual problem: if the GPU kernel's stack usage still
fits into 1 KiB.  Very often it does, but if, as happens in libgfortran
I/O handling, there is another such 'dt_parm' put onto the stack, the
stack then overflows; device-side SIGSEGV.

(There is, by the way, some similar analysis by Tom de Vries in
 "[nvptx, openacc, openmp, testsuite]
Recursive tests may fail due to thread stack limit".)

Of course, you shouldn't really be doing I/O in GPU kernels, but people
do like their occasional "'printf' debugging", so we ought to make that
work (... without pessimizing any "normal" code).

I assume that generally reducing the size of 'dt_parm' etc. is out of
scope.


There are so many wiggles and turns and corner cases and the like of 
nightmares in I/O I would advise not trying to reduce the dt_parm.  It 
could probably be done.


For debugging GPU, would it not be better to have a way you signal back 
to a main thread to do a print from there, like some sort of call back 
in the users code under test.


Putting this another way, recommend users debugging to use a different 
method than embedding print statements for debugging rather than do a 
tone of work to enable something that is not really a legitimate use case.


FWIW,

Jerry


Re: [PATCH] libstdc++, configure: Fix GLIBCXX_ZONEINFO_DIR configuration macro.

2022-12-23 Thread Jonathan Wakely via Gcc-patches
On Fri, 23 Dec 2022, 17:06 Iain Sandoe via Libstdc++, 
wrote:

>  This is a patch for comment on the approach - tested on x86_64-darwi21
>  thoughts?
>  Iain
>
>  --- 8< ---
>
> Testing on Darwin revealed that the GLIBCXX_ZONEINFO_DIR was not doing
> quite
> the right thing (we ended up with ${withval} in the config.h file).
>
> This patch proposes revising the behaviour of the configure flag thus:
>
> --with-libstdcxx-zoneinfo-dir=
>  unspecified : Set _GLIBCXX_ZONEINFO_DIR to a default suitable for $host
>  yes : Set _GLIBCXX_ZONEINFO_DIR to a default suitable for $host
>  no  : Do not set _GLIBCXX_ZONEINFO_DIR
>

What's the use case for "no"? Enforcing a UTC-only tzdb that doesn't even
try to load the tzdata? If that's desirable, we could #ifdef huge parts of
src/c++20/tzdb.cc to make the library smaller. That might make sense for a
toolchain for embedded targets where it's known there's no need for time
zone conversions.



 /some/path  : set _GLIBCXX_ZONEINFO_DIR = "/some/path"
>
> Signed-off-by: Iain Sandoe 
>
> libstdc++-v3/ChangeLog:
>
> * acinclude.m4 (GLIBCXX_ZONEINFO_DIR): Revise configure flag
> handling.
> * configure: Regenerate.
> * src/c++20/tzdb.cc: Add a comment that an unset
> _GLIBCXX_ZONEINFO_DIR
> implies that the configuration specified that no directory should
> be
> used.
> ---
>  libstdc++-v3/acinclude.m4  | 21 ++---
>  libstdc++-v3/configure | 28 +++-
>  libstdc++-v3/src/c++20/tzdb.cc |  1 +
>  3 files changed, 34 insertions(+), 16 deletions(-)
>
> diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
> index f73946a4918..3653822aed4 100644
> --- a/libstdc++-v3/acinclude.m4
> +++ b/libstdc++-v3/acinclude.m4
> @@ -5153,18 +5153,25 @@ AC_DEFUN([GLIBCXX_ZONEINFO_DIR], [
>AC_ARG_WITH([libstdcxx-zoneinfo-dir],
>  AC_HELP_STRING([--with-libstdcxx-zoneinfo-dir],
>[the directory to search for tzdata files]),
> -[zoneinfo_dir="${withval}"
> - AC_DEFINE(_GLIBCXX_ZONEINFO_DIR, "${withval}",
> -   [Define if a non-default location should be used for tzdata
> files.])
> -],
> -[
> +[],[with_libstdcxx_zoneinfo_dir=yes])
> +
> +  # Pick a default when no specific path is set.
> +  if test x${with_libstdcxx_zoneinfo_dir} = xyes; then
>  case "$host" in
># *-*-aix*) zoneinfo_dir="/usr/share/lib/zoneinfo" ;;
> +  *-*-darwin2*) zoneinfo_dir="/usr/share/lib/zoneinfo.default" ;;
>*) zoneinfo_dir="/usr/share/zoneinfo" ;;
>  esac
> -])
> -
> +  elif test x${with_libstdcxx_zoneinfo_dir} = xno; then
> +zoneinfo_dir=none
> +  else
> +zoneinfo_dir=${with_libstdcxx_zoneinfo_dir}
> +  fi
>AC_MSG_NOTICE([zoneinfo data directory: ${zoneinfo_dir}])
> +  if test x${zoneinfo_dir} != xnone; then
> +AC_DEFINE_UNQUOTED(_GLIBCXX_ZONEINFO_DIR, "${zoneinfo_dir}",
> +   [Define if a non-default location should be used for tzdata
> files.])
> +  fi
>  ])
>
>  # Macros from the top-level gcc directory.
>
> diff --git a/libstdc++-v3/src/c++20/tzdb.cc
> b/libstdc++-v3/src/c++20/tzdb.cc
> index 5f5c4199f65..c4311d0902a 100644
> --- a/libstdc++-v3/src/c++20/tzdb.cc
> +++ b/libstdc++-v3/src/c++20/tzdb.cc
> @@ -52,6 +52,7 @@
>  # endif
>  #endif
>
> +// This is a bit odd; the configure-time setting was 'no zoneinfo
> directory'
>  #ifndef _GLIBCXX_ZONEINFO_DIR
>  # define _GLIBCXX_ZONEINFO_DIR "/usr/share/zoneinfo"
>  #endif
> --
> 2.37.1 (Apple Git-137.1)
>
>


Re: Adding a new thread model to GCC

2022-12-23 Thread Jonathan Yong via Gcc-patches

On 12/22/22 12:28, i.nix...@autistici.org wrote:

On 2022-12-22 12:21, Jonathan Yong wrote:

hello,


On 12/16/22 19:20, Eric Botcazou wrote:

The libgcc parts look reasonable to me, but I can't approve them.
Maybe Jonathan Yong can approve those parts as mingw-w64 target
maintainer, or maybe a libgcc approver can do so.


OK.


The libstdc++ parts are OK for trunk. IIUC they could go in
separately, they just wouldn't be very much use without the libgcc
changes.


Sure thing.



Ping, need help to commit it?


yes, it would be great if we can merge the path into gcc-13!

I've tested it on gcc-12-branch and gcc-master for i686/x86_64 windows, 
with msvcrt and ucrt runtime - works as it should!


Eric ^^^



best!


Done, pushed to master branch. Thanks Eric.



OpenPGP_0x713B5FE29C145D45_and_old_rev.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: Ping^2: [PATCH] d: Update __FreeBSD_version values [PR107469]

2022-12-23 Thread Gerald Pfeifer
Hi Ian (and Andreas),

On Wed, 14 Dec 2022, Lorenzo Salvadore wrote:
> Ping https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605685.html
> 
> I would like to remind that Gerald Pfeifer already volunteered to commit 
> this patch when it is approved. However the patch has not been approved 
> yet.

I am tempted to commit this under our obvious rule (and this has been part 
of the FreeBSD ports for weeks now). 

It still would be preferable to get your review (and approval ideally ;-), 
though. Would you mind having a look?

(Andreas, any take as GCC's FreeBSD maintainer?)

Thanks,
Gerald

>> --- Original Message ---
>> On Friday, November 11th, 2022 at 12:07 AM, Lorenzo Salvadore 
>> develo...@lorenzosalvadore.it wrote:
>> 
>>> Update __FreeBSD_version values for the latest FreeBSD supported
>>> versions. In particular, add __FreeBSD_version for FreeBSD 14, which is
>>> necessary to compile libphobos successfully on FreeBSD 14.
>>> 
>>> The patch has already been applied successfully in the official FreeBSD
>>> ports tree for the ports lang/gcc11 and lang/gcc11-devel. Please see the
>>> following commits:
>>> 
>>> https://cgit.freebsd.org/ports/commit/?id=f61fb49b2e76fd4f7a5b7a11510b5109206c19f2
>>> https://cgit.freebsd.org/ports/commit/?id=57936dba89ea208e5dbc1bd2d7fda3d29a1838b3
>>> 
>>> libphobos/ChangeLog:
>>> 
>>> 2022-11-10 Lorenzo Salvadore develo...@lorenzosalvadore.it
>>> 
>>> PR d/107469.
>>> * libdruntime/core/sys/freebsd/config.d: Update __FreeBSD_version.
>>> 
>>> ---
>>> libphobos/libdruntime/core/sys/freebsd/config.d | 5 +++--
>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/libphobos/libdruntime/core/sys/freebsd/config.d 
>>> b/libphobos/libdruntime/core/sys/freebsd/config.d
>>> index 5e3129e2422..9d502e52e32 100644
>>> --- a/libphobos/libdruntime/core/sys/freebsd/config.d
>>> +++ b/libphobos/libdruntime/core/sys/freebsd/config.d
>>> @@ -14,8 +14,9 @@ public import core.sys.posix.config;
>>> // NOTE: When adding newer versions of FreeBSD, verify all current versioned
>>> // bindings are still compatible with the release.
>>> 
>>> - version (FreeBSD_13) enum __FreeBSD_version = 130;
>>> -else version (FreeBSD_12) enum __FreeBSD_version = 1202000;
>>> + version (FreeBSD_14) enum __FreeBSD_version = 140;
>>> +else version (FreeBSD_13) enum __FreeBSD_version = 1301000;
>>> +else version (FreeBSD_12) enum __FreeBSD_version = 1203000;
>>> else version (FreeBSD_11) enum __FreeBSD_version = 1104000;
>>> else version (FreeBSD_10) enum __FreeBSD_version = 1004000;
>>> else version (FreeBSD_9) enum __FreeBSD_version = 903000;
>>> --
>>> 2.38.0


[r13-4873 Regression] FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times \tmovd\t 3 on Linux/x86_64

2022-12-23 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

0b2c1369d035e92847cca81fd9f7b4e9ab9da710 is the first bad commit
commit 0b2c1369d035e92847cca81fd9f7b4e9ab9da710
Author: Roger Sayle 
Date:   Fri Dec 23 09:56:30 2022 +

PR target/107548: Handle vec_select in STV on x86.

caused

FAIL: gcc.target/i386/pr107548-1.c scan-assembler-times \tmovd\t 3

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-4873/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr107548-1.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


[PATCH] RISC-V: Fix ICE of visiting non-existing block in CFG.

2022-12-23 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch is to fix issue of visiting non-existing block of CFG.
Since blocks index of CFG in GCC are not always contiguous, we will potentially
visit a gap block which is no existing in the current CFG.

This patch can avoid visiting non existing block in CFG.

I noticed such issue in my internal regression of current testsuite 
when I change the X86 server machine. This patch fix it:
17:27:15  job(build_and_test_rv32): Increased FAIL List:
17:27:15  job(build_and_test_rv32): FAIL: 
gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-46.c
-O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler 
error: Segmentation fault)

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc 
(pass_vsetvl::compute_global_backward_infos): Change to visit CFG.
(pass_vsetvl::prune_expressions): Ditto.

---
 gcc/config/riscv/riscv-vsetvl.cc | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index a55b5a1c394..0d66765e09c 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1962,12 +1962,10 @@ pass_vsetvl::compute_global_backward_infos (void)
   if (dump_file)
 {
   fprintf (dump_file, "\n\nDirty blocks list: ");
-  for (size_t i = 0; i < m_vector_manager->vector_block_infos.length ();
-  i++)
-   {
- if (m_vector_manager->vector_block_infos[i].reaching_out.dirty_p ())
-   fprintf (dump_file, "%ld ", i);
-   }
+  for (const bb_info *bb : crtl->ssa->bbs ())
+   if (m_vector_manager->vector_block_infos[bb->index ()]
+ .reaching_out.dirty_p ())
+ fprintf (dump_file, "%d ", bb->index ());
   fprintf (dump_file, "\n\n");
 }
 }
@@ -1976,15 +1974,16 @@ pass_vsetvl::compute_global_backward_infos (void)
 void
 pass_vsetvl::prune_expressions (void)
 {
-  for (size_t i = 0; i < m_vector_manager->vector_block_infos.length (); i++)
+  for (const bb_info *bb : crtl->ssa->bbs ())
 {
-  if (m_vector_manager->vector_block_infos[i].local_dem.valid_or_dirty_p 
())
+  if (m_vector_manager->vector_block_infos[bb->index ()]
+   .local_dem.valid_or_dirty_p ())
m_vector_manager->create_expr (
- m_vector_manager->vector_block_infos[i].local_dem);
-  if (m_vector_manager->vector_block_infos[i]
+ m_vector_manager->vector_block_infos[bb->index ()].local_dem);
+  if (m_vector_manager->vector_block_infos[bb->index ()]
.reaching_out.valid_or_dirty_p ())
m_vector_manager->create_expr (
- m_vector_manager->vector_block_infos[i].reaching_out);
+ m_vector_manager->vector_block_infos[bb->index ()].reaching_out);
 }
 
   if (dump_file)
-- 
2.36.3



Re: Adding a new thread model to GCC

2022-12-23 Thread NightStrike via Gcc-patches
On Fri, Dec 23, 2022 at 7:00 PM Jonathan Yong via Gcc-patches
 wrote:
>
> On 12/22/22 12:28, i.nix...@autistici.org wrote:
> > On 2022-12-22 12:21, Jonathan Yong wrote:
> >
> > hello,
> >
> >> On 12/16/22 19:20, Eric Botcazou wrote:
>  The libgcc parts look reasonable to me, but I can't approve them.
>  Maybe Jonathan Yong can approve those parts as mingw-w64 target
>  maintainer, or maybe a libgcc approver can do so.
> >>>
> >>> OK.
> >>>
>  The libstdc++ parts are OK for trunk. IIUC they could go in
>  separately, they just wouldn't be very much use without the libgcc
>  changes.
> >>>
> >>> Sure thing.
> >>>
> >>
> >> Ping, need help to commit it?
> >
> > yes, it would be great if we can merge the path into gcc-13!
> >
> > I've tested it on gcc-12-branch and gcc-master for i686/x86_64 windows,
> > with msvcrt and ucrt runtime - works as it should!
> >
> > Eric ^^^
> >
> >
> >
> > best!
>
> Done, pushed to master branch. Thanks Eric.


I think this might have broken fortran.  I'm assuming because the
backtrace includes gthr.h, and I just did a git pull:

In file included from /tmp/rtmingw/mingw/include/windows.h:71,
 from ../libgcc/gthr-default.h:606,
 from ../../../libgfortran/../libgcc/gthr.h:148,
 from ../../../libgfortran/io/io.h:33,
 from ../../../libgfortran/runtime/error.c:27:
../../../libgfortran/io/io.h:298:24: error: expected identifier before
numeric constant
  298 | { CC_LIST, CC_FORTRAN, CC_NONE,
  |^~~


Re: Adding a new thread model to GCC

2022-12-23 Thread i.nixman--- via Gcc-patches

On 2022-12-23 23:59, Jonathan Yong wrote:


Done, pushed to master branch. Thanks Eric.



thank you Jonathan!


Re: Adding a new thread model to GCC

2022-12-23 Thread i.nixman--- via Gcc-patches

On 2022-12-24 05:58, NightStrike wrote:


I think this might have broken fortran.  I'm assuming because the
backtrace includes gthr.h, and I just did a git pull:

In file included from /tmp/rtmingw/mingw/include/windows.h:71,
 from ../libgcc/gthr-default.h:606,
 from ../../../libgfortran/../libgcc/gthr.h:148,
 from ../../../libgfortran/io/io.h:33,
 from ../../../libgfortran/runtime/error.c:27:
../../../libgfortran/io/io.h:298:24: error: expected identifier before
numeric constant
  298 | { CC_LIST, CC_FORTRAN, CC_NONE,
  |^~~



hmm...

I don't remember if I specified `fortran` in `--enable-language` in my 
test builds...

will try to build again now...