from:"Maxim Kuvyrkov"

Re: [PATCH v2] ARM: Block predication on atomics [PR111235]

2023-10-03 Thread Maxim Kuvyrkov

> On Oct 1, 2023, at 00:36, Ramana Radhakrishnan  
> wrote:
> 
> + linaro-toolchain as I don't understand the CI issues on patchwork.
> 
> 
...
> Ok if no regressions but as you might get nagged by the post commit CI ...

I don't see any pre-commit failures for this patch, but regardless of what 
results are for pre-commit CI, there's always a chance to identify problems in 
post-commit CI -- simply because we test wa-a-ay more configurations in 
post-commit CI than in pre-commit CI.

> 
> While it is not policy yet to look at these bots but given the
> enthusiasm at Cauldron for patchwork and pre-commit CI and because all
> my armhf boxes are boxed up, I decided to do something a bit novel !
> 
> I tried reviewing this via patchwork
> 
> https://patchwork.sourceware.org/project/gcc/patch/pawpr08mb8982a6aa40749b74cad14c5783...@pawpr08mb8982.eurprd08.prod.outlook.com/
> 
> and notice that
> 
> https://ci.linaro.org/job/tcwg_gcc_build--master-arm-precommit/2393/artifact/artifacts/artifacts.precommit/notify/mail-body.txt
> says nothing could be built.

Um, no.  This says ...
===
Results changed to
# reset_artifacts:
-10
# true:
0
# build_abe gcc:
1

From
# reset_artifacts:
-10
# true:
0
# build_abe gcc:
1
===
... i.e., build succeeded both before and after patch.  We'll change the 
boilerplate intro for successful builds from ...
"Dear contributor, our automatic CI has detected problems related to your 
patch(es)."
... to ...
"Dear contributor, you are awesome, no CI failures related to your patch(es)".

One things that is strange -- testsuite builds were not triggered, we have only 
2 reports from build tests, but are missing another 2 reports from testsuite 
tests.

> 
> Possibly worth double checking the status for it being a false
> negative as to why the build failed.

Pre-commit CI is happy with the patch, albeit testsuite checks didn't run for 
some reason.  Regardless, we'll quickly catch and report any fallout in the 
post-commit CI once the patch is merged.

> 
> It was green on patchwork but remembering that Green is not Green for
> CI in patchwork I clicked on the afore mentioned ci.linaro.org link
> and see that it's actually broken.

Unfortunately, I seem to have confused developers about green and red at my 
Cauldron presentation.  "Green/Red" in patchwork mean the usual PASS/FAIL.  
It's only in post-commit CI in jenkins interface green and red mean something 
different.

--
Maxim Kuvyrkov
https://www.linaro.org

[PATCH] Format gotools.sum closer to what DejaGnu does

2023-11-02 Thread Maxim Kuvyrkov

... to restore compatability with validate_failures.py .
The testsuite script validate_failures.py expects
"Running  ..." to extract  values,
and gotools.sum provided "Running ".

Note that libgo.sum, which also uses Makefile logic to generate
DejaGnu-like output, already has "..." suffix.

gotools/ChangeLog:

* Makefile.am: Update "Running  ..." output
* Makefile.in: Regenerate.
---
 gotools/Makefile.am | 4 ++--
 gotools/Makefile.in | 5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gotools/Makefile.am b/gotools/Makefile.am
index 7b5302990f8..d2376b9c25b 100644
--- a/gotools/Makefile.am
+++ b/gotools/Makefile.am
@@ -332,8 +332,8 @@ check: check-head check-go-tool check-runtime 
check-cgo-test check-carchive-test
@cp gotools.sum gotools.log
@for file in cmd_go-testlog runtime-testlog cgo-testlog 
carchive-testlog cmd_vet-testlog embed-testlog; do \
  testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \
- echo "Running $${testname}" >> gotools.sum; \
- echo "Running $${testname}" >> gotools.log; \
+ echo "Running $${testname} ..." >> gotools.sum; \
+ echo "Running $${testname} ..." >> gotools.log; \
  sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> gotools.log; \
  grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' -e 
's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \
done
diff --git a/gotools/Makefile.in b/gotools/Makefile.in
index 2783b91ef4b..9cc238e748d 100644
--- a/gotools/Makefile.in
+++ b/gotools/Makefile.in
@@ -317,6 +317,7 @@ pdfdir = @pdfdir@
 prefix = @prefix@
 program_transform_name = @program_transform_name@
 psdir = @psdir@
+runstatedir = @runstatedir@
 sbindir = @sbindir@
 sharedstatedir = @sharedstatedir@
 srcdir = @srcdir@
@@ -1003,8 +1004,8 @@ mostlyclean-local:
 @NATIVE_TRUE@  @cp gotools.sum gotools.log
 @NATIVE_TRUE@  @for file in cmd_go-testlog runtime-testlog cgo-testlog 
carchive-testlog cmd_vet-testlog embed-testlog; do \
 @NATIVE_TRUE@testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; 
\
-@NATIVE_TRUE@echo "Running $${testname}" >> gotools.sum; \
-@NATIVE_TRUE@echo "Running $${testname}" >> gotools.log; \
+@NATIVE_TRUE@echo "Running $${testname} ..." >> gotools.sum; \
+@NATIVE_TRUE@echo "Running $${testname} ..." >> gotools.log; \
 @NATIVE_TRUE@sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> 
gotools.log; \
 @NATIVE_TRUE@grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' 
-e 's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \
 @NATIVE_TRUE@  done
-- 
2.34.1

Re: [PATCH 01/12] [contrib] validate_failures.py: Avoid testsuite aliasing

2023-11-02 Thread Maxim Kuvyrkov

Patch proposed at 
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635000.html
--
Maxim Kuvyrkov
https://www.linaro.org

> On Sep 27, 2023, at 18:47, Maxim Kuvyrkov  wrote:
> 
> Hi Bernhard,
> 
> Thanks, I meant to fix this, but forgot.
> 
> The underlying problem here is that we want to detect which sub-testsuites 
> had failures.  Current regex doesn't match go's case because there is no 
> "..." at the end: "Running foo" vs "Running foo ..." .
> 
> My preferred way of fixing this is to make go's testsuite print out "..." .  
> We have a similar patch for glibc [1].
> 
> [1] https://sourceware.org/pipermail/libc-alpha/2023-June/148702.html
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 
>> On Sep 26, 2023, at 19:46, Bernhard Reutner-Fischer  
>> wrote:
>> 
>> Hi Maxim!
>> 
>> On Mon, 5 Jun 2023 18:06:25 +0400
>> Maxim Kuvyrkov via Gcc-patches  wrote:
>> 
>>>> On Jun 3, 2023, at 19:17, Jeff Law  wrote:
>>>> 
>>>> On 6/2/23 09:20, Maxim Kuvyrkov via Gcc-patches wrote:  
>>>>> This patch adds tracking of current testsuite "tool" and "exp"
>>>>> to the processing of .sum files.  This avoids aliasing between
>>>>> tests from different testsuites with same name+description.
>>>>> E.g., this is necessary for testsuite/c-c++-common, which is ran
>>>>> for both gcc and g++ "tools".
>>>>> This patch changes manifest format from ...
>>>>> 
>>>>> FAIL: gcc_test
>>>>> FAIL: g++_test
>>>>> 
>>>>> ... to ...
>>>>> 
>>>>> === gcc tests ===
>>>>> Running gcc/foo.exp ...
>>>>> FAIL: gcc_test
>>>>> === gcc Summary ==
>>>>> === g++ tests ===
>>>>> Running g++/bar.exp ...
>>>>> FAIL: g++_test
>>>>> === g++ Summary ==
>>>>> .
>>>>> The new format uses same formatting as DejaGnu's .sum files
>>>>> to specify which "tool" and "exp" the test belongs to.  
>>>> I think the series is fine.  You're not likely to hear from Diego or Doug 
>>>> I suspect, I don't think either are involved in GNU stuff anymore.
>>>> 
>>> 
>>> Thanks, Jeff.  I'll wait for a couple of days and will merge if there are 
>>> no new comments.
>> 
>> Maxim, may i ask you to have a look at the following problem, please?
>> 
>> ISTM that your exp code does not work as expected for go, maybe you
>> forgot to test the changes with go enabled?
>> 
>> Ever since your changes in summer i see the following:
>> 
>> gcc-14.mine$ 
>> /scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py 
>> --clean_build ../gcc-14.orig/
>> Getting actual results from build directory .
>> ./gcc/testsuite/go/go.sum
>> ./gcc/testsuite/gcc/gcc.sum
>> ./gcc/testsuite/objc/objc.sum
>> ./gcc/testsuite/jit/jit.sum
>> ./gcc/testsuite/gdc/gdc.sum
>> ./gcc/testsuite/gnat/gnat.sum
>> ./gcc/testsuite/ada/acats/acats.sum
>> ./gcc/testsuite/g++/g++.sum
>> ./gcc/testsuite/obj-c++/obj-c++.sum
>> ./gcc/testsuite/rust/rust.sum
>> ./gcc/testsuite/gfortran/gfortran.sum
>> ./x86_64-pc-linux-gnu/libgomp/testsuite/libgomp.sum
>> ./x86_64-pc-linux-gnu/libphobos/testsuite/libphobos.sum
>> ./x86_64-pc-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum
>> ./x86_64-pc-linux-gnu/libffi/testsuite/libffi.sum
>> ./x86_64-pc-linux-gnu/libitm/testsuite/libitm.sum
>> ./x86_64-pc-linux-gnu/libgo/libgo.sum
>> ./x86_64-pc-linux-gnu/libatomic/testsuite/libatomic.sum
>> ./gotools/gotools.sum
>> .sum file seems to be broken: tool="gotools", exp="None", 
>> summary_line="FAIL: TestScript"
>> Traceback (most recent call last):
>> File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 732, in 
>>   retval = Main(sys.argv)
>> File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 721, in Main
>>   retval = CompareBuilds()
>> File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 622, in CompareBuilds
>>   actual = GetResults(sum_files)
>> File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 466, in GetResults
>>

Re: [PATCH] Format gotools.sum closer to what DejaGnu does

2023-11-02 Thread Maxim Kuvyrkov

> On Nov 2, 2023, at 21:02, rep.dot@gmail.com wrote:
> 
> Hi Maxim!
> 
> Many thanks for the patch! Quick question below..
> 
> On 2 November 2023 13:48:55 CET, Maxim Kuvyrkov  
> wrote:
>> ... to restore compatability with validate_failures.py .
>> The testsuite script validate_failures.py expects
>> "Running  ..." to extract  values,
>> and gotools.sum provided "Running ".
>> 
>> Note that libgo.sum, which also uses Makefile logic to generate
>> DejaGnu-like output, already has "..." suffix.
>> 
>> gotools/ChangeLog:
>> 
>> * Makefile.am: Update "Running  ..." output
>> * Makefile.in: Regenerate.
>> ---
>> gotools/Makefile.am | 4 ++--
>> gotools/Makefile.in | 5 +++--
>> 2 files changed, 5 insertions(+), 4 deletions(-)
>> 
>> diff --git a/gotools/Makefile.am b/gotools/Makefile.am
>> index 7b5302990f8..d2376b9c25b 100644
>> --- a/gotools/Makefile.am
>> +++ b/gotools/Makefile.am
>> @@ -332,8 +332,8 @@ check: check-head check-go-tool check-runtime 
>> check-cgo-test check-carchive-test
>> @cp gotools.sum gotools.log
>> @for file in cmd_go-testlog runtime-testlog cgo-testlog carchive-testlog 
>> cmd_vet-testlog embed-testlog; do \
>>   testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \
>> -   echo "Running $${testname}" >> gotools.sum; \
>> -   echo "Running $${testname}" >> gotools.log; \
>> +   echo "Running $${testname} ..." >> gotools.sum; \
>> +   echo "Running $${testname} ..." >> gotools.log; \
>>   sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> gotools.log; \
>>   grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' -e 
>> 's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \
>> done
>> diff --git a/gotools/Makefile.in b/gotools/Makefile.in
>> index 2783b91ef4b..9cc238e748d 100644
>> --- a/gotools/Makefile.in
>> +++ b/gotools/Makefile.in
>> @@ -317,6 +317,7 @@ pdfdir = @pdfdir@
>> prefix = @prefix@
>> program_transform_name = @program_transform_name@
>> psdir = @psdir@
>> +runstatedir = @runstatedir@
> 
> Are you sure you used the correct version of automake?

I used automake 1.15.1 (from Ubuntu 20.04 automake-1.15 package), and I 
double-checked after getting the runstatedir update.

I would appreciate someone checking on their side to make sure I don't have 
something weird going on in my setup.

--
Maxim Kuvyrkov
https://www.linaro.org

[PATCH v2] Format gotools.sum closer to what DejaGnu does

2023-11-03 Thread Maxim Kuvyrkov

The only difference compared to v1 is using vanilla automake 1.15.1
to regenerate Makefile.in.

I'll merge this as obvious if no-one objects in a day.

===
... to restore compatability with validate_failures.py .
The testsuite script validate_failures.py expects
"Running  ..." to extract  values,
and gotools.sum provided "Running ".

Note that libgo.sum, which also uses Makefile logic to generate
DejaGnu-like output, already has "..." suffix.

gotools/ChangeLog:

* Makefile.am: Update "Running  ..." output
    * Makefile.in: Regenerate.

Signed-off-by: Maxim Kuvyrkov 
---
 gotools/Makefile.am | 4 ++--
 gotools/Makefile.in | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gotools/Makefile.am b/gotools/Makefile.am
index 7b5302990f8..d2376b9c25b 100644
--- a/gotools/Makefile.am
+++ b/gotools/Makefile.am
@@ -332,8 +332,8 @@ check: check-head check-go-tool check-runtime 
check-cgo-test check-carchive-test
@cp gotools.sum gotools.log
@for file in cmd_go-testlog runtime-testlog cgo-testlog 
carchive-testlog cmd_vet-testlog embed-testlog; do \
  testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \
- echo "Running $${testname}" >> gotools.sum; \
- echo "Running $${testname}" >> gotools.log; \
+ echo "Running $${testname} ..." >> gotools.sum; \
+ echo "Running $${testname} ..." >> gotools.log; \
  sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> gotools.log; \
  grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' -e 
's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \
done
diff --git a/gotools/Makefile.in b/gotools/Makefile.in
index 2783b91ef4b..36c2ec2abd3 100644
--- a/gotools/Makefile.in
+++ b/gotools/Makefile.in
@@ -1003,8 +1003,8 @@ mostlyclean-local:
 @NATIVE_TRUE@  @cp gotools.sum gotools.log
 @NATIVE_TRUE@  @for file in cmd_go-testlog runtime-testlog cgo-testlog 
carchive-testlog cmd_vet-testlog embed-testlog; do \
 @NATIVE_TRUE@testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; 
\
-@NATIVE_TRUE@echo "Running $${testname}" >> gotools.sum; \
-@NATIVE_TRUE@echo "Running $${testname}" >> gotools.log; \
+@NATIVE_TRUE@echo "Running $${testname} ..." >> gotools.sum; \
+@NATIVE_TRUE@echo "Running $${testname} ..." >> gotools.log; \
 @NATIVE_TRUE@sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> 
gotools.log; \
 @NATIVE_TRUE@grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' 
-e 's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \
 @NATIVE_TRUE@  done
-- 
2.34.1

Re: [PATCH] simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]

2020-02-03 Thread Maxim Kuvyrkov

Hi Richard,

You patch below increases code-size on aarch64-linux-gnu with -Os on SPEC2k6 
400.perlbench and 453.povray -- by 1% and 2% respectively.

400.perlbench,perlbench_base.default,   101,939261,951221
453.povray,povray_base.default, 102,707807,721399

Would you please check whether these can be avoided?  [Let me know if you need 
help reproducing this problem.]

Thank you,

--
Maxim Kuvyrkov
https://www.linaro.org

> On Jan 27, 2020, at 7:41 PM, Richard Sandiford  
> wrote:
> 
> In the gcc.target/aarch64/lsl_asr_sbfiz.c part of this PR, we have:
> 
> Failed to match this instruction:
> (set (reg:SI 95)
>(ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 97) 0)
>(const_int 3 [0x3])
>(const_int 0 [0])) 0)
>(const_int 19 [0x13])))
> 
> If we perform the natural simplification to:
> 
> (set (reg:SI 95)
>(ashift:SI (sign_extract:SI (reg:SI 97)
>(const_int 3 [0x3])
>(const_int 0 [0])) 0)
>(const_int 19 [0x13])))
> 
> then the pattern matches.  And it turns out that we do have a
> simplification like that already, but it would only kick in for
> extractions from a reg, not a subreg.  E.g.:
> 
> (set (reg:SI 95)
>(ashift:SI (subreg:SI (sign_extract:DI (reg:DI X)
>(const_int 3 [0x3])
>(const_int 0 [0])) 0)
>(const_int 19 [0x13])))
> 
> would simplify to:
> 
> (set (reg:SI 95)
>(ashift:SI (sign_extract:SI (subreg:SI (reg:DI X) 0)
>(const_int 3 [0x3])
>(const_int 0 [0])) 0)
>(const_int 19 [0x13])))
> 
> IMO the subreg case is even more obviously a simplification
> than the bare reg case, since the net effect is to remove
> either one or two subregs, rather than simply change the
> position of a subreg/truncation.
> 
> However, doing that regressed gcc.dg/tree-ssa/pr64910-2.c
> for -m32 on x86_64-linux-gnu, because we could then simplify
> a :HI zero_extract to a :QI one.  The associated *testqi_ext_3
> pattern did already seem to want to handle QImode extractions:
> 
>  "ix86_match_ccmode (insn, CCNOmode)
>   && ((TARGET_64BIT && GET_MODE (operands[2]) == DImode)
>   || GET_MODE (operands[2]) == SImode
>   || GET_MODE (operands[2]) == HImode
>   || GET_MODE (operands[2]) == QImode)
> 
> but I'm not sure how often the QI case would trigger in practice,
> since the zero_extract mode was restricted to HI and above.  I checked
> the other x86 patterns and couldn't see any other instances of this.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu,
> OK to install?
> 
> Richard
> 
> 
> 2020-01-27  Richard Sandiford  
> 
> gcc/
>   PR rtl-optimization/87763
>   * simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
>   simplification to handle subregs as well as bare regs.
>   * config/i386/i386.md (*testqi_ext_3): Match QI extracts too.
> ---
> gcc/config/i386/i386.md | 2 +-
> gcc/simplify-rtx.c  | 4 +++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 6e9c9bd2fb6..a125ab350bb 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -8927,7 +8927,7 @@ (define_insn "*testqi_ext_2"
> (define_insn_and_split "*testqi_ext_3"
>   [(set (match_operand 0 "flags_reg_operand")
> (match_operator 1 "compare_operator"
> -   [(zero_extract:SWI248
> +   [(zero_extract:SWI
>(match_operand 2 "nonimmediate_operand" "rm")
>(match_operand 3 "const_int_operand" "n")
>(match_operand 4 "const_int_operand" "n"))
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index eff1d07a253..db4f9339c15 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -736,7 +736,9 @@ simplify_truncation (machine_mode mode, rtx op,
>  (*_extract:M1 (truncate:M1 (reg:M2)) (len) (pos')) if possible without
>  changing len.  */
>   if ((GET_CODE (op) == ZERO_EXTRACT || GET_CODE (op) == SIGN_EXTRACT)
> -  && REG_P (XEXP (op, 0))
> +  && (REG_P (XEXP (op, 0))
> +   || (SUBREG_P (XEXP (op, 0))
> +   && REG_P (SUBREG_REG (XEXP (op, 0)
>   && GET_MODE (XEXP (op, 0)) == GET_MODE (op)
>   && CONST_INT_P (XEXP (op, 1))
>   && CONST_INT_P (XEXP (op, 2)))
> -- 
> 2.17.1
>

Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-03-01 Thread Maxim Kuvyrkov

> On Feb 29, 2024, at 21:59, Richard Earnshaw (lists) 
>  wrote:
> 
> On 29/02/2024 17:55, Andrew Pinski (QUIC) wrote:
>>> -Original Message-
>>> From: Maxim Kuvyrkov 
>>> Sent: Thursday, February 29, 2024 9:46 AM
>>> To: Andrew Pinski (QUIC) 
>>> Cc: Evgeny Karpov ; Andrew Pinski
>>> ; Richard Sandiford ; gcc-
>>> patc...@gcc.gnu.org; 10wa...@gmail.com; m...@harmstone.com; Zac
>>> Walker ; Ron Riddle
>>> ; Radek Barton 
>>> Subject: Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW
>>> environments for AArch64
>>> 
>>> WARNING: This email originated from outside of Qualcomm. Please be wary
>>> of any links or attachments, and do not enable macros.
>>> 
>>>> On Feb 29, 2024, at 21:35, Andrew Pinski (QUIC)
>>>  wrote:
>>>> 
>>>> 
>>>> 
>>>>> -----Original Message-
>>>>> From: Evgeny Karpov 
>>>>> Sent: Thursday, February 29, 2024 8:46 AM
>>>>> To: Andrew Pinski 
>>>>> Cc: Richard Sandiford ; gcc-
>>>>> patc...@gcc.gnu.org; 10wa...@gmail.com; Maxim Kuvyrkov
>>>>> ; m...@harmstone.com; Zac Walker
>>>>> ; Ron Riddle ;
>>>>> Radek Barton ; Andrew Pinski (QUIC)
>>>>> 
>>>>> Subject: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments
>>>>> for AArch64
>>>>> 
>>>>> Wednesday, February 28, 2024 2:00 AM
>>>>> Andrew Pinski wrote:
>>>>> 
>>>>>> What does this mean with respect to C++ exceptions? Or you using
>>>>>> SJLJ exceptions support or the dwarf unwinding ones without SEH
>>> support?
>>>>>> I am not sure if SJLJ exceptions is well tested any more in GCC either.
>>>>>> 
>>>>>> Also I have a question if you ran the full GCC/G++ testsuites and
>>>>>> what were the results?
>>>>>> If you did run it, did you use a cross compiler or the native
>>>>>> compiler? Did you do a bootstrap (GCC uses C++ but no exceptions
>>> though)?
>>>>> 
>>>>> As mentioned in the cover letter and the thread, the current
>>>>> contribution covers only the C scope.
>>>>> Exception handling is fully disabled for now.
>>>>> There is an experimental build with C++ and SEH, however, it is not
>>>>> included in the plan for the current contribution.
>>>>> 
>>>>> https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-
>>> build
>>>>> 
>>>>>> If you run using a cross compiler, did you use ssh or some other
>>>>>> route to run the applications?
>>>>>> 
>>>>>> Thanks,
>>>>>> Andrew Pinski
>>>>> 
>>>>> GitHub Actions are used to cross-compile toolchains, packages and
>>>>> tests, and execute tests on Windows Arm64.
>>>> 
>>>> This does not answer my question because what you are running is just
>>> simple testcases and not the FULL GCC testsuite.
>>>> So again have you ran the GCC testsuite and do you have a dejagnu board to
>>> be able to execute the binaries?
>>>> I think without the GCC testsuite ran to find all of the known failures, 
>>>> you are
>>> going to be running into many issues.
>>>> The GCC testsuite includes many tests for ABI corner cases and many
>>> features that you will most likely not think about testing using your simple
>>> testcases.
>>>> In fact I suspect there will be some of the aarch64 testcases which will 
>>>> need
>>> to be modified for the windows ABI which you have not done yet.
>>> 
>>> Hi Andrew,
>>> 
>>> We (Linaro) have a prototype CI loop setup for testing aarch64-w64-
>>> mingw32, and we have results for gcc-c and libatomic -- see [1].
>>> 
>>> The results are far from clean, but that's expected.  This patch series 
>>> aims at
>>> enabling C hello-world only, and subsequent patch series will improve the
>>> state of the port.
>>> 
>>> [1] https://ci.linaro.org/job/tcwg_gnu_mingw_check_gcc--master-woa64-
>>> build/6/artifact/artifacts/sumfiles/
>> 
>> Looking at these results, this port is not in any shape or form to be 
>> upstreamed right now. Even simple -g will cause failures.
>> Note we don't need a clean testsuite run but the patch series is not even 
>> allowing enabling hello world due to the -g not being able to used.
>> 
> 
> It seemed to me as though the patch was posted for comments, not for 
> immediate inclusion.  I agree this isn't ready for committing yet, but 
> neither should the submitters wait until it's perfect before posting it.
> 
> I think it's gcc-15 material, so now is about the right time to be thinking 
> about it.

Hi Andrew,

I agree with Richard.  This patch series is large as is, and it has clear goals:
1. Enable aarch64-w64-mingw32 to compile C-language hello-world.
2. Not regress any other targets.

As far as I know, it achieves both, but both Microsoft and Linaro will do 
additional testing on x86_64-w64-mingw32 to confirm.

There are more features and fixes for aarch64-w64-mingw32 waiting in the 
development repos on github, but I don't see any point in cleaning them up and 
preparing for submission before this already-quiet-large patchset if reviewed.

Thank you,

--
Maxim Kuvyrkov
https://www.linaro.org

[PATCH] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests

2024-03-13 Thread Maxim Kuvyrkov

This patch has been tested on
- aarch64-linux-gnu
- arm-linux-gnueabihf (VFP, NEON disabled by default),
- arm-none-eabi (Soft-FP)
with the following [expected] differences in the test results:

  - FAIL now PASS [FAIL => PASS]:
  Executed from: gcc:gcc.dg/vect/vect.exp
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors)
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
(test for excess errors)

  - UNSUPPORTED disappears[UNSUP=> ]:
  Executed from: g++:g++.dg/vect/vect.exp
g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=gnu++98

  - UNSUPPORTED appears   [ =>UNSUP]:
  Executed from: g++:g++.dg/vect/vect.exp
g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=c++98

  - UNRESOLVED disappears [UNRES=> ]:
  Executed from: gcc:gcc.dg/vect/vect.exp
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
compilation failed to produce executable
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to 
produce executable

This patch was motivated by gcc.dg/vect/pr113576.c, which currently
fails to compile for ARM targets without NEON.

=== CUT ===

Testsuites driven by vect.exp rely on check_vect_support_and_set_flags
to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add
-mfpu=neon for arm-linux-gnueabi).  Unfortunately, these flags are
overwritten by dg-options directive, which can cause tests to fail.

Behavior of dg-options is documented in vect.exp files, but not
all developers look at the .exp file when adding a new testcase.
This caused a few dg-options directives to be used instead of
the more appropriate dg-additional-options.

This patch changes target-independent dg-options into
dg-additional-options.  This patch does not touch target-specific
dg-options and target-specific tests to avoid disturbing the gentle
balance of target-specific vectorization.

This patch also removes a couple of unneeded "dg-do run" directives
to avoid failures on compile-only targets.  Default action is, again,
set by check_vect_support_and_set_flags.

Lastly, I avoided renaming tests that use -O options to O-*
filename format because this support is not consistent between
gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites.
It seems dg-additional-options is cleaner.

This patch does the following,
1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*;
2. do not change { dg-options FOO { target { target-*-pattern } } };
3. do not remove { dg-do run { target { target-*-pattern } } };
4. change { dg-options FOO } to { dg-additional-options FOO };
5. remove { dg-do run } in several tests, where it is clearly not needed.

gcc/testsuite/ChangeLog:

PR testsuite/114307
* g++.dg/vect/pr84556.cc: Fixup.
* gcc.dg/vect/complex/complex-operations-run.c Fixup.
* gcc.dg/vect/gimplefe-40.c Fixup.
* gcc.dg/vect/gimplefe-41.c Fixup.
* gcc.dg/vect/pr101145inf.c Fixup.
* gcc.dg/vect/pr101145inf_1.c Fixup.
* gcc.dg/vect/pr108316.c Fixup.
* gcc.dg/vect/pr109011-1.c Fixup.
* gcc.dg/vect/pr109011-2.c Fixup.
* gcc.dg/vect/pr109011-3.c Fixup.
* gcc.dg/vect/pr109011-4.c Fixup.
* gcc.dg/vect/pr109011-5.c Fixup.
* gcc.dg/vect/pr111846.c Fixup.
* gcc.dg/vect/pr111860-2.c Fixup.
* gcc.dg/vect/pr111860-3.c Fixup.
* gcc.dg/vect/pr113002.c Fixup.
* gcc.dg/vect/pr113576.c Fixup.
* gcc.dg/vect/pr84711.c Fixup.
* gcc.dg/vect/pr85597.c Fixup.
* gcc.dg/vect/pr88497-1.c Fixup.
* gcc.dg/vect/pr88497-2.c Fixup.
* gcc.dg/vect/pr88497-3.c Fixup.
* gcc.dg/vect/pr88497-4.c Fixup.
* gcc.dg/vect/pr88497-5.c Fixup.
* gcc.dg/vect/pr88497-7.c Fixup.
* gcc.dg/vect/pr92347.c Fixup.
* gcc.dg/vect/pr93069.c Fixup.
* gcc.dg/vect/pr97241.c Fixup.
* gcc.dg/vect/pr99102.c Fixup.
* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c Fixup.
* gcc.dg/vect/vect-early-break_65.c Fixup.
* gcc.dg/vect/vect-fold-1.c Fixup.
* gcc.dg/vect/vect-ifcvt-19.c Fixup.
* gcc.dg/vect/vect-ifcvt-20.c Fixup.
* gcc.dg/vect/vect-reduc-epilogue-gaps.c Fixup.
* gcc.dg/vect/vect-singleton_1.c Fixup.
* gfortran.dg/vect/fast-math-mgrid-resid.f Fixup.
* gfortran.dg/vect/pr77848.f Fixup.
* gfortran.dg/vect/pr90913.f90 Fixup.
---
 gcc/testsuite/g++.dg/vect/pr84556.cc   | 2 +-
 gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c | 1 -
 gcc/testsuite/gcc.dg/vect/gimplefe-40.c| 2 +-
 gcc/testsuite/gcc.dg/vect/gimplefe-41.c| 2 +-
 gcc/testsuite/gcc.dg/vect/pr101145inf.c| 2 +-
 gcc/testsuite/gcc.dg/vect/pr101145inf_1.c  | 2 +-
 gcc/testsuite/gcc.dg/vect/pr

Re: [PATCH] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests

2024-03-13 Thread Maxim Kuvyrkov

> On Mar 13, 2024, at 15:25, Richard Earnshaw  
> wrote:
> 
> 
> 
> On 13/03/2024 10:58, Maxim Kuvyrkov wrote:
>> This patch has been tested on
>> - aarch64-linux-gnu
>> - arm-linux-gnueabihf (VFP, NEON disabled by default),
>> - arm-none-eabi (Soft-FP)
>> with the following [expected] differences in the test results:
>>   - FAIL now PASS [FAIL => PASS]:
>>   Executed from: gcc:gcc.dg/vect/vect.exp
>> gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors)
>> gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
>> (test for excess errors)
>>   - UNSUPPORTED disappears[UNSUP=> ]:
>>   Executed from: g++:g++.dg/vect/vect.exp
>> g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=gnu++98
>>   - UNSUPPORTED appears   [ =>UNSUP]:
>>   Executed from: g++:g++.dg/vect/vect.exp
>> g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=c++98
>>   - UNRESOLVED disappears [UNRES=> ]:
>>   Executed from: gcc:gcc.dg/vect/vect.exp
>> gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
>> compilation failed to produce executable
>> gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to 
>> produce executable
>> This patch was motivated by gcc.dg/vect/pr113576.c, which currently
>> fails to compile for ARM targets without NEON.
>> === CUT ===
>> Testsuites driven by vect.exp rely on check_vect_support_and_set_flags
>> to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add
>> -mfpu=neon for arm-linux-gnueabi).  Unfortunately, these flags are
>> overwritten by dg-options directive, which can cause tests to fail.
>> Behavior of dg-options is documented in vect.exp files, but not
>> all developers look at the .exp file when adding a new testcase.
>> This caused a few dg-options directives to be used instead of
>> the more appropriate dg-additional-options.
>> This patch changes target-independent dg-options into
>> dg-additional-options.  This patch does not touch target-specific
>> dg-options and target-specific tests to avoid disturbing the gentle
>> balance of target-specific vectorization.
>> This patch also removes a couple of unneeded "dg-do run" directives
>> to avoid failures on compile-only targets.  Default action is, again,
>> set by check_vect_support_and_set_flags.
>> Lastly, I avoided renaming tests that use -O options to O-*
>> filename format because this support is not consistent between
>> gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites.
>> It seems dg-additional-options is cleaner.
>> This patch does the following,
>> 1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*;
>> 2. do not change { dg-options FOO { target { target-*-pattern } } };
>> 3. do not remove { dg-do run { target { target-*-pattern } } };
>> 4. change { dg-options FOO } to { dg-additional-options FOO };
>> 5. remove { dg-do run } in several tests, where it is clearly not needed.
>> gcc/testsuite/ChangeLog:
>> PR testsuite/114307
>> * g++.dg/vect/pr84556.cc: Fixup.
>> * gcc.dg/vect/complex/complex-operations-run.c Fixup.
>> * gcc.dg/vect/gimplefe-40.c Fixup.
>> * gcc.dg/vect/gimplefe-41.c Fixup.
>> * gcc.dg/vect/pr101145inf.c Fixup.
>> * gcc.dg/vect/pr101145inf_1.c Fixup.
>> * gcc.dg/vect/pr108316.c Fixup.
>> * gcc.dg/vect/pr109011-1.c Fixup.
>> * gcc.dg/vect/pr109011-2.c Fixup.
>> * gcc.dg/vect/pr109011-3.c Fixup.
>> * gcc.dg/vect/pr109011-4.c Fixup.
>> * gcc.dg/vect/pr109011-5.c Fixup.
>> * gcc.dg/vect/pr111846.c Fixup.
>> * gcc.dg/vect/pr111860-2.c Fixup.
>> * gcc.dg/vect/pr111860-3.c Fixup.
>> * gcc.dg/vect/pr113002.c Fixup.
>> * gcc.dg/vect/pr113576.c Fixup.
>> * gcc.dg/vect/pr84711.c Fixup.
>> * gcc.dg/vect/pr85597.c Fixup.
>> * gcc.dg/vect/pr88497-1.c Fixup.
>> * gcc.dg/vect/pr88497-2.c Fixup.
>> * gcc.dg/vect/pr88497-3.c Fixup.
>> * gcc.dg/vect/pr88497-4.c Fixup.
>> * gcc.dg/vect/pr88497-5.c Fixup.
>> * gcc.dg/vect/pr88497-7.c Fixup.
>> * gcc.dg/vect/pr92347.c Fixup.
>> * gcc.dg/vect/pr93069.c Fixup.
>> * gcc.dg/vect/pr97241.c Fixup.
>> * gcc.dg/vect/pr99102.c Fixup.
>> * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c Fixup.
>> * gcc.dg/vect/vect-early-break_65.c Fixup.
>> * gcc.dg/vect/vect-fold-1.c Fixup.
>> * gcc.dg/vect/vect-ifcvt-19.c Fixup.
>> * gcc.dg/vect/vect-ifcvt-20.c Fixup.
>> * gcc.dg/vect/vect-reduc-epilogue-gaps.c Fixup.
>> * g

[PATCH v2] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests

2024-03-13 Thread Maxim Kuvyrkov

Changes in v2:
- Better changelog entry.
- NFC.


This patch has been tested on
- aarch64-linux-gnu
- arm-linux-gnueabihf (VFP, NEON disabled by default),
- arm-none-eabi (Soft-FP)
with the following [expected] differences in the test results:

  - FAIL now PASS [FAIL => PASS]:
  Executed from: gcc:gcc.dg/vect/vect.exp
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors)
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
(test for excess errors)

  - UNSUPPORTED disappears[UNSUP=> ]:
  Executed from: g++:g++.dg/vect/vect.exp
g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=gnu++98

  - UNSUPPORTED appears   [ =>UNSUP]:
  Executed from: g++:g++.dg/vect/vect.exp
g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=c++98

  - UNRESOLVED disappears [UNRES=> ]:
  Executed from: gcc:gcc.dg/vect/vect.exp
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
compilation failed to produce executable
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to 
produce executable

This patch was motivated by gcc.dg/vect/pr113576.c, which currently
fails to compile for ARM targets without NEON.

=== CUT ===

Testsuites driven by vect.exp rely on check_vect_support_and_set_flags
to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add
-mfpu=neon for arm-linux-gnueabi).  Unfortunately, these flags are
overwritten by dg-options directive, which can cause tests to fail.

Behavior of dg-options is documented in vect.exp files, but not
all developers look at the .exp file when adding a new testcase.
This caused a few dg-options directives to be used instead of
the more appropriate dg-additional-options.

This patch changes target-independent dg-options into
dg-additional-options.  This patch does not touch target-specific
dg-options and target-specific tests to avoid disturbing the gentle
balance of target-specific vectorization.

This patch also removes a couple of unneeded "dg-do run" directives
to avoid failures on compile-only targets.  Default action is, again,
set by check_vect_support_and_set_flags.

Lastly, I avoided renaming tests that use -O options to O-*
filename format because this support is not consistent between
gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites.
It seems dg-additional-options is cleaner.

This patch does the following,
1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*;
2. do not change { dg-options FOO { target { target-*-pattern } } };
3. do not remove { dg-do run { target { target-*-pattern } } };
4. change { dg-options FOO } to { dg-additional-options FOO };
5. remove { dg-do run } in several tests, where it is clearly not needed.

gcc/testsuite/ChangeLog:

PR testsuite/114307
* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: Remove dg-run.
* gcc.dg/vect/complex/complex-operations-run.c: Likewise.
* gcc.dg/vect/pr113576.c: Remove dg-run.  Use dg-additional-options for
test-specific flags.
* gcc.dg/vect/gimplefe-40.c: Use dg-additional-options for
test-specific flags.
* gcc.dg/vect/gimplefe-41.c: Likewise.
* gcc.dg/vect/pr101145inf.c: Likewise.
* gcc.dg/vect/pr101145inf_1.c: Likewise.
* gcc.dg/vect/pr108316.c: Likewise.
* gcc.dg/vect/pr109011-1.c: Likewise.
* gcc.dg/vect/pr109011-2.c: Likewise.
* gcc.dg/vect/pr109011-3.c: Likewise.
* gcc.dg/vect/pr109011-4.c: Likewise.
* gcc.dg/vect/pr109011-5.c: Likewise.
* gcc.dg/vect/pr111846.c: Likewise.
* gcc.dg/vect/pr111860-2.c: Likewise.
* gcc.dg/vect/pr111860-3.c: Likewise.
* gcc.dg/vect/pr113002.c: Likewise.
* gcc.dg/vect/pr84711.c: Likewise.
* gcc.dg/vect/pr85597.c: Likewise.
* gcc.dg/vect/pr88497-1.c: Likewise.
* gcc.dg/vect/pr88497-2.c: Likewise.
* gcc.dg/vect/pr88497-3.c: Likewise.
* gcc.dg/vect/pr88497-4.c: Likewise.
* gcc.dg/vect/pr88497-5.c: Likewise.
* gcc.dg/vect/pr88497-7.c: Likewise.
* gcc.dg/vect/pr92347.c: Likewise.
* gcc.dg/vect/pr93069.c: Likewise.
* gcc.dg/vect/pr97241.c: Likewise.
* gcc.dg/vect/pr99102.c: Likewise.
* gcc.dg/vect/vect-early-break_65.c: Likewise.
* gcc.dg/vect/vect-fold-1.c: Likewise.
* gcc.dg/vect/vect-ifcvt-19.c: Likewise.
* gcc.dg/vect/vect-ifcvt-20.c: Likewise.
* gcc.dg/vect/vect-reduc-epilogue-gaps.c: Likewise.
* gcc.dg/vect/vect-singleton_1.c: Likewise.
* g++.dg/vect/pr84556.cc: Likewise.
* gfortran.dg/vect/fast-math-mgrid-resid.f: Likewise.
* gfortran.dg/vect/pr77848.f: Likewise.
* gfortran.dg/vect/pr90913.f90: Likewise.
---
 gcc/testsuite/g++.dg/vect/pr84556.cc   | 2 +-
 gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c

Re: [PATCH v1 00/13] Add aarch64-w64-mingw32 target

2024-02-21 Thread Maxim Kuvyrkov

Hi Evgeny,

Great job!

For reference, here is a test build of this patch series using Linaro Toolchain 
CI: 
https://ci.linaro.org/view/tcwg-build/job/tcwg_gnu_mingw_build--master-woa64-build/9/artifact/artifacts/
 

--
Maxim Kuvyrkov
https://www.linaro.org

> On Feb 21, 2024, at 21:47, Evgeny Karpov  wrote:
> 
> Hello,
> 
> We would like to take your attention to the review of changes for the
> new GCC target, aarch64-w64-mingw32. The new target will be
> supported, tested, added to CI, and maintained by Linaro. This marks
> the first of three planned patch series contributing to the GCC C
> compiler's support for Windows Arm64.
> 
> 1. Minimal aarch64-w64-mingw32 C implementation to cross-compile
> hello-world with libgcc for Windows Arm64 using MinGW.
> 2. Extension of the aarch64-w64-mingw32 C implementation to
> cross-compile OpenSSL, OpenBLAS, FFmpeg, and libjpeg-turbo. All
> packages successfully pass tests.
> 3. Addition of call stack support for debugging, resolution of
> optimization issues in the C compiler, and DLL export/import for the
> aarch64-w64-mingw32 target.
> 
> This patch series introduces the 1st point, which involves building
> hello-world for the aarch64-w64-mingw32 target. The patch depends on
> the binutils changes for the aarch64-w64-mingw32 target that have
> already been merged.
> 
> The binutils should include recent relocation fixes.
> f87eaf8ff3995a5888c6dc4996a20c770e6bcd36
> aarch64: Add new relocations and limit COFF AArch64 relocation offsets
> 
> The series is structured in a way to trivially show that it should not
> affect any other targets.
> 
> In this patch, several changes have been made to support the
> aarch64-w64-mingw32 target for GCC. The modifications include the
> definition of the MS ABI for aarch64, adjustments to FIXED_REGISTERS
> and STATIC_CHAIN_REGNUM for different ABIs, and specific definitions
> for COFF format on AArch64. Additionally, the patch reuses MinGW
>  types and definitions from i386, relocating them to a new
> mingw folder for shared usage between both targets.
> 
> MinGW-specific options have been introduced for AArch64, along with
> override options for aarch64-w64-mingw32. Builtin stack probing for
> override options for aarch64-w64-mingw32. Builtin stack probing for
> AArch64 has been enabled as an alternative for chkstk. Symbol name
> encoding and section information handling for aarch64-w64-mingw32 have
> been incorporated, and the MinGW environment has been added, which
> will also be utilized for defining the Cygwin environment in the
> future.
> 
> The patch includes renaming "x86 Windows Options" to "Cygwin and MinGW
> Options," which now encompasses AArch64 as well. AArch64-specific
> Cygwin and MinGW Options have been introduced for the unique
> requirements of the AArch64 architecture.
> 
> Function type declaration and named sections support have been added.
> The necessary objects for Cygwin and MinGW have been built for the
> aarch64-w64-mingw32 target, and relevant files such as msformat-c.cc
> and winnt-d.cc have been moved to the mingw folder for reuse in
> AArch64.
> 
> Furthermore, the aarch64-w64-mingw32 target has been included in both
> libatomic and libgcc, ensuring support for the AArch64 architecture
> within these libraries. These changes collectively enhance the
> capabilities of GCC for the specified target.
> 
> Coauthors: Zac Walker ,
> Mark Harmstone   and
> Ron Riddle 
> 
> Refactored, prepared, and validated by 
> Radek Barton  and 
> Evgeny Karpov 
> 
> Special thanks to the Linaro GNU toolchain team for internal review
> and assistance in preparing the patch series!
> 
> Regards,
> Evgeny
> 
> 
> Zac Walker (13):
>  Introduce aarch64-w64-mingw32 target
>  aarch64: The aarch64-w64-mingw32 target implements the MS ABI
>  aarch64: Mark x18 register as a fixed register for MS ABI
>  aarch64: Add aarch64-w64-mingw32 COFF
>  Reuse MinGW from i386 for AArch64
>  Rename section and encoding functions from i386 which will be used in
>aarch64
>  Exclude i386 functionality from aarch64 build
>  aarch64: Add Cygwin and MinGW environments for AArch64
>  aarch64: Add SEH to machine_function
>  Rename "x86 Windows Options" to "Cygwin and MinGW Options"
>  aarch64: Build and add objects for Cygwin and MinGW for AArch64
>  aarch64: Add aarch64-w64-mingw32 target to libatomic
>  Add aarch64-w64-mingw32 target to libgcc
> 
> fixincludes/mkfixinc.sh   |   3 +-
> gcc/config.gcc|  47 +++--
> gcc/config/aarch64/aarch64-coff.h |  92 +
> gcc/config/aarch64/aarch64-opts.h |   7 +
> gcc/config/aarch6

Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-02-29 Thread Maxim Kuvyrkov

> On Feb 29, 2024, at 21:35, Andrew Pinski (QUIC)  
> wrote:
> 
> 
> 
>> -Original Message-
>> From: Evgeny Karpov 
>> Sent: Thursday, February 29, 2024 8:46 AM
>> To: Andrew Pinski 
>> Cc: Richard Sandiford ; gcc-
>> patc...@gcc.gnu.org; 10wa...@gmail.com; Maxim Kuvyrkov
>> ; m...@harmstone.com; Zac Walker
>> ; Ron Riddle ; Radek
>> Barton ; Andrew Pinski (QUIC)
>> 
>> Subject: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments
>> for AArch64
>> 
>> Wednesday, February 28, 2024 2:00 AM
>> Andrew Pinski wrote:
>> 
>>> What does this mean with respect to C++ exceptions? Or you using SJLJ
>>> exceptions support or the dwarf unwinding ones without SEH support?
>>> I am not sure if SJLJ exceptions is well tested any more in GCC either.
>>> 
>>> Also I have a question if you ran the full GCC/G++ testsuites and what
>>> were the results?
>>> If you did run it, did you use a cross compiler or the native
>>> compiler? Did you do a bootstrap (GCC uses C++ but no exceptions though)?
>> 
>> As mentioned in the cover letter and the thread, the current contribution
>> covers only the C scope.
>> Exception handling is fully disabled for now.
>> There is an experimental build with C++ and SEH, however, it is not included 
>> in
>> the plan for the current contribution.
>> 
>> https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build
>> 
>>> If you run using a cross compiler, did you use ssh or some other route
>>> to run the applications?
>>> 
>>> Thanks,
>>> Andrew Pinski
>> 
>> GitHub Actions are used to cross-compile toolchains, packages and tests, and
>> execute tests on Windows Arm64.
> 
> This does not answer my question because what you are running is just simple 
> testcases and not the FULL GCC testsuite.
> So again have you ran the GCC testsuite and do you have a dejagnu board to be 
> able to execute the binaries?
> I think without the GCC testsuite ran to find all of the known failures, you 
> are going to be running into many issues.
> The GCC testsuite includes many tests for ABI corner cases and many features 
> that you will most likely not think about testing using your simple testcases.
> In fact I suspect there will be some of the aarch64 testcases which will need 
> to be modified for the windows ABI which you have not done yet.

Hi Andrew,

We (Linaro) have a prototype CI loop setup for testing aarch64-w64-mingw32, and 
we have results for gcc-c and libatomic -- see [1].

The results are far from clean, but that's expected.  This patch series aims at 
enabling C hello-world only, and subsequent patch series will improve the state 
of the port.

[1] 
https://ci.linaro.org/job/tcwg_gnu_mingw_check_gcc--master-woa64-build/6/artifact/artifacts/sumfiles/

Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org

Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2024-01-15 Thread Maxim Kuvyrkov

Hi Vladimir,
Hi Jeff,

Richard and Alexander have reviewed this patch and [I assume] have no
further comments.  OK to merge?

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov 
wrote:

> This patch avoids sched-deps.cc:find_inc() creating exponential number
> of dependencies, which become memory and compilation time hogs.
> Consider example (simplified from PR96388) ...
> ===
> sp=sp-4 // sp_insnA
> mem_insnA1[sp+A1]
> ...
> mem_insnAN[sp+AN]
> sp=sp-4 // sp_insnB
> mem_insnB1[sp+B1]
> ...
> mem_insnBM[sp+BM]
> ===
>
> [For simplicity, let's assume find_inc(backwards==true)].
> In this example find_modifiable_mems() will arrange for mem_insnA*
> to be able to pass sp_insnA, and, while doing this, will create
> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
> is a consumer of sp_insnA.  After this sp_insnB will have N new
> backward dependencies.
> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
> dependencies.
>
> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
> 30GB and compilation time of 30 minutes, with sched2 accounting for
> 95% of both metrics.  After this patch the RAM usage is down to 1GB
> and compilation time is down to 3-4 minutes, with sched2 no longer
> standing out on -ftime-report or memory usage.
>
> gcc/ChangeLog:
>
> PR rtl-optimization/96388
> PR rtl-optimization/111554
> * sched-deps.cc (find_inc): Avoid exponential behavior.
> ---
>  gcc/sched-deps.cc | 48 +++
>  1 file changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index c23218890f3..005fc0f567e 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii,
> rtx_insn *insn, bool before_mem)
>  /* Once a suitable mem reference has been found and the corresponding data
> in MII has been filled in, this function is called to find a suitable
> add or inc insn involving the register we found in the memory
> -   reference.  */
> +   reference.
> +   If successful, this function will create additional dependencies
> between
> +   - mii->inc_insn's producers and mii->mem_insn as a consumer (if
> backwards)
> +   - mii->inc_insn's consumers and mii->mem_insn as a producer (if
> !backwards).
> +*/
>
>  static bool
>  find_inc (struct mem_inc_info *mii, bool backwards)
>  {
>sd_iterator_def sd_it;
>dep_t dep;
> +  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK :
> SD_LIST_FORW;
> +  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
>
> -  sd_it = sd_iterator_start (mii->mem_insn,
> -backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
> +  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
>while (sd_iterator_cond (&sd_it, &dep))
>  {
>dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
>rtx_insn *pro = DEP_PRO (dep);
>rtx_insn *con = DEP_CON (dep);
> -  rtx_insn *inc_cand = backwards ? pro : con;
> +  rtx_insn *inc_cand;
> +  int n_inc_deps;
> +
>if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
> goto next;
> +
> +  if (backwards)
> +   {
> + inc_cand = pro;
> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
> +   }
> +  else
> +   {
> + inc_cand = con;
> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
> +   }
> +
> +  /* In the FOR_EACH_DEP loop below we will create additional
> n_inc_deps
> +for mem_insn.  This by itself is not a problem, since each
> mem_insn
> +will have only a few inc_insns associated with it.  However, if
> +we consider that a single inc_insn may have a lot of mem_insns,
> AND,
> +on top of that, a few other inc_insns associated with it --
> +those _other inc_insns_ will get (n_mem_deps * number of MEM
> insns)
> +dependencies created for them.  This may cause an exponential
> +growth of memory usage and scheduling time.
> +See PR96388 for details.
> +We [heuristically] use n_inc_deps as a proxy for the number of MEM
> +insns, and drop opportunities for breaking modifiable_mem
> dependencies
> +when dependency lists grow beyond reasonable size.  */
> +  if (n_mem_deps * n_inc_deps
> + >= param_max_pending_list_length * param_max_pending_list_length)
> +   goto next;
> +
>if (parse_add_or_inc (mii, inc_cand, backwards))
>

Re: [PATCH v3 3/8] Simplify handling of INSN_ and EXPR_LISTs in sched-rgn.cc

2024-01-15 Thread Maxim Kuvyrkov

Dear RTL maintainers,

Gently ping.  This patch adds a couple of new functions to lists.cc, which
then are used to simplify logic in the scheduler.  OK to merge?

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov 
wrote:

> This patch simplifies logic behind deps_join(), which will be
> important for the upcoming improvements of sched-deps.cc logging.
>
> The only functional change is that when deps_join() is called with
> empty state for the 2nd argument, it will not reverse INSN_ and
> EXPR_LISTs in the 1st argument.  Before this patch the lists were
> reversed due to use of concat_*_LIST().  Now, with copy_*_LIST()
> used for this case, the lists will remain in the original order.
>
> gcc/ChangeLog:
>
> * lists.cc (copy_EXPR_LIST, concat_EXPR_LIST): New functions.
> * rtl.h (copy_EXPR_LIST, concat_EXPR_LIST): Declare.
> * sched-rgn.cc (concat_insn_list, concat_expr_list): New helpers.
> (concat_insn_mem_list): Simplify.
> (deps_join): Update
> ---
>  gcc/lists.cc | 30 +++-
>  gcc/rtl.h|  4 +++-
>  gcc/sched-rgn.cc | 51 +++-
>  3 files changed, 61 insertions(+), 24 deletions(-)
>
> diff --git a/gcc/lists.cc b/gcc/lists.cc
> index 2cdf37ad533..83e7bf32176 100644
> --- a/gcc/lists.cc
> +++ b/gcc/lists.cc
> @@ -160,6 +160,24 @@ free_INSN_LIST_list (rtx_insn_list **listp)
>free_list ((rtx *)listp, &unused_insn_list);
>  }
>
> +/* Make a copy of the EXPR_LIST list LINK and return it.  */
> +rtx_expr_list *
> +copy_EXPR_LIST (rtx_expr_list *link)
> +{
> +  rtx_expr_list *new_queue;
> +  rtx_expr_list **pqueue = &new_queue;
> +
> +  for (; link; link = link->next ())
> +{
> +  rtx x = link->element ();
> +  rtx_expr_list *newlink = alloc_EXPR_LIST (REG_NOTE_KIND (link), x,
> NULL);
> +  *pqueue = newlink;
> +  pqueue = (rtx_expr_list **)&XEXP (newlink, 1);
> +}
> +  *pqueue = NULL;
> +  return new_queue;
> +}
> +
>  /* Make a copy of the INSN_LIST list LINK and return it.  */
>  rtx_insn_list *
>  copy_INSN_LIST (rtx_insn_list *link)
> @@ -178,12 +196,22 @@ copy_INSN_LIST (rtx_insn_list *link)
>return new_queue;
>  }
>
> +/* Duplicate the EXPR_LIST elements of COPY and prepend them to OLD.  */
> +rtx_expr_list *
> +concat_EXPR_LIST (rtx_expr_list *copy, rtx_expr_list *old)
> +{
> +  rtx_expr_list *new_rtx = old;
> +  for (; copy; copy = copy->next ())
> +new_rtx = alloc_EXPR_LIST (REG_NOTE_KIND (copy), copy->element (),
> new_rtx);
> +  return new_rtx;
> +}
> +
>  /* Duplicate the INSN_LIST elements of COPY and prepend them to OLD.  */
>  rtx_insn_list *
>  concat_INSN_LIST (rtx_insn_list *copy, rtx_insn_list *old)
>  {
>rtx_insn_list *new_rtx = old;
> -  for (; copy ; copy = copy->next ())
> +  for (; copy; copy = copy->next ())
>  {
>new_rtx = alloc_INSN_LIST (copy->insn (), new_rtx);
>PUT_REG_NOTE_KIND (new_rtx, REG_NOTE_KIND (copy));
> diff --git a/gcc/rtl.h b/gcc/rtl.h
> index e4b6cc0dbb5..7e952d7cbeb 100644
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -3764,10 +3764,12 @@ extern void free_EXPR_LIST_list (rtx_expr_list **);
>  extern void free_INSN_LIST_list (rtx_insn_list **);
>  extern void free_EXPR_LIST_node (rtx);
>  extern void free_INSN_LIST_node (rtx);
> +extern rtx_expr_list *alloc_EXPR_LIST (int, rtx, rtx);
>  extern rtx_insn_list *alloc_INSN_LIST (rtx, rtx);
> +extern rtx_expr_list *copy_EXPR_LIST (rtx_expr_list *);
>  extern rtx_insn_list *copy_INSN_LIST (rtx_insn_list *);
> +extern rtx_expr_list *concat_EXPR_LIST (rtx_expr_list *, rtx_expr_list *);
>  extern rtx_insn_list *concat_INSN_LIST (rtx_insn_list *, rtx_insn_list *);
> -extern rtx_expr_list *alloc_EXPR_LIST (int, rtx, rtx);
>  extern void remove_free_INSN_LIST_elem (rtx_insn *, rtx_insn_list **);
>  extern rtx remove_list_elem (rtx, rtx *);
>  extern rtx_insn *remove_free_INSN_LIST_node (rtx_insn_list **);
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index e5964f54ead..da3ec0458ff 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -2585,25 +2585,32 @@ add_branch_dependences (rtx_insn *head, rtx_insn
> *tail)
>
>  static class deps_desc *bb_deps;
>
> +/* Return a new insn_list with all the elements from the two input
> lists.  */
> +static rtx_insn_list *
> +concat_insn_list (rtx_insn_list *copy, rtx_insn_list *old)
> +{
> +  if (!old)
> +return copy_INSN_LIST (copy);
> +  return concat_INSN_LIST (copy, old);
> +}
> +
> +/* Return a new expr_list with all the elements from the two input
> lists.  */
> +static rtx_expr_list *
> +concat_expr_list (rtx_expr_list *cop

Re: [PATCH v3 4/8] Improve and fix sched-deps.cc: dump_dep() and dump_lists().

2024-01-15 Thread Maxim Kuvyrkov

Dear scheduler maintainers,

Gentle ping.  This patch is borderline trivial and affects only the lucky
few who debug sched-deps.cc code.  OK to merge?

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov 
wrote:

> Better propagate flags from dump_lists() into dump_dep() and
> add a missing "]" in dump_lists().
>
> gcc/ChangeLog:
>
> * sched-deps.cc (DUMP_DEP_PRO): Improve comment.
> (dump_dep_flags): Remove.
> (DUMP_LISTS_SIZE, DUMP_LISTS_DEPS, DUMP_LISTS_ALL): Continue
> numbering from DUMP_DEP_* flags.
> (dump_lists): Update and fix.
> ---
>  gcc/sched-deps.cc | 21 +++--
>  1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index 005fc0f567e..4d357079a7a 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -132,7 +132,8 @@ static void dump_ds (FILE *, ds_t);
>  /* Define flags for dump_dep ().  */
>
>  /* Dump producer of the dependence.  */
> -#define DUMP_DEP_PRO (2)
> +#define DUMP_DEP_PRO (2) /* Reserve "1" for handling of DUMP_DEP_ALL and
> +   DUMP_LISTS_ALL.  */
>
>  /* Dump consumer of the dependence.  */
>  #define DUMP_DEP_CON (4)
> @@ -206,9 +207,6 @@ dump_dep (FILE *dump, dep_t dep, int flags)
>fprintf (dump, ">");
>  }
>
> -/* Default flags for dump_dep ().  */
> -static int dump_dep_flags = (DUMP_DEP_PRO | DUMP_DEP_CON);
> -
>  /* Dump all fields of DEP to STDERR.  */
>  void
>  sd_debug_dep (dep_t dep)
> @@ -1454,19 +1452,20 @@ sd_delete_dep (sd_iterator_def sd_it)
>  }
>
>  /* Dump size of the lists.  */
> -#define DUMP_LISTS_SIZE (2)
> +#define DUMP_LISTS_SIZE (32) /* (DUMP_DEP_STATUS << 1)  */
>
>  /* Dump dependencies of the lists.  */
> -#define DUMP_LISTS_DEPS (4)
> +#define DUMP_LISTS_DEPS (64)
>
>  /* Dump all information about the lists.  */
>  #define DUMP_LISTS_ALL (DUMP_LISTS_SIZE | DUMP_LISTS_DEPS)
>
>  /* Dump deps_lists of INSN specified by TYPES to DUMP.
> -   FLAGS is a bit mask specifying what information about the lists needs
> -   to be printed.
> +   FLAGS is a bit mask specifying what information about the lists and
> +   the individual deps needs to be printed, this is a combination of
> +   DUMP_DEP_* and DUMP_LISTS_* flags.
> If FLAGS has the very first bit set, then dump all information about
> -   the lists and propagate this bit into the callee dump functions.  */
> +   the lists and deps propagate this bit into the callee dump functions.
> */
>  static void
>  dump_lists (FILE *dump, rtx insn, sd_list_types_def types, int flags)
>  {
> @@ -1488,10 +1487,12 @@ dump_lists (FILE *dump, rtx insn,
> sd_list_types_def types, int flags)
>  {
>FOR_EACH_DEP (insn, types, sd_it, dep)
> {
> - dump_dep (dump, dep, dump_dep_flags | all);
> + dump_dep (dump, dep, flags | all);
>   fprintf (dump, " ");
> }
>  }
> +
> +  fprintf (dump, "]");
>  }
>
>  /* Dump all information about deps_lists of INSN specified by TYPES
> --
> 2.34.1
>
>

-- 
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH v3 5/8] Add a bit more logging scheduler's dependency analysis

2024-01-15 Thread Maxim Kuvyrkov

Dear scheduler maintainers,

Gentle ping.  This is a trivial patch, which makes debugging sched-deps.cc
slightly more enjoyable.

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov 
wrote:

> gcc/ChangeLog:
>
> * sched-deps.cc (sd_add_dep, find_inc): Add logging about
> dependency creation.
> ---
>  gcc/sched-deps.cc | 30 ++
>  1 file changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index 4d357079a7a..2a87158ba4b 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -1342,6 +1342,13 @@ sd_add_dep (dep_t dep, bool resolved_p)
>   in the bitmap caches of dependency information.  */
>if (true_dependency_cache != NULL)
>  set_dependency_caches (dep);
> +
> +  if (sched_verbose >= 9)
> +{
> +  fprintf (sched_dump, "created dependency ");
> +  dump_dep (sched_dump, dep, 1);
> +  fprintf (sched_dump, "\n");
> +}
>  }
>
>  /* Add or update backward dependence between INSN and ELEM
> @@ -4879,18 +4886,33 @@ find_inc (struct mem_inc_info *mii, bool backwards)
>  we create.  */
>   gcc_assert (mii->inc_insn == inc_cand);
>
> + int n_deps_created = 0;
>   if (backwards)
> {
>   FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
> -   add_dependence_1 (mii->mem_insn, DEP_PRO (dep),
> - REG_DEP_TRUE);
> +   {
> + add_dependence_1 (mii->mem_insn, DEP_PRO (dep),
> +   REG_DEP_TRUE);
> + ++n_deps_created;
> +   }
> }
>   else
> {
>   FOR_EACH_DEP (mii->inc_insn, SD_LIST_FORW, sd_it, dep)
> -   add_dependence_1 (DEP_CON (dep), mii->mem_insn,
> - REG_DEP_ANTI);
> +   {
> + add_dependence_1 (DEP_CON (dep), mii->mem_insn,
> +   REG_DEP_ANTI);
> + ++n_deps_created;
> +   }
> }
> + if (sched_verbose >= 6)
> +   fprintf (sched_dump,
> +"created %d deps for mem_insn %d due to "
> +"inc_insn %d %s deps\n",
> +        n_deps_created, INSN_UID (mii->mem_insn),
> +INSN_UID (mii->inc_insn),
> +backwards ? "backward" : "forward");
> +
>   return true;
> }
>  next:
> --
> 2.34.1
>
>

-- 
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH v3 6/8] sched_deps.cc: Simplify initialization of dependency contexts

2024-01-15 Thread Maxim Kuvyrkov

Dear scheduler maintainers,

Gentle ping.  This is a trivial cleanup.

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov 
wrote:

> gcc/ChangeLog:
>
> * sched-deps.cc (init_deps, init_deps_reg_last): Simplify.
> (free_deps): Remove useless code.
> ---
>  gcc/sched-deps.cc | 13 -
>  1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index 2a87158ba4b..e0d3c97d935 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -3927,10 +3927,9 @@ init_deps (class deps_desc *deps, bool
> lazy_reg_last)
>int max_reg = (reload_completed ? FIRST_PSEUDO_REGISTER : max_reg_num
> ());
>
>deps->max_reg = max_reg;
> -  if (lazy_reg_last)
> -deps->reg_last = NULL;
> -  else
> -deps->reg_last = XCNEWVEC (struct deps_reg, max_reg);
> +  deps->reg_last = NULL;
> +  if (!lazy_reg_last)
> +init_deps_reg_last (deps);
>INIT_REG_SET (&deps->reg_last_in_use);
>
>deps->pending_read_insns = 0;
> @@ -3961,9 +3960,7 @@ init_deps (class deps_desc *deps, bool lazy_reg_last)
>  void
>  init_deps_reg_last (class deps_desc *deps)
>  {
> -  gcc_assert (deps && deps->max_reg > 0);
> -  gcc_assert (deps->reg_last == NULL);
> -
> +  gcc_assert (deps && deps->max_reg > 0 && deps->reg_last == NULL);
>deps->reg_last = XCNEWVEC (struct deps_reg, deps->max_reg);
>  }
>
> @@ -4013,8 +4010,6 @@ free_deps (class deps_desc *deps)
>   it at all.  */
>free (deps->reg_last);
>deps->reg_last = NULL;
> -
> -  deps = NULL;
>  }
>
>  /* Remove INSN from dependence contexts DEPS.  */
> --
> 2.34.1
>
>

-- 
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH v3 7/8] Improve logging of register data in scheduler dependency analysis

2024-01-15 Thread Maxim Kuvyrkov

Dear scheduler maintainers,

Gentle ping.  This patch improves debugging output, it does not touch
scheduling logic.

On Wed, 22 Nov 2023 at 15:15, Maxim Kuvyrkov 
wrote:

> Scheduler dependency analysis uses two main data structures:
> 1. reg_pending_* data contains effects of INSN on the register file,
>which is then incorporated into ...
> 2. deps_desc object, which contains commulative information about all
>instructions processed from deps_desc object's initialization.
>
> This patch adds debug dumping of (1).
>
> gcc/ChangeLog:
>
> * sched-deps.cc (print-rtl.h): Include for str_pattern_slim().
> (dump_reg_pending_data): New function.
> (sched_analyze_insn): Use it.
> ---
>  gcc/sched-deps.cc | 90 ++-
>  1 file changed, 89 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index e0d3c97d935..f9290c82fd2 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "sched-int.h"
>  #include "cselib.h"
>  #include "function-abi.h"
> +#include "print-rtl.h"
>
>  #ifdef INSN_SCHEDULING
>
> @@ -432,10 +433,24 @@ dep_spec_p (dep_t dep)
>return false;
>  }
>
> +/* These regsets describe how a single instruction affects registers.
> +   Their "life-time" is restricted to a single call of
> sched_analyze_insn().
> +   They are populated by sched_analyze_1() and sched_analyze_2(), and
> +   then sched_analyze_insn() transfers data from these into
> deps->reg_last[i].
> +   Near the end sched_analyze_insn() clears these regsets for the next
> +   insn.  */
>  static regset reg_pending_sets;
>  static regset reg_pending_clobbers;
>  static regset reg_pending_uses;
>  static regset reg_pending_control_uses;
> +
> +/* Similar to reg_pending_* regsets, this variable specifies whether
> +   the current insn analyzed by sched_analyze_insn() is a scheduling
> +   barrier that should "split" dependencies inside a block.  Internally
> +   sched-deps.cc does this by pretending that the barrier insn uses and
> +   sets all registers.
> +   Near the end sched_analyze_insn() transfers barrier info from this
> variable
> +   into deps->last_reg_pending_barrier.  */
>  static enum reg_pending_barrier_mode reg_pending_barrier;
>
>  /* Hard registers implicitly clobbered or used (or may be implicitly
> @@ -2880,7 +2895,77 @@ get_implicit_reg_pending_clobbers (HARD_REG_SET
> *temp, rtx_insn *insn)
>*temp &= ~ira_no_alloc_regs;
>  }
>
> -/* Analyze an INSN with pattern X to find all dependencies.  */
> +/* Dump state of reg_pending_* data for debug purposes.
> +   Dump only non-empty data to reduce log clobber.  */
> +static void
> +dump_reg_pending_data (FILE *file, rtx_insn *insn)
> +{
> +  fprintf (file, "\n");
> +  fprintf (file, ";; sched_analysis after insn %d: %s\n",
> +  INSN_UID (insn), str_pattern_slim (PATTERN (insn)));
> +
> +  if (!REG_SET_EMPTY_P (reg_pending_sets)
> +  || !REG_SET_EMPTY_P (reg_pending_clobbers)
> +  || !REG_SET_EMPTY_P (reg_pending_uses)
> +  || !REG_SET_EMPTY_P (reg_pending_control_uses))
> +{
> +  fprintf (file, ";; insn reg");
> +  if (!REG_SET_EMPTY_P (reg_pending_sets))
> +   {
> + fprintf (file, " sets(");
> + dump_regset (reg_pending_sets, file);
> + fprintf (file, ")");
> +   }
> +  if (!REG_SET_EMPTY_P (reg_pending_clobbers))
> +   {
> + fprintf (file, " clobbers(");
> + dump_regset (reg_pending_clobbers, file);
> + fprintf (file, ")");
> +   }
> +  if (!REG_SET_EMPTY_P (reg_pending_uses))
> +   {
> + fprintf (file, " uses(");
> + dump_regset (reg_pending_uses, file);
> + fprintf (file, ")");
> +   }
> +  if (!REG_SET_EMPTY_P (reg_pending_control_uses))
> +   {
> + fprintf (file, " control(");
> + dump_regset (reg_pending_control_uses, file);
> + fprintf (file, ")");
> +   }
> +  fprintf (file, "\n");
> +}
> +
> +  if (reg_pending_barrier)
> +fprintf (file, ";; insn reg barrier: %d\n", reg_pending_barrier);
> +
> +  if (!hard_reg_set_empty_p (implicit_reg_pending_clobbers)
> +  || !hard_reg_set_empty_p (implicit_reg_pending_uses))
> +{
> +  fprintf (file, ";; insn reg");
> +  if (!hard_reg_set_empty_p (implicit_reg_pending_clobbers))
> +   {
>

Re: [PATCH v3 8/8] Improve logging of scheduler dependency analysis context

2024-01-15 Thread Maxim Kuvyrkov

Dear scheduler maintainers,

Gentle ping.  This patch improves debugging output, it does not touch
scheduling logic.

On Wed, 22 Nov 2023 at 15:15, Maxim Kuvyrkov 
wrote:

> Scheduler dependency analysis uses two main data structures:
> 1. reg_pending_* data contains effects of INSN on the register file,
>which is then incorporated into ...
> 2. deps_desc object, which contains commulative information about all
>instructions processed from deps_desc object's initialization.
>
> This patch adds debug dumping of (2).
>
> Dependency analysis contexts (aka deps_desc objects) are huge, but
> each instruction affects only a small amount of data in these objects.
> Therefore, it is most useful to dump differential information
> compared to the dependency state after previous instruction.
>
> gcc/ChangeLog:
>
> * sched-deps.cc (reset_deps, dump_rtx_insn_list)
> (rtx_insn_list_same_p): New helper functions.
> (dump_deps_desc_diff): New function to dump dependency information.
> (sched_analysis_prev_deps): New static variable.
> (sched_analyze_insn): Dump dependency information.
> (init_deps_global, finish_deps_global): Handle
> sched_analysis_prev_deps.
> * sched-int.h (struct deps_reg): Update comments.
> * sched-rgn.cc (concat_insn_list, concat_expr_list): Update
> comments.
> ---
>  gcc/sched-deps.cc | 197 ++
>  gcc/sched-int.h   |   9 ++-
>  gcc/sched-rgn.cc  |   5 ++
>  3 files changed, 210 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index f9290c82fd2..edca9927e23 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -1677,6 +1677,15 @@ delete_all_dependences (rtx_insn *insn)
>  sd_delete_dep (sd_it);
>  }
>
> +/* Re-initialize existing dependency context DEPS to be a copy of FROM.
> */
> +static void
> +reset_deps (class deps_desc *deps, class deps_desc *from)
> +{
> +  free_deps (deps);
> +  init_deps (deps, false);
> +  deps_join (deps, from);
> +}
> +
>  /* All insns in a scheduling group except the first should only have
> dependencies on the previous insn in the group.  So we find the
> first instruction in the scheduling group by walking the dependence
> @@ -2960,6 +2969,177 @@ dump_reg_pending_data (FILE *file, rtx_insn *insn)
>  }
>  }
>
> +/* Dump rtx_insn_list LIST.
> +   Consider moving to lists.cc if there are users outside of
> sched-deps.cc.  */
> +static void
> +dump_rtx_insn_list (FILE *file, rtx_insn_list *list)
> +{
> +  for (; list; list = list->next ())
> +fprintf (file, " %d", INSN_UID (list->insn ()));
> +}
> +
> +/* Return TRUE if lists A and B have same elements in the same order.  */
> +static bool
> +rtx_insn_list_same_p (rtx_insn_list *a, rtx_insn_list *b)
> +{
> +  for (; a && b; a = a->next (), b = b->next ())
> +if (a->insn () != b->insn ())
> +  return false;
> +
> +  if (a || b)
> +return false;
> +
> +  return true;
> +}
> +
> +/* Dump parts of DEPS that are different from PREV.
> +   Dumping all information from dependency context produces huge
> +   hard-to-analize logs; differential dumping is way more managable.  */
> +static void
> +dump_deps_desc_diff (FILE *file, class deps_desc *deps, class deps_desc
> *prev)
> +{
> +  /* Each "paragraph" is a single line of output.  */
> +
> +  /* Note on param_max_pending_list_length:
> + During normal dependency analysis various lists should not exceed
> this
> + limit.  Searching for "!!!" in scheduler logs can point to potential
> bugs
> + or poorly-handled corner-cases.  */
> +
> +  if (!rtx_insn_list_same_p (deps->pending_read_insns,
> +prev->pending_read_insns))
> +{
> +  fprintf (file, ";; deps pending mem reads length(%d):",
> +  deps->pending_read_list_length);
> +  if ((deps->pending_read_list_length +
> deps->pending_write_list_length)
> + >= param_max_pending_list_length)
> +   fprintf (file, "%d insns!!!", deps->pending_read_list_length);
> +  else
> +   dump_rtx_insn_list (file, deps->pending_read_insns);
> +  fprintf (file, "\n");
> +}
> +
> +  if (!rtx_insn_list_same_p (deps->pending_write_insns,
> +prev->pending_write_insns))
> +{
> +  fprintf (file, ";; deps pending mem writes length(%d):",
> +  deps->pending_write_list_length);
> +  if ((deps->pending_read_list_length +
> deps->pendin

Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2024-01-16 Thread Maxim Kuvyrkov

> On Jan 17, 2024, at 10:51, Richard Biener  wrote:
> 
> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law  wrote:
>> 
>> 
>> 
>> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
>>> Hi Vladimir,
>>> Hi Jeff,
>>> 
>>> Richard and Alexander have reviewed this patch and [I assume] have no
>>> further comments.  OK to merge?
>> I think the question is whether or not we're too late.  I know that
>> Richard S has held off on his late-combine pass and I'm holding off on
>> the ext-dce work due to the fact that we're well past stage1 close.
>> 
>> I think the release managers ought to have the final say on this.
> 
> I'm fine with this now, it doesn't change code generation.

Thanks, Richard.

I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit cleanups 
and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens.

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org

Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2024-01-17 Thread Maxim Kuvyrkov

> On Jan 17, 2024, at 19:02, Richard Biener  wrote:
> 
> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
>  wrote:
>> 
>>> On Jan 17, 2024, at 10:51, Richard Biener  
>>> wrote:
>>> 
>>> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law  wrote:
>>>> 
>>>> 
>>>> 
>>>> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
>>>>> Hi Vladimir,
>>>>> Hi Jeff,
>>>>> 
>>>>> Richard and Alexander have reviewed this patch and [I assume] have no
>>>>> further comments.  OK to merge?
>>>> I think the question is whether or not we're too late.  I know that
>>>> Richard S has held off on his late-combine pass and I'm holding off on
>>>> the ext-dce work due to the fact that we're well past stage1 close.
>>>> 
>>>> I think the release managers ought to have the final say on this.
>>> 
>>> I'm fine with this now, it doesn't change code generation.
>> 
>> Thanks, Richard.
>> 
>> I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit 
>> cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when 
>> stage1 opens.
> 
> This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
> 
> gcc/fortran/f95-lang.o differs
> 
> does n_mem_deps or n_inc_deps include debug insns?

Thanks, investigating.

--
Maxim Kuvyrkov
https://www.linaro.org

[PATCH v1] Fix compare-debug bootstrap failure

2024-01-17 Thread Maxim Kuvyrkov

... caused by scheduler fix for PR96388 and PR111554.

This patch adjusts decision sched-deps.cc:find_inc() to use
length of dependency lists sans any DEBUG_INSN instructions.

gcc/ChangeLog:

* haifa-sched.cc (dep_list_size): Make global.
* sched-deps.cc (find_inc): Use instead of sd_lists_size().
* sched-int.h (dep_list_size): Declare.
---
 gcc/haifa-sched.cc | 8 ++--
 gcc/sched-deps.cc  | 6 +++---
 gcc/sched-int.h| 2 ++
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 49ee589aed7..1bc610f9a5f 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -1560,8 +1560,7 @@ contributes_to_priority_p (dep_t dep)
 }
 
 /* Compute the number of nondebug deps in list LIST for INSN.  */
-
-static int
+int
 dep_list_size (rtx_insn *insn, sd_list_types_def list)
 {
   sd_iterator_def sd_it;
@@ -1571,6 +1570,11 @@ dep_list_size (rtx_insn *insn, sd_list_types_def list)
   if (!MAY_HAVE_DEBUG_INSNS)
 return sd_lists_size (insn, list);
 
+  /* TODO: We should split normal and debug insns into separate SD_LIST_*
+ sub-lists, and then we'll be able to use something like
+ sd_lists_size(insn, list & SD_LIST_NON_DEBUG)
+ instead of walking dependencies below.  */
+
   FOR_EACH_DEP (insn, list, sd_it, dep)
 {
   if (DEBUG_INSN_P (DEP_CON (dep)))
diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index 0615007c560..5034e664e5e 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -4791,7 +4791,7 @@ find_inc (struct mem_inc_info *mii, bool backwards)
   sd_iterator_def sd_it;
   dep_t dep;
   sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
-  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
+  int n_mem_deps = dep_list_size (mii->mem_insn, mem_deps);
 
   sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
   while (sd_iterator_cond (&sd_it, &dep))
@@ -4808,12 +4808,12 @@ find_inc (struct mem_inc_info *mii, bool backwards)
   if (backwards)
{
  inc_cand = pro;
- n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
+ n_inc_deps = dep_list_size (inc_cand, SD_LIST_BACK);
}
   else
{
  inc_cand = con;
- n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
+ n_inc_deps = dep_list_size (inc_cand, SD_LIST_FORW);
}
 
   /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index ab784fe0d17..4df092013e9 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -1677,6 +1677,8 @@ extern void sd_copy_back_deps (rtx_insn *, rtx_insn *, 
bool);
 extern void sd_delete_dep (sd_iterator_def);
 extern void sd_debug_lists (rtx, sd_list_types_def);
 
+extern int dep_list_size (rtx_insn *, sd_list_types_def);
+
 /* Macros and declarations for scheduling fusion.  */
 #define FUSION_MAX_PRIORITY (INT_MAX)
 extern bool sched_fusion;
-- 
2.34.1

Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2024-01-17 Thread Maxim Kuvyrkov

> On Jan 17, 2024, at 19:05, Maxim Kuvyrkov  wrote:
> 
>> On Jan 17, 2024, at 19:02, Richard Biener  wrote:
>> 
>> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
>>  wrote:
>>> 
>>>> On Jan 17, 2024, at 10:51, Richard Biener  
>>>> wrote:
>>>> 
>>>> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law  wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
>>>>>> Hi Vladimir,
>>>>>> Hi Jeff,
>>>>>> 
>>>>>> Richard and Alexander have reviewed this patch and [I assume] have no
>>>>>> further comments.  OK to merge?
>>>>> I think the question is whether or not we're too late.  I know that
>>>>> Richard S has held off on his late-combine pass and I'm holding off on
>>>>> the ext-dce work due to the fact that we're well past stage1 close.
>>>>> 
>>>>> I think the release managers ought to have the final say on this.
>>>> 
>>>> I'm fine with this now, it doesn't change code generation.
>>> 
>>> Thanks, Richard.
>>> 
>>> I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit 
>>> cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when 
>>> stage1 opens.
>> 
>> This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
>> 
>> gcc/fortran/f95-lang.o differs
>> 
>> does n_mem_deps or n_inc_deps include debug insns?
> 
> Thanks, investigating.

Hi Richard,

Yes, both n_mem_deps or n_inc_deps include debug insns, I posted a patch for 
this in https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643267.html .  
Testing it now.

If you prefer, I can revert the fix for PR96388 and PR111554.

Kind regards,

--
Maxim Kuvyrkov
https://www.linaro.org

[PATCH] Make gcc.target/arm/bics_3.c testcase a bit more generic [PR113542]

2024-01-25 Thread Maxim Kuvyrkov

After fwprop improvement in r14-8319-g86de9b66480, codegen in
bics_3.c test changed from "bics" to "bic" instruction, with
the overall instruction stream remaining at the same quality.

This patch makes the scan-assembler directive accept both
"bics" and "bic".

BEFORE r14-8319-g86de9b66480:
bicsr0, r0, r1 @ 9  [c=4 l=4]  *andsi_notsi_si_compare0_scratch
mov r0, #1  @ 23[c=4 l=4]  *thumb2_movsi_vfp/1
it  eq
moveq   r0, #0  @ 26[c=8 l=4]  *p *thumb2_movsi_vfp/2
bx  lr  @ 29[c=8 l=4]  *thumb2_return

AFTER r14-8319-g86de9b66480:
bic r0, r0, r1  @ 8 [c=4 l=4]  andsi_notsi_si
subsr0, r0, #0  @ 22[c=4 l=4]  cmpsi2_addneg/0
it  ne
movne   r0, #1  @ 23[c=8 l=4]  *p *thumb2_movsi_vfp/2
bx  lr  @ 26[c=8 l=4]  *thumb2_return

gcc/testsuite/ChangeLog:

PR target/113542
* gcc.target/arm/bics_3.c: Update scan-assembler directive.
---
 gcc/testsuite/gcc.target/arm/bics_3.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/bics_3.c 
b/gcc/testsuite/gcc.target/arm/bics_3.c
index e056b264e15..c5bed3c92d2 100644
--- a/gcc/testsuite/gcc.target/arm/bics_3.c
+++ b/gcc/testsuite/gcc.target/arm/bics_3.c
@@ -35,6 +35,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" 2 
} } */
-/* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, 
.sl #2" 1 } } */
+/* { dg-final { scan-assembler-times "bics?\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" 
2 } } */
+/* { dg-final { scan-assembler-times "bics?\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, 
.sl #2" 1 } } */
 
-- 
2.34.1

Re: [PATCH] tree-optimization/113576 - non-empty latch and may_be_zero vectorization

2024-01-25 Thread Maxim Kuvyrkov

> On Jan 24, 2024, at 18:40, Richard Biener  wrote:
> 
> We can't support niters with may_be_zero when we end up with a
> non-empty latch due to early exit peeling.  At least not in
> the simplistic way the vectorizer handles this now.  Disallow
> it again for exits that are not the last one.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> PR tree-optimization/113576
> * tree-vect-loop.cc (vec_init_loop_exit_info): Only allow
> exits with may_be_zero niters when its the last one.
> 
> * gcc.dg/vect/pr113576.c: New testcase.
> ---
> gcc/testsuite/gcc.dg/vect/pr113576.c | 157 +++
> gcc/tree-vect-loop.cc|   9 +-
> 2 files changed, 164 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/vect/pr113576.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/pr113576.c 
> b/gcc/testsuite/gcc.dg/vect/pr113576.c
> new file mode 100644
> index 000..da5ddb09e33
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr113576.c
> @@ -0,0 +1,157 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3" } */
> +/* { dg-additional-options "-march=skylake-avx512" } */

Please adjust the testcase; this fails on non-x86_64 targets, see [1] and [2].

[1] 
https://patchwork.sourceware.org/project/gcc/patch/20240124144159.51c503858...@sourceware.org/
[2] 
https://ci.linaro.org/job/tcwg_gcc_check--master-aarch64-precommit/4765/artifact/artifacts/artifacts.precommit/00-sumfiles/gcc.log.1.xz

Thanks!

--
Maxim Kuvyrkov
https://www.linaro.org

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-09 Thread Maxim Kuvyrkov

Hi Kewen,

Below are my comments.  I don't want to override Alexander's review, and if the 
patch looks good to him, it's fine to ignore my concerns.

My main concern is that this adds a new entity -- forceful skipping of 
DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change in 
behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is already 
quite a bit of logic in the scheduler to skip them _as part of normal 
operation_.

Would you please consider 2 ideas below.

#1:
After a brief look, I'm guessing this part is causing the problem:
haifa-sched.cc <http://haifa-sched.cc/>:schedule_block():
=== [1]
  /* Loop until all the insns in BB are scheduled.  */
  while ((*current_sched_info->schedule_more_p) ())
{
  perform_replacements_new_cycle ();
  do
{
  start_clock_var = clock_var;

  clock_var++;

  advance_one_cycle ();
===

and then in the nested loop we have
=== [2]
  /* We don't want md sched reorder to even see debug isns, so put
 them out right away.  */
  if (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0))
  && (*current_sched_info->schedule_more_p) ())
{
  while (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0)))
{
  rtx_insn *insn = ready_remove_first (&ready);
  gcc_assert (DEBUG_INSN_P (insn));
  (*current_sched_info->begin_schedule_ready) (insn);
  scheduled_insns.safe_push (insn);
  last_scheduled_insn = insn;
  advance = schedule_insn (insn);
  gcc_assert (advance == 0);
  if (ready.n_ready > 0)
ready_sort (&ready);
}
}
===
.  At the [1] point we already have sorted ready list, and I don't see any 
blockers to doing [2] before calling advance_one_cycle().

#2
Another approach, which might be even easier, is to save the state of DFA 
before the initial advance_one_cycle(), and then restore it if no real insns 
have been scheduled.

Kind regards,

--
Maxim Kuvyrkov
https://www.linaro.org

> On Nov 8, 2023, at 06:49, Kewen.Lin  wrote:
> 
> Hi,
> 
> Gentle ping this:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html
> 
> BR,
> Kewen
> 
> on 2023/10/25 10:45, Kewen.Lin wrote:
>> Hi,
>> 
>> This is almost a repost for v2 which was posted at[1] in March
>> excepting for:
>>  1) rebased from r14-4810 which is relatively up-to-date,
>> some conflicts on "int to bool" return type change have
>> been resolved;
>>  2) adjust commit log a bit;
>>  3) fix misspelled "articial" with "artificial" somewhere;
>> 
>> --
>> *v2 comments*:
>> 
>> By addressing Alexander's comments, against v1 this
>> patch v2 mainly:
>> 
>>  - Rename no_real_insns_p to no_real_nondebug_insns_p;
>>  - Introduce enum rgn_bb_deps_free_action for three
>>kinds of actions to free deps;
>>  - Change function free_deps_for_bb_no_real_insns_p to
>>resolve_forw_deps which only focuses on forward deps;
>>  - Extend the handlings to cover dbg-cnt sched_block,
>>add one test case for it;
>>  - Move free_trg_info call in schedule_region to an
>>appropriate place.
>> 
>> One thing I'm not sure about is the change in function
>> sched_rgn_local_finish, currently the invocation to
>> sched_rgn_local_free is guarded with !sel_sched_p (),
>> so I just follow it, but the initialization of those
>> structures (in sched_rgn_local_init) isn't guarded
>> with !sel_sched_p (), it looks odd.
>> 
>> --
>> 
>> As PR108273 shows, when there is one block which only has
>> NOTE_P and LABEL_P insns at non-debug mode while has some
>> extra DEBUG_INSN_P insns at debug mode, after scheduling
>> it, the DFA states would be different between debug mode
>> and non-debug mode.  Since at non-debug mode, the block
>> meets no_real_insns_p, it gets skipped; while at debug
>> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
>> and DEBUG_INSN_P, the call of function advance_one_cycle
>> will change the DFA state.  PR108519 also shows this issue
>> can be exposed by some scheduler changes.
>> 
>> This patch is to change function no_real_insns_p to
>> function no_real_nondebug_insns_p by taking debug insn into
>> account, which make us not try to schedule for the block
>> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
>> resulting in consistent DFA states between non-debug and
>> debug mod

[PATCH 0/1] Avoid exponential behavior in scheduler and better logging

2023-11-20 Thread Maxim Kuvyrkov

Hi,

This patch series fixes exponential behavior in scheduler's 
find_modifiable_mems().  This fixes PRs [1] and [2], which are compilation time 
and memory hogs.

The first patch in the series is the actual fix (bootstrapped and regtested on 
aarch64-linux-gnu), and follow up patches will be improvements to scheduler's 
logging infrastructure, that enabled me to debug this problem.  As-is, the 
scheduler has good logging of the actual scheduling process, but calculation of 
instruction dependencies has almost no logging.

Please don't delay review of the PRs fix to wait for logging patches to be 
published, as it'll take me another week to clean them up.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111554

Maxim Kuvyrkov (1):
  sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

 gcc/sched-deps.cc | 45 +
 1 file changed, 41 insertions(+), 4 deletions(-)

-- 
2.34.1

[PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2023-11-20 Thread Maxim Kuvyrkov

This patch avoids sched-deps.cc:find_inc() creating exponential number
of dependencies, which become memory and compilation time hogs.
Consider example (simplified from PR96388) ...
===
sp=sp-4 // sp_insnA
mem_insnA1[sp+A1]
...
mem_insnAN[sp+AN]
sp=sp-4 // sp_insnB
mem_insnB1[sp+B1]
...
mem_insnBM[sp+BM]
===
... in this example find_modifiable_mems() will arrange for mem_insnA*
to be able to pass sp_insnA, and, while doing this, will create
dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
is a consumer of sp_insnA.  After this sp_insnB will have N new
backward dependencies.
Then find_modifiable_mems() gets to mem_insnB*s and starts to create
N new dependencies for _every_ mem_insnB*.  This gets us N*M new
dependencies.

In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
30GB and compilation time of 30 minutes, with sched2 accounting for
95% of both metrics.  After this patch the RAM usage is down to 1GB
and compilation time is down to 3-4 minutes, with sched2 no longer
standing out on -ftime-report or memory usage.

gcc/ChangeLog:

PR rtl-optimization/96388
PR rtl-optimization/111554
* sched-deps.cc (find_inc): Avoid exponential behavior.
---
 gcc/sched-deps.cc | 45 +
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index c23218890f3..397bb9fd462 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn 
*insn, bool before_mem)
 /* Once a suitable mem reference has been found and the corresponding data
in MII has been filled in, this function is called to find a suitable
add or inc insn involving the register we found in the memory
-   reference.  */
+   reference.
+   If successful, this function will create additional dependencies between
+   - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards)
+   - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards).
+*/
 
 static bool
 find_inc (struct mem_inc_info *mii, bool backwards)
 {
   sd_iterator_def sd_it;
   dep_t dep;
+  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
+  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
 
-  sd_it = sd_iterator_start (mii->mem_insn,
-backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
+  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
   while (sd_iterator_cond (&sd_it, &dep))
 {
   dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
   rtx_insn *pro = DEP_PRO (dep);
   rtx_insn *con = DEP_CON (dep);
-  rtx_insn *inc_cand = backwards ? pro : con;
+  rtx_insn *inc_cand;
+  int n_inc_deps;
+
+  if (backwards)
+   {
+ inc_cand = pro;
+ n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
+   }
+  else
+   {
+ inc_cand = con;
+ n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
+   }
+
+  /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
+for mem_insn.  This by itself is not a problem, since each mem_insn
+will have only a few inc_insns associated with it.  However, if
+we consider that a single inc_insn may have a lot of mem_insns, AND,
+on top of that, a few other inc_insns associated with it --
+those _other inc_insns_ will get (n_mem_deps * number of MEM insns)
+dependencies created for them.  This may cause an exponential
+growth of memory usage and scheduling time.
+See PR96388 for details.
+We [heuristically] use n_inc_deps as a proxy for the number of MEM
+insns, and drop opportunities for breaking modifiable_mem dependencies
+when dependency lists grow beyond reasonable size.  */
+  if (n_mem_deps * n_inc_deps
+ >= param_max_pending_list_length * param_max_pending_list_length)
+   goto next;
+
   if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
goto next;
+
   if (parse_add_or_inc (mii, inc_cand, backwards))
{
  struct dep_replacement *desc;
@@ -4838,6 +4873,8 @@ find_inc (struct mem_inc_info *mii, bool backwards)
  desc->insn = mii->mem_insn;
  move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
 INSN_SPEC_BACK_DEPS (con));
+
+ gcc_assert (mii->inc_insn == inc_cand);
  if (backwards)
{
  FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
-- 
2.34.1

Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2023-11-20 Thread Maxim Kuvyrkov

> On Nov 20, 2023, at 17:09, Richard Biener  wrote:
> 
> On Mon, Nov 20, 2023 at 1:08 PM Maxim Kuvyrkov
>  wrote:
>> 
>> This patch avoids sched-deps.cc:find_inc() creating exponential number
>> of dependencies, which become memory and compilation time hogs.
>> Consider example (simplified from PR96388) ...
>> ===
>> sp=sp-4 // sp_insnA
>> mem_insnA1[sp+A1]
>> ...
>> mem_insnAN[sp+AN]
>> sp=sp-4 // sp_insnB
>> mem_insnB1[sp+B1]
>> ...
>> mem_insnBM[sp+BM]
>> ===
>> ... in this example find_modifiable_mems() will arrange for mem_insnA*
>> to be able to pass sp_insnA, and, while doing this, will create
>> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
>> is a consumer of sp_insnA.  After this sp_insnB will have N new
>> backward dependencies.
>> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
>> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
>> dependencies.
>> 
>> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
>> 30GB and compilation time of 30 minutes, with sched2 accounting for
>> 95% of both metrics.  After this patch the RAM usage is down to 1GB
>> and compilation time is down to 3-4 minutes, with sched2 no longer
>> standing out on -ftime-report or memory usage.
>> 
>> gcc/ChangeLog:
>> 
>>PR rtl-optimization/96388
>>PR rtl-optimization/111554
>>* sched-deps.cc (find_inc): Avoid exponential behavior.
>> ---
>> gcc/sched-deps.cc | 45 +
>> 1 file changed, 41 insertions(+), 4 deletions(-)
>> 
>> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
>> index c23218890f3..397bb9fd462 100644
>> --- a/gcc/sched-deps.cc
>> +++ b/gcc/sched-deps.cc
>> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn 
>> *insn, bool before_mem)
>> /* Once a suitable mem reference has been found and the corresponding data
>>in MII has been filled in, this function is called to find a suitable
>>add or inc insn involving the register we found in the memory
>> -   reference.  */
>> +   reference.
>> +   If successful, this function will create additional dependencies between
>> +   - mii->inc_insn's producers and mii->mem_insn as a consumer (if 
>> backwards)
>> +   - mii->inc_insn's consumers and mii->mem_insn as a producer (if 
>> !backwards).
>> +*/
>> 
>> static bool
>> find_inc (struct mem_inc_info *mii, bool backwards)
>> {
>>   sd_iterator_def sd_it;
>>   dep_t dep;
>> +  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
>> +  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
>> 
>> -  sd_it = sd_iterator_start (mii->mem_insn,
>> -backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
>> +  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
>>   while (sd_iterator_cond (&sd_it, &dep))
>> {
>>   dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
>>   rtx_insn *pro = DEP_PRO (dep);
>>   rtx_insn *con = DEP_CON (dep);
>> -  rtx_insn *inc_cand = backwards ? pro : con;
>> +  rtx_insn *inc_cand;
>> +  int n_inc_deps;
>> +
>> +  if (backwards)
>> +   {
>> + inc_cand = pro;
>> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
>> +   }
>> +  else
>> +   {
>> + inc_cand = con;
>> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
>> +   }
>> +
>> +  /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
>> +for mem_insn.  This by itself is not a problem, since each mem_insn
>> +will have only a few inc_insns associated with it.  However, if
>> +we consider that a single inc_insn may have a lot of mem_insns, AND,
>> +on top of that, a few other inc_insns associated with it --
>> +those _other inc_insns_ will get (n_mem_deps * number of MEM insns)
>> +dependencies created for them.  This may cause an exponential
>> +growth of memory usage and scheduling time.
>> +See PR96388 for details.
>> +We [heuristically] use n_inc_deps as a proxy for the number of MEM
>> +insns, and drop opportunities for breaking modifiable_mem 
>> dependencies
>> +when dependency lists grow beyond reasonable size.  */
>> +  if (n_mem_deps * n_inc_deps
>> +     >= pa

Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2023-11-20 Thread Maxim Kuvyrkov

> On Nov 20, 2023, at 17:52, Alexander Monakov  wrote:
> 
> 
> On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote:
> 
>> This patch avoids sched-deps.cc:find_inc() creating exponential number
>> of dependencies, which become memory and compilation time hogs.
>> Consider example (simplified from PR96388) ...
>> ===
>> sp=sp-4 // sp_insnA
>> mem_insnA1[sp+A1]
>> ...
>> mem_insnAN[sp+AN]
>> sp=sp-4 // sp_insnB
>> mem_insnB1[sp+B1]
>> ...
>> mem_insnBM[sp+BM]
>> ===
>> ... in this example find_modifiable_mems() will arrange for mem_insnA*
>> to be able to pass sp_insnA, and, while doing this, will create
>> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
>> is a consumer of sp_insnA.  After this sp_insnB will have N new
>> backward dependencies.
>> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
>> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
>> dependencies.

[For avoidance of doubt, below discussion is about the general implementation 
of find_modifiable_mems() and not about the patch.]

> 
> It's a bit hard to read this without knowing which value of 'backwards'
> is assumed.
> 
> Say 'backwards' is true and we are inspecting producer sp_insnB of mem_insnB1.
> This is a true dependency. We know we can break it by adjusting B1 by -4, but
> we need to be careful not to move such modified mem_insnB1 above sp_insnA, so
> we need to iterate over *incoming true dependencies* of sp_insnB and add them.
> 
> But the code seems to be iterating over *all incoming dependencies*, so it
> will e.g. take anti-dependency mem_insnA1 -> sp_insnB and create a true
> dependency mem_insnA1 -> mem_insnB1'. This seems utterly inefficient, if my
> understanding is correct.

Yeap, your understanding is correct.  However, this is what 
find_modifiable_mems() has to do to avoid complicated analysis of second-level 
dependencies.

I think, the optimization that find_modifiable_mems() does can be implemented 
as part of main dependency analysis, and I'm going to discuss that with Bernd.  
I might be missing something, but it seems that instruction transformations 
that find_modifiable_mems() is doing are simpler than what ia64 speculation is 
doing.  So, I think, we should be able to leverage speculation infrastructure, 
rather than doing this optimization "on the side" of normal scheduling.

--
Maxim Kuvyrkov
https://www.linaro.org

[PATCH v2] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2023-11-20 Thread Maxim Kuvyrkov

This patch avoids sched-deps.cc:find_inc() creating exponential number
of dependencies, which become memory and compilation time hogs.
Consider example (simplified from PR96388) ...
===
sp=sp-4 // sp_insnA
mem_insnA1[sp+A1]
...
mem_insnAN[sp+AN]
sp=sp-4 // sp_insnB
mem_insnB1[sp+B1]
...
mem_insnBM[sp+BM]
===
... in this example find_modifiable_mems() will arrange for mem_insnA*
to be able to pass sp_insnA, and, while doing this, will create
dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
is a consumer of sp_insnA.  After this sp_insnB will have N new
backward dependencies.
Then find_modifiable_mems() gets to mem_insnB*s and starts to create
N new dependencies for _every_ mem_insnB*.  This gets us N*M new
dependencies.

In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
30GB and compilation time of 30 minutes, with sched2 accounting for
95% of both metrics.  After this patch the RAM usage is down to 1GB
and compilation time is down to 3-4 minutes, with sched2 no longer
standing out on -ftime-report or memory usage.

gcc/ChangeLog:

PR rtl-optimization/96388
PR rtl-optimization/111554
* sched-deps.cc (find_inc): Avoid exponential behavior.
---
 gcc/sched-deps.cc | 45 +
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index c23218890f3..a286ff319e2 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn 
*insn, bool before_mem)
 /* Once a suitable mem reference has been found and the corresponding data
in MII has been filled in, this function is called to find a suitable
add or inc insn involving the register we found in the memory
-   reference.  */
+   reference.
+   If successful, this function will create additional dependencies between
+   - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards)
+   - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards).
+*/
 
 static bool
 find_inc (struct mem_inc_info *mii, bool backwards)
 {
   sd_iterator_def sd_it;
   dep_t dep;
+  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
+  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
 
-  sd_it = sd_iterator_start (mii->mem_insn,
-backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
+  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
   while (sd_iterator_cond (&sd_it, &dep))
 {
   dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
   rtx_insn *pro = DEP_PRO (dep);
   rtx_insn *con = DEP_CON (dep);
-  rtx_insn *inc_cand = backwards ? pro : con;
+  rtx_insn *inc_cand;
+  int n_inc_deps;
+
   if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
goto next;
+
+  if (backwards)
+   {
+ inc_cand = pro;
+ n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
+   }
+  else
+   {
+ inc_cand = con;
+ n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
+   }
+
+  /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
+for mem_insn.  This by itself is not a problem, since each mem_insn
+will have only a few inc_insns associated with it.  However, if
+we consider that a single inc_insn may have a lot of mem_insns, AND,
+on top of that, a few other inc_insns associated with it --
+those _other inc_insns_ will get (n_mem_deps * number of MEM insns)
+dependencies created for them.  This may cause an exponential
+growth of memory usage and scheduling time.
+See PR96388 for details.
+We [heuristically] use n_inc_deps as a proxy for the number of MEM
+insns, and drop opportunities for breaking modifiable_mem dependencies
+when dependency lists grow beyond reasonable size.  */
+  if (n_mem_deps * n_inc_deps
+ >= param_max_pending_list_length * param_max_pending_list_length)
+   goto next;
+
   if (parse_add_or_inc (mii, inc_cand, backwards))
{
  struct dep_replacement *desc;
@@ -4838,6 +4873,8 @@ find_inc (struct mem_inc_info *mii, bool backwards)
  desc->insn = mii->mem_insn;
  move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
 INSN_SPEC_BACK_DEPS (con));
+
+ gcc_assert (mii->inc_insn == inc_cand);
  if (backwards)
{
  FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
-- 
2.34.1

Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2023-11-20 Thread Maxim Kuvyrkov

> On Nov 20, 2023, at 17:45, Richard Biener  wrote:
> 
> On Mon, Nov 20, 2023 at 2:42 PM Maxim Kuvyrkov
>  wrote:
>> 
>>> On Nov 20, 2023, at 17:09, Richard Biener  
>>> wrote:
>>> 
>>> On Mon, Nov 20, 2023 at 1:08 PM Maxim Kuvyrkov
>>>  wrote:
>>>> 
>>>> This patch avoids sched-deps.cc:find_inc() creating exponential number
>>>> of dependencies, which become memory and compilation time hogs.
>>>> Consider example (simplified from PR96388) ...
>>>> ===
>>>> sp=sp-4 // sp_insnA
>>>> mem_insnA1[sp+A1]
>>>> ...
>>>> mem_insnAN[sp+AN]
>>>> sp=sp-4 // sp_insnB
>>>> mem_insnB1[sp+B1]
>>>> ...
>>>> mem_insnBM[sp+BM]
>>>> ===
>>>> ... in this example find_modifiable_mems() will arrange for mem_insnA*
>>>> to be able to pass sp_insnA, and, while doing this, will create
>>>> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
>>>> is a consumer of sp_insnA.  After this sp_insnB will have N new
>>>> backward dependencies.
>>>> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
>>>> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
>>>> dependencies.
>>>> 
>>>> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
>>>> 30GB and compilation time of 30 minutes, with sched2 accounting for
>>>> 95% of both metrics.  After this patch the RAM usage is down to 1GB
>>>> and compilation time is down to 3-4 minutes, with sched2 no longer
>>>> standing out on -ftime-report or memory usage.
>>>> 
>>>> gcc/ChangeLog:
>>>> 
>>>>   PR rtl-optimization/96388
>>>>   PR rtl-optimization/111554
>>>>   * sched-deps.cc (find_inc): Avoid exponential behavior.
>>>> ---
>>>> gcc/sched-deps.cc | 45 +
>>>> 1 file changed, 41 insertions(+), 4 deletions(-)
>>>> 
>>>> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
>>>> index c23218890f3..397bb9fd462 100644
>>>> --- a/gcc/sched-deps.cc
>>>> +++ b/gcc/sched-deps.cc
>>>> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, 
>>>> rtx_insn *insn, bool before_mem)
>>>> /* Once a suitable mem reference has been found and the corresponding data
>>>>   in MII has been filled in, this function is called to find a suitable
>>>>   add or inc insn involving the register we found in the memory
>>>> -   reference.  */
>>>> +   reference.
>>>> +   If successful, this function will create additional dependencies 
>>>> between
>>>> +   - mii->inc_insn's producers and mii->mem_insn as a consumer (if 
>>>> backwards)
>>>> +   - mii->inc_insn's consumers and mii->mem_insn as a producer (if 
>>>> !backwards).
>>>> +*/
>>>> 
>>>> static bool
>>>> find_inc (struct mem_inc_info *mii, bool backwards)
>>>> {
>>>>  sd_iterator_def sd_it;
>>>>  dep_t dep;
>>>> +  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : 
>>>> SD_LIST_FORW;
>>>> +  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
>>>> 
>>>> -  sd_it = sd_iterator_start (mii->mem_insn,
>>>> -backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
>>>> +  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
>>>>  while (sd_iterator_cond (&sd_it, &dep))
>>>>{
>>>>  dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
>>>>  rtx_insn *pro = DEP_PRO (dep);
>>>>  rtx_insn *con = DEP_CON (dep);
>>>> -  rtx_insn *inc_cand = backwards ? pro : con;
>>>> +  rtx_insn *inc_cand;
>>>> +  int n_inc_deps;
>>>> +
>>>> +  if (backwards)
>>>> +   {
>>>> + inc_cand = pro;
>>>> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
>>>> +   }
>>>> +  else
>>>> +   {
>>>> + inc_cand = con;
>>>> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
>>>> +   }
>>>> +
>>>> +  /* In the FOR_EACH_DEP loop below we will create additional 
>>>> n_inc_deps
&

Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2023-11-21 Thread Maxim Kuvyrkov

> On Nov 20, 2023, at 20:30, Alexander Monakov  wrote:
> 
> 
> On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote:
> 
>>> On Nov 20, 2023, at 17:52, Alexander Monakov  wrote:
>>> 
>>> 
>>> On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote:
>>> 
>>>> This patch avoids sched-deps.cc:find_inc() creating exponential number
>>>> of dependencies, which become memory and compilation time hogs.
>>>> Consider example (simplified from PR96388) ...
>>>> ===
>>>> sp=sp-4 // sp_insnA
>>>> mem_insnA1[sp+A1]
>>>> ...
>>>> mem_insnAN[sp+AN]
>>>> sp=sp-4 // sp_insnB
>>>> mem_insnB1[sp+B1]
>>>> ...
>>>> mem_insnBM[sp+BM]
>>>> ===
>>>> ... in this example find_modifiable_mems() will arrange for mem_insnA*
>>>> to be able to pass sp_insnA, and, while doing this, will create
>>>> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
>>>> is a consumer of sp_insnA.  After this sp_insnB will have N new
>>>> backward dependencies.
>>>> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
>>>> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
>>>> dependencies.
>> 
>> [For avoidance of doubt, below discussion is about the general implementation
>> of find_modifiable_mems() and not about the patch.]
> 
> I was saying the commit message is hard to read (unless it's just me).
> 
>>> It's a bit hard to read this without knowing which value of 'backwards'
>>> is assumed.

Oh, sorry, I misunderstood your comment.

In the above example I want to describe outcome that current code generates, 
without going into details about exactly how it does it.  I'm not sure how to 
make it more readable, and would appreciate suggestions.


>>> 
>>> Say 'backwards' is true and we are inspecting producer sp_insnB of 
>>> mem_insnB1.
>>> This is a true dependency. We know we can break it by adjusting B1 by -4, 
>>> but
>>> we need to be careful not to move such modified mem_insnB1 above sp_insnA, 
>>> so
>>> we need to iterate over *incoming true dependencies* of sp_insnB and add 
>>> them.
>>> 
>>> But the code seems to be iterating over *all incoming dependencies*, so it
>>> will e.g. take anti-dependency mem_insnA1 -> sp_insnB and create a true
>>> dependency mem_insnA1 -> mem_insnB1'. This seems utterly inefficient, if my
>>> understanding is correct.
>> 
>> Yeap, your understanding is correct.  However, this is what
>> find_modifiable_mems() has to do to avoid complicated analysis of 
>> second-level
>> dependencies.
> 
> What is the reason it cannot simply skip anti-dependencies in the
> 'if (backwards)' loop, and true dependencies in the 'else' loop?

I /think/, this should be possible.  However, rather than improving current 
implementation my preference is to rework it by integrating with the main 
dependency analysis.  This should provide both faster and more precise 
dependency analysis, which would generate breakable addr/mem dependencies.

Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org

[PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2023-11-22 Thread Maxim Kuvyrkov

This patch avoids sched-deps.cc:find_inc() creating exponential number
of dependencies, which become memory and compilation time hogs.
Consider example (simplified from PR96388) ...
===
sp=sp-4 // sp_insnA
mem_insnA1[sp+A1]
...
mem_insnAN[sp+AN]
sp=sp-4 // sp_insnB
mem_insnB1[sp+B1]
...
mem_insnBM[sp+BM]
===

[For simplicity, let's assume find_inc(backwards==true)].
In this example find_modifiable_mems() will arrange for mem_insnA*
to be able to pass sp_insnA, and, while doing this, will create
dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
is a consumer of sp_insnA.  After this sp_insnB will have N new
backward dependencies.
Then find_modifiable_mems() gets to mem_insnB*s and starts to create
N new dependencies for _every_ mem_insnB*.  This gets us N*M new
dependencies.

In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
30GB and compilation time of 30 minutes, with sched2 accounting for
95% of both metrics.  After this patch the RAM usage is down to 1GB
and compilation time is down to 3-4 minutes, with sched2 no longer
standing out on -ftime-report or memory usage.

gcc/ChangeLog:

PR rtl-optimization/96388
PR rtl-optimization/111554
* sched-deps.cc (find_inc): Avoid exponential behavior.
---
 gcc/sched-deps.cc | 48 +++
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index c23218890f3..005fc0f567e 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn 
*insn, bool before_mem)
 /* Once a suitable mem reference has been found and the corresponding data
in MII has been filled in, this function is called to find a suitable
add or inc insn involving the register we found in the memory
-   reference.  */
+   reference.
+   If successful, this function will create additional dependencies between
+   - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards)
+   - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards).
+*/
 
 static bool
 find_inc (struct mem_inc_info *mii, bool backwards)
 {
   sd_iterator_def sd_it;
   dep_t dep;
+  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
+  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
 
-  sd_it = sd_iterator_start (mii->mem_insn,
-backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
+  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
   while (sd_iterator_cond (&sd_it, &dep))
 {
   dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
   rtx_insn *pro = DEP_PRO (dep);
   rtx_insn *con = DEP_CON (dep);
-  rtx_insn *inc_cand = backwards ? pro : con;
+  rtx_insn *inc_cand;
+  int n_inc_deps;
+
   if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
goto next;
+
+  if (backwards)
+   {
+ inc_cand = pro;
+ n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
+   }
+  else
+   {
+ inc_cand = con;
+ n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
+   }
+
+  /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
+for mem_insn.  This by itself is not a problem, since each mem_insn
+will have only a few inc_insns associated with it.  However, if
+we consider that a single inc_insn may have a lot of mem_insns, AND,
+on top of that, a few other inc_insns associated with it --
+those _other inc_insns_ will get (n_mem_deps * number of MEM insns)
+dependencies created for them.  This may cause an exponential
+growth of memory usage and scheduling time.
+See PR96388 for details.
+We [heuristically] use n_inc_deps as a proxy for the number of MEM
+insns, and drop opportunities for breaking modifiable_mem dependencies
+when dependency lists grow beyond reasonable size.  */
+  if (n_mem_deps * n_inc_deps
+ >= param_max_pending_list_length * param_max_pending_list_length)
+   goto next;
+
   if (parse_add_or_inc (mii, inc_cand, backwards))
{
  struct dep_replacement *desc;
@@ -4838,6 +4873,11 @@ find_inc (struct mem_inc_info *mii, bool backwards)
  desc->insn = mii->mem_insn;
  move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
 INSN_SPEC_BACK_DEPS (con));
+
+ /* Make sure that n_inc_deps above is consistent with dependencies
+we create.  */
+ gcc_assert (mii->inc_insn == inc_cand);
+
  if (backwards)
{
  FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
-- 
2.34.1

[PATCH v3 0/8] Avoid exponential behavior in scheduler and better logging

2023-11-22 Thread Maxim Kuvyrkov

Compared to v1 and v2 this is a complete patch series.

The debugging/dumping improvements gently touch IRA, RTL lists, and sel-sched 
bits to avoid re-inventing or copy-pasting the wheel.

Bootstrapping and regtesting these patches on aarch64-linux-gnu.  OK to merge?

Thanks!

===

This patch series fixes exponential behavior in scheduler's 
find_modifiable_mems(). This fixes PRs [1] and [2], which are compilation time 
and memory hogs.

The first patch in the series is the actual fix (bootstrapped and regtested on 
aarch64-linux-gnu), and follow up patches are improvements to scheduler's 
logging infrastructure, that enabled me to debug this problem.  As-is, the 
scheduler has good logging of the actual scheduling process, but calculation of 
instruction dependencies has almost no logging.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111554

Maxim Kuvyrkov (8):
  sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  Unify implementations of print_hard_reg_set()
  Simplify handling of INSN_ and EXPR_LISTs in sched-rgn.cc
  Improve and fix sched-deps.cc: dump_dep() and dump_lists().
  Add a bit more logging scheduler's dependency analysis
  sched_deps.cc: Simplify initialization of dependency contexts
  Improve logging of register data in scheduler dependency analysis
  Improve logging of scheduler dependency analysis context

 gcc/hard-reg-set.h|   3 +
 gcc/ira-color.cc  |  17 +-
 gcc/ira-conflicts.cc  |  39 +
 gcc/lists.cc  |  30 +++-
 gcc/rtl.h |   4 +-
 gcc/sched-deps.cc | 399 +++---
 gcc/sched-int.h   |   9 +-
 gcc/sched-rgn.cc  |  56 +++---
 gcc/sel-sched-dump.cc |  21 +--
 gcc/sel-sched-dump.h  |   2 +-
 10 files changed, 467 insertions(+), 113 deletions(-)

-- 
2.34.1

[PATCH v3 5/8] Add a bit more logging scheduler's dependency analysis

2023-11-22 Thread Maxim Kuvyrkov

gcc/ChangeLog:

* sched-deps.cc (sd_add_dep, find_inc): Add logging about
dependency creation.
---
 gcc/sched-deps.cc | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index 4d357079a7a..2a87158ba4b 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -1342,6 +1342,13 @@ sd_add_dep (dep_t dep, bool resolved_p)
  in the bitmap caches of dependency information.  */
   if (true_dependency_cache != NULL)
 set_dependency_caches (dep);
+
+  if (sched_verbose >= 9)
+{
+  fprintf (sched_dump, "created dependency ");
+  dump_dep (sched_dump, dep, 1);
+  fprintf (sched_dump, "\n");
+}
 }
 
 /* Add or update backward dependence between INSN and ELEM
@@ -4879,18 +4886,33 @@ find_inc (struct mem_inc_info *mii, bool backwards)
 we create.  */
  gcc_assert (mii->inc_insn == inc_cand);
 
+ int n_deps_created = 0;
  if (backwards)
{
  FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
-   add_dependence_1 (mii->mem_insn, DEP_PRO (dep),
- REG_DEP_TRUE);
+   {
+ add_dependence_1 (mii->mem_insn, DEP_PRO (dep),
+   REG_DEP_TRUE);
+ ++n_deps_created;
+   }
}
  else
{
  FOR_EACH_DEP (mii->inc_insn, SD_LIST_FORW, sd_it, dep)
-   add_dependence_1 (DEP_CON (dep), mii->mem_insn,
- REG_DEP_ANTI);
+   {
+ add_dependence_1 (DEP_CON (dep), mii->mem_insn,
+   REG_DEP_ANTI);
+ ++n_deps_created;
+   }
}
+ if (sched_verbose >= 6)
+   fprintf (sched_dump,
+"created %d deps for mem_insn %d due to "
+"inc_insn %d %s deps\n",
+n_deps_created, INSN_UID (mii->mem_insn),
+INSN_UID (mii->inc_insn),
+backwards ? "backward" : "forward");
+
  return true;
}
 next:
-- 
2.34.1

[PATCH v3 6/8] sched_deps.cc: Simplify initialization of dependency contexts

2023-11-22 Thread Maxim Kuvyrkov

gcc/ChangeLog:

* sched-deps.cc (init_deps, init_deps_reg_last): Simplify.
(free_deps): Remove useless code.
---
 gcc/sched-deps.cc | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index 2a87158ba4b..e0d3c97d935 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -3927,10 +3927,9 @@ init_deps (class deps_desc *deps, bool lazy_reg_last)
   int max_reg = (reload_completed ? FIRST_PSEUDO_REGISTER : max_reg_num ());
 
   deps->max_reg = max_reg;
-  if (lazy_reg_last)
-deps->reg_last = NULL;
-  else
-deps->reg_last = XCNEWVEC (struct deps_reg, max_reg);
+  deps->reg_last = NULL;
+  if (!lazy_reg_last)
+init_deps_reg_last (deps);
   INIT_REG_SET (&deps->reg_last_in_use);
 
   deps->pending_read_insns = 0;
@@ -3961,9 +3960,7 @@ init_deps (class deps_desc *deps, bool lazy_reg_last)
 void
 init_deps_reg_last (class deps_desc *deps)
 {
-  gcc_assert (deps && deps->max_reg > 0);
-  gcc_assert (deps->reg_last == NULL);
-
+  gcc_assert (deps && deps->max_reg > 0 && deps->reg_last == NULL);
   deps->reg_last = XCNEWVEC (struct deps_reg, deps->max_reg);
 }
 
@@ -4013,8 +4010,6 @@ free_deps (class deps_desc *deps)
  it at all.  */
   free (deps->reg_last);
   deps->reg_last = NULL;
-
-  deps = NULL;
 }
 
 /* Remove INSN from dependence contexts DEPS.  */
-- 
2.34.1

[PATCH v3 2/8] Unify implementations of print_hard_reg_set()

2023-11-22 Thread Maxim Kuvyrkov

We currently have 3 implementations of print_hard_reg_set()
(all with the same name!) in ira-color.cc, ira-conflicts.cc, and
sel-sched-dump.cc.  This patch generalizes implementation in
ira-color.cc, and uses it in all other places.  The declaration
is added to hard-reg-set.h.

The motivation for this patch is the [upcoming] need for
print_hard_reg_set() in sched-deps.cc.

gcc/ChangeLog:

* hard-reg-set.h (print_hard_reg_set): Declare.
* ira-color.cc (print_hard_reg_set): Generalize a bit.
(debug_hard_reg_set, print_hard_regs_subforest,)
(setup_allocno_available_regs_num): Update.
* ira-conflicts.cc (print_hard_reg_set): Remove.
(print_allocno_conflicts): Use global print_hard_reg_set().
* sel-sched-dump.cc (print_hard_reg_set): Remove.
(dump_hard_reg_set): Use global print_hard_reg_set().
* sel-sched-dump.h (dump_hard_reg_set): Mark as DEBUG_FUNCTION.
---
 gcc/hard-reg-set.h|  3 +++
 gcc/ira-color.cc  | 17 ++---
 gcc/ira-conflicts.cc  | 39 ---
 gcc/sel-sched-dump.cc | 21 -
 gcc/sel-sched-dump.h  |  2 +-
 5 files changed, 22 insertions(+), 60 deletions(-)

diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index b0bb9bce074..81bca6df0e5 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -524,4 +524,7 @@ call_used_or_fixed_reg_p (unsigned int regno)
 }
 #endif
 
+/* ira-color.cc */
+extern void print_hard_reg_set (FILE *, HARD_REG_SET, const char *, bool);
+
 #endif /* ! GCC_HARD_REG_SET_H */
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index f2e8ea34152..43564b44933 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -482,11 +482,14 @@ first_common_ancestor_node (allocno_hard_regs_node_t 
first,
 }
 
 /* Print hard reg set SET to F.  */
-static void
-print_hard_reg_set (FILE *f, HARD_REG_SET set, bool new_line_p)
+void
+print_hard_reg_set (FILE *f, HARD_REG_SET set,
+   const char *title, bool new_line_p)
 {
   int i, start, end;
 
+  if (title)
+fprintf (f, "%s", title);
   for (start = end = -1, i = 0; i < FIRST_PSEUDO_REGISTER; i++)
 {
   bool reg_included = TEST_HARD_REG_BIT (set, i);
@@ -516,7 +519,7 @@ print_hard_reg_set (FILE *f, HARD_REG_SET set, bool 
new_line_p)
 DEBUG_FUNCTION void
 debug_hard_reg_set (HARD_REG_SET set)
 {
-  print_hard_reg_set (stderr, set, true);
+  print_hard_reg_set (stderr, set, NULL, true);
 }
 
 /* Print allocno hard register subforest given by ROOTS and its LEVEL
@@ -534,7 +537,7 @@ print_hard_regs_subforest (FILE *f, 
allocno_hard_regs_node_t roots,
   for (i = 0; i < level * 2; i++)
fprintf (f, " ");
   fprintf (f, "%d:(", node->preorder_num);
-  print_hard_reg_set (f, node->hard_regs->set, false);
+  print_hard_reg_set (f, node->hard_regs->set, NULL, false);
   fprintf (f, ")@%" PRId64"\n", node->hard_regs->cost);
   print_hard_regs_subforest (f, node->first, level + 1);
 }
@@ -2982,12 +2985,12 @@ setup_allocno_available_regs_num (ira_allocno_t a)
  "  Allocno a%dr%d of %s(%d) has %d avail. regs ",
  ALLOCNO_NUM (a), ALLOCNO_REGNO (a),
  reg_class_names[aclass], ira_class_hard_regs_num[aclass], n);
-  print_hard_reg_set (ira_dump_file, data->profitable_hard_regs, false);
+  print_hard_reg_set (ira_dump_file, data->profitable_hard_regs, NULL, false);
   fprintf (ira_dump_file, ", %snode: ",
   data->profitable_hard_regs == data->hard_regs_node->hard_regs->set
   ? "" : "^");
   print_hard_reg_set (ira_dump_file,
- data->hard_regs_node->hard_regs->set, false);
+ data->hard_regs_node->hard_regs->set, NULL, false);
   for (i = 0; i < nwords; i++)
 {
   ira_object_t obj = ALLOCNO_OBJECT (a, i);
@@ -3000,7 +3003,7 @@ setup_allocno_available_regs_num (ira_allocno_t a)
}
   fprintf (ira_dump_file, " (confl regs = ");
   print_hard_reg_set (ira_dump_file, OBJECT_TOTAL_CONFLICT_HARD_REGS (obj),
- false);
+ NULL, false);
   fprintf (ira_dump_file, ")");
 }
   fprintf (ira_dump_file, "\n");
diff --git a/gcc/ira-conflicts.cc b/gcc/ira-conflicts.cc
index a4d93c8d734..69806b1a15b 100644
--- a/gcc/ira-conflicts.cc
+++ b/gcc/ira-conflicts.cc
@@ -670,37 +670,6 @@ build_conflicts (void)
 
 
 
-/* Print hard reg set SET with TITLE to FILE.  */
-static void
-print_hard_reg_set (FILE *file, const char *title, HARD_REG_SET set)
-{
-  int i, start, end;
-
-  fputs (title, file);
-  for (start = end = -1, i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-{
-  bool reg_included = TEST_HARD_REG_BIT (set, i);
-
-  if (reg_included)
-   {
- if (start == -1)
-   start = i;
- end = i;
-   }
-  if (start >= 0 && (!reg_included || i == FIRST_PSEUDO_REGISTER - 1))
-   {
- if (start == end)
-   fprintf (file, " %d", start);
- else if (start == end + 1)
-

[PATCH v3 8/8] Improve logging of scheduler dependency analysis context

2023-11-22 Thread Maxim Kuvyrkov

Scheduler dependency analysis uses two main data structures:
1. reg_pending_* data contains effects of INSN on the register file,
   which is then incorporated into ...
2. deps_desc object, which contains commulative information about all
   instructions processed from deps_desc object's initialization.

This patch adds debug dumping of (2).

Dependency analysis contexts (aka deps_desc objects) are huge, but
each instruction affects only a small amount of data in these objects.
Therefore, it is most useful to dump differential information
compared to the dependency state after previous instruction.

gcc/ChangeLog:

* sched-deps.cc (reset_deps, dump_rtx_insn_list)
(rtx_insn_list_same_p): New helper functions.
(dump_deps_desc_diff): New function to dump dependency information.
(sched_analysis_prev_deps): New static variable.
(sched_analyze_insn): Dump dependency information.
(init_deps_global, finish_deps_global): Handle sched_analysis_prev_deps.
* sched-int.h (struct deps_reg): Update comments.
* sched-rgn.cc (concat_insn_list, concat_expr_list): Update comments.
---
 gcc/sched-deps.cc | 197 ++
 gcc/sched-int.h   |   9 ++-
 gcc/sched-rgn.cc  |   5 ++
 3 files changed, 210 insertions(+), 1 deletion(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index f9290c82fd2..edca9927e23 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -1677,6 +1677,15 @@ delete_all_dependences (rtx_insn *insn)
 sd_delete_dep (sd_it);
 }
 
+/* Re-initialize existing dependency context DEPS to be a copy of FROM.  */
+static void
+reset_deps (class deps_desc *deps, class deps_desc *from)
+{
+  free_deps (deps);
+  init_deps (deps, false);
+  deps_join (deps, from);
+}
+
 /* All insns in a scheduling group except the first should only have
dependencies on the previous insn in the group.  So we find the
first instruction in the scheduling group by walking the dependence
@@ -2960,6 +2969,177 @@ dump_reg_pending_data (FILE *file, rtx_insn *insn)
 }
 }
 
+/* Dump rtx_insn_list LIST.
+   Consider moving to lists.cc if there are users outside of sched-deps.cc.  */
+static void
+dump_rtx_insn_list (FILE *file, rtx_insn_list *list)
+{
+  for (; list; list = list->next ())
+fprintf (file, " %d", INSN_UID (list->insn ()));
+}
+
+/* Return TRUE if lists A and B have same elements in the same order.  */
+static bool
+rtx_insn_list_same_p (rtx_insn_list *a, rtx_insn_list *b)
+{
+  for (; a && b; a = a->next (), b = b->next ())
+if (a->insn () != b->insn ())
+  return false;
+
+  if (a || b)
+return false;
+
+  return true;
+}
+
+/* Dump parts of DEPS that are different from PREV.
+   Dumping all information from dependency context produces huge
+   hard-to-analize logs; differential dumping is way more managable.  */
+static void
+dump_deps_desc_diff (FILE *file, class deps_desc *deps, class deps_desc *prev)
+{
+  /* Each "paragraph" is a single line of output.  */
+
+  /* Note on param_max_pending_list_length:
+ During normal dependency analysis various lists should not exceed this
+ limit.  Searching for "!!!" in scheduler logs can point to potential bugs
+ or poorly-handled corner-cases.  */
+
+  if (!rtx_insn_list_same_p (deps->pending_read_insns,
+prev->pending_read_insns))
+{
+  fprintf (file, ";; deps pending mem reads length(%d):",
+  deps->pending_read_list_length);
+  if ((deps->pending_read_list_length + deps->pending_write_list_length)
+ >= param_max_pending_list_length)
+   fprintf (file, "%d insns!!!", deps->pending_read_list_length);
+  else
+   dump_rtx_insn_list (file, deps->pending_read_insns);
+  fprintf (file, "\n");
+}
+
+  if (!rtx_insn_list_same_p (deps->pending_write_insns,
+prev->pending_write_insns))
+{
+  fprintf (file, ";; deps pending mem writes length(%d):",
+  deps->pending_write_list_length);
+  if ((deps->pending_read_list_length + deps->pending_write_list_length)
+ >= param_max_pending_list_length)
+   fprintf (file, "%d insns!!!", deps->pending_write_list_length);
+  else
+   dump_rtx_insn_list (file, deps->pending_write_insns);
+  fprintf (file, "\n");
+}
+
+  if (!rtx_insn_list_same_p (deps->pending_jump_insns,
+prev->pending_jump_insns))
+{
+  fprintf (file, ";; deps pending jump length(%d):",
+  deps->pending_flush_length);
+  if (deps->pending_flush_length >= param_max_pending_list_length)
+   fprintf (file, "%d insns!!!", deps->pending_flush_length);
+  else
+   dump_rtx_insn_list (file, deps->pending_jump_insns);
+  fprintf (file, "\n");
+}
+
+  fprintf (file, ";; last");
+  if (!rtx_insn_list_same_p (deps->last_pending_memory_flush,
+prev->last_pending_memory_flush))
+{
+  fprintf (

[PATCH v3 3/8] Simplify handling of INSN_ and EXPR_LISTs in sched-rgn.cc

2023-11-22 Thread Maxim Kuvyrkov

This patch simplifies logic behind deps_join(), which will be
important for the upcoming improvements of sched-deps.cc logging.

The only functional change is that when deps_join() is called with
empty state for the 2nd argument, it will not reverse INSN_ and
EXPR_LISTs in the 1st argument.  Before this patch the lists were
reversed due to use of concat_*_LIST().  Now, with copy_*_LIST()
used for this case, the lists will remain in the original order.

gcc/ChangeLog:

* lists.cc (copy_EXPR_LIST, concat_EXPR_LIST): New functions.
* rtl.h (copy_EXPR_LIST, concat_EXPR_LIST): Declare.
* sched-rgn.cc (concat_insn_list, concat_expr_list): New helpers.
(concat_insn_mem_list): Simplify.
(deps_join): Update
---
 gcc/lists.cc | 30 +++-
 gcc/rtl.h|  4 +++-
 gcc/sched-rgn.cc | 51 +++-
 3 files changed, 61 insertions(+), 24 deletions(-)

diff --git a/gcc/lists.cc b/gcc/lists.cc
index 2cdf37ad533..83e7bf32176 100644
--- a/gcc/lists.cc
+++ b/gcc/lists.cc
@@ -160,6 +160,24 @@ free_INSN_LIST_list (rtx_insn_list **listp)
   free_list ((rtx *)listp, &unused_insn_list);
 }
 
+/* Make a copy of the EXPR_LIST list LINK and return it.  */
+rtx_expr_list *
+copy_EXPR_LIST (rtx_expr_list *link)
+{
+  rtx_expr_list *new_queue;
+  rtx_expr_list **pqueue = &new_queue;
+
+  for (; link; link = link->next ())
+{
+  rtx x = link->element ();
+  rtx_expr_list *newlink = alloc_EXPR_LIST (REG_NOTE_KIND (link), x, NULL);
+  *pqueue = newlink;
+  pqueue = (rtx_expr_list **)&XEXP (newlink, 1);
+}
+  *pqueue = NULL;
+  return new_queue;
+}
+
 /* Make a copy of the INSN_LIST list LINK and return it.  */
 rtx_insn_list *
 copy_INSN_LIST (rtx_insn_list *link)
@@ -178,12 +196,22 @@ copy_INSN_LIST (rtx_insn_list *link)
   return new_queue;
 }
 
+/* Duplicate the EXPR_LIST elements of COPY and prepend them to OLD.  */
+rtx_expr_list *
+concat_EXPR_LIST (rtx_expr_list *copy, rtx_expr_list *old)
+{
+  rtx_expr_list *new_rtx = old;
+  for (; copy; copy = copy->next ())
+new_rtx = alloc_EXPR_LIST (REG_NOTE_KIND (copy), copy->element (), 
new_rtx);
+  return new_rtx;
+}
+
 /* Duplicate the INSN_LIST elements of COPY and prepend them to OLD.  */
 rtx_insn_list *
 concat_INSN_LIST (rtx_insn_list *copy, rtx_insn_list *old)
 {
   rtx_insn_list *new_rtx = old;
-  for (; copy ; copy = copy->next ())
+  for (; copy; copy = copy->next ())
 {
   new_rtx = alloc_INSN_LIST (copy->insn (), new_rtx);
   PUT_REG_NOTE_KIND (new_rtx, REG_NOTE_KIND (copy));
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e4b6cc0dbb5..7e952d7cbeb 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3764,10 +3764,12 @@ extern void free_EXPR_LIST_list (rtx_expr_list **);
 extern void free_INSN_LIST_list (rtx_insn_list **);
 extern void free_EXPR_LIST_node (rtx);
 extern void free_INSN_LIST_node (rtx);
+extern rtx_expr_list *alloc_EXPR_LIST (int, rtx, rtx);
 extern rtx_insn_list *alloc_INSN_LIST (rtx, rtx);
+extern rtx_expr_list *copy_EXPR_LIST (rtx_expr_list *);
 extern rtx_insn_list *copy_INSN_LIST (rtx_insn_list *);
+extern rtx_expr_list *concat_EXPR_LIST (rtx_expr_list *, rtx_expr_list *);
 extern rtx_insn_list *concat_INSN_LIST (rtx_insn_list *, rtx_insn_list *);
-extern rtx_expr_list *alloc_EXPR_LIST (int, rtx, rtx);
 extern void remove_free_INSN_LIST_elem (rtx_insn *, rtx_insn_list **);
 extern rtx remove_list_elem (rtx, rtx *);
 extern rtx_insn *remove_free_INSN_LIST_node (rtx_insn_list **);
diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index e5964f54ead..da3ec0458ff 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -2585,25 +2585,32 @@ add_branch_dependences (rtx_insn *head, rtx_insn *tail)
 
 static class deps_desc *bb_deps;
 
+/* Return a new insn_list with all the elements from the two input lists.  */
+static rtx_insn_list *
+concat_insn_list (rtx_insn_list *copy, rtx_insn_list *old)
+{
+  if (!old)
+return copy_INSN_LIST (copy);
+  return concat_INSN_LIST (copy, old);
+}
+
+/* Return a new expr_list with all the elements from the two input lists.  */
+static rtx_expr_list *
+concat_expr_list (rtx_expr_list *copy, rtx_expr_list *old)
+{
+  if (!old)
+return copy_EXPR_LIST (copy);
+  return concat_EXPR_LIST (copy, old);
+}
+
 static void
 concat_insn_mem_list (rtx_insn_list *copy_insns,
  rtx_expr_list *copy_mems,
  rtx_insn_list **old_insns_p,
  rtx_expr_list **old_mems_p)
 {
-  rtx_insn_list *new_insns = *old_insns_p;
-  rtx_expr_list *new_mems = *old_mems_p;
-
-  while (copy_insns)
-{
-  new_insns = alloc_INSN_LIST (copy_insns->insn (), new_insns);
-  new_mems = alloc_EXPR_LIST (VOIDmode, copy_mems->element (), new_mems);
-  copy_insns = copy_insns->next ();
-  copy_mems = copy_mems->next ();
-}
-
-  *old_insns_p = new_insns;
-  *old_mems_p = new_mems;
+  *old_insns_p = concat_insn_list (copy_insns, *old_insns_p);
+  *old_mems_p = concat

[PATCH v3 4/8] Improve and fix sched-deps.cc: dump_dep() and dump_lists().

2023-11-22 Thread Maxim Kuvyrkov

Better propagate flags from dump_lists() into dump_dep() and
add a missing "]" in dump_lists().

gcc/ChangeLog:

* sched-deps.cc (DUMP_DEP_PRO): Improve comment.
(dump_dep_flags): Remove.
(DUMP_LISTS_SIZE, DUMP_LISTS_DEPS, DUMP_LISTS_ALL): Continue
numbering from DUMP_DEP_* flags.
(dump_lists): Update and fix.
---
 gcc/sched-deps.cc | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index 005fc0f567e..4d357079a7a 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -132,7 +132,8 @@ static void dump_ds (FILE *, ds_t);
 /* Define flags for dump_dep ().  */
 
 /* Dump producer of the dependence.  */
-#define DUMP_DEP_PRO (2)
+#define DUMP_DEP_PRO (2) /* Reserve "1" for handling of DUMP_DEP_ALL and
+   DUMP_LISTS_ALL.  */
 
 /* Dump consumer of the dependence.  */
 #define DUMP_DEP_CON (4)
@@ -206,9 +207,6 @@ dump_dep (FILE *dump, dep_t dep, int flags)
   fprintf (dump, ">");
 }
 
-/* Default flags for dump_dep ().  */
-static int dump_dep_flags = (DUMP_DEP_PRO | DUMP_DEP_CON);
-
 /* Dump all fields of DEP to STDERR.  */
 void
 sd_debug_dep (dep_t dep)
@@ -1454,19 +1452,20 @@ sd_delete_dep (sd_iterator_def sd_it)
 }
 
 /* Dump size of the lists.  */
-#define DUMP_LISTS_SIZE (2)
+#define DUMP_LISTS_SIZE (32) /* (DUMP_DEP_STATUS << 1)  */
 
 /* Dump dependencies of the lists.  */
-#define DUMP_LISTS_DEPS (4)
+#define DUMP_LISTS_DEPS (64)
 
 /* Dump all information about the lists.  */
 #define DUMP_LISTS_ALL (DUMP_LISTS_SIZE | DUMP_LISTS_DEPS)
 
 /* Dump deps_lists of INSN specified by TYPES to DUMP.
-   FLAGS is a bit mask specifying what information about the lists needs
-   to be printed.
+   FLAGS is a bit mask specifying what information about the lists and
+   the individual deps needs to be printed, this is a combination of
+   DUMP_DEP_* and DUMP_LISTS_* flags.
If FLAGS has the very first bit set, then dump all information about
-   the lists and propagate this bit into the callee dump functions.  */
+   the lists and deps propagate this bit into the callee dump functions.  */
 static void
 dump_lists (FILE *dump, rtx insn, sd_list_types_def types, int flags)
 {
@@ -1488,10 +1487,12 @@ dump_lists (FILE *dump, rtx insn, sd_list_types_def 
types, int flags)
 {
   FOR_EACH_DEP (insn, types, sd_it, dep)
{
- dump_dep (dump, dep, dump_dep_flags | all);
+ dump_dep (dump, dep, flags | all);
  fprintf (dump, " ");
}
 }
+
+  fprintf (dump, "]");
 }
 
 /* Dump all information about deps_lists of INSN specified by TYPES
-- 
2.34.1

[PATCH v3 7/8] Improve logging of register data in scheduler dependency analysis

2023-11-22 Thread Maxim Kuvyrkov

Scheduler dependency analysis uses two main data structures:
1. reg_pending_* data contains effects of INSN on the register file,
   which is then incorporated into ...
2. deps_desc object, which contains commulative information about all
   instructions processed from deps_desc object's initialization.

This patch adds debug dumping of (1).

gcc/ChangeLog:

* sched-deps.cc (print-rtl.h): Include for str_pattern_slim().
(dump_reg_pending_data): New function.
(sched_analyze_insn): Use it.
---
 gcc/sched-deps.cc | 90 ++-
 1 file changed, 89 insertions(+), 1 deletion(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index e0d3c97d935..f9290c82fd2 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "sched-int.h"
 #include "cselib.h"
 #include "function-abi.h"
+#include "print-rtl.h"
 
 #ifdef INSN_SCHEDULING
 
@@ -432,10 +433,24 @@ dep_spec_p (dep_t dep)
   return false;
 }
 
+/* These regsets describe how a single instruction affects registers.
+   Their "life-time" is restricted to a single call of sched_analyze_insn().
+   They are populated by sched_analyze_1() and sched_analyze_2(), and
+   then sched_analyze_insn() transfers data from these into deps->reg_last[i].
+   Near the end sched_analyze_insn() clears these regsets for the next
+   insn.  */
 static regset reg_pending_sets;
 static regset reg_pending_clobbers;
 static regset reg_pending_uses;
 static regset reg_pending_control_uses;
+
+/* Similar to reg_pending_* regsets, this variable specifies whether
+   the current insn analyzed by sched_analyze_insn() is a scheduling
+   barrier that should "split" dependencies inside a block.  Internally
+   sched-deps.cc does this by pretending that the barrier insn uses and
+   sets all registers.
+   Near the end sched_analyze_insn() transfers barrier info from this variable
+   into deps->last_reg_pending_barrier.  */
 static enum reg_pending_barrier_mode reg_pending_barrier;
 
 /* Hard registers implicitly clobbered or used (or may be implicitly
@@ -2880,7 +2895,77 @@ get_implicit_reg_pending_clobbers (HARD_REG_SET *temp, 
rtx_insn *insn)
   *temp &= ~ira_no_alloc_regs;
 }
 
-/* Analyze an INSN with pattern X to find all dependencies.  */
+/* Dump state of reg_pending_* data for debug purposes.
+   Dump only non-empty data to reduce log clobber.  */
+static void
+dump_reg_pending_data (FILE *file, rtx_insn *insn)
+{
+  fprintf (file, "\n");
+  fprintf (file, ";; sched_analysis after insn %d: %s\n",
+  INSN_UID (insn), str_pattern_slim (PATTERN (insn)));
+
+  if (!REG_SET_EMPTY_P (reg_pending_sets)
+  || !REG_SET_EMPTY_P (reg_pending_clobbers)
+  || !REG_SET_EMPTY_P (reg_pending_uses)
+  || !REG_SET_EMPTY_P (reg_pending_control_uses))
+{
+  fprintf (file, ";; insn reg");
+  if (!REG_SET_EMPTY_P (reg_pending_sets))
+   {
+ fprintf (file, " sets(");
+ dump_regset (reg_pending_sets, file);
+ fprintf (file, ")");
+   }
+  if (!REG_SET_EMPTY_P (reg_pending_clobbers))
+   {
+ fprintf (file, " clobbers(");
+ dump_regset (reg_pending_clobbers, file);
+ fprintf (file, ")");
+   }
+  if (!REG_SET_EMPTY_P (reg_pending_uses))
+   {
+ fprintf (file, " uses(");
+ dump_regset (reg_pending_uses, file);
+ fprintf (file, ")");
+   }
+  if (!REG_SET_EMPTY_P (reg_pending_control_uses))
+   {
+ fprintf (file, " control(");
+ dump_regset (reg_pending_control_uses, file);
+ fprintf (file, ")");
+   }
+  fprintf (file, "\n");
+}
+
+  if (reg_pending_barrier)
+fprintf (file, ";; insn reg barrier: %d\n", reg_pending_barrier);
+
+  if (!hard_reg_set_empty_p (implicit_reg_pending_clobbers)
+  || !hard_reg_set_empty_p (implicit_reg_pending_uses))
+{
+  fprintf (file, ";; insn reg");
+  if (!hard_reg_set_empty_p (implicit_reg_pending_clobbers))
+   {
+ print_hard_reg_set (file, implicit_reg_pending_clobbers,
+ " implicit clobbers(", false);
+ fprintf (file, ")");
+   }
+  if (!hard_reg_set_empty_p (implicit_reg_pending_uses))
+   {
+ print_hard_reg_set (file, implicit_reg_pending_uses,
+ " implicit uses(", false);
+ fprintf (file, ")");
+   }
+  fprintf (file, "\n");
+}
+}
+
+/* Analyze an INSN with pattern X to find all dependencies.
+   This analysis uses two main data structures:
+   1. reg_pending_* data contains effects of INSN on the register file,
+  which is then incorporated into ...
+   2. deps_desc object, which contains commulative information about all
+  instructions processed from deps_desc object's initialization.  */
 static void
 sched_analyze_insn (class deps_desc *deps, rtx x, rtx_insn *insn)
 {
@@ -3328,6 +3413,9 @@ sched_analyze_insn (class d

Re: [PATCH 01/12] [contrib] validate_failures.py: Avoid testsuite aliasing

2023-09-27 Thread Maxim Kuvyrkov

Hi Bernhard,

Thanks, I meant to fix this, but forgot.

The underlying problem here is that we want to detect which sub-testsuites had 
failures.  Current regex doesn't match go's case because there is no "..." at 
the end: "Running foo" vs "Running foo ..." .

My preferred way of fixing this is to make go's testsuite print out "..." .  We 
have a similar patch for glibc [1].

[1] https://sourceware.org/pipermail/libc-alpha/2023-June/148702.html

--
Maxim Kuvyrkov
https://www.linaro.org

> On Sep 26, 2023, at 19:46, Bernhard Reutner-Fischer  
> wrote:
> 
> Hi Maxim!
> 
> On Mon, 5 Jun 2023 18:06:25 +0400
> Maxim Kuvyrkov via Gcc-patches  wrote:
> 
>>> On Jun 3, 2023, at 19:17, Jeff Law  wrote:
>>> 
>>> On 6/2/23 09:20, Maxim Kuvyrkov via Gcc-patches wrote:  
>>>> This patch adds tracking of current testsuite "tool" and "exp"
>>>> to the processing of .sum files.  This avoids aliasing between
>>>> tests from different testsuites with same name+description.
>>>> E.g., this is necessary for testsuite/c-c++-common, which is ran
>>>> for both gcc and g++ "tools".
>>>> This patch changes manifest format from ...
>>>> 
>>>> FAIL: gcc_test
>>>> FAIL: g++_test
>>>> 
>>>> ... to ...
>>>> 
>>>> === gcc tests ===
>>>> Running gcc/foo.exp ...
>>>> FAIL: gcc_test
>>>> === gcc Summary ==
>>>> === g++ tests ===
>>>> Running g++/bar.exp ...
>>>> FAIL: g++_test
>>>> === g++ Summary ==
>>>> .
>>>> The new format uses same formatting as DejaGnu's .sum files
>>>> to specify which "tool" and "exp" the test belongs to.  
>>> I think the series is fine.  You're not likely to hear from Diego or Doug I 
>>> suspect, I don't think either are involved in GNU stuff anymore.
>>> 
>> 
>> Thanks, Jeff.  I'll wait for a couple of days and will merge if there are no 
>> new comments.
> 
> Maxim, may i ask you to have a look at the following problem, please?
> 
> ISTM that your exp code does not work as expected for go, maybe you
> forgot to test the changes with go enabled?
> 
> Ever since your changes in summer i see the following:
> 
> gcc-14.mine$ 
> /scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py 
> --clean_build ../gcc-14.orig/
> Getting actual results from build directory .
> ./gcc/testsuite/go/go.sum
> ./gcc/testsuite/gcc/gcc.sum
> ./gcc/testsuite/objc/objc.sum
> ./gcc/testsuite/jit/jit.sum
> ./gcc/testsuite/gdc/gdc.sum
> ./gcc/testsuite/gnat/gnat.sum
> ./gcc/testsuite/ada/acats/acats.sum
> ./gcc/testsuite/g++/g++.sum
> ./gcc/testsuite/obj-c++/obj-c++.sum
> ./gcc/testsuite/rust/rust.sum
> ./gcc/testsuite/gfortran/gfortran.sum
> ./x86_64-pc-linux-gnu/libgomp/testsuite/libgomp.sum
> ./x86_64-pc-linux-gnu/libphobos/testsuite/libphobos.sum
> ./x86_64-pc-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum
> ./x86_64-pc-linux-gnu/libffi/testsuite/libffi.sum
> ./x86_64-pc-linux-gnu/libitm/testsuite/libitm.sum
> ./x86_64-pc-linux-gnu/libgo/libgo.sum
> ./x86_64-pc-linux-gnu/libatomic/testsuite/libatomic.sum
> ./gotools/gotools.sum
> .sum file seems to be broken: tool="gotools", exp="None", summary_line="FAIL: 
> TestScript"
> Traceback (most recent call last):
>  File 
> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", 
> line 732, in 
>retval = Main(sys.argv)
>  File 
> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", 
> line 721, in Main
>retval = CompareBuilds()
>  File 
> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", 
> line 622, in CompareBuilds
>actual = GetResults(sum_files)
>  File 
> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", 
> line 466, in GetResults
>build_results.update(ParseSummary(sum_fname))
>  File 
> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", 
> line 405, in ParseSummary
>result = result_set.MakeTestResult(line, ordinal)
>  File 
> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", 
> line 239, in MakeTestResult
>return TestResult(summary_line, ordinal,
>  File 
> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", 
> line 151, in __init__
>raise
> RuntimeError: No active exception to reraise
> 
> 
> The problem seems to be that gotools.sum does not mention any ".exp"
> files.
> 
> $ grep "Running " gotools/gotools.sum 
> Running cmd/go
> Running runtime
> Running cgo
> Running carchive
> Running cmd/vet
> Running embed
> $ grep -c "\.exp" gotools/gotools.sum 
> 0
> 
> The .sum files looks like this:
> ---8<---
> Test Run By foo on Tue Sep 26 14:46:48 CEST 2023
> Native configuration is x86_64-foo-linux-gnu
> 
>=== gotools tests ===
> 
> Running cmd/go
> UNTESTED: TestAccidentalGitCheckout
> PASS: TestAlwaysLinkSysoFiles
> ...
> UNTESTED: TestParallelTest
> FAIL: TestScript
> ...
> ---8<---
> 
> May i ask you to have a look, please?
> 
> TIA,

Re: [PATCH, libgcc] Fix licenses on several files

2013-08-26 Thread Maxim Kuvyrkov

On 29/07/2013, at 10:03 AM, Maxim Kuvyrkov wrote:

> While verifying license compliance for GCC and its libraries I noticed that 
> several libgcc files that end up in the final library are licensed under 
> GPL-3.0+ instead of GPL-3.0-with-GCC-exception.
> 
> This is, obviously, was not the intention of developers who just copied wrong 
> boilerplate text, and this patch fixes the oversights.

I wrote up a blog post about GCC licenses and ideas on their audit at 
http://www.kugelworks.com/blog/gcc-license-audit/ .  The blog post mentions two 
improvements in handling licenses in GCC and I will follow up on them in 
separate threads.

--
Maxim Kuvyrkov
www.kugelworks.com

Re: [PATCH] Enable non-complex math builtins from C99 for Bionic

2013-09-03 Thread Maxim Kuvyrkov

RGET_LIBC_HAS_FUNCTION no_c99_libc_has_function
> diff --git a/gcc/config/moxie/uclinux.h b/gcc/config/moxie/uclinux.h
> index 498037e..85c65f2 100644
> --- a/gcc/config/moxie/uclinux.h
> +++ b/gcc/config/moxie/uclinux.h
> @@ -37,3 +37,6 @@ see the files COPYING3 and COPYING.RUNTIME
> respectively.  If not, see
>  --wrap=mmap --wrap=munmap --wrap=alloca\
>  %{fmudflapth: --wrap=pthread_create\
> }} %{fmudflap|fmudflapth: --wrap=main}"
> +
> +#undef TARGET_LIBC_HAS_FUNCTION
> +#define TARGET_LIBC_HAS_FUNCTION no_c99_libc_has_function
> 
> 

Thanks,

--
Maxim Kuvyrkov
www.kugelworks.com

Re: [bootstrap] Fix build for several targets (including pr58242)

2013-09-04 Thread Maxim Kuvyrkov

On 3/09/2013, at 5:53 PM, Jakub Jelinek wrote:

> On Tue, Sep 03, 2013 at 09:25:31AM +0400, Alexander Ivchenko wrote:
>> Several builds are broken after r201838.
> 
> What targets actually support bionic?  If it is just arm, i?86/x86_64
> and perhaps aarch64 and nothing else, I'd like to question the way where
> you enforce all the
>  # Add Android userspace support to Linux targets.
>  case $target in
>*linux*)
>  tm_p_file="${tm_p_file} linux-protos.h"
>  tmake_file="${tmake_file} t-linux-android"
>  tm_file="$tm_file linux-android.h"
>  extra_options="$extra_options linux-android.opt"
>  extra_objs="$extra_objs linux-android.o"
>  ;;
>  esac
> stuff onto all Linux targets, without actually testing all of them.
> Wouldn't it be much better not to support Android in any way
> in config/linux.h (say at most #define TARGET_HAS_BIONIC 0 there,
> #ifdef HAVE_GNU_INDIRECT_FUNCTION
> #define TARGET_LIBC_HAS_FUNCTION hook_bool*true
> #endif
> etc.), and append the android stuff only to targets which you
> support, not before the target specific snippets in config.gcc,
> but after those (so linux-android.h, etc. come last, not first)?

Jakub,

Nominal handling of Bionic by all *linux* targets is my fault, and it seemed 
like a good thing at the time.  With several iterations of improvements laid on 
top of this initial decision it now becoming a mess.  I'm thinking along 
similar lines as you to restrict Bionic/Android handling only to targets that 
actually support it.

As Alexander said in his email, this rework should be done as a separate patch.

Thanks,

--
Maxim Kuvyrkov
www.kugelworks.com

Re: [PATCH] Enable non-complex math builtins from C99 for Bionic

2013-09-04 Thread Maxim Kuvyrkov

On 4/09/2013, at 7:43 PM, Alexander Ivchenko wrote:

> Hi Maxim,
> 
> 2013/9/4 Maxim Kuvyrkov :
>> On 23/08/2013, at 1:04 AM, Alexander Ivchenko wrote:
>> 
>>> Ugh.. thanks, you are right. That points to another problem that I
>>> didn't see before:
>>> 
>>> 3) *linux* targets that do not append to tm_p_file (s390x-*-linux* and
>>> s390x-ibm-tpf* - your patch addresses that problem correctly) OR
>>> tmake_file (bfin*-linux-uclibc* or crisv32-*-linux* | cris-*-linux*)
>> 
>> Could you be more verbose, please?  What some of the *linux* target do not 
>> append to tm_p_file?
> 
> *linux* targets that do not append to tm_p_file are 390x-*-linux* and
> s390x-ibm-tpf* and Andrew is already addressed this problem.

OK.

> 
> 
>> It seems that there are at least two separate problems:
>> 
>> 1. OPTION_BIONIC is not defined in linux-android.c .  I think the right fix 
>> here is to copy definitions of OPTION_BIONIC (and, optionally, 
>> OPTION_UCLIBC) from gcc/config/linux.h to rs6000/linux.h, rs6000/linux64.h 
>> and alpha/linux.h -- in other words, define OPTION_BIONIC and OPTION_UCLIBC 
>> whenever OPTION_GLIBC is defined.
> 
> Why can't we just use the checks like "if (linux_libc == LIBC_GLIBC)".
> Seems that we have all infrastructure in place (linux_libc,
> LIBC_GLIBC, LIBC_BIONIC, LIBC_UCLIBC are defined for all *linux*
> targets). I don't see the reason to make new defines.

Same reason why the existing defines are in place.  OPTION_X is not always the 
same as (linux_libc == LIBC_X).

> 
>> 2. The second problem is to do with definitions of TARGET_LIBC_HAS_FUNCTION 
>> for bfin, c6x, lm32, m68k and moxie.  What is the failure scenario here?
> 
> "For them we have ar: linux-android.o: No such file or directory" So I
> added t-linux-android rules to tmake_file.
> 
> (and for bfin additionally we have:
> "../../gcc/gcc/config/bfin/bfin.c:5812:29: error:
> ‘linux_android_libc_has_function’ was not declared in this scope".
> because
> the tm_p.h is not included in bfin.c)

OK.

> 
>> 
>>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>>> index 7e1d529..89cf30a 100644
>>> --- a/gcc/config.gcc
>>> +++ b/gcc/config.gcc
>>> @@ -1018,7 +1018,7 @@ bfin*-uclinux*)
>>> ;;
>>> bfin*-linux-uclibc*)
>>> tm_file="${tm_file} dbxelf.h elfos.h bfin/elf.h gnu-user.h linux.h
>>> glibc-stdint.h bfin/linux.h ./linux-sysroot-suffix.h"
>>> - tmake_file="bfin/t-bfin-linux t-slibgcc"
>>> + tmake_file="bfin/t-bfin-linux t-slibgcc t-linux-android"
>>> use_collect2=no
>>> ;;
>>> bfin*-rtems*)
>> 
>> Why?  Bfin has nothing to do with android.
> 
> Since ‘linux_android_libc_has_function’ handles all three libc options
> for all linux targets, that's natural to use this function. May be it
> should be called "linux_libc_has_function" and be located in linux.c
> (we don't have this file..).

OK.

Ideally we would have linux.c with linux_libc_has_function that knows nothing 
about Bionic and linux-android.c with linux_android_libc_has_function.

The patch is OK with definitions of OPTION_GLIBC, OPTION_UCLIBC and 
OPTION_BIONIC copied verbatim from gcc/config/linux.h to rs6000 and alpha 
versions.

It is then on my plate to refactor handling of Bionic libc and remove it from 
targets that don't support it.

Thanks,

--
Maxim Kuvyrkov
www.kugelworks.com

Cleanup Linux libc selection and Android support

2013-09-18 Thread Maxim Kuvyrkov

Following recent breakage caused by adding nominal Android support to all 
*linux* targets [*] this patch series cleans up libc selection for Linux 
targets (-mglibc/-muclibc/-mbionic), splits libc selection logic from Android 
support, and removes Android handling from targets that don't support it.

[*] http://thread.gmane.org/gmane.comp.gcc.patches/277430/focus=292362

Special thanks goes to Alexander who tested and reviewed initial versions of 
these patches and fixed several problems.

The patch series was tested on various Linux and uClinux targets including arm, 
bfin, c6x, m68k, mips, powerpc, x86, x86_64.

Patches will be posted in their separate threads, and below is a summary.  
Individually the patches are all borderline trivial.

Reviews and approvals are welcome!

- 0001-Rename-files-for-libc-selection-on-Linux-targets
Mechanical rename of files in preparation for splitting Android handling from 
libc selection.

- 0002-Rename-functions-relating-to-libc-support-on-Linux-t
Mechanical rename of functions.

- 0003-Robustify-check-for-IFUNC-support
Trivial fix.

- 0004-Cleanup-definitions-of-libc-related-target-hooks
Consolidate definitions of libc target hooks in linux.h

- 0005-Cleanup-libc-selection-and-Android-support
Split Android handling from libc selection and remove Android handling from 
targets that don't support it.

Thanks,

--
Maxim Kuvyrkov
www.kugelworks.com

[PATCH 1/5] Rename files for libc selection on Linux targets

2013-09-18 Thread Maxim Kuvyrkov

Mechanical rename of files in preparation for splitting Android handling from 
libc selection.

The patch series was tested on various Linux and uClinux targets including arm, 
bfin, c6x, m68k, mips, powerpc, x86, x86_64.

OK to apply?

--
Maxim Kuvyrkov
www.kugelworks.com



0001-Rename-files-for-libc-selection-on-Linux-targets.patch
Description: Binary data

[PATCH 2/5] Rename functions relating to libc support on Linux targets

2013-09-18 Thread Maxim Kuvyrkov

Mechanical rename of functions.

The patch series was tested on various Linux and uClinux targets including arm, 
bfin, c6x, m68k, mips, powerpc, x86, x86_64.

OK to apply?

--
Maxim Kuvyrkov
www.kugelworks.com



0002-Rename-functions-relating-to-libc-support-on-Linux-t.patch
Description: Binary data

[PATCH 3/5] Robustify check for IFUNC support

2013-09-18 Thread Maxim Kuvyrkov

This patch changes check in linux_has_ifunc_p() from TARGET_ANDROID to 
OPTION_BIONIC.  These two predicates are, currently, the same, but it is better 
to check for a specific C library rather than operational environment the 
library is commonly associated with.

The patch series was tested on various Linux and uClinux targets including arm, 
bfin, c6x, m68k, mips, powerpc, x86, x86_64.

OK to apply?

Thanks,

--
Maxim Kuvyrkov
www.kugelworks.com



0003-Robustify-check-for-IFUNC-support.patch
Description: Binary data

[PATCH 4/5] Cleanup definitions of libc-related target hooks

2013-09-18 Thread Maxim Kuvyrkov

This patch consolidates definitions of TARGET_LIBC_HAS_FUNCTION and 
TARGET_HAS_IFUNC_P hooks in linux.h from linux-android.h and uclinux.h files of 
various architectures.

One caveat to consolidating definitions in linux.h is that both *-linux-* and 
*-uclinux-* targets include config/linux.h, but *-uclinux-* targets then skip 
linux.opt, linux-protos.h and linux.c, which declare and define functions 
implementing target hooks.

Since we don't really want to add linux.opt and company to *-uclinux-* targets 
we check for "#if *-uclinux-*" in linux.h and define TARGET_LIBC_HAS_FUNCTION 
and TARGET_HAS_IFUNC_P separately for *-linux-* and for *-uclinux-*.

The patch series was tested on various Linux and uClinux targets including arm, 
bfin, c6x, m68k, mips, powerpc, x86, x86_64.

OK to apply?

Thanks,

--
Maxim Kuvyrkov
www.kugelworks.com




0004-Cleanup-definitions-of-libc-related-target-hooks.patch
Description: Binary data

[PATCH 5/5] Cleanup libc selection and Android support

2013-09-18 Thread Maxim Kuvyrkov

This patch splits Android handling from libc selection and removes Android 
handling from targets that don't support it.

The patch series was tested on various Linux and uClinux targets including arm, 
bfin, c6x, m68k, mips, powerpc, x86, x86_64.

OK to apply?

Thanks,

--
Maxim Kuvyrkov
www.kugelworks.com



0005-Cleanup-libc-selection-and-Android-support.patch
Description: Binary data

[PATCH] Add testcase for PR61033

2016-02-20 Thread Maxim Kuvyrkov

This patch adds a testcase for PR61033 [1].

The bug is about compiler going into infinite loop while solving data-flow in 
vt_find_locations on arm-* targets.  The issue in fixed in GCC 5 and later by 
Richard B.'s r211624 [2].

The bug affects GCC 4.8 and 4.9, but is unlikely to be fixed there.

This patch adds a regression test to make sure we catch the problem should it 
re-appear again.

Tested on arm-linux-gnueabihf and x86_64-linux-gnu.

OK to apply?


[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61033

[2] http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=211625
2014-06-13  Richard Biener  

1bf2ea41.diff
Description: Binary data

[PATCH,testsuite] Print markers to stderr to avoid races with sanitizer output

2016-03-01 Thread Maxim Kuvyrkov

This patch improves sanitizer testsuite to avoid sporadic failures, especially 
when [cross-]testing on a remote machine.

There are several unstable sanitizer tests in GCC testsuite (e.g., 
g++.dg/tsan/aligned_vs_unaligned_race.C), and the instability comes from the 
test doing this:

 printf("Pass\n");
 /* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */
 /* { dg-output "Pass.*" } */

.  Thread sanitizer prints its data to stderr, and then printf prints to 
stdout.  The testcase expects stderr to be flushed before stdout, and "Pass" to 
be at the end of the output.  AFAIK, that works fine in native/local 
environment, but fails in remote testing from time to time.  The failure case 
happens when stdout output is printed before stderr output.  This is, likely, 
because there is no strict ordering of stdout and stderr when remote testing.

This patch fixes the affected tests in GCC testsuite, and I'll be also posting 
a similar patch for LLVM testsuite.

OK for trunk?

--
Maxim Kuvyrkov
www.linaro.org




8cffd698.diff
Description: Binary data

Re: [PATCH,testsuite] Print markers to stderr to avoid races with sanitizer output

2016-03-01 Thread Maxim Kuvyrkov

> On Mar 1, 2016, at 12:31 PM, Maxim Kuvyrkov  wrote:
> 
> This patch improves sanitizer testsuite to avoid sporadic failures, 
> especially when [cross-]testing on a remote machine.
> 
> There are several unstable sanitizer tests in GCC testsuite (e.g., 
> g++.dg/tsan/aligned_vs_unaligned_race.C), and the instability comes from the 
> test doing this:
> 
> printf("Pass\n");
> /* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */
> /* { dg-output "Pass.*" } */
> 
> .  Thread sanitizer prints its data to stderr, and then printf prints to 
> stdout.  The testcase expects stderr to be flushed before stdout, and "Pass" 
> to be at the end of the output.  AFAIK, that works fine in native/local 
> environment, but fails in remote testing from time to time.  The failure case 
> happens when stdout output is printed before stderr output.  This is, likely, 
> because there is no strict ordering of stdout and stderr when remote testing.
> 
> This patch fixes the affected tests in GCC testsuite, and I'll be also 
> posting a similar patch for LLVM testsuite.

FYI, this was posted to LLVM (and approved) here: 
http://reviews.llvm.org/D17757 .

> 
> OK for trunk?


--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH] Fix detection of setrlimit in libstdc++ testsuite

2016-03-02 Thread Maxim Kuvyrkov

PING ^ 2.  The patch is sitting without comments for 4+ months.

--
Maxim Kuvyrkov
www.linaro.org



> On Dec 10, 2015, at 4:47 PM, Maxim Kuvyrkov  wrote:
> 
>> On Nov 11, 2015, at 7:56 PM, Maxim Kuvyrkov  
>> wrote:
>> 
>> Hi,
>> 
>> This patch fixes an obscure cross-testing problem that crashed (OOMed) our 
>> boards at Linaro.  Several tests in libstdc++ (e.g., [1]) limit themselves 
>> to some reasonable amount of RAM and then try to allocate 32 gigs.  
>> Unfortunately, the configure test that checks presence of setrlimit is 
>> rather strange: if target is native, then try compile file with call to 
>> setrlimit -- if compilation succeeds, then use setrlimit, otherwise, ignore 
>> setrlimit.  The strange part is that the compilation check is done only for 
>> native targets, as if cross-toolchains can't generate working executables.  
>> [This is rather odd, and I might be missing some underlaying caveat.]
>> 
>> Therefore, when testing a cross toolchain, the test [1] still tries to 
>> allocate 32GB of RAM with no setrlimit restrictions.  On most targets that 
>> people use for cross-testing this is not an issue because either
>> - the target is 32-bit, so there is no 32GB user-space to speak of, or
>> - the target board has small amount of RAM and no swap, so allocation 
>> immediately fails, or
>> - the target board has plenty of RAM, so allocating 32GB is not an issue.
>> 
>> However, if one is testing on a 64-bit board with 16GB or RAM and 16GB of 
>> swap, then one gets into an obscure near-OOM swapping condition.  This is 
>> exactly the case with cross-testing aarch64-linux-gnu toolchains on APM 
>> Mustang.
>> 
>> The attached patch removes "native" restriction from configure test for 
>> setrlimit.  This enables setrlimit restrictions on the testsuite, and the 
>> test [1] expectedly fails to allocate 32GB due to setrlimit restriction.
>> 
>> I have tested it on x86_64-linux-gnu and i686-linux-gnu native toolchains, 
>> and aarch64-linux-gnu and arm-linux-gnueabi[hf] cross-toolchains with no 
>> regressions [*].
>> 
>> OK to commit?
>> 
>> I didn't go as far as enabling setenv/locale tests when cross-testing 
>> libstdc++ because I remember of issues with generating locales in 
>> cross-built glibc.  In any case, locale tests are unlikely to OOM the test 
>> board the way that absence of setrlimit does.
>> 
>> [1] 27_io/ios_base/storage/2.cc
>> 
>> [*] Cross-testing using user-mode QEMU made 27_io/fpos/14775.cc execution 
>> test to FAIL.  This test uses setrlimit set max file size, and is 
>> misbehaving only under QEMU.  I believe this a QEMU issue with not handling 
>> setrlimit correctly.
>> 
> 
> Ping.
> 
> --
> Maxim Kuvyrkov
> www.linaro.org


0001-Use-setrlimit-for-testing-libstdc-in-cross-toolchain.patch
Description: Binary data

Re: Cleanup Linux libc selection and Android support

2013-12-07 Thread Maxim Kuvyrkov

On 6/11/2013, at 1:44 pm, Maxim Kuvyrkov  wrote:

> On 19/09/2013, at 8:26 am, Maxim Kuvyrkov  wrote:
> 
>> Following recent breakage caused by adding nominal Android support to all 
>> *linux* targets [*] this patch series cleans up libc selection for Linux 
>> targets (-mglibc/-muclibc/-mbionic), splits libc selection logic from 
>> Android support, and removes Android handling from targets that don't 
>> support it.
> 
> Ping.  Anyone wants to review these cleanup patches to config.gcc, config/t-* 
> and config/*.h files, or should I just start committing them quietly :-P ?

This patch series was approved by Jeff Law, and I've committed them after 
retesting on current mainline.

Thank you,

--
Maxim Kuvyrkov
www.kugelworks.com

[PATCH] Fix mips64-linux and s390x-linux builds

2013-12-09 Thread Maxim Kuvyrkov

On 9/12/2013, at 8:21 am, Maxim Kuvyrkov  wrote:

> On 9/12/2013, at 3:24 am, Jan-Benedict Glaw  wrote:
> 
>> Hi Maxim!
>> 
>> One of your recent libc<->android clean-up patches broke the
>> mips64-linux target as a side-effect, see eg.
>> http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=53806:
>> 
>> g++ -c  -DIN_GCC_FRONTEND -DIN_GCC_FRONTEND -g -O2 -DIN_GCC  
>> -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions -fno-rtti 
>> -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
>> -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long 
>> -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H 
>> -I. -Ic-family -I/home/jbglaw/repos/gcc/gcc 
>> -I/home/jbglaw/repos/gcc/gcc/c-family 
>> -I/home/jbglaw/repos/gcc/gcc/../include 
>> -I/home/jbglaw/repos/gcc/gcc/../libcpp/include  
>> -I/home/jbglaw/repos/gcc/gcc/../libdecnumber 
>> -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
>> -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o c-family/c-cppbuiltin.o 
>> -MT c-family/c-cppbuiltin.o -MMD -MP -MF c-family/.deps/c-cppbuiltin.TPo 
>> /home/jbglaw/repos/gcc/gcc/c-family/c-cppbuiltin.c
>> /home/jbglaw/repos/gcc/gcc/c-family/c-cppbuiltin.c: In function ‘void 
>> c_cpp_builtins(cpp_reader*)’:
>> /home/jbglaw/repos/gcc/gcc/c-family/c-cppbuiltin.c:1014:370: error: 
>> ‘ANDROID_TARGET_OS_CPP_BUILTINS’ was not declared in this scope
>> make[1]: *** [c-family/c-cppbuiltin.o] Error 1
> 
> I'm looking into this.

My recent patches to cleanup support for Android/Bionic for *-linux-* targets 
broke mips64-linux and s390x-linux builds.  Unfortunately, these targets fell 
out from the test coverage of these cleanups.

The problems are in missing declarations, and are trivial to fix by adding 
necessary files in config.gcc.

The fix for mips64-linux (and, also, mips*-mti-linux*) is to add 
linux-android.h and linux-android.opt to the target header/option files -- thus 
enabling android handling for all mips*-linux* toolchains.

The fix for s390x-linux is to add linux-protos.h to target headers.  No android 
handling is added to s390x-linux.

Richard, Andreas, are you OK with changes to your respective ports?

Thank you,

--
Maxim Kuvyrkov
www.kugelworks.com

0001-config.gcc-mips-mti-linux-mips64-linux.patch
Description: Binary data

Re: [PATCH] Fix mips64-linux and s390x-linux builds

2013-12-10 Thread Maxim Kuvyrkov

On 10/12/2013, at 1:07 am, Maxim Kuvyrkov  wrote:

> My recent patches to cleanup support for Android/Bionic for *-linux-* targets 
> broke mips64-linux and s390x-linux builds.  Unfortunately, these targets fell 
> out from the test coverage of these cleanups.
> 
> The problems are in missing declarations, and are trivial to fix by adding 
> necessary files in config.gcc.
> 
> The fix for mips64-linux (and, also, mips*-mti-linux*) is to add 
> linux-android.h and linux-android.opt to the target header/option files -- 
> thus enabling android handling for all mips*-linux* toolchains.
> 
> The fix for s390x-linux is to add linux-protos.h to target headers.  No 
> android handling is added to s390x-linux.
> 
> Richard, Andreas, are you OK with changes to your respective ports?

Given that the one-line fix for s390x-linux is trivial (just add file with 
function prototypes to the list of headers), and that Richard approved 
mips*-linux part of the patch -- I'm going to commit the patch tomorrow unless 
I hear something to the contrary.  I don't want to keep mips and s390x builds 
broken for too much longer.

Thank you,

--
Maxim Kuvyrkov
www.kugelworks.com

Re: [testsuite] Fix potential race conditions in gfortran tests

2015-10-22 Thread Maxim Kuvyrkov

Hi,

And this is a similar patch for gfortran.dg/streamio_N.f90 tests.

OK to commit?

--
Maxim Kuvyrkov
www.linaro.org



> On Oct 18, 2015, at 4:17 PM, Mikael Morin  wrote:
> 
> Le 16/10/2015 10:08, Christophe Lyon a écrit :
>> Hi,
>> 
>> We have noticed a few random failures in gfortran tests in our validations.
>> 
>> Maxim investigated some of them and noticed a possible race condition
>> in the streamio tests, for which he'll post a patch.
>> 
>> I looked for other similar cases (checking which files are unlinked
>> several times during 'make check'), and noticed that a few other cases
>> might read/write/delete the same file concurrently.
>> 
>> The proposed fix is to use different names for the different testcases.
>> 
>> I ran make check -jX on various ARM/AArch64 configuration with no regression.
>> 
>> OK?
>> 
> Looks good to me.  Thanks.
> 
> Mikael



0001-Fix-race-on-temp-file-in-gfortran-streamio_-.f90-tes.ChangeLog
Description: Binary data


0001-Fix-race-on-temp-file-in-gfortran-streamio_-.f90-tes.patch
Description: Binary data

[PATCH] Fix detection of setrlimit in libstdc++ testsuite

2015-11-11 Thread Maxim Kuvyrkov

Hi,

This patch fixes an obscure cross-testing problem that crashed (OOMed) our 
boards at Linaro.  Several tests in libstdc++ (e.g., [1]) limit themselves to 
some reasonable amount of RAM and then try to allocate 32 gigs.  Unfortunately, 
the configure test that checks presence of setrlimit is rather strange: if 
target is native, then try compile file with call to setrlimit -- if 
compilation succeeds, then use setrlimit, otherwise, ignore setrlimit.  The 
strange part is that the compilation check is done only for native targets, as 
if cross-toolchains can't generate working executables.  [This is rather odd, 
and I might be missing some underlaying caveat.]

Therefore, when testing a cross toolchain, the test [1] still tries to allocate 
32GB of RAM with no setrlimit restrictions.  On most targets that people use 
for cross-testing this is not an issue because either
- the target is 32-bit, so there is no 32GB user-space to speak of, or
- the target board has small amount of RAM and no swap, so allocation 
immediately fails, or
- the target board has plenty of RAM, so allocating 32GB is not an issue.

However, if one is testing on a 64-bit board with 16GB or RAM and 16GB of swap, 
then one gets into an obscure near-OOM swapping condition.  This is exactly the 
case with cross-testing aarch64-linux-gnu toolchains on APM Mustang.

The attached patch removes "native" restriction from configure test for 
setrlimit.  This enables setrlimit restrictions on the testsuite, and the test 
[1] expectedly fails to allocate 32GB due to setrlimit restriction.

I have tested it on x86_64-linux-gnu and i686-linux-gnu native toolchains, and 
aarch64-linux-gnu and arm-linux-gnueabi[hf] cross-toolchains with no 
regressions [*].

OK to commit?

I didn't go as far as enabling setenv/locale tests when cross-testing libstdc++ 
because I remember of issues with generating locales in cross-built glibc.  In 
any case, locale tests are unlikely to OOM the test board the way that absence 
of setrlimit does.

[1] 27_io/ios_base/storage/2.cc

[*] Cross-testing using user-mode QEMU made 27_io/fpos/14775.cc execution test 
to FAIL.  This test uses setrlimit set max file size, and is misbehaving only 
under QEMU.  I believe this a QEMU issue with not handling setrlimit correctly.

--
Maxim Kuvyrkov
www.linaro.org




0001-Use-setrlimit-for-testing-libstdc-in-cross-toolchain.patch
Description: Binary data

Re: [PATCH] Fix detection of setrlimit in libstdc++ testsuite

2015-12-10 Thread Maxim Kuvyrkov

> On Nov 11, 2015, at 7:56 PM, Maxim Kuvyrkov  wrote:
> 
> Hi,
> 
> This patch fixes an obscure cross-testing problem that crashed (OOMed) our 
> boards at Linaro.  Several tests in libstdc++ (e.g., [1]) limit themselves to 
> some reasonable amount of RAM and then try to allocate 32 gigs.  
> Unfortunately, the configure test that checks presence of setrlimit is rather 
> strange: if target is native, then try compile file with call to setrlimit -- 
> if compilation succeeds, then use setrlimit, otherwise, ignore setrlimit.  
> The strange part is that the compilation check is done only for native 
> targets, as if cross-toolchains can't generate working executables.  [This is 
> rather odd, and I might be missing some underlaying caveat.]
> 
> Therefore, when testing a cross toolchain, the test [1] still tries to 
> allocate 32GB of RAM with no setrlimit restrictions.  On most targets that 
> people use for cross-testing this is not an issue because either
> - the target is 32-bit, so there is no 32GB user-space to speak of, or
> - the target board has small amount of RAM and no swap, so allocation 
> immediately fails, or
> - the target board has plenty of RAM, so allocating 32GB is not an issue.
> 
> However, if one is testing on a 64-bit board with 16GB or RAM and 16GB of 
> swap, then one gets into an obscure near-OOM swapping condition.  This is 
> exactly the case with cross-testing aarch64-linux-gnu toolchains on APM 
> Mustang.
> 
> The attached patch removes "native" restriction from configure test for 
> setrlimit.  This enables setrlimit restrictions on the testsuite, and the 
> test [1] expectedly fails to allocate 32GB due to setrlimit restriction.
> 
> I have tested it on x86_64-linux-gnu and i686-linux-gnu native toolchains, 
> and aarch64-linux-gnu and arm-linux-gnueabi[hf] cross-toolchains with no 
> regressions [*].
> 
> OK to commit?
> 
> I didn't go as far as enabling setenv/locale tests when cross-testing 
> libstdc++ because I remember of issues with generating locales in cross-built 
> glibc.  In any case, locale tests are unlikely to OOM the test board the way 
> that absence of setrlimit does.
> 
> [1] 27_io/ios_base/storage/2.cc
> 
> [*] Cross-testing using user-mode QEMU made 27_io/fpos/14775.cc execution 
> test to FAIL.  This test uses setrlimit set max file size, and is misbehaving 
> only under QEMU.  I believe this a QEMU issue with not handling setrlimit 
> correctly.
> 

Ping.

--
Maxim Kuvyrkov
www.linaro.org

Re: One more patch for PR91333

2020-02-17 Thread Maxim Kuvyrkov

Hi Vladimir,

This patch increases code size at -Os on arm-linux-gnueabihf by 1% (with no 
code-size reductions) on several SPEC CPU2006 benchmarks:

400.perlbench,perlbench_base.default ,580842,583842
429.mcf,mcf_base.default ,7867,7955
403.gcc,gcc_base.default ,1726449,1736149
433.milc,milc_base.default   ,66328,66816
456.hmmer,hmmer_base.default ,148394,149434
482.sphinx3,sphinx_livepretend_base.default  ,99183,99863

Could you look into whether these regressions can be avoided?

Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org

> On Feb 2, 2020, at 7:46 PM, Vladimir Makarov  wrote:
> 
>   The previous patch for
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91333
> 
> resulted in aarch64 testsuite failures.
> 
> The following patch solves some of the failures and modified the PR test as 
> the generated code changed.
> 
> The patch was successfully bootstrapped on x86-64 and benchmarked on SPEC2000.
> 
> 
>

Re: One more patch for PR91333

2020-02-18 Thread Maxim Kuvyrkov

> On Feb 18, 2020, at 6:30 PM, Vladimir Makarov  wrote:
> 
> On 2/17/20 10:08 AM, Maxim Kuvyrkov wrote:
>> Hi Vladimir,
>> 
>> This patch increases code size at -Os on arm-linux-gnueabihf by 1% (with no 
>> code-size reductions) on several SPEC CPU2006 benchmarks:
>> 
>> 400.perlbench,perlbench_base.default ,580842,583842
>> 429.mcf,mcf_base.default ,7867,7955
>> 403.gcc,gcc_base.default ,1726449,1736149
>> 433.milc,milc_base.default   ,66328,66816
>> 456.hmmer,hmmer_base.default ,148394,149434
>> 482.sphinx3,sphinx_livepretend_base.default  ,99183,99863
>> 
>> Could you look into whether these regressions can be avoided?
>> 
> Sure, Maxim.  I'll look into it.  If I can not solve the problem, I probably 
> revert the patch.

Thanks, Vladimir.  Forgot to mention that this is for Thumb2 
(--target=arm-linux-gnueabihf --with-mode=thumb).

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org

Re: [PATCH 4/6] Port prefetch configuration from aarch32 to aarch64

2017-06-09 Thread Maxim Kuvyrkov

> On Jun 8, 2017, at 6:13 PM, Richard Earnshaw (lists) 
>  wrote:
> 
> On 08/06/17 14:47, James Greenhalgh wrote:
>> On Mon, Jan 30, 2017 at 08:35:00AM -0800, Andrew Pinski wrote:
>>> On Mon, Jan 30, 2017 at 3:48 AM, Maxim Kuvyrkov
>>>  wrote:
>>>> This patch port prefetch configuration from aarch32 backend to aarch64.  
>>>> There is no code-generation change from this patch.
>>>> 
>>>> This patch also happens to address Kyrill's comment on Andrew's 
>>>> prefetching patch at 
>>>> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02133.html .
>>>> 
>>>> This patch also fixes a minor bug in aarch64_override_options_internal(), 
>>>> which used "selected_cpu->tune" instead of "aarch64_tune_params".
>>> 
>>> I am not a fan of the macro at all.
>> 
>> I'm with Andrew for this. The precedent in the AArch64 port is for
>> explicitly spelling this out, as we do with the branch costs, approx_modes,
>> vector costs etc.
>> 
>> I'd rather we went that route than the macro you're using. I don't have
>> any objections to the rest of your patch.
>> 
>> Thanks,
>> James
>> 
> 
> I disagree with having to write all these things out, but I do agree
> that we should be self-consistent within the port.

I'm re-writing the patch with approach that James suggested, and instead of 
AARCH64_PREFETCH_NOT_BENEFICIAL there will be "&generic_prefetch_tune", so not 
much copy-paste of parameters.

> 
> The purpose of introducing the macros in the ARM port was to avoid the
> common problem of merging adding new parameters with adding new ports.
> C initializers for these tables assign values sequentially  with
> zero-padding at the end, so failing to merge such changes carefully
> leads to real bugs in the compiler (even if just performance bugs).
> 
> I notice that the last entry in the current tune params table is an int,
> rather than something that is type-checked (like the penultimate entry -
> an enum).  Having the final entry be type checked at least ensures that
> the right number of elements exist, even if the order is not strictly
> checked.

If you prefer, I can move "&generic_prefetch_tune" to the bottom of "struct 
tune_params", thus enabling type checking on struct pointer.

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH 4/6] Port prefetch configuration from aarch32 to aarch64

2017-06-09 Thread Maxim Kuvyrkov

> On Jun 9, 2017, at 1:04 PM, James Greenhalgh  wrote:
>>> 
...
>>> I disagree with having to write all these things out, but I do agree
>>> that we should be self-consistent within the port.
>> 
>> I'm re-writing the patch with approach that James suggested, and instead of
>> AARCH64_PREFETCH_NOT_BENEFICIAL there will be "&generic_prefetch_tune", so
>> not much copy-paste of parameters.
>> 
>>> 
>>> The purpose of introducing the macros in the ARM port was to avoid the
>>> common problem of merging adding new parameters with adding new ports.
>>> C initializers for these tables assign values sequentially  with
>>> zero-padding at the end, so failing to merge such changes carefully
>>> leads to real bugs in the compiler (even if just performance bugs).
>>> 
>>> I notice that the last entry in the current tune params table is an int,
>>> rather than something that is type-checked (like the penultimate entry -
>>> an enum).  Having the final entry be type checked at least ensures that
>>> the right number of elements exist, even if the order is not strictly
>>> checked.
>> 
>> If you prefer, I can move "&generic_prefetch_tune" to the bottom of "struct
>> tune_params", thus enabling type checking on struct pointer.
> 
> For the zero-cost/small-benefit tradefoff it gives, I'd say "why not".
> 
> With that change in place and the obvious rebase needed for patch 6/6 in
> place, both this (4/6) and patch 6/6 are OK.

Thanks, James.

I'm going to commit this once bootstrap / regtest finish on aarch64-linux-gnu.

--
Maxim Kuvyrkov
www.linaro.org




0001-Port-prefetch-configuration-from-aarch32-to-aarch64-.patch
Description: Binary data

Re: [PATCH 5/6][AArch64] Enable -fprefetch-loop-arrays at -O3 for cores that benefit from prefetching.

2017-06-09 Thread Maxim Kuvyrkov

> On Jun 8, 2017, at 7:31 PM, James Greenhalgh  wrote:
> 
> On Fri, Feb 03, 2017 at 02:58:23PM +0300, Maxim Kuvyrkov wrote:
>>> On Jan 30, 2017, at 5:50 PM, Maxim Kuvyrkov  
>>> wrote:
>>> 
>>>> On Jan 30, 2017, at 3:23 PM, Kyrill Tkachov  
>>>> wrote:
>>>> 
>>>> Hi Maxim,
>>>> 
>>>> On 30/01/17 12:06, Maxim Kuvyrkov wrote:
>>>>> This patch enables prefetching at -O3 for aarch64 cores that set 
>>>>> "simultaneous prefetches" parameter above 0.  There are currently no such 
>>>>> settings, so this patch doesn't change default code generation.
>>>>> 
>>>>> I'm now working on improvements to -fprefetch-loop-arrays pass to make it 
>>>>> suitable for -O2. I'll post this work in the next month.
>>>>> 
>>>>> Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.
>>>>> 
>>>> 
>>>> Are you aiming to get this in for GCC 8?
>>>> I have one small comment on this patch:
>>>> 
>>>> +  /* Enable sw prefetching at -O3 for CPUS that have prefetch, and we
>>>> + have deemed it beneficial (signified by setting
>>>> + prefetch.num_slots to 1 or more).  */
>>>> +  if (flag_prefetch_loop_arrays < 0
>>>> +  && HAVE_prefetch
>>>> 
>>>> HAVE_prefetch will always be true on aarch64.
>>>> I imagine midend code that had logic like this would need this check, but 
>>>> aarch64-specific code shouldn't need it.
>>> 
>>> Agree, I'll remove HAVE_prefetch.
>>> 
>>> This pattern was copied from other backends, and HAVE_prefetch is most 
>>> likely a historical artifact.
>> 
>> Andrew raised a good point in the review of his patch that it is a bad idea
>> to use one of prefetching parameters (simultaneous_prefetches) as indicator
>> for whether to enable prefetching pass by default.  Indeed there are cases
>> when we want to set simultaneous_prefetch according to HW documentation (or
>> experimental results), but not enable prefetching pass by default.
>> 
>> This update to the patch addresses it.  The patch adds a new explicit field
>> to prefetch tuning structure "default_opt_level" that sets optimization level
>> from which prefetching should be enabled by default.  The current value is to
>> enable prefetching at -O3; additionally, this parameter will come handy for
>> enabling prefetching at -O2 [when it is ready].
> 
> I really don't like the scheme of changing the optimisation threshold when
> profiling data is used.
> 
> I've seen too many reports and presentations by the uninitiated who believe
> that the use of profiling data has made the difference, when in reality
> it is just GCC changing behaviour on which passes run. It is very
> misleading!

OK, agree.  That line came from i386 backend.  I'll run benchmarks for enabling 
prefetching at -O2 at a later date, and, possibly, we will have a performance 
argument for reducing prefetch threshold when profile data is available.

> 
> With that line removed, and any rebasing needed over changes to the macro,
> I'm happy with this patch.

Attached is updated patch.  I'll commit it after bootstrap / regtest.

Thanks,

--
Maxim Kuvyrkov
www.linaro.org




0002-Enable-fprefetch-loop-arrays-at-given-optimization-l.patch
Description: Binary data

Re: [PATCH 6/6][AArch64] Update prefetch tuning parameters for falkor and qdf24xx tunings.

2017-06-09 Thread Maxim Kuvyrkov

> On Jun 8, 2017, at 7:32 PM, James Greenhalgh  wrote:
> 
> On Mon, Jan 30, 2017 at 03:08:04PM +0300, Maxim Kuvyrkov wrote:
>> This patch enables software prefetching at -O3 for Qualcomm's qdf24xx cores.
>> 
>> Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.
> 
> This patch is OK in whatever form it takes after rebasing for the macro
> in 4/6.

Here is updated patch for the record.

--
Maxim Kuvyrkov
www.linaro.org




0003-Update-prefetch-tuning-parameters-for-qdf24xx.patch
Description: Binary data

Re: [PATCH 0/6] Improve -fprefetch-loop-arrays in general and for AArch64 in particular

2017-06-09 Thread Maxim Kuvyrkov

> On May 28, 2017, at 8:01 AM, Andrew Pinski  wrote:
> 
> On Tue, Feb 28, 2017 at 1:53 AM, Maxim Kuvyrkov
>> 
...
>> Hi Kyrill,
>> 
>> My hope was to push them in time for GCC 7, but it seems to late now.  I'll 
>> return to these patches at the beginning of Stage 1.
> 
> Ping on this patch set as I really want to get in the prefetching side
> for ThunderX 1 and 2.  Or should I resubmit my patch set?

Hi Andrew,

James G. approved my prefetching patches, and I will be merging them today or 
on Monday.

I've rebase your ThunderX patch on top of my patchset.  I leave it to you to 
re-test and get it reviewed.

--
Maxim Kuvyrkov
www.linaro.org



0004-tFrom-Maxim-Kuvyrkov-maxim.kuvyrkov-linaro.org.patch
Description: Binary data

Re: [C++ PATCH] Fix -fsanitize={null,alignment} of references (PR c++/79572)

2017-11-24 Thread Maxim Kuvyrkov

On Thu, Mar 23, 2017 at 11:37 PM, Jakub Jelinek  wrote:
> Hi!
>
> Since late C++ folding has been committed, we don't sanitize some reference
> bindings to NULL.  Earlier we had always NOP_EXPR to REFERENCE_TYPE say from
> INTEGER_CST or whatever else, but cp_fold can now turn that right into
> INTEGER_CST with REFERENCE_TYPE.  The following patch sanitizes even those.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-03-23  Jakub Jelinek  
>
> PR c++/79572
> * c-ubsan.h (ubsan_maybe_instrument_reference): Change argument to
> tree *.
> * c-ubsan.c (ubsan_maybe_instrument_reference): Likewise.  Handle
> not just NOP_EXPR to REFERENCE_TYPE, but also INTEGER_CST with
> REFERENCE_TYPE.
>
> * cp-gimplify.c (cp_genericize_r): Sanitize INTEGER_CSTs with
> REFERENCE_TYPE.  Adjust ubsan_maybe_instrument_reference caller
> for NOP_EXPR to REFERENCE_TYPE.
>
> * g++.dg/ubsan/null-8.C: New test.
>
...
> --- gcc/testsuite/g++.dg/ubsan/null-8.C.jj  2017-03-23 09:42:31.664696676 
> +0100
> +++ gcc/testsuite/g++.dg/ubsan/null-8.C 2017-03-23 09:43:31.501908802 +0100
> @@ -0,0 +1,19 @@
> +// PR c++/79572
> +// { dg-do run }
> +// { dg-options "-fsanitize=null -std=c++14" }
> +// { dg-output "reference binding to null pointer of type 'const int'" }
> +
> +void
> +foo (const int &iref)
> +{
> +  if (&iref)
> +__builtin_printf ("iref %d\n", iref);
> +  else
> +__builtin_printf ("iref is NULL\n");

Hi Jakub,

Using __builtin_printf causes this test to fail sporadically when
cross-testing.  Stdout and stderr output can get mixed in
cross-testing, so dejagnu might see
==
g++.dg/ubsan/null-8.C:18:7: runtime error: reference binding to null
pointer of type iref is NULL
'const int'
==
instead of
==
g++.dg/ubsan/null-8.C:18:7: runtime error: reference binding to null
pointer of type 'const int'
iref is NULL
==

Is it essential for this testcase to use __builtin_printf or simple
"fprintf (stderr, ...)" would do just fine?

> +}
> +
> +int
> +main ()
> +{
> +  foo (*((int*) __null));
> +}

Regards,

-- 
Maxim Kuvyrkov

Re: C++ PATCH to primary_template_instantiation_p

2017-11-28 Thread Maxim Kuvyrkov


> On Nov 28, 2017, at 12:29 AM, Jason Merrill  wrote:
> 
> All the uses of primary_template_instantiation_p actually want to
> query whether the entity in question is a specialization of the
> template, not whether it's an instantiation or explicit
> specialization.
> 
> Tested x86_64-pc-linux-gnu, applying to trunk.
> 

Hi Jason,

I get the following failure with the new test on x86_64-linux-gnu and 
aarch64-linux-gnu:

> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg2a.C
> @@ -0,0 +1,27 @@
> +// PR c++/46831
> +// { dg-do compile { target c++11 } }
> +// { dg-options "" }
> +
> +struct B { };
> +struct D : B { };
> +struct A {
> +  template operator D&(); // { dg-message "template 
> conversion" }
> +  operator long();
> +};
> +
> +template <> A::operator D&();

"Template conversion" warning is triggered on this line, rather than above.

> +
> +void f(long);
> +void f(B&);
> +
> +struct A2 {
> +  template operator B&();
> +};
> +
> +void f2(const B&);
> +
> +int main() {
> +  f(A());
> +  f2(A2());
> +  f2(A());   // { dg-error "" }
> +}
> 

Would you please take a look?

===
spawn -ignore SIGHUP 
/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/_build/builds/x86_64-unknown-linux-gnu/x86_64-unknown-linux-gnu/gcc.git~master-stage2/gcc/testsuite/g++5/../../xg++
 
-B/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/_build/builds/x86_64-unknown-linux-gnu/x86_64-unknown-linux-gnu/gcc.git~master-stage2/gcc/testsuite/g++5/../../
 
/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg2a.C
 -fno-diagnostics-show-caret -fdiagnostics-color=never -nostdinc++ 
-I/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/_build/builds/x86_64-unknown-linux-gnu/x86_64-unknown-linux-gnu/gcc.git~master-stage2/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu
 
-I/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/_build/builds/x86_64-unknown-linux-gnu/x86_64-unknown-linux-gnu/gcc.git~master-stage2/x86_64-unknown-linux-gnu/libstdc++-v3/include
 
-I/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/libstdc++-v3/libsupc++
 
-I/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/libstdc++-v3/include/backward
 
-I/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/libstdc++-v3/testsuite/util
 -fmessage-length=0 -std=gnu++11 -S -o fntmpdefarg2a.s
/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg2a.C:
 In function 'int main()':
/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg2a.C:26:6:
 error: invalid user-defined conversion from 'A' to 'const B&' [-fpermissive]
/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg2a.C:12:13:
 note: candidate is: 'A::operator D&() [with T = void]' 
/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg2a.C:12:13:
 note:   conversion from return type 'D&' of template conversion function 
specialization to 'const B&' is not an exact match
/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg2a.C:21:6:
 note:   initializing argument 1 of 'void f2(const B&)'
compiler exited with status 1
FAIL: g++.dg/cpp0x/fntmpdefarg2a.C  -std=gnu++11  (test for warnings, line 8)
PASS: g++.dg/cpp0x/fntmpdefarg2a.C  -std=gnu++11  (test for errors, line 26)
PASS: g++.dg/cpp0x/fntmpdefarg2a.C  -std=gnu++11 (test for excess errors)
===

Regards,

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH] PR84068: Fix sort order of SCHED_PRESSURE_MODEL

2018-01-31 Thread Maxim Kuvyrkov

> On Jan 31, 2018, at 4:33 PM, Wilco Dijkstra  wrote:
> 
> Richard Sandiford wrote:
> 
>> This was the original intent, but was changed in r213708.  TBH I'm not
>> sure what the second hunk in that revision fixed, since model_index is
>> supposed to return an index greater than all valid indices when passed
>> an instruction outside the current block.  Maxim, do you remember?
> 
> See https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00932.html, it says:
> 
> "The second one is to account for the fact that model_index() of two 
> instructions
> is meaningful only when both instructions are in the current basic block."
> 
> Unless something has changed, I'm assuming that's still true today. Maybe
> the underlying idea was to allow interleaving of instructions outside the
> current block,

Yes.

> but that isn't feasible if you want a well-defined sort ordering.

Agree.

Wilco, your patch looks good.  Thanks for fixing this!

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH] PR84068: Fix sort order of SCHED_PRESSURE_MODEL

2018-02-07 Thread Maxim Kuvyrkov

> On Feb 2, 2018, at 7:40 PM, Wilco Dijkstra  wrote:
> 
> Right, so here is version 2 which ends up much simpler:
> 
> The comparison function for SCHED_PRESSURE_MODEL is incorrect.  If either
> instruction is not in target_bb, the ordering is not well defined.  
> Since all instructions outside the target_bb get the highest model_index,
> all we need to do is sort on model_index.  If the model_index is the same
> we defer to RFS_DEP_COUNT and/or RFS_TIE.
> 
> Bootstrap OK, OK for commit?

Looks good to me.

--
Maxim Kuvyrkov
www.linaro.org

> 
> ChangeLog:
> 2018-02-02  Wilco Dijkstra  
> 
>   PR rlt-optimization/84068
>   * haifa-sched.c (rank_for_schedule): Fix SCHED_PRESSURE_MODEL sorting.
> 
>   PR rlt-optimization/84068
>   * gcc.dg/pr84068.c: New test.
> --
> 
> diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
> index 
> ebdec46bf04f1ba07e8b70607602a3bc9e7ec7de..4a899b56173dd7d67db084ed86d24ad865ccadcd
>  100644
> --- a/gcc/haifa-sched.c
> +++ b/gcc/haifa-sched.c
> @@ -2783,12 +2783,11 @@ rank_for_schedule (const void *x, const void *y)
> }
> 
>   /* Prefer instructions that occur earlier in the model schedule.  */
> -  if (sched_pressure == SCHED_PRESSURE_MODEL
> -  && INSN_BB (tmp) == target_bb && INSN_BB (tmp2) == target_bb)
> +  if (sched_pressure == SCHED_PRESSURE_MODEL)
> {
>   diff = model_index (tmp) - model_index (tmp2);
> -  gcc_assert (diff != 0);
> -  return rfs_result (RFS_PRESSURE_INDEX, diff, tmp, tmp2);
> +  if (diff != 0)
> + return rfs_result (RFS_PRESSURE_INDEX, diff, tmp, tmp2);
> }
> 
>   /* Prefer the insn which has more later insns that depend on it.
> diff --git a/gcc/testsuite/gcc.dg/pr84068.c b/gcc/testsuite/gcc.dg/pr84068.c
> new file mode 100644
> index 
> ..13110d84455f20edfc50f09efe4074721bd6a7d0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr84068.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-sched-critical-path-heuristic 
> -fno-sched-rank-heuristic --param=max-sched-extend-regions-iters=5 --param 
> sched-pressure-algorithm=2" } */
> +
> +#ifdef __SIZEOF_INT128__
> +typedef __int128 largeint;
> +#else
> +typedef long long largeint;
> +#endif
> +
> +largeint a;
> +int b;
> +
> +largeint
> +foo (char d, short e, int f)
> +{
> +  b = __builtin_sub_overflow_p (b, 1, (unsigned long)0);
> +  return a + f;
> +}

Re: [PATCH][arm] PR target/82518: Return false in ARRAY_MODE_SUPPORTED_P for BYTES_BIG_ENDIAN

2018-04-25 Thread Maxim Kuvyrkov

> On Mar 20, 2018, at 8:11 PM, Kyrill Tkachov  
> wrote:
> 
> Hi all,
> 
> This PR shows that we get the load/store_lanes logic wrong for arm big-endian.
> It is tricky to get right. Aarch64 does it by adding the appropriate 
> lane-swapping
> operations during expansion.
> 
> I'd like to do the same on arm eventually, but we'd need to port and validate 
> the VTBL-generating
> code and add it to all the right places and I'm not comfortable enough doing 
> it for GCC 8, but I am keen
> in getting the wrong-code fixed.
> As I say in the PR, vectorisation on armeb is already severely restricted (we 
> disable many patterns on BYTES_BIG_ENDIAN)
> and the load/store_lanes patterns really were not working properly at all, so 
> disabling them is not
> a radical approach.
> 
> The way to do that is to return false in ARRAY_MODE_SUPPORTED_P for 
> BYTES_BIG_ENDIAN.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> Also tested on armeb-none-eabi.
> 
> Committing to trunk.
...
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -6609,7 +6609,8 @@ proc check_effective_target_vect_load_lanes { } {
>   verbose "check_effective_target_vect_load_lanes: using cached result" 2
>  } else {
>   set et_vect_load_lanes 0
> - if { ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok])
> + # We don't support load_lanes correctly on big-endian arm.
> + if { ([istarget arm-*-*] && [check_effective_target_arm_neon_ok])
>|| [istarget aarch64*-*-*] } {
>   set et_vect_load_lanes 1
>   }
> 

Hi Kyrill,

This part makes armv8l-linux-gnueabihf target fail a few of slp-perm-* tests.  
Using [check_effective_target_arm_little_endian] should fix this.

Would you please fix this on master and gcc-7-branch?

Thanks!

--
Maxim Kuvyrkov
www.linaro.org

Re: [Libgomp, Fortran] Fix canadian cross build

2018-05-23 Thread Maxim Kuvyrkov

> On Jun 23, 2017, at 4:44 PM, Yvan Roux  wrote:
> 
> Hello,
> 
> Fortran parts of libgomp (omp_lib.mod, openacc.mod, etc...) are
> missing in a canadian cross build, at least when target gfortran
> compiler comes from PATH and not from GFORTRAN_FOR_TARGET.
> 
> Back in 2010, executability test of GFORTRAN was added to fix libgomp
> build on cygwin, but when the executable doesn't contain the path,
> "test -x" fails and part of the library are not built.
> 
> This patch fixes the issue by using M4 macro AC_PATH_PROG (which
> returns the absolute name) instead of AC_CHECK_PROG in the function
> defined in config/acx.m4: NCN_STRICT_CHECK_TARGET_TOOLS.  I renamed it
> into NCN_STRICT_PATH_TARGET_TOOLS to keep the semantic used in M4.
> 
> Tested by building cross and candian cross toolchain (host:
> i686-w64-mingw32) for arm-linux-gnueabihf with issue and with a
> complete libgomp.
> 
> ok for trunk ?

Hi Yvan,

The patch looks OK, but it is a pain to review.  Would you please split it into 
2 patches: one for the mechanical renames, and one for logical changes to 
acx.m4?  This should allow Paolo and DJ to approve your patch.

Thanks!

> 
> Thanks
> Yvan
> 
> config/ChangeLog
> 2017-06-23  Yvan Roux  
> 
>* acx.m4 (NCN_STRICT_CHECK_TARGET_TOOLS): Renamed to ...
>(NCN_STRICT_PATH_TARGET_TOOLS): ... this.  It reflects the replacement
>of AC_CHECK_PROG by AC_PATH_PROG to get the absolute name of the
>program.
>(ACX_CHECK_INSTALLED_TARGET_TOOL): Use renamed function.
> 
> ChangeLog
> 2017-06-23  Yvan Roux  
> 
>* configure.ac: Use NCN_STRICT_PATH_TARGET_TOOLS instead of
>NCN_STRICT_CHECK_TARGET_TOOLS.
>* configure: Regenerate.
> 

--
Maxim Kuvyrkov
www.linaro.org

Re: Cleanup Linux libc selection and Android support

2013-11-05 Thread Maxim Kuvyrkov

On 19/09/2013, at 8:26 am, Maxim Kuvyrkov  wrote:

> Following recent breakage caused by adding nominal Android support to all 
> *linux* targets [*] this patch series cleans up libc selection for Linux 
> targets (-mglibc/-muclibc/-mbionic), splits libc selection logic from Android 
> support, and removes Android handling from targets that don't support it.

Ping.  Anyone wants to review these cleanup patches to config.gcc, config/t-* 
and config/*.h files, or should I just start committing them quietly :-P ?

Thanks,

--
Maxim Kuvyrkov
www.kugelworks.com



> 
> [*] http://thread.gmane.org/gmane.comp.gcc.patches/277430/focus=292362
> 
> Special thanks goes to Alexander who tested and reviewed initial versions of 
> these patches and fixed several problems.
> 
> The patch series was tested on various Linux and uClinux targets including 
> arm, bfin, c6x, m68k, mips, powerpc, x86, x86_64.
> 
> Patches will be posted in their separate threads, and below is a summary.  
> Individually the patches are all borderline trivial.
> 
> Reviews and approvals are welcome!
> 
> - 0001-Rename-files-for-libc-selection-on-Linux-targets
> Mechanical rename of files in preparation for splitting Android handling from 
> libc selection.
> 
> - 0002-Rename-functions-relating-to-libc-support-on-Linux-t
> Mechanical rename of functions.
> 
> - 0003-Robustify-check-for-IFUNC-support
> Trivial fix.
> 
> - 0004-Cleanup-definitions-of-libc-related-target-hooks
> Consolidate definitions of libc target hooks in linux.h
> 
> - 0005-Cleanup-libc-selection-and-Android-support
> Split Android handling from libc selection and remove Android handling from 
> targets that don't support it.
> 
> Thanks,
> 
> --
> Maxim Kuvyrkov
> www.kugelworks.com
> 
> 
>

[PATCH] Fix race in libstdc++ testsuite

2014-10-16 Thread Maxim Kuvyrkov

Hi,

This patch fixes a race in libstdc++ testsuite.  When we split libstdc++'s 
testsuite into several parallel instances, it may occur that testing for 
features (check_v3_target_fileio in this case) is done in two sub-processes at 
the same time.  The race occurs on the cin_unget-1.txt data file, with one 
thread overwriting the file that was created by another thread.

The fix is to use unique name cin_unget-1-[pid].txt for the data file.

Tested on x86_64-linux-gnu.

OK to apply?

--
Maxim Kuvyrkov
www.linaro.org




0001-Fix-race-in-libstdc-testsuite.ChangeLog
Description: Binary data


0001-Fix-race-in-libstdc-testsuite.patch
Description: Binary data

[PATCH] Account for prologue spills in reg_pressure scheduling

2014-10-19 Thread Maxim Kuvyrkov

Hi,

This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGHTED 
and SCHED_PRESSURE_MODEL) to better estimate number of available registers.

At the moment the scheduler does not account for spills in the prologues and 
restores in the epilogue, which occur from use of call-used registers.  The 
current state is, essentially, optimized for case when there is a hot loop 
inside the function, and the loop executes significantly more often than the 
prologue/epilogue.  However, on the opposite end, we have a case when the 
function is just a single non-cyclic basic block, which executes just as often 
as prologue / epilogue, so spills in the prologue hurt performance as much as 
spills in the basic block itself.  In such a case the scheduler should 
throttle-down on the number of available registers and try to not go beyond 
call-clobbered registers.

The patch uses basic block frequencies to balance the cost of using call-used 
registers for intermediate cases between the two above extremes.

The motivation for this patch was a floating-point testcase on 
arm-linux-gnueabihf (ARM is one of the few targets that use register pressure 
scheduling by default).

A "thanks" goes to Richard good discussion of the problem and suggestions on 
the approach to fix it.

The patch was bootstrapped on x86_64-linux-gnu (which doesn't really exercises 
the patch), and cross-tested on arm-linux-gnueabihf and aarch64-linux-gnu.

OK to apply?

--
Maxim Kuvyrkov
www.linaro.org




0001-sched_class_reg_num.ChangeLog
Description: Binary data


0001-sched_class_reg_num.patch
Description: Binary data

[PATCH] Improve scheduler dumps of ready list

2014-10-20 Thread Maxim Kuvyrkov

Hi,

Following previous improvement to scheduler dumps that provided insight into 
which heuristics in rank_for_schedule make most decisions, this patch adds 
print outs that show the deciding reason for an instruction in the ready list 
to be at its particular place.

This patch allowed me to troubleshoot several scheduling problems in the 
register pressure scheduling.

Tested on x86_64-linux-gnu, arm-linux-gnueabihf and aarch64-linux-gnu.

OK to apply?

Thank you, 

--
Maxim Kuvyrkov
www.linaro.org




0002-last_rfs_win.ChangeLog
Description: Binary data


0002-last_rfs_win.patch
Description: Binary data

Re: [PATCH] Account for prologue spills in reg_pressure scheduling

2014-10-20 Thread Maxim Kuvyrkov

On Oct 21, 2014, at 8:11 AM, Sebastian Pop  wrote:

> Maxim Kuvyrkov wrote:
>> Hi,
>> 
>> This patch improves register pressure scheduling (both 
>> SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number 
>> of available registers.
>> 
>> At the moment the scheduler does not account for spills in the prologues and 
>> restores in the epilogue, which occur from use of call-used registers.  The 
>> current state is, essentially, optimized for case when there is a hot loop 
>> inside the function, and the loop executes significantly more often than the 
>> prologue/epilogue. However, on the opposite end, we have a case when the 
>> function is just a single non-cyclic basic block, which executes just as 
>> often as prologue / epilogue, so spills in the prologue hurt performance as 
>> much as spills in the basic block itself.  In such a case the scheduler 
>> should throttle-down on the number of available registers and try to not go 
>> beyond call-clobbered registers.
>> 
>> The patch uses basic block frequencies to balance the cost of using 
>> call-used registers for intermediate cases between the two above extremes.
>> 
>> The motivation for this patch was a floating-point testcase on 
>> arm-linux-gnueabihf (ARM is one of the few targets that use register 
>> pressure scheduling by default).
>> 
> 
> Does aarch64 enable reg pressure sched by default, or what is the flag to 
> enable it?
> I'm planing to look at the perf impact of the patch.

Thanks, benchmarking results are welcome!  AArch64 doesn't use reg_pressure 
scheduling by default.  Use "-fsched-pressure 
--param=sched-pressure-algorithm=2" to enable same thing as on ARM.  I would 
imagine C++ and Fortran floating-point code to be most affected.

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH] Account for prologue spills in reg_pressure scheduling

2014-10-20 Thread Maxim Kuvyrkov

[Adding ARM maintainers to CC]

On Oct 21, 2014, at 9:44 AM, Sebastian Pop  wrote:

> Hi Maxim,
> 
> Maxim Kuvyrkov wrote:
>> Thanks, benchmarking results are welcome!  AArch64 doesn't use reg_pressure
>> scheduling by default.  Use "-fsched-pressure
>> --param=sched-pressure-algorithm=2" to enable same thing as on ARM.  I would
>> imagine C++ and Fortran floating-point code to be most affected.
> 
> On aarch64 I only see perf improvements with your patch: no perf degradations 
> on
> all the tests that I have run.
> 
> base0: r216447, -O3
> base1: r216447, -O3 -fsched-pressure --param=sched-pressure-algorithm=2
> patch: r216447 + your patch, -O3 -fsched-pressure 
> --param=sched-pressure-algorithm=2
> 
> patch vs. base1 is only an improvement.
> 
> base1 vs. base0 has a few good improvements, and some small degradations: your
> patch improves the perf for one of the degradations to the point it is better
> now with -fsched-pressure --param=sched-pressure-algorithm=2 than at -O3.
> 
> Could we turn on "-fsched-pressure --param=sched-pressure-algorithm=2" by
> default for aarch64?

These are great results, yay!  Sebastian, what benchmarks did you run?

We need to see improvements on spec2k / spec2k6 to enable register-pressure 
scheduling on AArch64 by default.  The current understanding is that AArch64 
has enough registers to not benefit from pressure-aware scheduling.  On the 
other hand, one could argue that cores with more complex pipelines (e.g., A57) 
might not benefit from pipeline-oriented scheduling either, and, therefore, 
scheduling for register pressure can provide a better win.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH] Account for prologue spills in reg_pressure scheduling

2014-10-20 Thread Maxim Kuvyrkov

On Oct 21, 2014, at 10:39 AM, Ramana Radhakrishnan  
wrote:

> On Mon, Oct 20, 2014 at 10:17 PM, Richard Sandiford
>  wrote:
>> Maxim Kuvyrkov  writes:
>>> [Adding ARM maintainers to CC]
>>> 
>>> On Oct 21, 2014, at 9:44 AM, Sebastian Pop  wrote:
>>> 
>>>> Hi Maxim,
>>>> 
>>>> Maxim Kuvyrkov wrote:
>>>>> Thanks, benchmarking results are welcome!  AArch64 doesn't use 
>>>>> reg_pressure
>>>>> scheduling by default.  Use "-fsched-pressure
>>>>> --param=sched-pressure-algorithm=2" to enable same thing as on ARM.  I 
>>>>> would
>>>>> imagine C++ and Fortran floating-point code to be most affected.
>>>> 
>>>> On aarch64 I only see perf improvements with your patch: no perf
>>>> degradations on
>>>> all the tests that I have run.
>>>> 
>>>> base0: r216447, -O3
>>>> base1: r216447, -O3 -fsched-pressure --param=sched-pressure-algorithm=2
>>>> patch: r216447 + your patch, -O3 -fsched-pressure
>>>> --param=sched-pressure-algorithm=2
>>>> 
>>>> patch vs. base1 is only an improvement.
>>>> 
>>>> base1 vs. base0 has a few good improvements, and some small degradations: 
>>>> your
>>>> patch improves the perf for one of the degradations to the point it is 
>>>> better
>>>> now with -fsched-pressure --param=sched-pressure-algorithm=2 than at -O3.
>>>> 
>>>> Could we turn on "-fsched-pressure --param=sched-pressure-algorithm=2" by
>>>> default for aarch64?
> 
> We already have sched-pressure --param=sched-pressure-algorithm=1 on
> by default in the AArch64 backend from September.
> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01663.html went in a few
> days back.

Oh, good, I didn't notice AArch64 using sched-pressure because of weighted 
algorithm.  This patch affects both "weighted" and "model" algorithm, so it 
doesn't necessarily makes "model" algorithm better than "weighted".  That said, 
my personal preference is for both AArch32 and AArch64 to use "model" 
algorithm, but, as you said, we need benchmarking data to make that decision.

...
>  We can run this and let
> you know the results, though SPECFP2k6 takes quite a while - are all
> your patches to sched-pressure now done ?

Wait on benchmarking for 1-2 days.  The sched-pressure patches are done now, 
but there are more patches for the 2nd scheduler pass in the queue (from 
linaro-dev/sched-model-prefetch branch).  Hopefully those will be posted here 
today.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

[PATCH 3/8] Remove cached_first_cycle_multipass_dfa_lookahead and cached_issue_rate

2014-10-20 Thread Maxim Kuvyrkov

Hi,

This patch cleans up (removes) cached_first_cycle_multipass_dfa_lookahead and 
cached_issue_rate.

These seem to be an artifact from the scheduler refactoring 10+ years ago.  
They assume that dfa_lookahead and issue_rate can change mid-way through 
scheduling, which is never the case.  All backends currently treat 
dfa_lookahead and issue_rate as constants for the duration of scheduling passes.

Bootstrapped on x86_64-linux-gnu.  Regression testing is in progress.  OK to 
commit if no regressions?

[ChangeLog is part of the git patch]

--
Maxim Kuvyrkov
www.linaro.org




0003-Remove-cached_first_cycle_multipass_dfa_lookahead-an.patch
Description: Binary data

[PATCH 4/8] Disable max_issue when scheduling for register pressure

2014-10-20 Thread Maxim Kuvyrkov

Hi,

This patch disables max_issue-based lookahead multipass scheduling when 
register-pressure heuristics are active.  The two approaches tend to undo each 
others decisions and don't play well together.

Currently this patch is a no-op, since no target uses both max_issue and 
register-pressure scheduling.  This will change soon with upcoming patches that 
will enable max_issue during 2nd scheduling pass for ARM and AArch64.

Bootstrapped on x86_64-linux-gnu.  Regression testing is in progress.  OK to 
commit if no regressions?

[ChangeLog is part of the git patch]

Thank you,

--
Maxim Kuvyrkov
www.linaro.org




0004-Disable-max_issue-when-scheduling-for-register-press.patch
Description: Binary data

[PATCH 5/8] Enable max_issue for AArch32 and AArch64

2014-10-20 Thread Maxim Kuvyrkov

Hi Ramana,
Hi Marcus,

This patch enables max_issue multipass lookahead scheduling for 2nd scheduler 
pass (or, more pedantically, whenever register-pressure scheduling is not in 
use).

Multipass lookahead scheduling is being enabled for cores that can issue 2 or 
more instructions per cycle, and it allows scheduler to better exploit 
multi-issue pipelines.  This patch also provides foundation for [upcoming] 
auto-prefetcher model in the scheduler, which is handled via max_issue.

This change requires benchmarking, which I can't easily do at the moment.  I 
would appreciate any benchmarking results that you can share.

Bootstrap on aarch64-linux-gnu is in progress.  OK to apply, provided no 
performance or correctness regressions?

Thank you,

--
Maxim Kuvyrkov
www.linaro.org




0005-Enable-max_issue-for-AArch32-and-AArch64.patch
Description: Binary data

[PATCH 6/8] Handle SCRATCH in decompose_address

2014-10-20 Thread Maxim Kuvyrkov

Hi,

This patch is a simple fix to allow decompose_address to handle SCRATCH'es 
during 2nd scheduler pass. This patch is a prerequisite for a scheduler 
improvement that relies on decompose_address to parse insns.

Bootstrapped and regtested on x86_64-linux-gnu and regtested on 
arm-linux-gnueabihf and aarch64-linux-gnu.

OK to apply?

Thank you,

[ChangeLog is part of the git patch]

--
Maxim Kuvyrkov
www.linaro.org




0006-Handle-SCRATCH-in-decompose_address.patch
Description: Binary data

[PATCH 7/8] Model cache auto-prefetcher in scheduler

2014-10-20 Thread Maxim Kuvyrkov

Hi,

This patch adds auto-prefetcher modeling to GCC scheduler.  The auto-prefetcher 
model is currently enabled only for ARM Cortex-A15, since this is the only CPU 
that I know of to have the hardware auto-prefetcher unit.

The documentation on the auto-prefetcher is very sparse, and all I have are my 
empirical studies and a short note in Cortex-A15 manual (search for "L2 cache 
auto-prefether").  This patch, therefore, implements a very abstract model that 
makes scheduler prefer "mem_op (base+8); mem_op (base+12)" over "mem_op 
(base+12); mem_op (base+8)".  In other words, memory operations are tried to be 
issued in order of increasing memory offsets.

The auto-prefetcher model implementation is based on max_issue mutlipass 
lookahead scheduling, and its "guard" hook.  The guard hook examines contents 
of the ready list and the queue, and, if it finds instructions with lower 
memory offsets, marks instructions with higher memory offset as unavailable for 
immediate scheduling.

This patch has been in works since beginning of the year, and many of my 
previous scheduler cleanup patches were to prepare the infrastructure for this 
feature. 

Ramana, this change requires benchmarking, which I can't easily do at the 
moment.  I would appreciate any benchmarking results that you can share.  In 
particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH needs to be 
tuned/confirmed for Cortex-A15.

At the moment the parameter is set to "2", which means that the autopref model 
will look through ready list and 1-stall queue in search of relevant 
instructions.  Values of -1 (disable autopref), 0 (use autopref only in 
rank_for_schedule), 1 (look through ready list), 2 (look through ready list and 
1-stall queue), and 3 (look through ready list and 2-stall queue) should be 
considered and benchmarked.

Bootstrapped on x86_64-linux-gnu and regtested on arm-linux-gnueaihf and 
aarch64-linux-gnu.  OK to apply, provided no performance or correctness 
regressions?

[ChangeLog is part of the git patch]

Thank you,

--
Maxim Kuvyrkov
www.linaro.org




0007-Model-cache-auto-prefetcher-in-scheduler.patch
Description: Binary data

[PATCH 8/8] Use rank_for_schedule to as tie-breaker in model_order_p

2014-10-20 Thread Maxim Kuvyrkov

Hi,

This patch improves model_order_p to use non-reg-pressure version of 
rank_for_schedule when it needs to break the tie.  At the moment it is 
comparing INSN_PRIORITY by itself, and it seems prudent to outsource that to 
rank_for_schedule.

Bootstrapped on x86_64-linux-gnu and regtested on arm-linux-gnueabihf and 
aarch64-linux-gnu.

OK to apply?

Thank you,

--
Maxim Kuvyrkov
www.linaro.org




0008-Use-rank_for_schedule-to-as-tie-breaker-in-model_ord.patch
Description: Binary data

Re: [PATCH 8/8] Use rank_for_schedule to as tie-breaker in model_order_p

2014-10-21 Thread Maxim Kuvyrkov

On Oct 21, 2014, at 9:11 PM, Richard Sandiford  
wrote:

> Maxim Kuvyrkov  writes:
>> This patch improves model_order_p to use non-reg-pressure version of
>> rank_for_schedule when it needs to break the tie.  At the moment it is
>> comparing INSN_PRIORITY by itself, and it seems prudent to outsource
>> that to rank_for_schedule.
> 
> Do you have an example of where this helps?  A possible danger is that
> rank_for_schedule might (indirectly) depend on state that isn't maintained
> or updated in the same way during the model schedule phase.

I don't have an example where this patch helps, and I consider this patch a 
general cleanup.  From what I can see, all scheduler data structures are 
maintained during reg_sched_model scheduling.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH 8/8] Use rank_for_schedule to as tie-breaker in model_order_p

2014-10-21 Thread Maxim Kuvyrkov

On Oct 22, 2014, at 9:24 AM, Richard Sandiford  
wrote:

> Maxim Kuvyrkov  writes:
>> On Oct 21, 2014, at 9:11 PM, Richard Sandiford
>>  wrote:
>> 
>>> Maxim Kuvyrkov  writes:
>>>> This patch improves model_order_p to use non-reg-pressure version of
>>>> rank_for_schedule when it needs to break the tie.  At the moment it is
>>>> comparing INSN_PRIORITY by itself, and it seems prudent to outsource
>>>> that to rank_for_schedule.
>>> 
>>> Do you have an example of where this helps?  A possible danger is that
>>> rank_for_schedule might (indirectly) depend on state that isn't maintained
>>> or updated in the same way during the model schedule phase.
>> 
>> I don't have an example where this patch helps, and I consider this
>> patch a general cleanup.  From what I can see, all scheduler data
>> structures are maintained during reg_sched_model scheduling.
> 
> I'm not really sure it's a cleanup though.  The code isn't aesthetically
> cleaner because of the way it has to switch the global sched_pressure
> variable for the duration of the call.  And it doesn't really seem
> conceptually cleaner.  The model ordering was deliberately simple
> because the model schedule doesn't take pipeline characteristics like
> issue delays or unit conflicts into account; all it's doing is trying to
> minimise the register pressure.  It isn't obvious to me that all the
> extra stuff that rank_for_schedule does would make sense in that context
> (i.e. that it would tend to reduce pressure compared to the current test,
> rather than increase it).  It also looks like using rank_for_schedule would
> increase compile time.
> 
> So personally I'd rather keep things the way they are unless we have
> a motivating example.

I will not be pushing for this particular change, I'm fine with dropping the 
patch if current code looks more aesthetically pleasing.

Thanks,

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH] Account for prologue spills in reg_pressure scheduling

2014-10-21 Thread Maxim Kuvyrkov

On Oct 22, 2014, at 4:24 AM, Vladimir Makarov  wrote:

> On 10/20/2014 02:57 AM, Maxim Kuvyrkov wrote:
>> Hi,
>> 
>> This patch improves register pressure scheduling (both 
>> SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number 
>> of available registers.
>> 
>> At the moment the scheduler does not account for spills in the prologues and 
>> restores in the epilogue, which occur from use of call-used registers.  The 
>> current state is, essentially, optimized for case when there is a hot loop 
>> inside the function, and the loop executes significantly more often than the 
>> prologue/epilogue.  However, on the opposite end, we have a case when the 
>> function is just a single non-cyclic basic block, which executes just as 
>> often as prologue / epilogue, so spills in the prologue hurt performance as 
>> much as spills in the basic block itself.  In such a case the scheduler 
>> should throttle-down on the number of available registers and try to not go 
>> beyond call-clobbered registers.
>> 
>> The patch uses basic block frequencies to balance the cost of using 
>> call-used registers for intermediate cases between the two above extremes.
>> 
>> The motivation for this patch was a floating-point testcase on 
>> arm-linux-gnueabihf (ARM is one of the few targets that use register 
>> pressure scheduling by default).
>> 
>> A "thanks" goes to Richard good discussion of the problem and suggestions on 
>> the approach to fix it.
>> 
>> The patch was bootstrapped on x86_64-linux-gnu (which doesn't really 
>> exercises the patch), and cross-tested on arm-linux-gnueabihf and 
>> aarch64-linux-gnu.
>> 
>> OK to apply?
>> 
> It is a pretty interesting idea for heuristic, Maxim.
> 
> But I don't understand the following loop:
> 
> +  for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
> + if (call_used_regs[i])
> +   for (c = 0; c < ira_pressure_classes_num; ++c)
> + {
> +   enum reg_class cl = ira_pressure_classes[c];
> +   if (ira_class_hard_regs[cl][i])
> + ++call_used_regs_num[cl];
> 
> 
> ira_class_hard_regs[cl] is array containing hard registers belonging to
> class CL.  So if GENERAL_REGS consists of hard regs 0..3, 12..15,  the
> array will contain 8 elements 0..3, 12..15.  The array size is defined
> by ira_class_hard_regs_num[cl].  So the index is order number of hard
> reg in the class (starting from 0) but not hard register number itself. 
> Also the pressure classes never intersect so you can stop the inner loop
> when you find class to which hard reg belongs to.

Thanks for spotting this.  Indeed, this is a bug, but it still happened to 
correctly calculate numbers of call-used register for ARM (where I debugged the 
implementation).

> 
> I believe you should rewrite the code and get performance results again
> to get an approval.

Sebastian, could you run the geekbench again to make sure you see same 
performance numbers?

>   You also missed the changelog.
> 

The changelog was in the separate file.  Also attached here with the fixed 
patch.  Bootstrapped on x86_64-linux-gnu.  Bootstrap and regtest on 
arm-linux-gnueabihf is in progress.

--
Maxim Kuvyrkov
www.linaro.org




0001-sched_class_reg_num.ChangeLog
Description: Binary data


0001-sched_class_reg_num.patch
Description: Binary data

Re: [PATCH 5/8] Enable max_issue for AArch32 and AArch64

2014-10-22 Thread Maxim Kuvyrkov

On Oct 23, 2014, at 11:42 AM, Sebastian Pop  wrote:

> Sebastian Pop wrote:
>> Maxim Kuvyrkov wrote:
>>> This change requires benchmarking, which I can't easily do at the moment.  I
>>> would appreciate any benchmarking results that you can share.
>> 
>> I will run my testsuite on aarch64.
>> Do you need the perf for all the previous patches 1 to 5 together, or just 
>> for patch 5?
> 
> I ran your patch 5 on top of r216539 at -O3 and there were not many 
> differences
> in performance, and of very little amplitude.  Some small speedups, and some
> small slowdowns.  Let me know if you want a reduced testcase for the most
> significative slowdown.

Hi Sebastian,

This is, pretty much, what I expected.  This patch enables infrastructure for 
patch 7, and, by itself, was not supposed to give measurable speed ups.  Patch 
7 should improve performance for Cortex-A15 (and any other cores that have 
similar auto-prefetcher hardware).

Thank you for the benchmarking!

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH 6/8] Handle SCRATCH in decompose_address

2014-10-22 Thread Maxim Kuvyrkov

On Oct 23, 2014, at 9:02 AM, Jeff Law  wrote:

> On 10/20/14 21:35, Maxim Kuvyrkov wrote:
>> Hi,
>> 
>> This patch is a simple fix to allow decompose_address to handle
>> SCRATCH'es during 2nd scheduler pass. This patch is a prerequisite
>> for a scheduler improvement that relies on decompose_address to parse
>> insns.
>> 
>> Bootstrapped and regtested on x86_64-linux-gnu and regtested on
>> arm-linux-gnueabihf and aarch64-linux-gnu.
> I'd like to see some further discussion here.
> 
> get_base_term is supposed to look at its argument as a base address. I'm 
> curious under what circumstances you want to have a SCRATCH as a base address?
> 
> I didn't see anything in patch #8 which obviously dependended on this, but 
> maybe it's in there, but more subtle than expected.
> 
> If you can justify why it's useful to handle scratch in here, then the patch 
> will be fine.

Without this patch decompose_address() ICEs during second scheduler pass on 
prologue instructions that usually have "(clobber (mem:BLK (scratch))".  The 
only reason for this patch is to prevent that fault and enable use of 
decompose_address during 2nd scheduler pass.

Does this answer your question, or are you looking for a more in-depth reason?

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH] Account for prologue spills in reg_pressure scheduling

2014-10-22 Thread Maxim Kuvyrkov

On Oct 23, 2014, at 3:45 AM, Vladimir Makarov  wrote:

...
> Richard proposed 2-loops solution instead of 3.  So I'd definitely prefer his.
> 
> The patch is ok with Richard's 2-loops proposal.  Thanks, Maxim.  As I wrote 
> the heuristic is an interesting one.

This is the patch I'm going to commit after some more testing.  Thank you both 
reviews!

--
Maxim Kuvyrkov
www.linaro.org



0001-sched_class_reg_num.ChangeLog
Description: Binary data


0001-sched_class_reg_num.patch
Description: Binary data

Re: [PATCH 3/8] Remove cached_first_cycle_multipass_dfa_lookahead and cached_issue_rate

2014-10-22 Thread Maxim Kuvyrkov

On Oct 22, 2014, at 4:53 AM, Vladimir Makarov  wrote:

> On 10/20/2014 11:16 PM, Maxim Kuvyrkov wrote:
>> Hi,
>> 
>> This patch cleans up (removes) cached_first_cycle_multipass_dfa_lookahead 
>> and cached_issue_rate.
>> 
>> These seem to be an artifact from the scheduler refactoring 10+ years ago.  
>> They assume that dfa_lookahead and issue_rate can change mid-way through 
>> scheduling, which is never the case.  All backends currently treat 
>> dfa_lookahead and issue_rate as constants for the duration of scheduling 
>> passes.
>> 
>> Bootstrapped on x86_64-linux-gnu.  Regression testing is in progress.  OK to 
>> commit if no regressions?
>> 
> Yes.  the patch for issue rate itself is ok but you should have modified
> doc/tm.texi too for dfa lookahead hook saying that it should be a
> constant (issue rate already has such clause).
> 
> On the other hand I'd not assume that dfa look ahead is a constant.  In
> future we could make it non-constant to differentiate non-hot and hot
> functions to speed up the scheduler as dfa look ahead scheduling is
> pretty expensive.
> 
> So issue rate change is ok but I'd not rush to change dfa look ahead
> related code.

It is trivial to prove that currently 
cached_first_cycle_multipass_dfa_lookahead always has the same value as 
dfa_lookahead.  And, even should 
targetm.sched.first_cycle_multipass_dfa_lookahead start returning different 
values, max_issue will happily continue to use the value that the hook returned 
in sched_init().

Also, your suggestion to use different dfa_lookahead values for hot/cold 
functions is not affected by this patch.  The values of dfa_lookahead variable 
have scope of the scheduling pass, which is invoked separately for every 
function.

I guess, you could, potentially, start differentiating values of dfa_lookahead 
based on whether a basic_block is hot or cold, but handling this scenario would 
require significant changes throughout the scheduler.

To summarize, dfa_lookahead is currently a pass-time invariant, that is free to 
change between invocations of the scheduler pass.  
Cached_first_cycle_multipass_dfa_lookahead always holds the same value as 
dfa_lookahead, and, as such, is extraneous.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

[PATCH] New debug parameter sched-pressure-factor

2014-10-22 Thread Maxim Kuvyrkov

Hi,

For benchmarking and tuning of register-pressure scheduling I've made the 
following simple patch to adjust effective number of registers that scheduler 
has at its disposal.  Do you think this would be valuable for others to 
investigate performance problems or tune register-pressure scheduling?

The parameter scales number of register by the given number of percent (the 
default is 100, which is the current status quo).  At value of 50 the scheduler 
attempt to use only half of available registers, while value of 200 will cause 
it to assume double the number of available registers (and, likely, cause a lot 
of spills).

Thank you,

--
Maxim Kuvyrkov
www.linaro.org




0008-Sched_pressure_factor.patch
Description: Binary data

Re: [PATCH] Account for prologue spills in reg_pressure scheduling

2014-10-23 Thread Maxim Kuvyrkov

On Oct 23, 2014, at 8:20 PM, Richard Sandiford  
wrote:

> Maxim Kuvyrkov  writes:
>> @@ -6701,6 +6753,24 @@ alloc_global_sched_pressure_data (void)
>>saved_reg_live = BITMAP_ALLOC (NULL);
>>region_ref_regs = BITMAP_ALLOC (NULL);
>>  }
>> +
>> +  /* Calculate number of CALL_USED_REGS in register classes that
>> + we calculate register pressure for.  */
>> +  for (int c = 0; c < ira_pressure_classes_num; ++c)
>> +{
>> +  enum reg_class cl = ira_pressure_classes[c];
>> +
>> +  call_used_regs_num[cl] = 0;
>> +
>> +  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
>> +if (call_used_regs[ira_class_hard_regs[cl][i]])
>> +  {
>> +/* Register I belongs to pressure class CL.  Pressure
>> +   classes do not intersect, so don't look further.  */
>> +++call_used_regs_num[cl];
>> +break;
>> +  }
>> +}
> 
> I don't think we want the break here.  The effect would be to count
> at most one call-used register per pressure class.

You make me feel dumb :-)

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH, sched] Cleanup and improve multipass_dfa_lookahead_guard

2014-06-03 Thread Maxim Kuvyrkov

On Jun 2, 2014, at 7:54 PM, Andreas Schwab  wrote:

> Regtested on ia64-suse-linux and installed as obvious.
> 
> Andreas.
> 
>   * config/ia64/ia64.c
>   (ia64_first_cycle_multipass_dfa_lookahead_guard): Check
>   pending_data_specs first.
> 
> diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c
> index 118e5bf..4c5390b 100644
> --- a/gcc/config/ia64/ia64.c
> +++ b/gcc/config/ia64/ia64.c
> @@ -7536,7 +7536,7 @@ ia64_first_cycle_multipass_dfa_lookahead_guard (rtx 
> insn, int ready_index)
> 
>   /* Size of ALAT is 32.  As far as we perform conservative
>  data speculation, we keep ALAT half-empty.  */
> -  if ((TODO_SPEC (insn) & BEGIN_DATA) && pending_data_specs >= 16)
> +  if (pending_data_specs >= 16 && (TODO_SPEC (insn) & BEGIN_DATA))
> return ready_index == 0 ? -1 : 1;
> 
>   if (ready_index == 0)

Thanks for fixing this.  To make it a bit more robust I would suggest using 
same condition as in ia64_variable_issue():
===
  if (sched_deps_info->generate_spec_deps && !sel_sched_p ())
/* Modulo scheduling does not extend h_i_d when emitting
   new instructions.  Don't use h_i_d, if we don't have to.  */
{
  if (DONE_SPEC (insn) & BEGIN_DATA)
pending_data_specs++;
  if (CHECK_SPEC (insn) & BEGIN_DATA)
pending_data_specs--;
}
===

The underlaying problem is that TODO_SPEC(insn) should not be accessed for 
sel-sched.  Because pending_data_specs turns out to always be null during 
sel-sched, "pending_data_specs >= 16" can be used as a shorthand for
 "sched_deps_info->generate_spec_deps && !sel_sched_p () && pending_data_specs 
>= 16"

--
Maxim Kuvyrkov
www.linaro.org

Re: [AArch64][PR65139] use clobber with match_scratch for aarch64_lshr_sisd_or_int_3

2015-04-18 Thread Maxim Kuvyrkov

> On Apr 18, 2015, at 8:21 PM, Richard Earnshaw  
> wrote:
> 
> On 18/04/15 16:13, Jakub Jelinek wrote:
>> On Sat, Apr 18, 2015 at 03:07:16PM +0100, Richard Earnshaw wrote:
>>> You need to ensure that your scratch register cannot overlap op1, since
>>> the scratch is written before op1 is read.
>> 
>> -   (clobber (match_scratch:QI 3 "=X,w,X"))]
>> +   (clobber (match_scratch:QI 3 "=X,&w,X"))]
>> 
>> incremental diff should ensure that, right?
>> 
>>  Jakub
>> 
> 
> 
> Sorry, where in the patch is that hunk?
> 
> I see just:
> 
> +   (clobber (match_scratch:QI 3 "=X,w,X"))]

Jakub's suggestion is an incremental patch on top of Kugan's.

> 
> And why would early clobbering the scratch be notably better than the
> original?
> 

It will still be better.  With this patch we want to allow RA freedom to 
optimally handle both of the following cases:

1. operand[1] dies after the instruction.  In this case we want operand[0] and 
operand[1] to be assigned to the same reg, and operand[3] to be assigned to a 
different register to provide a temporary.  In this case we don't care whether 
operand[3] is early-clobber or not.  This case is not optimally handled with 
current insn patterns.

2. operand[1] lives on after the instruction.  In this case we want operand[0] 
and operand[3] to be assigned to the same reg, and not clobber operand[1].  By 
marking operand[3] early-clobber we ensure that operand[1] is in a different 
register from what operand[0] and operand[3] were assigned to.  This case 
should be handled equally well before and after the patch.

My understanding is that Kugan's patch with Jakub's fix on top satisfy both of 
these cases.

--
Maxim Kuvyrkov
www.linaro.org

1 2 3 4 >

1 - 100 of 397 matches

Mail list logo