Re: [PATCH v2] ARM: Block predication on atomics [PR111235]
> On Oct 1, 2023, at 00:36, Ramana Radhakrishnan > wrote: > > + linaro-toolchain as I don't understand the CI issues on patchwork. > > ... > Ok if no regressions but as you might get nagged by the post commit CI ... I don't see any pre-commit failures for this patch, but regardless of what results are for pre-commit CI, there's always a chance to identify problems in post-commit CI -- simply because we test wa-a-ay more configurations in post-commit CI than in pre-commit CI. > > While it is not policy yet to look at these bots but given the > enthusiasm at Cauldron for patchwork and pre-commit CI and because all > my armhf boxes are boxed up, I decided to do something a bit novel ! > > I tried reviewing this via patchwork > > https://patchwork.sourceware.org/project/gcc/patch/pawpr08mb8982a6aa40749b74cad14c5783...@pawpr08mb8982.eurprd08.prod.outlook.com/ > > and notice that > > https://ci.linaro.org/job/tcwg_gcc_build--master-arm-precommit/2393/artifact/artifacts/artifacts.precommit/notify/mail-body.txt > says nothing could be built. Um, no. This says ... === Results changed to # reset_artifacts: -10 # true: 0 # build_abe gcc: 1 From # reset_artifacts: -10 # true: 0 # build_abe gcc: 1 === ... i.e., build succeeded both before and after patch. We'll change the boilerplate intro for successful builds from ... "Dear contributor, our automatic CI has detected problems related to your patch(es)." ... to ... "Dear contributor, you are awesome, no CI failures related to your patch(es)". One things that is strange -- testsuite builds were not triggered, we have only 2 reports from build tests, but are missing another 2 reports from testsuite tests. > > Possibly worth double checking the status for it being a false > negative as to why the build failed. Pre-commit CI is happy with the patch, albeit testsuite checks didn't run for some reason. Regardless, we'll quickly catch and report any fallout in the post-commit CI once the patch is merged. > > It was green on patchwork but remembering that Green is not Green for > CI in patchwork I clicked on the afore mentioned ci.linaro.org link > and see that it's actually broken. Unfortunately, I seem to have confused developers about green and red at my Cauldron presentation. "Green/Red" in patchwork mean the usual PASS/FAIL. It's only in post-commit CI in jenkins interface green and red mean something different. -- Maxim Kuvyrkov https://www.linaro.org
[PATCH] Format gotools.sum closer to what DejaGnu does
... to restore compatability with validate_failures.py . The testsuite script validate_failures.py expects "Running ..." to extract values, and gotools.sum provided "Running ". Note that libgo.sum, which also uses Makefile logic to generate DejaGnu-like output, already has "..." suffix. gotools/ChangeLog: * Makefile.am: Update "Running ..." output * Makefile.in: Regenerate. --- gotools/Makefile.am | 4 ++-- gotools/Makefile.in | 5 +++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/gotools/Makefile.am b/gotools/Makefile.am index 7b5302990f8..d2376b9c25b 100644 --- a/gotools/Makefile.am +++ b/gotools/Makefile.am @@ -332,8 +332,8 @@ check: check-head check-go-tool check-runtime check-cgo-test check-carchive-test @cp gotools.sum gotools.log @for file in cmd_go-testlog runtime-testlog cgo-testlog carchive-testlog cmd_vet-testlog embed-testlog; do \ testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \ - echo "Running $${testname}" >> gotools.sum; \ - echo "Running $${testname}" >> gotools.log; \ + echo "Running $${testname} ..." >> gotools.sum; \ + echo "Running $${testname} ..." >> gotools.log; \ sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> gotools.log; \ grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' -e 's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \ done diff --git a/gotools/Makefile.in b/gotools/Makefile.in index 2783b91ef4b..9cc238e748d 100644 --- a/gotools/Makefile.in +++ b/gotools/Makefile.in @@ -317,6 +317,7 @@ pdfdir = @pdfdir@ prefix = @prefix@ program_transform_name = @program_transform_name@ psdir = @psdir@ +runstatedir = @runstatedir@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ srcdir = @srcdir@ @@ -1003,8 +1004,8 @@ mostlyclean-local: @NATIVE_TRUE@ @cp gotools.sum gotools.log @NATIVE_TRUE@ @for file in cmd_go-testlog runtime-testlog cgo-testlog carchive-testlog cmd_vet-testlog embed-testlog; do \ @NATIVE_TRUE@testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \ -@NATIVE_TRUE@echo "Running $${testname}" >> gotools.sum; \ -@NATIVE_TRUE@echo "Running $${testname}" >> gotools.log; \ +@NATIVE_TRUE@echo "Running $${testname} ..." >> gotools.sum; \ +@NATIVE_TRUE@echo "Running $${testname} ..." >> gotools.log; \ @NATIVE_TRUE@sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> gotools.log; \ @NATIVE_TRUE@grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' -e 's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \ @NATIVE_TRUE@ done -- 2.34.1
Re: [PATCH 01/12] [contrib] validate_failures.py: Avoid testsuite aliasing
Patch proposed at https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635000.html -- Maxim Kuvyrkov https://www.linaro.org > On Sep 27, 2023, at 18:47, Maxim Kuvyrkov wrote: > > Hi Bernhard, > > Thanks, I meant to fix this, but forgot. > > The underlying problem here is that we want to detect which sub-testsuites > had failures. Current regex doesn't match go's case because there is no > "..." at the end: "Running foo" vs "Running foo ..." . > > My preferred way of fixing this is to make go's testsuite print out "..." . > We have a similar patch for glibc [1]. > > [1] https://sourceware.org/pipermail/libc-alpha/2023-June/148702.html > > -- > Maxim Kuvyrkov > https://www.linaro.org > >> On Sep 26, 2023, at 19:46, Bernhard Reutner-Fischer >> wrote: >> >> Hi Maxim! >> >> On Mon, 5 Jun 2023 18:06:25 +0400 >> Maxim Kuvyrkov via Gcc-patches wrote: >> >>>> On Jun 3, 2023, at 19:17, Jeff Law wrote: >>>> >>>> On 6/2/23 09:20, Maxim Kuvyrkov via Gcc-patches wrote: >>>>> This patch adds tracking of current testsuite "tool" and "exp" >>>>> to the processing of .sum files. This avoids aliasing between >>>>> tests from different testsuites with same name+description. >>>>> E.g., this is necessary for testsuite/c-c++-common, which is ran >>>>> for both gcc and g++ "tools". >>>>> This patch changes manifest format from ... >>>>> >>>>> FAIL: gcc_test >>>>> FAIL: g++_test >>>>> >>>>> ... to ... >>>>> >>>>> === gcc tests === >>>>> Running gcc/foo.exp ... >>>>> FAIL: gcc_test >>>>> === gcc Summary == >>>>> === g++ tests === >>>>> Running g++/bar.exp ... >>>>> FAIL: g++_test >>>>> === g++ Summary == >>>>> . >>>>> The new format uses same formatting as DejaGnu's .sum files >>>>> to specify which "tool" and "exp" the test belongs to. >>>> I think the series is fine. You're not likely to hear from Diego or Doug >>>> I suspect, I don't think either are involved in GNU stuff anymore. >>>> >>> >>> Thanks, Jeff. I'll wait for a couple of days and will merge if there are >>> no new comments. >> >> Maxim, may i ask you to have a look at the following problem, please? >> >> ISTM that your exp code does not work as expected for go, maybe you >> forgot to test the changes with go enabled? >> >> Ever since your changes in summer i see the following: >> >> gcc-14.mine$ >> /scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py >> --clean_build ../gcc-14.orig/ >> Getting actual results from build directory . >> ./gcc/testsuite/go/go.sum >> ./gcc/testsuite/gcc/gcc.sum >> ./gcc/testsuite/objc/objc.sum >> ./gcc/testsuite/jit/jit.sum >> ./gcc/testsuite/gdc/gdc.sum >> ./gcc/testsuite/gnat/gnat.sum >> ./gcc/testsuite/ada/acats/acats.sum >> ./gcc/testsuite/g++/g++.sum >> ./gcc/testsuite/obj-c++/obj-c++.sum >> ./gcc/testsuite/rust/rust.sum >> ./gcc/testsuite/gfortran/gfortran.sum >> ./x86_64-pc-linux-gnu/libgomp/testsuite/libgomp.sum >> ./x86_64-pc-linux-gnu/libphobos/testsuite/libphobos.sum >> ./x86_64-pc-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum >> ./x86_64-pc-linux-gnu/libffi/testsuite/libffi.sum >> ./x86_64-pc-linux-gnu/libitm/testsuite/libitm.sum >> ./x86_64-pc-linux-gnu/libgo/libgo.sum >> ./x86_64-pc-linux-gnu/libatomic/testsuite/libatomic.sum >> ./gotools/gotools.sum >> .sum file seems to be broken: tool="gotools", exp="None", >> summary_line="FAIL: TestScript" >> Traceback (most recent call last): >> File >> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", >> line 732, in >> retval = Main(sys.argv) >> File >> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", >> line 721, in Main >> retval = CompareBuilds() >> File >> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", >> line 622, in CompareBuilds >> actual = GetResults(sum_files) >> File >> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", >> line 466, in GetResults >>
Re: [PATCH] Format gotools.sum closer to what DejaGnu does
> On Nov 2, 2023, at 21:02, rep.dot@gmail.com wrote: > > Hi Maxim! > > Many thanks for the patch! Quick question below.. > > On 2 November 2023 13:48:55 CET, Maxim Kuvyrkov > wrote: >> ... to restore compatability with validate_failures.py . >> The testsuite script validate_failures.py expects >> "Running ..." to extract values, >> and gotools.sum provided "Running ". >> >> Note that libgo.sum, which also uses Makefile logic to generate >> DejaGnu-like output, already has "..." suffix. >> >> gotools/ChangeLog: >> >> * Makefile.am: Update "Running ..." output >> * Makefile.in: Regenerate. >> --- >> gotools/Makefile.am | 4 ++-- >> gotools/Makefile.in | 5 +++-- >> 2 files changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/gotools/Makefile.am b/gotools/Makefile.am >> index 7b5302990f8..d2376b9c25b 100644 >> --- a/gotools/Makefile.am >> +++ b/gotools/Makefile.am >> @@ -332,8 +332,8 @@ check: check-head check-go-tool check-runtime >> check-cgo-test check-carchive-test >> @cp gotools.sum gotools.log >> @for file in cmd_go-testlog runtime-testlog cgo-testlog carchive-testlog >> cmd_vet-testlog embed-testlog; do \ >> testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \ >> - echo "Running $${testname}" >> gotools.sum; \ >> - echo "Running $${testname}" >> gotools.log; \ >> + echo "Running $${testname} ..." >> gotools.sum; \ >> + echo "Running $${testname} ..." >> gotools.log; \ >> sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> gotools.log; \ >> grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' -e >> 's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \ >> done >> diff --git a/gotools/Makefile.in b/gotools/Makefile.in >> index 2783b91ef4b..9cc238e748d 100644 >> --- a/gotools/Makefile.in >> +++ b/gotools/Makefile.in >> @@ -317,6 +317,7 @@ pdfdir = @pdfdir@ >> prefix = @prefix@ >> program_transform_name = @program_transform_name@ >> psdir = @psdir@ >> +runstatedir = @runstatedir@ > > Are you sure you used the correct version of automake? I used automake 1.15.1 (from Ubuntu 20.04 automake-1.15 package), and I double-checked after getting the runstatedir update. I would appreciate someone checking on their side to make sure I don't have something weird going on in my setup. -- Maxim Kuvyrkov https://www.linaro.org
[PATCH v2] Format gotools.sum closer to what DejaGnu does
The only difference compared to v1 is using vanilla automake 1.15.1 to regenerate Makefile.in. I'll merge this as obvious if no-one objects in a day. === ... to restore compatability with validate_failures.py . The testsuite script validate_failures.py expects "Running ..." to extract values, and gotools.sum provided "Running ". Note that libgo.sum, which also uses Makefile logic to generate DejaGnu-like output, already has "..." suffix. gotools/ChangeLog: * Makefile.am: Update "Running ..." output * Makefile.in: Regenerate. Signed-off-by: Maxim Kuvyrkov --- gotools/Makefile.am | 4 ++-- gotools/Makefile.in | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gotools/Makefile.am b/gotools/Makefile.am index 7b5302990f8..d2376b9c25b 100644 --- a/gotools/Makefile.am +++ b/gotools/Makefile.am @@ -332,8 +332,8 @@ check: check-head check-go-tool check-runtime check-cgo-test check-carchive-test @cp gotools.sum gotools.log @for file in cmd_go-testlog runtime-testlog cgo-testlog carchive-testlog cmd_vet-testlog embed-testlog; do \ testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \ - echo "Running $${testname}" >> gotools.sum; \ - echo "Running $${testname}" >> gotools.log; \ + echo "Running $${testname} ..." >> gotools.sum; \ + echo "Running $${testname} ..." >> gotools.log; \ sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> gotools.log; \ grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' -e 's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \ done diff --git a/gotools/Makefile.in b/gotools/Makefile.in index 2783b91ef4b..36c2ec2abd3 100644 --- a/gotools/Makefile.in +++ b/gotools/Makefile.in @@ -1003,8 +1003,8 @@ mostlyclean-local: @NATIVE_TRUE@ @cp gotools.sum gotools.log @NATIVE_TRUE@ @for file in cmd_go-testlog runtime-testlog cgo-testlog carchive-testlog cmd_vet-testlog embed-testlog; do \ @NATIVE_TRUE@testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \ -@NATIVE_TRUE@echo "Running $${testname}" >> gotools.sum; \ -@NATIVE_TRUE@echo "Running $${testname}" >> gotools.log; \ +@NATIVE_TRUE@echo "Running $${testname} ..." >> gotools.sum; \ +@NATIVE_TRUE@echo "Running $${testname} ..." >> gotools.log; \ @NATIVE_TRUE@sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> gotools.log; \ @NATIVE_TRUE@grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' -e 's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \ @NATIVE_TRUE@ done -- 2.34.1
Re: [PATCH] simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]
Hi Richard, You patch below increases code-size on aarch64-linux-gnu with -Os on SPEC2k6 400.perlbench and 453.povray -- by 1% and 2% respectively. 400.perlbench,perlbench_base.default, 101,939261,951221 453.povray,povray_base.default, 102,707807,721399 Would you please check whether these can be avoided? [Let me know if you need help reproducing this problem.] Thank you, -- Maxim Kuvyrkov https://www.linaro.org > On Jan 27, 2020, at 7:41 PM, Richard Sandiford > wrote: > > In the gcc.target/aarch64/lsl_asr_sbfiz.c part of this PR, we have: > > Failed to match this instruction: > (set (reg:SI 95) >(ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 97) 0) >(const_int 3 [0x3]) >(const_int 0 [0])) 0) >(const_int 19 [0x13]))) > > If we perform the natural simplification to: > > (set (reg:SI 95) >(ashift:SI (sign_extract:SI (reg:SI 97) >(const_int 3 [0x3]) >(const_int 0 [0])) 0) >(const_int 19 [0x13]))) > > then the pattern matches. And it turns out that we do have a > simplification like that already, but it would only kick in for > extractions from a reg, not a subreg. E.g.: > > (set (reg:SI 95) >(ashift:SI (subreg:SI (sign_extract:DI (reg:DI X) >(const_int 3 [0x3]) >(const_int 0 [0])) 0) >(const_int 19 [0x13]))) > > would simplify to: > > (set (reg:SI 95) >(ashift:SI (sign_extract:SI (subreg:SI (reg:DI X) 0) >(const_int 3 [0x3]) >(const_int 0 [0])) 0) >(const_int 19 [0x13]))) > > IMO the subreg case is even more obviously a simplification > than the bare reg case, since the net effect is to remove > either one or two subregs, rather than simply change the > position of a subreg/truncation. > > However, doing that regressed gcc.dg/tree-ssa/pr64910-2.c > for -m32 on x86_64-linux-gnu, because we could then simplify > a :HI zero_extract to a :QI one. The associated *testqi_ext_3 > pattern did already seem to want to handle QImode extractions: > > "ix86_match_ccmode (insn, CCNOmode) > && ((TARGET_64BIT && GET_MODE (operands[2]) == DImode) > || GET_MODE (operands[2]) == SImode > || GET_MODE (operands[2]) == HImode > || GET_MODE (operands[2]) == QImode) > > but I'm not sure how often the QI case would trigger in practice, > since the zero_extract mode was restricted to HI and above. I checked > the other x86 patterns and couldn't see any other instances of this. > > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu, > OK to install? > > Richard > > > 2020-01-27 Richard Sandiford > > gcc/ > PR rtl-optimization/87763 > * simplify-rtx.c (simplify_truncation): Extend sign/zero_extract > simplification to handle subregs as well as bare regs. > * config/i386/i386.md (*testqi_ext_3): Match QI extracts too. > --- > gcc/config/i386/i386.md | 2 +- > gcc/simplify-rtx.c | 4 +++- > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index 6e9c9bd2fb6..a125ab350bb 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -8927,7 +8927,7 @@ (define_insn "*testqi_ext_2" > (define_insn_and_split "*testqi_ext_3" > [(set (match_operand 0 "flags_reg_operand") > (match_operator 1 "compare_operator" > - [(zero_extract:SWI248 > + [(zero_extract:SWI >(match_operand 2 "nonimmediate_operand" "rm") >(match_operand 3 "const_int_operand" "n") >(match_operand 4 "const_int_operand" "n")) > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > index eff1d07a253..db4f9339c15 100644 > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -736,7 +736,9 @@ simplify_truncation (machine_mode mode, rtx op, > (*_extract:M1 (truncate:M1 (reg:M2)) (len) (pos')) if possible without > changing len. */ > if ((GET_CODE (op) == ZERO_EXTRACT || GET_CODE (op) == SIGN_EXTRACT) > - && REG_P (XEXP (op, 0)) > + && (REG_P (XEXP (op, 0)) > + || (SUBREG_P (XEXP (op, 0)) > + && REG_P (SUBREG_REG (XEXP (op, 0) > && GET_MODE (XEXP (op, 0)) == GET_MODE (op) > && CONST_INT_P (XEXP (op, 1)) > && CONST_INT_P (XEXP (op, 2))) > -- > 2.17.1 >
Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64
> On Feb 29, 2024, at 21:59, Richard Earnshaw (lists) > wrote: > > On 29/02/2024 17:55, Andrew Pinski (QUIC) wrote: >>> -Original Message- >>> From: Maxim Kuvyrkov >>> Sent: Thursday, February 29, 2024 9:46 AM >>> To: Andrew Pinski (QUIC) >>> Cc: Evgeny Karpov ; Andrew Pinski >>> ; Richard Sandiford ; gcc- >>> patc...@gcc.gnu.org; 10wa...@gmail.com; m...@harmstone.com; Zac >>> Walker ; Ron Riddle >>> ; Radek Barton >>> Subject: Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW >>> environments for AArch64 >>> >>> WARNING: This email originated from outside of Qualcomm. Please be wary >>> of any links or attachments, and do not enable macros. >>> >>>> On Feb 29, 2024, at 21:35, Andrew Pinski (QUIC) >>> wrote: >>>> >>>> >>>> >>>>> -----Original Message- >>>>> From: Evgeny Karpov >>>>> Sent: Thursday, February 29, 2024 8:46 AM >>>>> To: Andrew Pinski >>>>> Cc: Richard Sandiford ; gcc- >>>>> patc...@gcc.gnu.org; 10wa...@gmail.com; Maxim Kuvyrkov >>>>> ; m...@harmstone.com; Zac Walker >>>>> ; Ron Riddle ; >>>>> Radek Barton ; Andrew Pinski (QUIC) >>>>> >>>>> Subject: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments >>>>> for AArch64 >>>>> >>>>> Wednesday, February 28, 2024 2:00 AM >>>>> Andrew Pinski wrote: >>>>> >>>>>> What does this mean with respect to C++ exceptions? Or you using >>>>>> SJLJ exceptions support or the dwarf unwinding ones without SEH >>> support? >>>>>> I am not sure if SJLJ exceptions is well tested any more in GCC either. >>>>>> >>>>>> Also I have a question if you ran the full GCC/G++ testsuites and >>>>>> what were the results? >>>>>> If you did run it, did you use a cross compiler or the native >>>>>> compiler? Did you do a bootstrap (GCC uses C++ but no exceptions >>> though)? >>>>> >>>>> As mentioned in the cover letter and the thread, the current >>>>> contribution covers only the C scope. >>>>> Exception handling is fully disabled for now. >>>>> There is an experimental build with C++ and SEH, however, it is not >>>>> included in the plan for the current contribution. >>>>> >>>>> https://github.com/Windows-on-ARM-Experiments/mingw-woarm64- >>> build >>>>> >>>>>> If you run using a cross compiler, did you use ssh or some other >>>>>> route to run the applications? >>>>>> >>>>>> Thanks, >>>>>> Andrew Pinski >>>>> >>>>> GitHub Actions are used to cross-compile toolchains, packages and >>>>> tests, and execute tests on Windows Arm64. >>>> >>>> This does not answer my question because what you are running is just >>> simple testcases and not the FULL GCC testsuite. >>>> So again have you ran the GCC testsuite and do you have a dejagnu board to >>> be able to execute the binaries? >>>> I think without the GCC testsuite ran to find all of the known failures, >>>> you are >>> going to be running into many issues. >>>> The GCC testsuite includes many tests for ABI corner cases and many >>> features that you will most likely not think about testing using your simple >>> testcases. >>>> In fact I suspect there will be some of the aarch64 testcases which will >>>> need >>> to be modified for the windows ABI which you have not done yet. >>> >>> Hi Andrew, >>> >>> We (Linaro) have a prototype CI loop setup for testing aarch64-w64- >>> mingw32, and we have results for gcc-c and libatomic -- see [1]. >>> >>> The results are far from clean, but that's expected. This patch series >>> aims at >>> enabling C hello-world only, and subsequent patch series will improve the >>> state of the port. >>> >>> [1] https://ci.linaro.org/job/tcwg_gnu_mingw_check_gcc--master-woa64- >>> build/6/artifact/artifacts/sumfiles/ >> >> Looking at these results, this port is not in any shape or form to be >> upstreamed right now. Even simple -g will cause failures. >> Note we don't need a clean testsuite run but the patch series is not even >> allowing enabling hello world due to the -g not being able to used. >> > > It seemed to me as though the patch was posted for comments, not for > immediate inclusion. I agree this isn't ready for committing yet, but > neither should the submitters wait until it's perfect before posting it. > > I think it's gcc-15 material, so now is about the right time to be thinking > about it. Hi Andrew, I agree with Richard. This patch series is large as is, and it has clear goals: 1. Enable aarch64-w64-mingw32 to compile C-language hello-world. 2. Not regress any other targets. As far as I know, it achieves both, but both Microsoft and Linaro will do additional testing on x86_64-w64-mingw32 to confirm. There are more features and fixes for aarch64-w64-mingw32 waiting in the development repos on github, but I don't see any point in cleaning them up and preparing for submission before this already-quiet-large patchset if reviewed. Thank you, -- Maxim Kuvyrkov https://www.linaro.org
[PATCH] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests
This patch has been tested on - aarch64-linux-gnu - arm-linux-gnueabihf (VFP, NEON disabled by default), - arm-none-eabi (Soft-FP) with the following [expected] differences in the test results: - FAIL now PASS [FAIL => PASS]: Executed from: gcc:gcc.dg/vect/vect.exp gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors) gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects (test for excess errors) - UNSUPPORTED disappears[UNSUP=> ]: Executed from: g++:g++.dg/vect/vect.exp g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc -std=gnu++98 - UNSUPPORTED appears [ =>UNSUP]: Executed from: g++:g++.dg/vect/vect.exp g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc -std=c++98 - UNRESOLVED disappears [UNRES=> ]: Executed from: gcc:gcc.dg/vect/vect.exp gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects compilation failed to produce executable gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to produce executable This patch was motivated by gcc.dg/vect/pr113576.c, which currently fails to compile for ARM targets without NEON. === CUT === Testsuites driven by vect.exp rely on check_vect_support_and_set_flags to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add -mfpu=neon for arm-linux-gnueabi). Unfortunately, these flags are overwritten by dg-options directive, which can cause tests to fail. Behavior of dg-options is documented in vect.exp files, but not all developers look at the .exp file when adding a new testcase. This caused a few dg-options directives to be used instead of the more appropriate dg-additional-options. This patch changes target-independent dg-options into dg-additional-options. This patch does not touch target-specific dg-options and target-specific tests to avoid disturbing the gentle balance of target-specific vectorization. This patch also removes a couple of unneeded "dg-do run" directives to avoid failures on compile-only targets. Default action is, again, set by check_vect_support_and_set_flags. Lastly, I avoided renaming tests that use -O options to O-* filename format because this support is not consistent between gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites. It seems dg-additional-options is cleaner. This patch does the following, 1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*; 2. do not change { dg-options FOO { target { target-*-pattern } } }; 3. do not remove { dg-do run { target { target-*-pattern } } }; 4. change { dg-options FOO } to { dg-additional-options FOO }; 5. remove { dg-do run } in several tests, where it is clearly not needed. gcc/testsuite/ChangeLog: PR testsuite/114307 * g++.dg/vect/pr84556.cc: Fixup. * gcc.dg/vect/complex/complex-operations-run.c Fixup. * gcc.dg/vect/gimplefe-40.c Fixup. * gcc.dg/vect/gimplefe-41.c Fixup. * gcc.dg/vect/pr101145inf.c Fixup. * gcc.dg/vect/pr101145inf_1.c Fixup. * gcc.dg/vect/pr108316.c Fixup. * gcc.dg/vect/pr109011-1.c Fixup. * gcc.dg/vect/pr109011-2.c Fixup. * gcc.dg/vect/pr109011-3.c Fixup. * gcc.dg/vect/pr109011-4.c Fixup. * gcc.dg/vect/pr109011-5.c Fixup. * gcc.dg/vect/pr111846.c Fixup. * gcc.dg/vect/pr111860-2.c Fixup. * gcc.dg/vect/pr111860-3.c Fixup. * gcc.dg/vect/pr113002.c Fixup. * gcc.dg/vect/pr113576.c Fixup. * gcc.dg/vect/pr84711.c Fixup. * gcc.dg/vect/pr85597.c Fixup. * gcc.dg/vect/pr88497-1.c Fixup. * gcc.dg/vect/pr88497-2.c Fixup. * gcc.dg/vect/pr88497-3.c Fixup. * gcc.dg/vect/pr88497-4.c Fixup. * gcc.dg/vect/pr88497-5.c Fixup. * gcc.dg/vect/pr88497-7.c Fixup. * gcc.dg/vect/pr92347.c Fixup. * gcc.dg/vect/pr93069.c Fixup. * gcc.dg/vect/pr97241.c Fixup. * gcc.dg/vect/pr99102.c Fixup. * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c Fixup. * gcc.dg/vect/vect-early-break_65.c Fixup. * gcc.dg/vect/vect-fold-1.c Fixup. * gcc.dg/vect/vect-ifcvt-19.c Fixup. * gcc.dg/vect/vect-ifcvt-20.c Fixup. * gcc.dg/vect/vect-reduc-epilogue-gaps.c Fixup. * gcc.dg/vect/vect-singleton_1.c Fixup. * gfortran.dg/vect/fast-math-mgrid-resid.f Fixup. * gfortran.dg/vect/pr77848.f Fixup. * gfortran.dg/vect/pr90913.f90 Fixup. --- gcc/testsuite/g++.dg/vect/pr84556.cc | 2 +- gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c | 1 - gcc/testsuite/gcc.dg/vect/gimplefe-40.c| 2 +- gcc/testsuite/gcc.dg/vect/gimplefe-41.c| 2 +- gcc/testsuite/gcc.dg/vect/pr101145inf.c| 2 +- gcc/testsuite/gcc.dg/vect/pr101145inf_1.c | 2 +- gcc/testsuite/gcc.dg/vect/pr
Re: [PATCH] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests
> On Mar 13, 2024, at 15:25, Richard Earnshaw > wrote: > > > > On 13/03/2024 10:58, Maxim Kuvyrkov wrote: >> This patch has been tested on >> - aarch64-linux-gnu >> - arm-linux-gnueabihf (VFP, NEON disabled by default), >> - arm-none-eabi (Soft-FP) >> with the following [expected] differences in the test results: >> - FAIL now PASS [FAIL => PASS]: >> Executed from: gcc:gcc.dg/vect/vect.exp >> gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors) >> gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects >> (test for excess errors) >> - UNSUPPORTED disappears[UNSUP=> ]: >> Executed from: g++:g++.dg/vect/vect.exp >> g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc -std=gnu++98 >> - UNSUPPORTED appears [ =>UNSUP]: >> Executed from: g++:g++.dg/vect/vect.exp >> g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc -std=c++98 >> - UNRESOLVED disappears [UNRES=> ]: >> Executed from: gcc:gcc.dg/vect/vect.exp >> gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects >> compilation failed to produce executable >> gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to >> produce executable >> This patch was motivated by gcc.dg/vect/pr113576.c, which currently >> fails to compile for ARM targets without NEON. >> === CUT === >> Testsuites driven by vect.exp rely on check_vect_support_and_set_flags >> to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add >> -mfpu=neon for arm-linux-gnueabi). Unfortunately, these flags are >> overwritten by dg-options directive, which can cause tests to fail. >> Behavior of dg-options is documented in vect.exp files, but not >> all developers look at the .exp file when adding a new testcase. >> This caused a few dg-options directives to be used instead of >> the more appropriate dg-additional-options. >> This patch changes target-independent dg-options into >> dg-additional-options. This patch does not touch target-specific >> dg-options and target-specific tests to avoid disturbing the gentle >> balance of target-specific vectorization. >> This patch also removes a couple of unneeded "dg-do run" directives >> to avoid failures on compile-only targets. Default action is, again, >> set by check_vect_support_and_set_flags. >> Lastly, I avoided renaming tests that use -O options to O-* >> filename format because this support is not consistent between >> gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites. >> It seems dg-additional-options is cleaner. >> This patch does the following, >> 1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*; >> 2. do not change { dg-options FOO { target { target-*-pattern } } }; >> 3. do not remove { dg-do run { target { target-*-pattern } } }; >> 4. change { dg-options FOO } to { dg-additional-options FOO }; >> 5. remove { dg-do run } in several tests, where it is clearly not needed. >> gcc/testsuite/ChangeLog: >> PR testsuite/114307 >> * g++.dg/vect/pr84556.cc: Fixup. >> * gcc.dg/vect/complex/complex-operations-run.c Fixup. >> * gcc.dg/vect/gimplefe-40.c Fixup. >> * gcc.dg/vect/gimplefe-41.c Fixup. >> * gcc.dg/vect/pr101145inf.c Fixup. >> * gcc.dg/vect/pr101145inf_1.c Fixup. >> * gcc.dg/vect/pr108316.c Fixup. >> * gcc.dg/vect/pr109011-1.c Fixup. >> * gcc.dg/vect/pr109011-2.c Fixup. >> * gcc.dg/vect/pr109011-3.c Fixup. >> * gcc.dg/vect/pr109011-4.c Fixup. >> * gcc.dg/vect/pr109011-5.c Fixup. >> * gcc.dg/vect/pr111846.c Fixup. >> * gcc.dg/vect/pr111860-2.c Fixup. >> * gcc.dg/vect/pr111860-3.c Fixup. >> * gcc.dg/vect/pr113002.c Fixup. >> * gcc.dg/vect/pr113576.c Fixup. >> * gcc.dg/vect/pr84711.c Fixup. >> * gcc.dg/vect/pr85597.c Fixup. >> * gcc.dg/vect/pr88497-1.c Fixup. >> * gcc.dg/vect/pr88497-2.c Fixup. >> * gcc.dg/vect/pr88497-3.c Fixup. >> * gcc.dg/vect/pr88497-4.c Fixup. >> * gcc.dg/vect/pr88497-5.c Fixup. >> * gcc.dg/vect/pr88497-7.c Fixup. >> * gcc.dg/vect/pr92347.c Fixup. >> * gcc.dg/vect/pr93069.c Fixup. >> * gcc.dg/vect/pr97241.c Fixup. >> * gcc.dg/vect/pr99102.c Fixup. >> * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c Fixup. >> * gcc.dg/vect/vect-early-break_65.c Fixup. >> * gcc.dg/vect/vect-fold-1.c Fixup. >> * gcc.dg/vect/vect-ifcvt-19.c Fixup. >> * gcc.dg/vect/vect-ifcvt-20.c Fixup. >> * gcc.dg/vect/vect-reduc-epilogue-gaps.c Fixup. >> * g
[PATCH v2] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests
Changes in v2: - Better changelog entry. - NFC. This patch has been tested on - aarch64-linux-gnu - arm-linux-gnueabihf (VFP, NEON disabled by default), - arm-none-eabi (Soft-FP) with the following [expected] differences in the test results: - FAIL now PASS [FAIL => PASS]: Executed from: gcc:gcc.dg/vect/vect.exp gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors) gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects (test for excess errors) - UNSUPPORTED disappears[UNSUP=> ]: Executed from: g++:g++.dg/vect/vect.exp g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc -std=gnu++98 - UNSUPPORTED appears [ =>UNSUP]: Executed from: g++:g++.dg/vect/vect.exp g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc -std=c++98 - UNRESOLVED disappears [UNRES=> ]: Executed from: gcc:gcc.dg/vect/vect.exp gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects compilation failed to produce executable gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to produce executable This patch was motivated by gcc.dg/vect/pr113576.c, which currently fails to compile for ARM targets without NEON. === CUT === Testsuites driven by vect.exp rely on check_vect_support_and_set_flags to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add -mfpu=neon for arm-linux-gnueabi). Unfortunately, these flags are overwritten by dg-options directive, which can cause tests to fail. Behavior of dg-options is documented in vect.exp files, but not all developers look at the .exp file when adding a new testcase. This caused a few dg-options directives to be used instead of the more appropriate dg-additional-options. This patch changes target-independent dg-options into dg-additional-options. This patch does not touch target-specific dg-options and target-specific tests to avoid disturbing the gentle balance of target-specific vectorization. This patch also removes a couple of unneeded "dg-do run" directives to avoid failures on compile-only targets. Default action is, again, set by check_vect_support_and_set_flags. Lastly, I avoided renaming tests that use -O options to O-* filename format because this support is not consistent between gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites. It seems dg-additional-options is cleaner. This patch does the following, 1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*; 2. do not change { dg-options FOO { target { target-*-pattern } } }; 3. do not remove { dg-do run { target { target-*-pattern } } }; 4. change { dg-options FOO } to { dg-additional-options FOO }; 5. remove { dg-do run } in several tests, where it is clearly not needed. gcc/testsuite/ChangeLog: PR testsuite/114307 * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: Remove dg-run. * gcc.dg/vect/complex/complex-operations-run.c: Likewise. * gcc.dg/vect/pr113576.c: Remove dg-run. Use dg-additional-options for test-specific flags. * gcc.dg/vect/gimplefe-40.c: Use dg-additional-options for test-specific flags. * gcc.dg/vect/gimplefe-41.c: Likewise. * gcc.dg/vect/pr101145inf.c: Likewise. * gcc.dg/vect/pr101145inf_1.c: Likewise. * gcc.dg/vect/pr108316.c: Likewise. * gcc.dg/vect/pr109011-1.c: Likewise. * gcc.dg/vect/pr109011-2.c: Likewise. * gcc.dg/vect/pr109011-3.c: Likewise. * gcc.dg/vect/pr109011-4.c: Likewise. * gcc.dg/vect/pr109011-5.c: Likewise. * gcc.dg/vect/pr111846.c: Likewise. * gcc.dg/vect/pr111860-2.c: Likewise. * gcc.dg/vect/pr111860-3.c: Likewise. * gcc.dg/vect/pr113002.c: Likewise. * gcc.dg/vect/pr84711.c: Likewise. * gcc.dg/vect/pr85597.c: Likewise. * gcc.dg/vect/pr88497-1.c: Likewise. * gcc.dg/vect/pr88497-2.c: Likewise. * gcc.dg/vect/pr88497-3.c: Likewise. * gcc.dg/vect/pr88497-4.c: Likewise. * gcc.dg/vect/pr88497-5.c: Likewise. * gcc.dg/vect/pr88497-7.c: Likewise. * gcc.dg/vect/pr92347.c: Likewise. * gcc.dg/vect/pr93069.c: Likewise. * gcc.dg/vect/pr97241.c: Likewise. * gcc.dg/vect/pr99102.c: Likewise. * gcc.dg/vect/vect-early-break_65.c: Likewise. * gcc.dg/vect/vect-fold-1.c: Likewise. * gcc.dg/vect/vect-ifcvt-19.c: Likewise. * gcc.dg/vect/vect-ifcvt-20.c: Likewise. * gcc.dg/vect/vect-reduc-epilogue-gaps.c: Likewise. * gcc.dg/vect/vect-singleton_1.c: Likewise. * g++.dg/vect/pr84556.cc: Likewise. * gfortran.dg/vect/fast-math-mgrid-resid.f: Likewise. * gfortran.dg/vect/pr77848.f: Likewise. * gfortran.dg/vect/pr90913.f90: Likewise. --- gcc/testsuite/g++.dg/vect/pr84556.cc | 2 +- gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c
Re: [PATCH v1 00/13] Add aarch64-w64-mingw32 target
Hi Evgeny, Great job! For reference, here is a test build of this patch series using Linaro Toolchain CI: https://ci.linaro.org/view/tcwg-build/job/tcwg_gnu_mingw_build--master-woa64-build/9/artifact/artifacts/ -- Maxim Kuvyrkov https://www.linaro.org > On Feb 21, 2024, at 21:47, Evgeny Karpov wrote: > > Hello, > > We would like to take your attention to the review of changes for the > new GCC target, aarch64-w64-mingw32. The new target will be > supported, tested, added to CI, and maintained by Linaro. This marks > the first of three planned patch series contributing to the GCC C > compiler's support for Windows Arm64. > > 1. Minimal aarch64-w64-mingw32 C implementation to cross-compile > hello-world with libgcc for Windows Arm64 using MinGW. > 2. Extension of the aarch64-w64-mingw32 C implementation to > cross-compile OpenSSL, OpenBLAS, FFmpeg, and libjpeg-turbo. All > packages successfully pass tests. > 3. Addition of call stack support for debugging, resolution of > optimization issues in the C compiler, and DLL export/import for the > aarch64-w64-mingw32 target. > > This patch series introduces the 1st point, which involves building > hello-world for the aarch64-w64-mingw32 target. The patch depends on > the binutils changes for the aarch64-w64-mingw32 target that have > already been merged. > > The binutils should include recent relocation fixes. > f87eaf8ff3995a5888c6dc4996a20c770e6bcd36 > aarch64: Add new relocations and limit COFF AArch64 relocation offsets > > The series is structured in a way to trivially show that it should not > affect any other targets. > > In this patch, several changes have been made to support the > aarch64-w64-mingw32 target for GCC. The modifications include the > definition of the MS ABI for aarch64, adjustments to FIXED_REGISTERS > and STATIC_CHAIN_REGNUM for different ABIs, and specific definitions > for COFF format on AArch64. Additionally, the patch reuses MinGW > types and definitions from i386, relocating them to a new > mingw folder for shared usage between both targets. > > MinGW-specific options have been introduced for AArch64, along with > override options for aarch64-w64-mingw32. Builtin stack probing for > override options for aarch64-w64-mingw32. Builtin stack probing for > AArch64 has been enabled as an alternative for chkstk. Symbol name > encoding and section information handling for aarch64-w64-mingw32 have > been incorporated, and the MinGW environment has been added, which > will also be utilized for defining the Cygwin environment in the > future. > > The patch includes renaming "x86 Windows Options" to "Cygwin and MinGW > Options," which now encompasses AArch64 as well. AArch64-specific > Cygwin and MinGW Options have been introduced for the unique > requirements of the AArch64 architecture. > > Function type declaration and named sections support have been added. > The necessary objects for Cygwin and MinGW have been built for the > aarch64-w64-mingw32 target, and relevant files such as msformat-c.cc > and winnt-d.cc have been moved to the mingw folder for reuse in > AArch64. > > Furthermore, the aarch64-w64-mingw32 target has been included in both > libatomic and libgcc, ensuring support for the AArch64 architecture > within these libraries. These changes collectively enhance the > capabilities of GCC for the specified target. > > Coauthors: Zac Walker , > Mark Harmstone and > Ron Riddle > > Refactored, prepared, and validated by > Radek Barton and > Evgeny Karpov > > Special thanks to the Linaro GNU toolchain team for internal review > and assistance in preparing the patch series! > > Regards, > Evgeny > > > Zac Walker (13): > Introduce aarch64-w64-mingw32 target > aarch64: The aarch64-w64-mingw32 target implements the MS ABI > aarch64: Mark x18 register as a fixed register for MS ABI > aarch64: Add aarch64-w64-mingw32 COFF > Reuse MinGW from i386 for AArch64 > Rename section and encoding functions from i386 which will be used in >aarch64 > Exclude i386 functionality from aarch64 build > aarch64: Add Cygwin and MinGW environments for AArch64 > aarch64: Add SEH to machine_function > Rename "x86 Windows Options" to "Cygwin and MinGW Options" > aarch64: Build and add objects for Cygwin and MinGW for AArch64 > aarch64: Add aarch64-w64-mingw32 target to libatomic > Add aarch64-w64-mingw32 target to libgcc > > fixincludes/mkfixinc.sh | 3 +- > gcc/config.gcc| 47 +++-- > gcc/config/aarch64/aarch64-coff.h | 92 + > gcc/config/aarch64/aarch64-opts.h | 7 + > gcc/config/aarch6
Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64
> On Feb 29, 2024, at 21:35, Andrew Pinski (QUIC) > wrote: > > > >> -Original Message- >> From: Evgeny Karpov >> Sent: Thursday, February 29, 2024 8:46 AM >> To: Andrew Pinski >> Cc: Richard Sandiford ; gcc- >> patc...@gcc.gnu.org; 10wa...@gmail.com; Maxim Kuvyrkov >> ; m...@harmstone.com; Zac Walker >> ; Ron Riddle ; Radek >> Barton ; Andrew Pinski (QUIC) >> >> Subject: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments >> for AArch64 >> >> Wednesday, February 28, 2024 2:00 AM >> Andrew Pinski wrote: >> >>> What does this mean with respect to C++ exceptions? Or you using SJLJ >>> exceptions support or the dwarf unwinding ones without SEH support? >>> I am not sure if SJLJ exceptions is well tested any more in GCC either. >>> >>> Also I have a question if you ran the full GCC/G++ testsuites and what >>> were the results? >>> If you did run it, did you use a cross compiler or the native >>> compiler? Did you do a bootstrap (GCC uses C++ but no exceptions though)? >> >> As mentioned in the cover letter and the thread, the current contribution >> covers only the C scope. >> Exception handling is fully disabled for now. >> There is an experimental build with C++ and SEH, however, it is not included >> in >> the plan for the current contribution. >> >> https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build >> >>> If you run using a cross compiler, did you use ssh or some other route >>> to run the applications? >>> >>> Thanks, >>> Andrew Pinski >> >> GitHub Actions are used to cross-compile toolchains, packages and tests, and >> execute tests on Windows Arm64. > > This does not answer my question because what you are running is just simple > testcases and not the FULL GCC testsuite. > So again have you ran the GCC testsuite and do you have a dejagnu board to be > able to execute the binaries? > I think without the GCC testsuite ran to find all of the known failures, you > are going to be running into many issues. > The GCC testsuite includes many tests for ABI corner cases and many features > that you will most likely not think about testing using your simple testcases. > In fact I suspect there will be some of the aarch64 testcases which will need > to be modified for the windows ABI which you have not done yet. Hi Andrew, We (Linaro) have a prototype CI loop setup for testing aarch64-w64-mingw32, and we have results for gcc-c and libatomic -- see [1]. The results are far from clean, but that's expected. This patch series aims at enabling C hello-world only, and subsequent patch series will improve the state of the port. [1] https://ci.linaro.org/job/tcwg_gnu_mingw_check_gcc--master-woa64-build/6/artifact/artifacts/sumfiles/ Thanks, -- Maxim Kuvyrkov https://www.linaro.org
Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
Hi Vladimir, Hi Jeff, Richard and Alexander have reviewed this patch and [I assume] have no further comments. OK to merge? On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov wrote: > This patch avoids sched-deps.cc:find_inc() creating exponential number > of dependencies, which become memory and compilation time hogs. > Consider example (simplified from PR96388) ... > === > sp=sp-4 // sp_insnA > mem_insnA1[sp+A1] > ... > mem_insnAN[sp+AN] > sp=sp-4 // sp_insnB > mem_insnB1[sp+B1] > ... > mem_insnBM[sp+BM] > === > > [For simplicity, let's assume find_inc(backwards==true)]. > In this example find_modifiable_mems() will arrange for mem_insnA* > to be able to pass sp_insnA, and, while doing this, will create > dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB > is a consumer of sp_insnA. After this sp_insnB will have N new > backward dependencies. > Then find_modifiable_mems() gets to mem_insnB*s and starts to create > N new dependencies for _every_ mem_insnB*. This gets us N*M new > dependencies. > > In PR96833's testcase N and M are 10k-15k, which causes RAM usage of > 30GB and compilation time of 30 minutes, with sched2 accounting for > 95% of both metrics. After this patch the RAM usage is down to 1GB > and compilation time is down to 3-4 minutes, with sched2 no longer > standing out on -ftime-report or memory usage. > > gcc/ChangeLog: > > PR rtl-optimization/96388 > PR rtl-optimization/111554 > * sched-deps.cc (find_inc): Avoid exponential behavior. > --- > gcc/sched-deps.cc | 48 +++ > 1 file changed, 44 insertions(+), 4 deletions(-) > > diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc > index c23218890f3..005fc0f567e 100644 > --- a/gcc/sched-deps.cc > +++ b/gcc/sched-deps.cc > @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, > rtx_insn *insn, bool before_mem) > /* Once a suitable mem reference has been found and the corresponding data > in MII has been filled in, this function is called to find a suitable > add or inc insn involving the register we found in the memory > - reference. */ > + reference. > + If successful, this function will create additional dependencies > between > + - mii->inc_insn's producers and mii->mem_insn as a consumer (if > backwards) > + - mii->inc_insn's consumers and mii->mem_insn as a producer (if > !backwards). > +*/ > > static bool > find_inc (struct mem_inc_info *mii, bool backwards) > { >sd_iterator_def sd_it; >dep_t dep; > + sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : > SD_LIST_FORW; > + int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps); > > - sd_it = sd_iterator_start (mii->mem_insn, > -backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW); > + sd_it = sd_iterator_start (mii->mem_insn, mem_deps); >while (sd_iterator_cond (&sd_it, &dep)) > { >dep_node_t node = DEP_LINK_NODE (*sd_it.linkp); >rtx_insn *pro = DEP_PRO (dep); >rtx_insn *con = DEP_CON (dep); > - rtx_insn *inc_cand = backwards ? pro : con; > + rtx_insn *inc_cand; > + int n_inc_deps; > + >if (DEP_NONREG (dep) || DEP_MULTIPLE (dep)) > goto next; > + > + if (backwards) > + { > + inc_cand = pro; > + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK); > + } > + else > + { > + inc_cand = con; > + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW); > + } > + > + /* In the FOR_EACH_DEP loop below we will create additional > n_inc_deps > +for mem_insn. This by itself is not a problem, since each > mem_insn > +will have only a few inc_insns associated with it. However, if > +we consider that a single inc_insn may have a lot of mem_insns, > AND, > +on top of that, a few other inc_insns associated with it -- > +those _other inc_insns_ will get (n_mem_deps * number of MEM > insns) > +dependencies created for them. This may cause an exponential > +growth of memory usage and scheduling time. > +See PR96388 for details. > +We [heuristically] use n_inc_deps as a proxy for the number of MEM > +insns, and drop opportunities for breaking modifiable_mem > dependencies > +when dependency lists grow beyond reasonable size. */ > + if (n_mem_deps * n_inc_deps > + >= param_max_pending_list_length * param_max_pending_list_length) > + goto next; > + >if (parse_add_or_inc (mii, inc_cand, backwards)) >
Re: [PATCH v3 3/8] Simplify handling of INSN_ and EXPR_LISTs in sched-rgn.cc
Dear RTL maintainers, Gently ping. This patch adds a couple of new functions to lists.cc, which then are used to simplify logic in the scheduler. OK to merge? On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov wrote: > This patch simplifies logic behind deps_join(), which will be > important for the upcoming improvements of sched-deps.cc logging. > > The only functional change is that when deps_join() is called with > empty state for the 2nd argument, it will not reverse INSN_ and > EXPR_LISTs in the 1st argument. Before this patch the lists were > reversed due to use of concat_*_LIST(). Now, with copy_*_LIST() > used for this case, the lists will remain in the original order. > > gcc/ChangeLog: > > * lists.cc (copy_EXPR_LIST, concat_EXPR_LIST): New functions. > * rtl.h (copy_EXPR_LIST, concat_EXPR_LIST): Declare. > * sched-rgn.cc (concat_insn_list, concat_expr_list): New helpers. > (concat_insn_mem_list): Simplify. > (deps_join): Update > --- > gcc/lists.cc | 30 +++- > gcc/rtl.h| 4 +++- > gcc/sched-rgn.cc | 51 +++- > 3 files changed, 61 insertions(+), 24 deletions(-) > > diff --git a/gcc/lists.cc b/gcc/lists.cc > index 2cdf37ad533..83e7bf32176 100644 > --- a/gcc/lists.cc > +++ b/gcc/lists.cc > @@ -160,6 +160,24 @@ free_INSN_LIST_list (rtx_insn_list **listp) >free_list ((rtx *)listp, &unused_insn_list); > } > > +/* Make a copy of the EXPR_LIST list LINK and return it. */ > +rtx_expr_list * > +copy_EXPR_LIST (rtx_expr_list *link) > +{ > + rtx_expr_list *new_queue; > + rtx_expr_list **pqueue = &new_queue; > + > + for (; link; link = link->next ()) > +{ > + rtx x = link->element (); > + rtx_expr_list *newlink = alloc_EXPR_LIST (REG_NOTE_KIND (link), x, > NULL); > + *pqueue = newlink; > + pqueue = (rtx_expr_list **)&XEXP (newlink, 1); > +} > + *pqueue = NULL; > + return new_queue; > +} > + > /* Make a copy of the INSN_LIST list LINK and return it. */ > rtx_insn_list * > copy_INSN_LIST (rtx_insn_list *link) > @@ -178,12 +196,22 @@ copy_INSN_LIST (rtx_insn_list *link) >return new_queue; > } > > +/* Duplicate the EXPR_LIST elements of COPY and prepend them to OLD. */ > +rtx_expr_list * > +concat_EXPR_LIST (rtx_expr_list *copy, rtx_expr_list *old) > +{ > + rtx_expr_list *new_rtx = old; > + for (; copy; copy = copy->next ()) > +new_rtx = alloc_EXPR_LIST (REG_NOTE_KIND (copy), copy->element (), > new_rtx); > + return new_rtx; > +} > + > /* Duplicate the INSN_LIST elements of COPY and prepend them to OLD. */ > rtx_insn_list * > concat_INSN_LIST (rtx_insn_list *copy, rtx_insn_list *old) > { >rtx_insn_list *new_rtx = old; > - for (; copy ; copy = copy->next ()) > + for (; copy; copy = copy->next ()) > { >new_rtx = alloc_INSN_LIST (copy->insn (), new_rtx); >PUT_REG_NOTE_KIND (new_rtx, REG_NOTE_KIND (copy)); > diff --git a/gcc/rtl.h b/gcc/rtl.h > index e4b6cc0dbb5..7e952d7cbeb 100644 > --- a/gcc/rtl.h > +++ b/gcc/rtl.h > @@ -3764,10 +3764,12 @@ extern void free_EXPR_LIST_list (rtx_expr_list **); > extern void free_INSN_LIST_list (rtx_insn_list **); > extern void free_EXPR_LIST_node (rtx); > extern void free_INSN_LIST_node (rtx); > +extern rtx_expr_list *alloc_EXPR_LIST (int, rtx, rtx); > extern rtx_insn_list *alloc_INSN_LIST (rtx, rtx); > +extern rtx_expr_list *copy_EXPR_LIST (rtx_expr_list *); > extern rtx_insn_list *copy_INSN_LIST (rtx_insn_list *); > +extern rtx_expr_list *concat_EXPR_LIST (rtx_expr_list *, rtx_expr_list *); > extern rtx_insn_list *concat_INSN_LIST (rtx_insn_list *, rtx_insn_list *); > -extern rtx_expr_list *alloc_EXPR_LIST (int, rtx, rtx); > extern void remove_free_INSN_LIST_elem (rtx_insn *, rtx_insn_list **); > extern rtx remove_list_elem (rtx, rtx *); > extern rtx_insn *remove_free_INSN_LIST_node (rtx_insn_list **); > diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc > index e5964f54ead..da3ec0458ff 100644 > --- a/gcc/sched-rgn.cc > +++ b/gcc/sched-rgn.cc > @@ -2585,25 +2585,32 @@ add_branch_dependences (rtx_insn *head, rtx_insn > *tail) > > static class deps_desc *bb_deps; > > +/* Return a new insn_list with all the elements from the two input > lists. */ > +static rtx_insn_list * > +concat_insn_list (rtx_insn_list *copy, rtx_insn_list *old) > +{ > + if (!old) > +return copy_INSN_LIST (copy); > + return concat_INSN_LIST (copy, old); > +} > + > +/* Return a new expr_list with all the elements from the two input > lists. */ > +static rtx_expr_list * > +concat_expr_list (rtx_expr_list *cop
Re: [PATCH v3 4/8] Improve and fix sched-deps.cc: dump_dep() and dump_lists().
Dear scheduler maintainers, Gentle ping. This patch is borderline trivial and affects only the lucky few who debug sched-deps.cc code. OK to merge? On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov wrote: > Better propagate flags from dump_lists() into dump_dep() and > add a missing "]" in dump_lists(). > > gcc/ChangeLog: > > * sched-deps.cc (DUMP_DEP_PRO): Improve comment. > (dump_dep_flags): Remove. > (DUMP_LISTS_SIZE, DUMP_LISTS_DEPS, DUMP_LISTS_ALL): Continue > numbering from DUMP_DEP_* flags. > (dump_lists): Update and fix. > --- > gcc/sched-deps.cc | 21 +++-- > 1 file changed, 11 insertions(+), 10 deletions(-) > > diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc > index 005fc0f567e..4d357079a7a 100644 > --- a/gcc/sched-deps.cc > +++ b/gcc/sched-deps.cc > @@ -132,7 +132,8 @@ static void dump_ds (FILE *, ds_t); > /* Define flags for dump_dep (). */ > > /* Dump producer of the dependence. */ > -#define DUMP_DEP_PRO (2) > +#define DUMP_DEP_PRO (2) /* Reserve "1" for handling of DUMP_DEP_ALL and > + DUMP_LISTS_ALL. */ > > /* Dump consumer of the dependence. */ > #define DUMP_DEP_CON (4) > @@ -206,9 +207,6 @@ dump_dep (FILE *dump, dep_t dep, int flags) >fprintf (dump, ">"); > } > > -/* Default flags for dump_dep (). */ > -static int dump_dep_flags = (DUMP_DEP_PRO | DUMP_DEP_CON); > - > /* Dump all fields of DEP to STDERR. */ > void > sd_debug_dep (dep_t dep) > @@ -1454,19 +1452,20 @@ sd_delete_dep (sd_iterator_def sd_it) > } > > /* Dump size of the lists. */ > -#define DUMP_LISTS_SIZE (2) > +#define DUMP_LISTS_SIZE (32) /* (DUMP_DEP_STATUS << 1) */ > > /* Dump dependencies of the lists. */ > -#define DUMP_LISTS_DEPS (4) > +#define DUMP_LISTS_DEPS (64) > > /* Dump all information about the lists. */ > #define DUMP_LISTS_ALL (DUMP_LISTS_SIZE | DUMP_LISTS_DEPS) > > /* Dump deps_lists of INSN specified by TYPES to DUMP. > - FLAGS is a bit mask specifying what information about the lists needs > - to be printed. > + FLAGS is a bit mask specifying what information about the lists and > + the individual deps needs to be printed, this is a combination of > + DUMP_DEP_* and DUMP_LISTS_* flags. > If FLAGS has the very first bit set, then dump all information about > - the lists and propagate this bit into the callee dump functions. */ > + the lists and deps propagate this bit into the callee dump functions. > */ > static void > dump_lists (FILE *dump, rtx insn, sd_list_types_def types, int flags) > { > @@ -1488,10 +1487,12 @@ dump_lists (FILE *dump, rtx insn, > sd_list_types_def types, int flags) > { >FOR_EACH_DEP (insn, types, sd_it, dep) > { > - dump_dep (dump, dep, dump_dep_flags | all); > + dump_dep (dump, dep, flags | all); > fprintf (dump, " "); > } > } > + > + fprintf (dump, "]"); > } > > /* Dump all information about deps_lists of INSN specified by TYPES > -- > 2.34.1 > > -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH v3 5/8] Add a bit more logging scheduler's dependency analysis
Dear scheduler maintainers, Gentle ping. This is a trivial patch, which makes debugging sched-deps.cc slightly more enjoyable. On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov wrote: > gcc/ChangeLog: > > * sched-deps.cc (sd_add_dep, find_inc): Add logging about > dependency creation. > --- > gcc/sched-deps.cc | 30 ++ > 1 file changed, 26 insertions(+), 4 deletions(-) > > diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc > index 4d357079a7a..2a87158ba4b 100644 > --- a/gcc/sched-deps.cc > +++ b/gcc/sched-deps.cc > @@ -1342,6 +1342,13 @@ sd_add_dep (dep_t dep, bool resolved_p) > in the bitmap caches of dependency information. */ >if (true_dependency_cache != NULL) > set_dependency_caches (dep); > + > + if (sched_verbose >= 9) > +{ > + fprintf (sched_dump, "created dependency "); > + dump_dep (sched_dump, dep, 1); > + fprintf (sched_dump, "\n"); > +} > } > > /* Add or update backward dependence between INSN and ELEM > @@ -4879,18 +4886,33 @@ find_inc (struct mem_inc_info *mii, bool backwards) > we create. */ > gcc_assert (mii->inc_insn == inc_cand); > > + int n_deps_created = 0; > if (backwards) > { > FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep) > - add_dependence_1 (mii->mem_insn, DEP_PRO (dep), > - REG_DEP_TRUE); > + { > + add_dependence_1 (mii->mem_insn, DEP_PRO (dep), > + REG_DEP_TRUE); > + ++n_deps_created; > + } > } > else > { > FOR_EACH_DEP (mii->inc_insn, SD_LIST_FORW, sd_it, dep) > - add_dependence_1 (DEP_CON (dep), mii->mem_insn, > - REG_DEP_ANTI); > + { > + add_dependence_1 (DEP_CON (dep), mii->mem_insn, > + REG_DEP_ANTI); > + ++n_deps_created; > + } > } > + if (sched_verbose >= 6) > + fprintf (sched_dump, > +"created %d deps for mem_insn %d due to " > +"inc_insn %d %s deps\n", > + n_deps_created, INSN_UID (mii->mem_insn), > +INSN_UID (mii->inc_insn), > +backwards ? "backward" : "forward"); > + > return true; > } > next: > -- > 2.34.1 > > -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH v3 6/8] sched_deps.cc: Simplify initialization of dependency contexts
Dear scheduler maintainers, Gentle ping. This is a trivial cleanup. On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov wrote: > gcc/ChangeLog: > > * sched-deps.cc (init_deps, init_deps_reg_last): Simplify. > (free_deps): Remove useless code. > --- > gcc/sched-deps.cc | 13 - > 1 file changed, 4 insertions(+), 9 deletions(-) > > diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc > index 2a87158ba4b..e0d3c97d935 100644 > --- a/gcc/sched-deps.cc > +++ b/gcc/sched-deps.cc > @@ -3927,10 +3927,9 @@ init_deps (class deps_desc *deps, bool > lazy_reg_last) >int max_reg = (reload_completed ? FIRST_PSEUDO_REGISTER : max_reg_num > ()); > >deps->max_reg = max_reg; > - if (lazy_reg_last) > -deps->reg_last = NULL; > - else > -deps->reg_last = XCNEWVEC (struct deps_reg, max_reg); > + deps->reg_last = NULL; > + if (!lazy_reg_last) > +init_deps_reg_last (deps); >INIT_REG_SET (&deps->reg_last_in_use); > >deps->pending_read_insns = 0; > @@ -3961,9 +3960,7 @@ init_deps (class deps_desc *deps, bool lazy_reg_last) > void > init_deps_reg_last (class deps_desc *deps) > { > - gcc_assert (deps && deps->max_reg > 0); > - gcc_assert (deps->reg_last == NULL); > - > + gcc_assert (deps && deps->max_reg > 0 && deps->reg_last == NULL); >deps->reg_last = XCNEWVEC (struct deps_reg, deps->max_reg); > } > > @@ -4013,8 +4010,6 @@ free_deps (class deps_desc *deps) > it at all. */ >free (deps->reg_last); >deps->reg_last = NULL; > - > - deps = NULL; > } > > /* Remove INSN from dependence contexts DEPS. */ > -- > 2.34.1 > > -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH v3 7/8] Improve logging of register data in scheduler dependency analysis
Dear scheduler maintainers, Gentle ping. This patch improves debugging output, it does not touch scheduling logic. On Wed, 22 Nov 2023 at 15:15, Maxim Kuvyrkov wrote: > Scheduler dependency analysis uses two main data structures: > 1. reg_pending_* data contains effects of INSN on the register file, >which is then incorporated into ... > 2. deps_desc object, which contains commulative information about all >instructions processed from deps_desc object's initialization. > > This patch adds debug dumping of (1). > > gcc/ChangeLog: > > * sched-deps.cc (print-rtl.h): Include for str_pattern_slim(). > (dump_reg_pending_data): New function. > (sched_analyze_insn): Use it. > --- > gcc/sched-deps.cc | 90 ++- > 1 file changed, 89 insertions(+), 1 deletion(-) > > diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc > index e0d3c97d935..f9290c82fd2 100644 > --- a/gcc/sched-deps.cc > +++ b/gcc/sched-deps.cc > @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. If not see > #include "sched-int.h" > #include "cselib.h" > #include "function-abi.h" > +#include "print-rtl.h" > > #ifdef INSN_SCHEDULING > > @@ -432,10 +433,24 @@ dep_spec_p (dep_t dep) >return false; > } > > +/* These regsets describe how a single instruction affects registers. > + Their "life-time" is restricted to a single call of > sched_analyze_insn(). > + They are populated by sched_analyze_1() and sched_analyze_2(), and > + then sched_analyze_insn() transfers data from these into > deps->reg_last[i]. > + Near the end sched_analyze_insn() clears these regsets for the next > + insn. */ > static regset reg_pending_sets; > static regset reg_pending_clobbers; > static regset reg_pending_uses; > static regset reg_pending_control_uses; > + > +/* Similar to reg_pending_* regsets, this variable specifies whether > + the current insn analyzed by sched_analyze_insn() is a scheduling > + barrier that should "split" dependencies inside a block. Internally > + sched-deps.cc does this by pretending that the barrier insn uses and > + sets all registers. > + Near the end sched_analyze_insn() transfers barrier info from this > variable > + into deps->last_reg_pending_barrier. */ > static enum reg_pending_barrier_mode reg_pending_barrier; > > /* Hard registers implicitly clobbered or used (or may be implicitly > @@ -2880,7 +2895,77 @@ get_implicit_reg_pending_clobbers (HARD_REG_SET > *temp, rtx_insn *insn) >*temp &= ~ira_no_alloc_regs; > } > > -/* Analyze an INSN with pattern X to find all dependencies. */ > +/* Dump state of reg_pending_* data for debug purposes. > + Dump only non-empty data to reduce log clobber. */ > +static void > +dump_reg_pending_data (FILE *file, rtx_insn *insn) > +{ > + fprintf (file, "\n"); > + fprintf (file, ";; sched_analysis after insn %d: %s\n", > + INSN_UID (insn), str_pattern_slim (PATTERN (insn))); > + > + if (!REG_SET_EMPTY_P (reg_pending_sets) > + || !REG_SET_EMPTY_P (reg_pending_clobbers) > + || !REG_SET_EMPTY_P (reg_pending_uses) > + || !REG_SET_EMPTY_P (reg_pending_control_uses)) > +{ > + fprintf (file, ";; insn reg"); > + if (!REG_SET_EMPTY_P (reg_pending_sets)) > + { > + fprintf (file, " sets("); > + dump_regset (reg_pending_sets, file); > + fprintf (file, ")"); > + } > + if (!REG_SET_EMPTY_P (reg_pending_clobbers)) > + { > + fprintf (file, " clobbers("); > + dump_regset (reg_pending_clobbers, file); > + fprintf (file, ")"); > + } > + if (!REG_SET_EMPTY_P (reg_pending_uses)) > + { > + fprintf (file, " uses("); > + dump_regset (reg_pending_uses, file); > + fprintf (file, ")"); > + } > + if (!REG_SET_EMPTY_P (reg_pending_control_uses)) > + { > + fprintf (file, " control("); > + dump_regset (reg_pending_control_uses, file); > + fprintf (file, ")"); > + } > + fprintf (file, "\n"); > +} > + > + if (reg_pending_barrier) > +fprintf (file, ";; insn reg barrier: %d\n", reg_pending_barrier); > + > + if (!hard_reg_set_empty_p (implicit_reg_pending_clobbers) > + || !hard_reg_set_empty_p (implicit_reg_pending_uses)) > +{ > + fprintf (file, ";; insn reg"); > + if (!hard_reg_set_empty_p (implicit_reg_pending_clobbers)) > + { >
Re: [PATCH v3 8/8] Improve logging of scheduler dependency analysis context
Dear scheduler maintainers, Gentle ping. This patch improves debugging output, it does not touch scheduling logic. On Wed, 22 Nov 2023 at 15:15, Maxim Kuvyrkov wrote: > Scheduler dependency analysis uses two main data structures: > 1. reg_pending_* data contains effects of INSN on the register file, >which is then incorporated into ... > 2. deps_desc object, which contains commulative information about all >instructions processed from deps_desc object's initialization. > > This patch adds debug dumping of (2). > > Dependency analysis contexts (aka deps_desc objects) are huge, but > each instruction affects only a small amount of data in these objects. > Therefore, it is most useful to dump differential information > compared to the dependency state after previous instruction. > > gcc/ChangeLog: > > * sched-deps.cc (reset_deps, dump_rtx_insn_list) > (rtx_insn_list_same_p): New helper functions. > (dump_deps_desc_diff): New function to dump dependency information. > (sched_analysis_prev_deps): New static variable. > (sched_analyze_insn): Dump dependency information. > (init_deps_global, finish_deps_global): Handle > sched_analysis_prev_deps. > * sched-int.h (struct deps_reg): Update comments. > * sched-rgn.cc (concat_insn_list, concat_expr_list): Update > comments. > --- > gcc/sched-deps.cc | 197 ++ > gcc/sched-int.h | 9 ++- > gcc/sched-rgn.cc | 5 ++ > 3 files changed, 210 insertions(+), 1 deletion(-) > > diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc > index f9290c82fd2..edca9927e23 100644 > --- a/gcc/sched-deps.cc > +++ b/gcc/sched-deps.cc > @@ -1677,6 +1677,15 @@ delete_all_dependences (rtx_insn *insn) > sd_delete_dep (sd_it); > } > > +/* Re-initialize existing dependency context DEPS to be a copy of FROM. > */ > +static void > +reset_deps (class deps_desc *deps, class deps_desc *from) > +{ > + free_deps (deps); > + init_deps (deps, false); > + deps_join (deps, from); > +} > + > /* All insns in a scheduling group except the first should only have > dependencies on the previous insn in the group. So we find the > first instruction in the scheduling group by walking the dependence > @@ -2960,6 +2969,177 @@ dump_reg_pending_data (FILE *file, rtx_insn *insn) > } > } > > +/* Dump rtx_insn_list LIST. > + Consider moving to lists.cc if there are users outside of > sched-deps.cc. */ > +static void > +dump_rtx_insn_list (FILE *file, rtx_insn_list *list) > +{ > + for (; list; list = list->next ()) > +fprintf (file, " %d", INSN_UID (list->insn ())); > +} > + > +/* Return TRUE if lists A and B have same elements in the same order. */ > +static bool > +rtx_insn_list_same_p (rtx_insn_list *a, rtx_insn_list *b) > +{ > + for (; a && b; a = a->next (), b = b->next ()) > +if (a->insn () != b->insn ()) > + return false; > + > + if (a || b) > +return false; > + > + return true; > +} > + > +/* Dump parts of DEPS that are different from PREV. > + Dumping all information from dependency context produces huge > + hard-to-analize logs; differential dumping is way more managable. */ > +static void > +dump_deps_desc_diff (FILE *file, class deps_desc *deps, class deps_desc > *prev) > +{ > + /* Each "paragraph" is a single line of output. */ > + > + /* Note on param_max_pending_list_length: > + During normal dependency analysis various lists should not exceed > this > + limit. Searching for "!!!" in scheduler logs can point to potential > bugs > + or poorly-handled corner-cases. */ > + > + if (!rtx_insn_list_same_p (deps->pending_read_insns, > +prev->pending_read_insns)) > +{ > + fprintf (file, ";; deps pending mem reads length(%d):", > + deps->pending_read_list_length); > + if ((deps->pending_read_list_length + > deps->pending_write_list_length) > + >= param_max_pending_list_length) > + fprintf (file, "%d insns!!!", deps->pending_read_list_length); > + else > + dump_rtx_insn_list (file, deps->pending_read_insns); > + fprintf (file, "\n"); > +} > + > + if (!rtx_insn_list_same_p (deps->pending_write_insns, > +prev->pending_write_insns)) > +{ > + fprintf (file, ";; deps pending mem writes length(%d):", > + deps->pending_write_list_length); > + if ((deps->pending_read_list_length + > deps->pendin
Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
> On Jan 17, 2024, at 10:51, Richard Biener wrote: > > On Tue, Jan 16, 2024 at 3:52 PM Jeff Law wrote: >> >> >> >> On 1/15/24 05:56, Maxim Kuvyrkov wrote: >>> Hi Vladimir, >>> Hi Jeff, >>> >>> Richard and Alexander have reviewed this patch and [I assume] have no >>> further comments. OK to merge? >> I think the question is whether or not we're too late. I know that >> Richard S has held off on his late-combine pass and I'm holding off on >> the ext-dce work due to the fact that we're well past stage1 close. >> >> I think the release managers ought to have the final say on this. > > I'm fine with this now, it doesn't change code generation. Thanks, Richard. I'll merge the fix for PR96388 and PR111554 -- patch 1/8. I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens. Regards, -- Maxim Kuvyrkov https://www.linaro.org
Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
> On Jan 17, 2024, at 19:02, Richard Biener wrote: > > On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov > wrote: >> >>> On Jan 17, 2024, at 10:51, Richard Biener >>> wrote: >>> >>> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law wrote: >>>> >>>> >>>> >>>> On 1/15/24 05:56, Maxim Kuvyrkov wrote: >>>>> Hi Vladimir, >>>>> Hi Jeff, >>>>> >>>>> Richard and Alexander have reviewed this patch and [I assume] have no >>>>> further comments. OK to merge? >>>> I think the question is whether or not we're too late. I know that >>>> Richard S has held off on his late-combine pass and I'm holding off on >>>> the ext-dce work due to the fact that we're well past stage1 close. >>>> >>>> I think the release managers ought to have the final say on this. >>> >>> I'm fine with this now, it doesn't change code generation. >> >> Thanks, Richard. >> >> I'll merge the fix for PR96388 and PR111554 -- patch 1/8. I'll commit >> cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when >> stage1 opens. > > This seems to have caused a compare-debug bootstrap issue on x86_64-linux, > > gcc/fortran/f95-lang.o differs > > does n_mem_deps or n_inc_deps include debug insns? Thanks, investigating. -- Maxim Kuvyrkov https://www.linaro.org
[PATCH v1] Fix compare-debug bootstrap failure
... caused by scheduler fix for PR96388 and PR111554. This patch adjusts decision sched-deps.cc:find_inc() to use length of dependency lists sans any DEBUG_INSN instructions. gcc/ChangeLog: * haifa-sched.cc (dep_list_size): Make global. * sched-deps.cc (find_inc): Use instead of sd_lists_size(). * sched-int.h (dep_list_size): Declare. --- gcc/haifa-sched.cc | 8 ++-- gcc/sched-deps.cc | 6 +++--- gcc/sched-int.h| 2 ++ 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc index 49ee589aed7..1bc610f9a5f 100644 --- a/gcc/haifa-sched.cc +++ b/gcc/haifa-sched.cc @@ -1560,8 +1560,7 @@ contributes_to_priority_p (dep_t dep) } /* Compute the number of nondebug deps in list LIST for INSN. */ - -static int +int dep_list_size (rtx_insn *insn, sd_list_types_def list) { sd_iterator_def sd_it; @@ -1571,6 +1570,11 @@ dep_list_size (rtx_insn *insn, sd_list_types_def list) if (!MAY_HAVE_DEBUG_INSNS) return sd_lists_size (insn, list); + /* TODO: We should split normal and debug insns into separate SD_LIST_* + sub-lists, and then we'll be able to use something like + sd_lists_size(insn, list & SD_LIST_NON_DEBUG) + instead of walking dependencies below. */ + FOR_EACH_DEP (insn, list, sd_it, dep) { if (DEBUG_INSN_P (DEP_CON (dep))) diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc index 0615007c560..5034e664e5e 100644 --- a/gcc/sched-deps.cc +++ b/gcc/sched-deps.cc @@ -4791,7 +4791,7 @@ find_inc (struct mem_inc_info *mii, bool backwards) sd_iterator_def sd_it; dep_t dep; sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW; - int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps); + int n_mem_deps = dep_list_size (mii->mem_insn, mem_deps); sd_it = sd_iterator_start (mii->mem_insn, mem_deps); while (sd_iterator_cond (&sd_it, &dep)) @@ -4808,12 +4808,12 @@ find_inc (struct mem_inc_info *mii, bool backwards) if (backwards) { inc_cand = pro; - n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK); + n_inc_deps = dep_list_size (inc_cand, SD_LIST_BACK); } else { inc_cand = con; - n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW); + n_inc_deps = dep_list_size (inc_cand, SD_LIST_FORW); } /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps diff --git a/gcc/sched-int.h b/gcc/sched-int.h index ab784fe0d17..4df092013e9 100644 --- a/gcc/sched-int.h +++ b/gcc/sched-int.h @@ -1677,6 +1677,8 @@ extern void sd_copy_back_deps (rtx_insn *, rtx_insn *, bool); extern void sd_delete_dep (sd_iterator_def); extern void sd_debug_lists (rtx, sd_list_types_def); +extern int dep_list_size (rtx_insn *, sd_list_types_def); + /* Macros and declarations for scheduling fusion. */ #define FUSION_MAX_PRIORITY (INT_MAX) extern bool sched_fusion; -- 2.34.1
Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
> On Jan 17, 2024, at 19:05, Maxim Kuvyrkov wrote: > >> On Jan 17, 2024, at 19:02, Richard Biener wrote: >> >> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov >> wrote: >>> >>>> On Jan 17, 2024, at 10:51, Richard Biener >>>> wrote: >>>> >>>> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law wrote: >>>>> >>>>> >>>>> >>>>> On 1/15/24 05:56, Maxim Kuvyrkov wrote: >>>>>> Hi Vladimir, >>>>>> Hi Jeff, >>>>>> >>>>>> Richard and Alexander have reviewed this patch and [I assume] have no >>>>>> further comments. OK to merge? >>>>> I think the question is whether or not we're too late. I know that >>>>> Richard S has held off on his late-combine pass and I'm holding off on >>>>> the ext-dce work due to the fact that we're well past stage1 close. >>>>> >>>>> I think the release managers ought to have the final say on this. >>>> >>>> I'm fine with this now, it doesn't change code generation. >>> >>> Thanks, Richard. >>> >>> I'll merge the fix for PR96388 and PR111554 -- patch 1/8. I'll commit >>> cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when >>> stage1 opens. >> >> This seems to have caused a compare-debug bootstrap issue on x86_64-linux, >> >> gcc/fortran/f95-lang.o differs >> >> does n_mem_deps or n_inc_deps include debug insns? > > Thanks, investigating. Hi Richard, Yes, both n_mem_deps or n_inc_deps include debug insns, I posted a patch for this in https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643267.html . Testing it now. If you prefer, I can revert the fix for PR96388 and PR111554. Kind regards, -- Maxim Kuvyrkov https://www.linaro.org
[PATCH] Make gcc.target/arm/bics_3.c testcase a bit more generic [PR113542]
After fwprop improvement in r14-8319-g86de9b66480, codegen in bics_3.c test changed from "bics" to "bic" instruction, with the overall instruction stream remaining at the same quality. This patch makes the scan-assembler directive accept both "bics" and "bic". BEFORE r14-8319-g86de9b66480: bicsr0, r0, r1 @ 9 [c=4 l=4] *andsi_notsi_si_compare0_scratch mov r0, #1 @ 23[c=4 l=4] *thumb2_movsi_vfp/1 it eq moveq r0, #0 @ 26[c=8 l=4] *p *thumb2_movsi_vfp/2 bx lr @ 29[c=8 l=4] *thumb2_return AFTER r14-8319-g86de9b66480: bic r0, r0, r1 @ 8 [c=4 l=4] andsi_notsi_si subsr0, r0, #0 @ 22[c=4 l=4] cmpsi2_addneg/0 it ne movne r0, #1 @ 23[c=8 l=4] *p *thumb2_movsi_vfp/2 bx lr @ 26[c=8 l=4] *thumb2_return gcc/testsuite/ChangeLog: PR target/113542 * gcc.target/arm/bics_3.c: Update scan-assembler directive. --- gcc/testsuite/gcc.target/arm/bics_3.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/arm/bics_3.c b/gcc/testsuite/gcc.target/arm/bics_3.c index e056b264e15..c5bed3c92d2 100644 --- a/gcc/testsuite/gcc.target/arm/bics_3.c +++ b/gcc/testsuite/gcc.target/arm/bics_3.c @@ -35,6 +35,6 @@ main (void) return 0; } -/* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" 2 } } */ -/* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, .sl #2" 1 } } */ +/* { dg-final { scan-assembler-times "bics?\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" 2 } } */ +/* { dg-final { scan-assembler-times "bics?\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, .sl #2" 1 } } */ -- 2.34.1
Re: [PATCH] tree-optimization/113576 - non-empty latch and may_be_zero vectorization
> On Jan 24, 2024, at 18:40, Richard Biener wrote: > > We can't support niters with may_be_zero when we end up with a > non-empty latch due to early exit peeling. At least not in > the simplistic way the vectorizer handles this now. Disallow > it again for exits that are not the last one. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > PR tree-optimization/113576 > * tree-vect-loop.cc (vec_init_loop_exit_info): Only allow > exits with may_be_zero niters when its the last one. > > * gcc.dg/vect/pr113576.c: New testcase. > --- > gcc/testsuite/gcc.dg/vect/pr113576.c | 157 +++ > gcc/tree-vect-loop.cc| 9 +- > 2 files changed, 164 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/pr113576.c > > diff --git a/gcc/testsuite/gcc.dg/vect/pr113576.c > b/gcc/testsuite/gcc.dg/vect/pr113576.c > new file mode 100644 > index 000..da5ddb09e33 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/pr113576.c > @@ -0,0 +1,157 @@ > +/* { dg-do run } */ > +/* { dg-options "-O3" } */ > +/* { dg-additional-options "-march=skylake-avx512" } */ Please adjust the testcase; this fails on non-x86_64 targets, see [1] and [2]. [1] https://patchwork.sourceware.org/project/gcc/patch/20240124144159.51c503858...@sourceware.org/ [2] https://ci.linaro.org/job/tcwg_gcc_check--master-aarch64-precommit/4765/artifact/artifacts/artifacts.precommit/00-sumfiles/gcc.log.1.xz Thanks! -- Maxim Kuvyrkov https://www.linaro.org
Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]
Hi Kewen, Below are my comments. I don't want to override Alexander's review, and if the patch looks good to him, it's fine to ignore my concerns. My main concern is that this adds a new entity -- forceful skipping of DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change in behavior. Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is already quite a bit of logic in the scheduler to skip them _as part of normal operation_. Would you please consider 2 ideas below. #1: After a brief look, I'm guessing this part is causing the problem: haifa-sched.cc <http://haifa-sched.cc/>:schedule_block(): === [1] /* Loop until all the insns in BB are scheduled. */ while ((*current_sched_info->schedule_more_p) ()) { perform_replacements_new_cycle (); do { start_clock_var = clock_var; clock_var++; advance_one_cycle (); === and then in the nested loop we have === [2] /* We don't want md sched reorder to even see debug isns, so put them out right away. */ if (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0)) && (*current_sched_info->schedule_more_p) ()) { while (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0))) { rtx_insn *insn = ready_remove_first (&ready); gcc_assert (DEBUG_INSN_P (insn)); (*current_sched_info->begin_schedule_ready) (insn); scheduled_insns.safe_push (insn); last_scheduled_insn = insn; advance = schedule_insn (insn); gcc_assert (advance == 0); if (ready.n_ready > 0) ready_sort (&ready); } } === . At the [1] point we already have sorted ready list, and I don't see any blockers to doing [2] before calling advance_one_cycle(). #2 Another approach, which might be even easier, is to save the state of DFA before the initial advance_one_cycle(), and then restore it if no real insns have been scheduled. Kind regards, -- Maxim Kuvyrkov https://www.linaro.org > On Nov 8, 2023, at 06:49, Kewen.Lin wrote: > > Hi, > > Gentle ping this: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html > > BR, > Kewen > > on 2023/10/25 10:45, Kewen.Lin wrote: >> Hi, >> >> This is almost a repost for v2 which was posted at[1] in March >> excepting for: >> 1) rebased from r14-4810 which is relatively up-to-date, >> some conflicts on "int to bool" return type change have >> been resolved; >> 2) adjust commit log a bit; >> 3) fix misspelled "articial" with "artificial" somewhere; >> >> -- >> *v2 comments*: >> >> By addressing Alexander's comments, against v1 this >> patch v2 mainly: >> >> - Rename no_real_insns_p to no_real_nondebug_insns_p; >> - Introduce enum rgn_bb_deps_free_action for three >>kinds of actions to free deps; >> - Change function free_deps_for_bb_no_real_insns_p to >>resolve_forw_deps which only focuses on forward deps; >> - Extend the handlings to cover dbg-cnt sched_block, >>add one test case for it; >> - Move free_trg_info call in schedule_region to an >>appropriate place. >> >> One thing I'm not sure about is the change in function >> sched_rgn_local_finish, currently the invocation to >> sched_rgn_local_free is guarded with !sel_sched_p (), >> so I just follow it, but the initialization of those >> structures (in sched_rgn_local_init) isn't guarded >> with !sel_sched_p (), it looks odd. >> >> -- >> >> As PR108273 shows, when there is one block which only has >> NOTE_P and LABEL_P insns at non-debug mode while has some >> extra DEBUG_INSN_P insns at debug mode, after scheduling >> it, the DFA states would be different between debug mode >> and non-debug mode. Since at non-debug mode, the block >> meets no_real_insns_p, it gets skipped; while at debug >> mode, it gets scheduled, even it only has NOTE_P, LABEL_P >> and DEBUG_INSN_P, the call of function advance_one_cycle >> will change the DFA state. PR108519 also shows this issue >> can be exposed by some scheduler changes. >> >> This patch is to change function no_real_insns_p to >> function no_real_nondebug_insns_p by taking debug insn into >> account, which make us not try to schedule for the block >> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns, >> resulting in consistent DFA states between non-debug and >> debug mod
[PATCH 0/1] Avoid exponential behavior in scheduler and better logging
Hi, This patch series fixes exponential behavior in scheduler's find_modifiable_mems(). This fixes PRs [1] and [2], which are compilation time and memory hogs. The first patch in the series is the actual fix (bootstrapped and regtested on aarch64-linux-gnu), and follow up patches will be improvements to scheduler's logging infrastructure, that enabled me to debug this problem. As-is, the scheduler has good logging of the actual scheduling process, but calculation of instruction dependencies has almost no logging. Please don't delay review of the PRs fix to wait for logging patches to be published, as it'll take me another week to clean them up. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111554 Maxim Kuvyrkov (1): sched-deps.cc (find_modifiable_mems): Avoid exponential behavior gcc/sched-deps.cc | 45 + 1 file changed, 41 insertions(+), 4 deletions(-) -- 2.34.1
[PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
This patch avoids sched-deps.cc:find_inc() creating exponential number of dependencies, which become memory and compilation time hogs. Consider example (simplified from PR96388) ... === sp=sp-4 // sp_insnA mem_insnA1[sp+A1] ... mem_insnAN[sp+AN] sp=sp-4 // sp_insnB mem_insnB1[sp+B1] ... mem_insnBM[sp+BM] === ... in this example find_modifiable_mems() will arrange for mem_insnA* to be able to pass sp_insnA, and, while doing this, will create dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB is a consumer of sp_insnA. After this sp_insnB will have N new backward dependencies. Then find_modifiable_mems() gets to mem_insnB*s and starts to create N new dependencies for _every_ mem_insnB*. This gets us N*M new dependencies. In PR96833's testcase N and M are 10k-15k, which causes RAM usage of 30GB and compilation time of 30 minutes, with sched2 accounting for 95% of both metrics. After this patch the RAM usage is down to 1GB and compilation time is down to 3-4 minutes, with sched2 no longer standing out on -ftime-report or memory usage. gcc/ChangeLog: PR rtl-optimization/96388 PR rtl-optimization/111554 * sched-deps.cc (find_inc): Avoid exponential behavior. --- gcc/sched-deps.cc | 45 + 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc index c23218890f3..397bb9fd462 100644 --- a/gcc/sched-deps.cc +++ b/gcc/sched-deps.cc @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn *insn, bool before_mem) /* Once a suitable mem reference has been found and the corresponding data in MII has been filled in, this function is called to find a suitable add or inc insn involving the register we found in the memory - reference. */ + reference. + If successful, this function will create additional dependencies between + - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards) + - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards). +*/ static bool find_inc (struct mem_inc_info *mii, bool backwards) { sd_iterator_def sd_it; dep_t dep; + sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW; + int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps); - sd_it = sd_iterator_start (mii->mem_insn, -backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW); + sd_it = sd_iterator_start (mii->mem_insn, mem_deps); while (sd_iterator_cond (&sd_it, &dep)) { dep_node_t node = DEP_LINK_NODE (*sd_it.linkp); rtx_insn *pro = DEP_PRO (dep); rtx_insn *con = DEP_CON (dep); - rtx_insn *inc_cand = backwards ? pro : con; + rtx_insn *inc_cand; + int n_inc_deps; + + if (backwards) + { + inc_cand = pro; + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK); + } + else + { + inc_cand = con; + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW); + } + + /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps +for mem_insn. This by itself is not a problem, since each mem_insn +will have only a few inc_insns associated with it. However, if +we consider that a single inc_insn may have a lot of mem_insns, AND, +on top of that, a few other inc_insns associated with it -- +those _other inc_insns_ will get (n_mem_deps * number of MEM insns) +dependencies created for them. This may cause an exponential +growth of memory usage and scheduling time. +See PR96388 for details. +We [heuristically] use n_inc_deps as a proxy for the number of MEM +insns, and drop opportunities for breaking modifiable_mem dependencies +when dependency lists grow beyond reasonable size. */ + if (n_mem_deps * n_inc_deps + >= param_max_pending_list_length * param_max_pending_list_length) + goto next; + if (DEP_NONREG (dep) || DEP_MULTIPLE (dep)) goto next; + if (parse_add_or_inc (mii, inc_cand, backwards)) { struct dep_replacement *desc; @@ -4838,6 +4873,8 @@ find_inc (struct mem_inc_info *mii, bool backwards) desc->insn = mii->mem_insn; move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con), INSN_SPEC_BACK_DEPS (con)); + + gcc_assert (mii->inc_insn == inc_cand); if (backwards) { FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep) -- 2.34.1
Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
> On Nov 20, 2023, at 17:09, Richard Biener wrote: > > On Mon, Nov 20, 2023 at 1:08 PM Maxim Kuvyrkov > wrote: >> >> This patch avoids sched-deps.cc:find_inc() creating exponential number >> of dependencies, which become memory and compilation time hogs. >> Consider example (simplified from PR96388) ... >> === >> sp=sp-4 // sp_insnA >> mem_insnA1[sp+A1] >> ... >> mem_insnAN[sp+AN] >> sp=sp-4 // sp_insnB >> mem_insnB1[sp+B1] >> ... >> mem_insnBM[sp+BM] >> === >> ... in this example find_modifiable_mems() will arrange for mem_insnA* >> to be able to pass sp_insnA, and, while doing this, will create >> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB >> is a consumer of sp_insnA. After this sp_insnB will have N new >> backward dependencies. >> Then find_modifiable_mems() gets to mem_insnB*s and starts to create >> N new dependencies for _every_ mem_insnB*. This gets us N*M new >> dependencies. >> >> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of >> 30GB and compilation time of 30 minutes, with sched2 accounting for >> 95% of both metrics. After this patch the RAM usage is down to 1GB >> and compilation time is down to 3-4 minutes, with sched2 no longer >> standing out on -ftime-report or memory usage. >> >> gcc/ChangeLog: >> >>PR rtl-optimization/96388 >>PR rtl-optimization/111554 >>* sched-deps.cc (find_inc): Avoid exponential behavior. >> --- >> gcc/sched-deps.cc | 45 + >> 1 file changed, 41 insertions(+), 4 deletions(-) >> >> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc >> index c23218890f3..397bb9fd462 100644 >> --- a/gcc/sched-deps.cc >> +++ b/gcc/sched-deps.cc >> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn >> *insn, bool before_mem) >> /* Once a suitable mem reference has been found and the corresponding data >>in MII has been filled in, this function is called to find a suitable >>add or inc insn involving the register we found in the memory >> - reference. */ >> + reference. >> + If successful, this function will create additional dependencies between >> + - mii->inc_insn's producers and mii->mem_insn as a consumer (if >> backwards) >> + - mii->inc_insn's consumers and mii->mem_insn as a producer (if >> !backwards). >> +*/ >> >> static bool >> find_inc (struct mem_inc_info *mii, bool backwards) >> { >> sd_iterator_def sd_it; >> dep_t dep; >> + sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW; >> + int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps); >> >> - sd_it = sd_iterator_start (mii->mem_insn, >> -backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW); >> + sd_it = sd_iterator_start (mii->mem_insn, mem_deps); >> while (sd_iterator_cond (&sd_it, &dep)) >> { >> dep_node_t node = DEP_LINK_NODE (*sd_it.linkp); >> rtx_insn *pro = DEP_PRO (dep); >> rtx_insn *con = DEP_CON (dep); >> - rtx_insn *inc_cand = backwards ? pro : con; >> + rtx_insn *inc_cand; >> + int n_inc_deps; >> + >> + if (backwards) >> + { >> + inc_cand = pro; >> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK); >> + } >> + else >> + { >> + inc_cand = con; >> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW); >> + } >> + >> + /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps >> +for mem_insn. This by itself is not a problem, since each mem_insn >> +will have only a few inc_insns associated with it. However, if >> +we consider that a single inc_insn may have a lot of mem_insns, AND, >> +on top of that, a few other inc_insns associated with it -- >> +those _other inc_insns_ will get (n_mem_deps * number of MEM insns) >> +dependencies created for them. This may cause an exponential >> +growth of memory usage and scheduling time. >> +See PR96388 for details. >> +We [heuristically] use n_inc_deps as a proxy for the number of MEM >> +insns, and drop opportunities for breaking modifiable_mem >> dependencies >> +when dependency lists grow beyond reasonable size. */ >> + if (n_mem_deps * n_inc_deps >> + >= pa
Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
> On Nov 20, 2023, at 17:52, Alexander Monakov wrote: > > > On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote: > >> This patch avoids sched-deps.cc:find_inc() creating exponential number >> of dependencies, which become memory and compilation time hogs. >> Consider example (simplified from PR96388) ... >> === >> sp=sp-4 // sp_insnA >> mem_insnA1[sp+A1] >> ... >> mem_insnAN[sp+AN] >> sp=sp-4 // sp_insnB >> mem_insnB1[sp+B1] >> ... >> mem_insnBM[sp+BM] >> === >> ... in this example find_modifiable_mems() will arrange for mem_insnA* >> to be able to pass sp_insnA, and, while doing this, will create >> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB >> is a consumer of sp_insnA. After this sp_insnB will have N new >> backward dependencies. >> Then find_modifiable_mems() gets to mem_insnB*s and starts to create >> N new dependencies for _every_ mem_insnB*. This gets us N*M new >> dependencies. [For avoidance of doubt, below discussion is about the general implementation of find_modifiable_mems() and not about the patch.] > > It's a bit hard to read this without knowing which value of 'backwards' > is assumed. > > Say 'backwards' is true and we are inspecting producer sp_insnB of mem_insnB1. > This is a true dependency. We know we can break it by adjusting B1 by -4, but > we need to be careful not to move such modified mem_insnB1 above sp_insnA, so > we need to iterate over *incoming true dependencies* of sp_insnB and add them. > > But the code seems to be iterating over *all incoming dependencies*, so it > will e.g. take anti-dependency mem_insnA1 -> sp_insnB and create a true > dependency mem_insnA1 -> mem_insnB1'. This seems utterly inefficient, if my > understanding is correct. Yeap, your understanding is correct. However, this is what find_modifiable_mems() has to do to avoid complicated analysis of second-level dependencies. I think, the optimization that find_modifiable_mems() does can be implemented as part of main dependency analysis, and I'm going to discuss that with Bernd. I might be missing something, but it seems that instruction transformations that find_modifiable_mems() is doing are simpler than what ia64 speculation is doing. So, I think, we should be able to leverage speculation infrastructure, rather than doing this optimization "on the side" of normal scheduling. -- Maxim Kuvyrkov https://www.linaro.org
[PATCH v2] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
This patch avoids sched-deps.cc:find_inc() creating exponential number of dependencies, which become memory and compilation time hogs. Consider example (simplified from PR96388) ... === sp=sp-4 // sp_insnA mem_insnA1[sp+A1] ... mem_insnAN[sp+AN] sp=sp-4 // sp_insnB mem_insnB1[sp+B1] ... mem_insnBM[sp+BM] === ... in this example find_modifiable_mems() will arrange for mem_insnA* to be able to pass sp_insnA, and, while doing this, will create dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB is a consumer of sp_insnA. After this sp_insnB will have N new backward dependencies. Then find_modifiable_mems() gets to mem_insnB*s and starts to create N new dependencies for _every_ mem_insnB*. This gets us N*M new dependencies. In PR96833's testcase N and M are 10k-15k, which causes RAM usage of 30GB and compilation time of 30 minutes, with sched2 accounting for 95% of both metrics. After this patch the RAM usage is down to 1GB and compilation time is down to 3-4 minutes, with sched2 no longer standing out on -ftime-report or memory usage. gcc/ChangeLog: PR rtl-optimization/96388 PR rtl-optimization/111554 * sched-deps.cc (find_inc): Avoid exponential behavior. --- gcc/sched-deps.cc | 45 + 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc index c23218890f3..a286ff319e2 100644 --- a/gcc/sched-deps.cc +++ b/gcc/sched-deps.cc @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn *insn, bool before_mem) /* Once a suitable mem reference has been found and the corresponding data in MII has been filled in, this function is called to find a suitable add or inc insn involving the register we found in the memory - reference. */ + reference. + If successful, this function will create additional dependencies between + - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards) + - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards). +*/ static bool find_inc (struct mem_inc_info *mii, bool backwards) { sd_iterator_def sd_it; dep_t dep; + sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW; + int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps); - sd_it = sd_iterator_start (mii->mem_insn, -backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW); + sd_it = sd_iterator_start (mii->mem_insn, mem_deps); while (sd_iterator_cond (&sd_it, &dep)) { dep_node_t node = DEP_LINK_NODE (*sd_it.linkp); rtx_insn *pro = DEP_PRO (dep); rtx_insn *con = DEP_CON (dep); - rtx_insn *inc_cand = backwards ? pro : con; + rtx_insn *inc_cand; + int n_inc_deps; + if (DEP_NONREG (dep) || DEP_MULTIPLE (dep)) goto next; + + if (backwards) + { + inc_cand = pro; + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK); + } + else + { + inc_cand = con; + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW); + } + + /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps +for mem_insn. This by itself is not a problem, since each mem_insn +will have only a few inc_insns associated with it. However, if +we consider that a single inc_insn may have a lot of mem_insns, AND, +on top of that, a few other inc_insns associated with it -- +those _other inc_insns_ will get (n_mem_deps * number of MEM insns) +dependencies created for them. This may cause an exponential +growth of memory usage and scheduling time. +See PR96388 for details. +We [heuristically] use n_inc_deps as a proxy for the number of MEM +insns, and drop opportunities for breaking modifiable_mem dependencies +when dependency lists grow beyond reasonable size. */ + if (n_mem_deps * n_inc_deps + >= param_max_pending_list_length * param_max_pending_list_length) + goto next; + if (parse_add_or_inc (mii, inc_cand, backwards)) { struct dep_replacement *desc; @@ -4838,6 +4873,8 @@ find_inc (struct mem_inc_info *mii, bool backwards) desc->insn = mii->mem_insn; move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con), INSN_SPEC_BACK_DEPS (con)); + + gcc_assert (mii->inc_insn == inc_cand); if (backwards) { FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep) -- 2.34.1
Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
> On Nov 20, 2023, at 17:45, Richard Biener wrote: > > On Mon, Nov 20, 2023 at 2:42 PM Maxim Kuvyrkov > wrote: >> >>> On Nov 20, 2023, at 17:09, Richard Biener >>> wrote: >>> >>> On Mon, Nov 20, 2023 at 1:08 PM Maxim Kuvyrkov >>> wrote: >>>> >>>> This patch avoids sched-deps.cc:find_inc() creating exponential number >>>> of dependencies, which become memory and compilation time hogs. >>>> Consider example (simplified from PR96388) ... >>>> === >>>> sp=sp-4 // sp_insnA >>>> mem_insnA1[sp+A1] >>>> ... >>>> mem_insnAN[sp+AN] >>>> sp=sp-4 // sp_insnB >>>> mem_insnB1[sp+B1] >>>> ... >>>> mem_insnBM[sp+BM] >>>> === >>>> ... in this example find_modifiable_mems() will arrange for mem_insnA* >>>> to be able to pass sp_insnA, and, while doing this, will create >>>> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB >>>> is a consumer of sp_insnA. After this sp_insnB will have N new >>>> backward dependencies. >>>> Then find_modifiable_mems() gets to mem_insnB*s and starts to create >>>> N new dependencies for _every_ mem_insnB*. This gets us N*M new >>>> dependencies. >>>> >>>> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of >>>> 30GB and compilation time of 30 minutes, with sched2 accounting for >>>> 95% of both metrics. After this patch the RAM usage is down to 1GB >>>> and compilation time is down to 3-4 minutes, with sched2 no longer >>>> standing out on -ftime-report or memory usage. >>>> >>>> gcc/ChangeLog: >>>> >>>> PR rtl-optimization/96388 >>>> PR rtl-optimization/111554 >>>> * sched-deps.cc (find_inc): Avoid exponential behavior. >>>> --- >>>> gcc/sched-deps.cc | 45 + >>>> 1 file changed, 41 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc >>>> index c23218890f3..397bb9fd462 100644 >>>> --- a/gcc/sched-deps.cc >>>> +++ b/gcc/sched-deps.cc >>>> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, >>>> rtx_insn *insn, bool before_mem) >>>> /* Once a suitable mem reference has been found and the corresponding data >>>> in MII has been filled in, this function is called to find a suitable >>>> add or inc insn involving the register we found in the memory >>>> - reference. */ >>>> + reference. >>>> + If successful, this function will create additional dependencies >>>> between >>>> + - mii->inc_insn's producers and mii->mem_insn as a consumer (if >>>> backwards) >>>> + - mii->inc_insn's consumers and mii->mem_insn as a producer (if >>>> !backwards). >>>> +*/ >>>> >>>> static bool >>>> find_inc (struct mem_inc_info *mii, bool backwards) >>>> { >>>> sd_iterator_def sd_it; >>>> dep_t dep; >>>> + sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : >>>> SD_LIST_FORW; >>>> + int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps); >>>> >>>> - sd_it = sd_iterator_start (mii->mem_insn, >>>> -backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW); >>>> + sd_it = sd_iterator_start (mii->mem_insn, mem_deps); >>>> while (sd_iterator_cond (&sd_it, &dep)) >>>>{ >>>> dep_node_t node = DEP_LINK_NODE (*sd_it.linkp); >>>> rtx_insn *pro = DEP_PRO (dep); >>>> rtx_insn *con = DEP_CON (dep); >>>> - rtx_insn *inc_cand = backwards ? pro : con; >>>> + rtx_insn *inc_cand; >>>> + int n_inc_deps; >>>> + >>>> + if (backwards) >>>> + { >>>> + inc_cand = pro; >>>> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK); >>>> + } >>>> + else >>>> + { >>>> + inc_cand = con; >>>> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW); >>>> + } >>>> + >>>> + /* In the FOR_EACH_DEP loop below we will create additional >>>> n_inc_deps &
Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
> On Nov 20, 2023, at 20:30, Alexander Monakov wrote: > > > On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote: > >>> On Nov 20, 2023, at 17:52, Alexander Monakov wrote: >>> >>> >>> On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote: >>> >>>> This patch avoids sched-deps.cc:find_inc() creating exponential number >>>> of dependencies, which become memory and compilation time hogs. >>>> Consider example (simplified from PR96388) ... >>>> === >>>> sp=sp-4 // sp_insnA >>>> mem_insnA1[sp+A1] >>>> ... >>>> mem_insnAN[sp+AN] >>>> sp=sp-4 // sp_insnB >>>> mem_insnB1[sp+B1] >>>> ... >>>> mem_insnBM[sp+BM] >>>> === >>>> ... in this example find_modifiable_mems() will arrange for mem_insnA* >>>> to be able to pass sp_insnA, and, while doing this, will create >>>> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB >>>> is a consumer of sp_insnA. After this sp_insnB will have N new >>>> backward dependencies. >>>> Then find_modifiable_mems() gets to mem_insnB*s and starts to create >>>> N new dependencies for _every_ mem_insnB*. This gets us N*M new >>>> dependencies. >> >> [For avoidance of doubt, below discussion is about the general implementation >> of find_modifiable_mems() and not about the patch.] > > I was saying the commit message is hard to read (unless it's just me). > >>> It's a bit hard to read this without knowing which value of 'backwards' >>> is assumed. Oh, sorry, I misunderstood your comment. In the above example I want to describe outcome that current code generates, without going into details about exactly how it does it. I'm not sure how to make it more readable, and would appreciate suggestions. >>> >>> Say 'backwards' is true and we are inspecting producer sp_insnB of >>> mem_insnB1. >>> This is a true dependency. We know we can break it by adjusting B1 by -4, >>> but >>> we need to be careful not to move such modified mem_insnB1 above sp_insnA, >>> so >>> we need to iterate over *incoming true dependencies* of sp_insnB and add >>> them. >>> >>> But the code seems to be iterating over *all incoming dependencies*, so it >>> will e.g. take anti-dependency mem_insnA1 -> sp_insnB and create a true >>> dependency mem_insnA1 -> mem_insnB1'. This seems utterly inefficient, if my >>> understanding is correct. >> >> Yeap, your understanding is correct. However, this is what >> find_modifiable_mems() has to do to avoid complicated analysis of >> second-level >> dependencies. > > What is the reason it cannot simply skip anti-dependencies in the > 'if (backwards)' loop, and true dependencies in the 'else' loop? I /think/, this should be possible. However, rather than improving current implementation my preference is to rework it by integrating with the main dependency analysis. This should provide both faster and more precise dependency analysis, which would generate breakable addr/mem dependencies. Thanks, -- Maxim Kuvyrkov https://www.linaro.org
[PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
This patch avoids sched-deps.cc:find_inc() creating exponential number of dependencies, which become memory and compilation time hogs. Consider example (simplified from PR96388) ... === sp=sp-4 // sp_insnA mem_insnA1[sp+A1] ... mem_insnAN[sp+AN] sp=sp-4 // sp_insnB mem_insnB1[sp+B1] ... mem_insnBM[sp+BM] === [For simplicity, let's assume find_inc(backwards==true)]. In this example find_modifiable_mems() will arrange for mem_insnA* to be able to pass sp_insnA, and, while doing this, will create dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB is a consumer of sp_insnA. After this sp_insnB will have N new backward dependencies. Then find_modifiable_mems() gets to mem_insnB*s and starts to create N new dependencies for _every_ mem_insnB*. This gets us N*M new dependencies. In PR96833's testcase N and M are 10k-15k, which causes RAM usage of 30GB and compilation time of 30 minutes, with sched2 accounting for 95% of both metrics. After this patch the RAM usage is down to 1GB and compilation time is down to 3-4 minutes, with sched2 no longer standing out on -ftime-report or memory usage. gcc/ChangeLog: PR rtl-optimization/96388 PR rtl-optimization/111554 * sched-deps.cc (find_inc): Avoid exponential behavior. --- gcc/sched-deps.cc | 48 +++ 1 file changed, 44 insertions(+), 4 deletions(-) diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc index c23218890f3..005fc0f567e 100644 --- a/gcc/sched-deps.cc +++ b/gcc/sched-deps.cc @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn *insn, bool before_mem) /* Once a suitable mem reference has been found and the corresponding data in MII has been filled in, this function is called to find a suitable add or inc insn involving the register we found in the memory - reference. */ + reference. + If successful, this function will create additional dependencies between + - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards) + - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards). +*/ static bool find_inc (struct mem_inc_info *mii, bool backwards) { sd_iterator_def sd_it; dep_t dep; + sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW; + int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps); - sd_it = sd_iterator_start (mii->mem_insn, -backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW); + sd_it = sd_iterator_start (mii->mem_insn, mem_deps); while (sd_iterator_cond (&sd_it, &dep)) { dep_node_t node = DEP_LINK_NODE (*sd_it.linkp); rtx_insn *pro = DEP_PRO (dep); rtx_insn *con = DEP_CON (dep); - rtx_insn *inc_cand = backwards ? pro : con; + rtx_insn *inc_cand; + int n_inc_deps; + if (DEP_NONREG (dep) || DEP_MULTIPLE (dep)) goto next; + + if (backwards) + { + inc_cand = pro; + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK); + } + else + { + inc_cand = con; + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW); + } + + /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps +for mem_insn. This by itself is not a problem, since each mem_insn +will have only a few inc_insns associated with it. However, if +we consider that a single inc_insn may have a lot of mem_insns, AND, +on top of that, a few other inc_insns associated with it -- +those _other inc_insns_ will get (n_mem_deps * number of MEM insns) +dependencies created for them. This may cause an exponential +growth of memory usage and scheduling time. +See PR96388 for details. +We [heuristically] use n_inc_deps as a proxy for the number of MEM +insns, and drop opportunities for breaking modifiable_mem dependencies +when dependency lists grow beyond reasonable size. */ + if (n_mem_deps * n_inc_deps + >= param_max_pending_list_length * param_max_pending_list_length) + goto next; + if (parse_add_or_inc (mii, inc_cand, backwards)) { struct dep_replacement *desc; @@ -4838,6 +4873,11 @@ find_inc (struct mem_inc_info *mii, bool backwards) desc->insn = mii->mem_insn; move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con), INSN_SPEC_BACK_DEPS (con)); + + /* Make sure that n_inc_deps above is consistent with dependencies +we create. */ + gcc_assert (mii->inc_insn == inc_cand); + if (backwards) { FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep) -- 2.34.1
[PATCH v3 0/8] Avoid exponential behavior in scheduler and better logging
Compared to v1 and v2 this is a complete patch series. The debugging/dumping improvements gently touch IRA, RTL lists, and sel-sched bits to avoid re-inventing or copy-pasting the wheel. Bootstrapping and regtesting these patches on aarch64-linux-gnu. OK to merge? Thanks! === This patch series fixes exponential behavior in scheduler's find_modifiable_mems(). This fixes PRs [1] and [2], which are compilation time and memory hogs. The first patch in the series is the actual fix (bootstrapped and regtested on aarch64-linux-gnu), and follow up patches are improvements to scheduler's logging infrastructure, that enabled me to debug this problem. As-is, the scheduler has good logging of the actual scheduling process, but calculation of instruction dependencies has almost no logging. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388 [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111554 Maxim Kuvyrkov (8): sched-deps.cc (find_modifiable_mems): Avoid exponential behavior Unify implementations of print_hard_reg_set() Simplify handling of INSN_ and EXPR_LISTs in sched-rgn.cc Improve and fix sched-deps.cc: dump_dep() and dump_lists(). Add a bit more logging scheduler's dependency analysis sched_deps.cc: Simplify initialization of dependency contexts Improve logging of register data in scheduler dependency analysis Improve logging of scheduler dependency analysis context gcc/hard-reg-set.h| 3 + gcc/ira-color.cc | 17 +- gcc/ira-conflicts.cc | 39 + gcc/lists.cc | 30 +++- gcc/rtl.h | 4 +- gcc/sched-deps.cc | 399 +++--- gcc/sched-int.h | 9 +- gcc/sched-rgn.cc | 56 +++--- gcc/sel-sched-dump.cc | 21 +-- gcc/sel-sched-dump.h | 2 +- 10 files changed, 467 insertions(+), 113 deletions(-) -- 2.34.1
[PATCH v3 5/8] Add a bit more logging scheduler's dependency analysis
gcc/ChangeLog: * sched-deps.cc (sd_add_dep, find_inc): Add logging about dependency creation. --- gcc/sched-deps.cc | 30 ++ 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc index 4d357079a7a..2a87158ba4b 100644 --- a/gcc/sched-deps.cc +++ b/gcc/sched-deps.cc @@ -1342,6 +1342,13 @@ sd_add_dep (dep_t dep, bool resolved_p) in the bitmap caches of dependency information. */ if (true_dependency_cache != NULL) set_dependency_caches (dep); + + if (sched_verbose >= 9) +{ + fprintf (sched_dump, "created dependency "); + dump_dep (sched_dump, dep, 1); + fprintf (sched_dump, "\n"); +} } /* Add or update backward dependence between INSN and ELEM @@ -4879,18 +4886,33 @@ find_inc (struct mem_inc_info *mii, bool backwards) we create. */ gcc_assert (mii->inc_insn == inc_cand); + int n_deps_created = 0; if (backwards) { FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep) - add_dependence_1 (mii->mem_insn, DEP_PRO (dep), - REG_DEP_TRUE); + { + add_dependence_1 (mii->mem_insn, DEP_PRO (dep), + REG_DEP_TRUE); + ++n_deps_created; + } } else { FOR_EACH_DEP (mii->inc_insn, SD_LIST_FORW, sd_it, dep) - add_dependence_1 (DEP_CON (dep), mii->mem_insn, - REG_DEP_ANTI); + { + add_dependence_1 (DEP_CON (dep), mii->mem_insn, + REG_DEP_ANTI); + ++n_deps_created; + } } + if (sched_verbose >= 6) + fprintf (sched_dump, +"created %d deps for mem_insn %d due to " +"inc_insn %d %s deps\n", +n_deps_created, INSN_UID (mii->mem_insn), +INSN_UID (mii->inc_insn), +backwards ? "backward" : "forward"); + return true; } next: -- 2.34.1
[PATCH v3 6/8] sched_deps.cc: Simplify initialization of dependency contexts
gcc/ChangeLog: * sched-deps.cc (init_deps, init_deps_reg_last): Simplify. (free_deps): Remove useless code. --- gcc/sched-deps.cc | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc index 2a87158ba4b..e0d3c97d935 100644 --- a/gcc/sched-deps.cc +++ b/gcc/sched-deps.cc @@ -3927,10 +3927,9 @@ init_deps (class deps_desc *deps, bool lazy_reg_last) int max_reg = (reload_completed ? FIRST_PSEUDO_REGISTER : max_reg_num ()); deps->max_reg = max_reg; - if (lazy_reg_last) -deps->reg_last = NULL; - else -deps->reg_last = XCNEWVEC (struct deps_reg, max_reg); + deps->reg_last = NULL; + if (!lazy_reg_last) +init_deps_reg_last (deps); INIT_REG_SET (&deps->reg_last_in_use); deps->pending_read_insns = 0; @@ -3961,9 +3960,7 @@ init_deps (class deps_desc *deps, bool lazy_reg_last) void init_deps_reg_last (class deps_desc *deps) { - gcc_assert (deps && deps->max_reg > 0); - gcc_assert (deps->reg_last == NULL); - + gcc_assert (deps && deps->max_reg > 0 && deps->reg_last == NULL); deps->reg_last = XCNEWVEC (struct deps_reg, deps->max_reg); } @@ -4013,8 +4010,6 @@ free_deps (class deps_desc *deps) it at all. */ free (deps->reg_last); deps->reg_last = NULL; - - deps = NULL; } /* Remove INSN from dependence contexts DEPS. */ -- 2.34.1
[PATCH v3 2/8] Unify implementations of print_hard_reg_set()
We currently have 3 implementations of print_hard_reg_set() (all with the same name!) in ira-color.cc, ira-conflicts.cc, and sel-sched-dump.cc. This patch generalizes implementation in ira-color.cc, and uses it in all other places. The declaration is added to hard-reg-set.h. The motivation for this patch is the [upcoming] need for print_hard_reg_set() in sched-deps.cc. gcc/ChangeLog: * hard-reg-set.h (print_hard_reg_set): Declare. * ira-color.cc (print_hard_reg_set): Generalize a bit. (debug_hard_reg_set, print_hard_regs_subforest,) (setup_allocno_available_regs_num): Update. * ira-conflicts.cc (print_hard_reg_set): Remove. (print_allocno_conflicts): Use global print_hard_reg_set(). * sel-sched-dump.cc (print_hard_reg_set): Remove. (dump_hard_reg_set): Use global print_hard_reg_set(). * sel-sched-dump.h (dump_hard_reg_set): Mark as DEBUG_FUNCTION. --- gcc/hard-reg-set.h| 3 +++ gcc/ira-color.cc | 17 ++--- gcc/ira-conflicts.cc | 39 --- gcc/sel-sched-dump.cc | 21 - gcc/sel-sched-dump.h | 2 +- 5 files changed, 22 insertions(+), 60 deletions(-) diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h index b0bb9bce074..81bca6df0e5 100644 --- a/gcc/hard-reg-set.h +++ b/gcc/hard-reg-set.h @@ -524,4 +524,7 @@ call_used_or_fixed_reg_p (unsigned int regno) } #endif +/* ira-color.cc */ +extern void print_hard_reg_set (FILE *, HARD_REG_SET, const char *, bool); + #endif /* ! GCC_HARD_REG_SET_H */ diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc index f2e8ea34152..43564b44933 100644 --- a/gcc/ira-color.cc +++ b/gcc/ira-color.cc @@ -482,11 +482,14 @@ first_common_ancestor_node (allocno_hard_regs_node_t first, } /* Print hard reg set SET to F. */ -static void -print_hard_reg_set (FILE *f, HARD_REG_SET set, bool new_line_p) +void +print_hard_reg_set (FILE *f, HARD_REG_SET set, + const char *title, bool new_line_p) { int i, start, end; + if (title) +fprintf (f, "%s", title); for (start = end = -1, i = 0; i < FIRST_PSEUDO_REGISTER; i++) { bool reg_included = TEST_HARD_REG_BIT (set, i); @@ -516,7 +519,7 @@ print_hard_reg_set (FILE *f, HARD_REG_SET set, bool new_line_p) DEBUG_FUNCTION void debug_hard_reg_set (HARD_REG_SET set) { - print_hard_reg_set (stderr, set, true); + print_hard_reg_set (stderr, set, NULL, true); } /* Print allocno hard register subforest given by ROOTS and its LEVEL @@ -534,7 +537,7 @@ print_hard_regs_subforest (FILE *f, allocno_hard_regs_node_t roots, for (i = 0; i < level * 2; i++) fprintf (f, " "); fprintf (f, "%d:(", node->preorder_num); - print_hard_reg_set (f, node->hard_regs->set, false); + print_hard_reg_set (f, node->hard_regs->set, NULL, false); fprintf (f, ")@%" PRId64"\n", node->hard_regs->cost); print_hard_regs_subforest (f, node->first, level + 1); } @@ -2982,12 +2985,12 @@ setup_allocno_available_regs_num (ira_allocno_t a) " Allocno a%dr%d of %s(%d) has %d avail. regs ", ALLOCNO_NUM (a), ALLOCNO_REGNO (a), reg_class_names[aclass], ira_class_hard_regs_num[aclass], n); - print_hard_reg_set (ira_dump_file, data->profitable_hard_regs, false); + print_hard_reg_set (ira_dump_file, data->profitable_hard_regs, NULL, false); fprintf (ira_dump_file, ", %snode: ", data->profitable_hard_regs == data->hard_regs_node->hard_regs->set ? "" : "^"); print_hard_reg_set (ira_dump_file, - data->hard_regs_node->hard_regs->set, false); + data->hard_regs_node->hard_regs->set, NULL, false); for (i = 0; i < nwords; i++) { ira_object_t obj = ALLOCNO_OBJECT (a, i); @@ -3000,7 +3003,7 @@ setup_allocno_available_regs_num (ira_allocno_t a) } fprintf (ira_dump_file, " (confl regs = "); print_hard_reg_set (ira_dump_file, OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), - false); + NULL, false); fprintf (ira_dump_file, ")"); } fprintf (ira_dump_file, "\n"); diff --git a/gcc/ira-conflicts.cc b/gcc/ira-conflicts.cc index a4d93c8d734..69806b1a15b 100644 --- a/gcc/ira-conflicts.cc +++ b/gcc/ira-conflicts.cc @@ -670,37 +670,6 @@ build_conflicts (void) -/* Print hard reg set SET with TITLE to FILE. */ -static void -print_hard_reg_set (FILE *file, const char *title, HARD_REG_SET set) -{ - int i, start, end; - - fputs (title, file); - for (start = end = -1, i = 0; i < FIRST_PSEUDO_REGISTER; i++) -{ - bool reg_included = TEST_HARD_REG_BIT (set, i); - - if (reg_included) - { - if (start == -1) - start = i; - end = i; - } - if (start >= 0 && (!reg_included || i == FIRST_PSEUDO_REGISTER - 1)) - { - if (start == end) - fprintf (file, " %d", start); - else if (start == end + 1) -
[PATCH v3 8/8] Improve logging of scheduler dependency analysis context
Scheduler dependency analysis uses two main data structures: 1. reg_pending_* data contains effects of INSN on the register file, which is then incorporated into ... 2. deps_desc object, which contains commulative information about all instructions processed from deps_desc object's initialization. This patch adds debug dumping of (2). Dependency analysis contexts (aka deps_desc objects) are huge, but each instruction affects only a small amount of data in these objects. Therefore, it is most useful to dump differential information compared to the dependency state after previous instruction. gcc/ChangeLog: * sched-deps.cc (reset_deps, dump_rtx_insn_list) (rtx_insn_list_same_p): New helper functions. (dump_deps_desc_diff): New function to dump dependency information. (sched_analysis_prev_deps): New static variable. (sched_analyze_insn): Dump dependency information. (init_deps_global, finish_deps_global): Handle sched_analysis_prev_deps. * sched-int.h (struct deps_reg): Update comments. * sched-rgn.cc (concat_insn_list, concat_expr_list): Update comments. --- gcc/sched-deps.cc | 197 ++ gcc/sched-int.h | 9 ++- gcc/sched-rgn.cc | 5 ++ 3 files changed, 210 insertions(+), 1 deletion(-) diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc index f9290c82fd2..edca9927e23 100644 --- a/gcc/sched-deps.cc +++ b/gcc/sched-deps.cc @@ -1677,6 +1677,15 @@ delete_all_dependences (rtx_insn *insn) sd_delete_dep (sd_it); } +/* Re-initialize existing dependency context DEPS to be a copy of FROM. */ +static void +reset_deps (class deps_desc *deps, class deps_desc *from) +{ + free_deps (deps); + init_deps (deps, false); + deps_join (deps, from); +} + /* All insns in a scheduling group except the first should only have dependencies on the previous insn in the group. So we find the first instruction in the scheduling group by walking the dependence @@ -2960,6 +2969,177 @@ dump_reg_pending_data (FILE *file, rtx_insn *insn) } } +/* Dump rtx_insn_list LIST. + Consider moving to lists.cc if there are users outside of sched-deps.cc. */ +static void +dump_rtx_insn_list (FILE *file, rtx_insn_list *list) +{ + for (; list; list = list->next ()) +fprintf (file, " %d", INSN_UID (list->insn ())); +} + +/* Return TRUE if lists A and B have same elements in the same order. */ +static bool +rtx_insn_list_same_p (rtx_insn_list *a, rtx_insn_list *b) +{ + for (; a && b; a = a->next (), b = b->next ()) +if (a->insn () != b->insn ()) + return false; + + if (a || b) +return false; + + return true; +} + +/* Dump parts of DEPS that are different from PREV. + Dumping all information from dependency context produces huge + hard-to-analize logs; differential dumping is way more managable. */ +static void +dump_deps_desc_diff (FILE *file, class deps_desc *deps, class deps_desc *prev) +{ + /* Each "paragraph" is a single line of output. */ + + /* Note on param_max_pending_list_length: + During normal dependency analysis various lists should not exceed this + limit. Searching for "!!!" in scheduler logs can point to potential bugs + or poorly-handled corner-cases. */ + + if (!rtx_insn_list_same_p (deps->pending_read_insns, +prev->pending_read_insns)) +{ + fprintf (file, ";; deps pending mem reads length(%d):", + deps->pending_read_list_length); + if ((deps->pending_read_list_length + deps->pending_write_list_length) + >= param_max_pending_list_length) + fprintf (file, "%d insns!!!", deps->pending_read_list_length); + else + dump_rtx_insn_list (file, deps->pending_read_insns); + fprintf (file, "\n"); +} + + if (!rtx_insn_list_same_p (deps->pending_write_insns, +prev->pending_write_insns)) +{ + fprintf (file, ";; deps pending mem writes length(%d):", + deps->pending_write_list_length); + if ((deps->pending_read_list_length + deps->pending_write_list_length) + >= param_max_pending_list_length) + fprintf (file, "%d insns!!!", deps->pending_write_list_length); + else + dump_rtx_insn_list (file, deps->pending_write_insns); + fprintf (file, "\n"); +} + + if (!rtx_insn_list_same_p (deps->pending_jump_insns, +prev->pending_jump_insns)) +{ + fprintf (file, ";; deps pending jump length(%d):", + deps->pending_flush_length); + if (deps->pending_flush_length >= param_max_pending_list_length) + fprintf (file, "%d insns!!!", deps->pending_flush_length); + else + dump_rtx_insn_list (file, deps->pending_jump_insns); + fprintf (file, "\n"); +} + + fprintf (file, ";; last"); + if (!rtx_insn_list_same_p (deps->last_pending_memory_flush, +prev->last_pending_memory_flush)) +{ + fprintf (
[PATCH v3 3/8] Simplify handling of INSN_ and EXPR_LISTs in sched-rgn.cc
This patch simplifies logic behind deps_join(), which will be important for the upcoming improvements of sched-deps.cc logging. The only functional change is that when deps_join() is called with empty state for the 2nd argument, it will not reverse INSN_ and EXPR_LISTs in the 1st argument. Before this patch the lists were reversed due to use of concat_*_LIST(). Now, with copy_*_LIST() used for this case, the lists will remain in the original order. gcc/ChangeLog: * lists.cc (copy_EXPR_LIST, concat_EXPR_LIST): New functions. * rtl.h (copy_EXPR_LIST, concat_EXPR_LIST): Declare. * sched-rgn.cc (concat_insn_list, concat_expr_list): New helpers. (concat_insn_mem_list): Simplify. (deps_join): Update --- gcc/lists.cc | 30 +++- gcc/rtl.h| 4 +++- gcc/sched-rgn.cc | 51 +++- 3 files changed, 61 insertions(+), 24 deletions(-) diff --git a/gcc/lists.cc b/gcc/lists.cc index 2cdf37ad533..83e7bf32176 100644 --- a/gcc/lists.cc +++ b/gcc/lists.cc @@ -160,6 +160,24 @@ free_INSN_LIST_list (rtx_insn_list **listp) free_list ((rtx *)listp, &unused_insn_list); } +/* Make a copy of the EXPR_LIST list LINK and return it. */ +rtx_expr_list * +copy_EXPR_LIST (rtx_expr_list *link) +{ + rtx_expr_list *new_queue; + rtx_expr_list **pqueue = &new_queue; + + for (; link; link = link->next ()) +{ + rtx x = link->element (); + rtx_expr_list *newlink = alloc_EXPR_LIST (REG_NOTE_KIND (link), x, NULL); + *pqueue = newlink; + pqueue = (rtx_expr_list **)&XEXP (newlink, 1); +} + *pqueue = NULL; + return new_queue; +} + /* Make a copy of the INSN_LIST list LINK and return it. */ rtx_insn_list * copy_INSN_LIST (rtx_insn_list *link) @@ -178,12 +196,22 @@ copy_INSN_LIST (rtx_insn_list *link) return new_queue; } +/* Duplicate the EXPR_LIST elements of COPY and prepend them to OLD. */ +rtx_expr_list * +concat_EXPR_LIST (rtx_expr_list *copy, rtx_expr_list *old) +{ + rtx_expr_list *new_rtx = old; + for (; copy; copy = copy->next ()) +new_rtx = alloc_EXPR_LIST (REG_NOTE_KIND (copy), copy->element (), new_rtx); + return new_rtx; +} + /* Duplicate the INSN_LIST elements of COPY and prepend them to OLD. */ rtx_insn_list * concat_INSN_LIST (rtx_insn_list *copy, rtx_insn_list *old) { rtx_insn_list *new_rtx = old; - for (; copy ; copy = copy->next ()) + for (; copy; copy = copy->next ()) { new_rtx = alloc_INSN_LIST (copy->insn (), new_rtx); PUT_REG_NOTE_KIND (new_rtx, REG_NOTE_KIND (copy)); diff --git a/gcc/rtl.h b/gcc/rtl.h index e4b6cc0dbb5..7e952d7cbeb 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -3764,10 +3764,12 @@ extern void free_EXPR_LIST_list (rtx_expr_list **); extern void free_INSN_LIST_list (rtx_insn_list **); extern void free_EXPR_LIST_node (rtx); extern void free_INSN_LIST_node (rtx); +extern rtx_expr_list *alloc_EXPR_LIST (int, rtx, rtx); extern rtx_insn_list *alloc_INSN_LIST (rtx, rtx); +extern rtx_expr_list *copy_EXPR_LIST (rtx_expr_list *); extern rtx_insn_list *copy_INSN_LIST (rtx_insn_list *); +extern rtx_expr_list *concat_EXPR_LIST (rtx_expr_list *, rtx_expr_list *); extern rtx_insn_list *concat_INSN_LIST (rtx_insn_list *, rtx_insn_list *); -extern rtx_expr_list *alloc_EXPR_LIST (int, rtx, rtx); extern void remove_free_INSN_LIST_elem (rtx_insn *, rtx_insn_list **); extern rtx remove_list_elem (rtx, rtx *); extern rtx_insn *remove_free_INSN_LIST_node (rtx_insn_list **); diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc index e5964f54ead..da3ec0458ff 100644 --- a/gcc/sched-rgn.cc +++ b/gcc/sched-rgn.cc @@ -2585,25 +2585,32 @@ add_branch_dependences (rtx_insn *head, rtx_insn *tail) static class deps_desc *bb_deps; +/* Return a new insn_list with all the elements from the two input lists. */ +static rtx_insn_list * +concat_insn_list (rtx_insn_list *copy, rtx_insn_list *old) +{ + if (!old) +return copy_INSN_LIST (copy); + return concat_INSN_LIST (copy, old); +} + +/* Return a new expr_list with all the elements from the two input lists. */ +static rtx_expr_list * +concat_expr_list (rtx_expr_list *copy, rtx_expr_list *old) +{ + if (!old) +return copy_EXPR_LIST (copy); + return concat_EXPR_LIST (copy, old); +} + static void concat_insn_mem_list (rtx_insn_list *copy_insns, rtx_expr_list *copy_mems, rtx_insn_list **old_insns_p, rtx_expr_list **old_mems_p) { - rtx_insn_list *new_insns = *old_insns_p; - rtx_expr_list *new_mems = *old_mems_p; - - while (copy_insns) -{ - new_insns = alloc_INSN_LIST (copy_insns->insn (), new_insns); - new_mems = alloc_EXPR_LIST (VOIDmode, copy_mems->element (), new_mems); - copy_insns = copy_insns->next (); - copy_mems = copy_mems->next (); -} - - *old_insns_p = new_insns; - *old_mems_p = new_mems; + *old_insns_p = concat_insn_list (copy_insns, *old_insns_p); + *old_mems_p = concat
[PATCH v3 4/8] Improve and fix sched-deps.cc: dump_dep() and dump_lists().
Better propagate flags from dump_lists() into dump_dep() and add a missing "]" in dump_lists(). gcc/ChangeLog: * sched-deps.cc (DUMP_DEP_PRO): Improve comment. (dump_dep_flags): Remove. (DUMP_LISTS_SIZE, DUMP_LISTS_DEPS, DUMP_LISTS_ALL): Continue numbering from DUMP_DEP_* flags. (dump_lists): Update and fix. --- gcc/sched-deps.cc | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc index 005fc0f567e..4d357079a7a 100644 --- a/gcc/sched-deps.cc +++ b/gcc/sched-deps.cc @@ -132,7 +132,8 @@ static void dump_ds (FILE *, ds_t); /* Define flags for dump_dep (). */ /* Dump producer of the dependence. */ -#define DUMP_DEP_PRO (2) +#define DUMP_DEP_PRO (2) /* Reserve "1" for handling of DUMP_DEP_ALL and + DUMP_LISTS_ALL. */ /* Dump consumer of the dependence. */ #define DUMP_DEP_CON (4) @@ -206,9 +207,6 @@ dump_dep (FILE *dump, dep_t dep, int flags) fprintf (dump, ">"); } -/* Default flags for dump_dep (). */ -static int dump_dep_flags = (DUMP_DEP_PRO | DUMP_DEP_CON); - /* Dump all fields of DEP to STDERR. */ void sd_debug_dep (dep_t dep) @@ -1454,19 +1452,20 @@ sd_delete_dep (sd_iterator_def sd_it) } /* Dump size of the lists. */ -#define DUMP_LISTS_SIZE (2) +#define DUMP_LISTS_SIZE (32) /* (DUMP_DEP_STATUS << 1) */ /* Dump dependencies of the lists. */ -#define DUMP_LISTS_DEPS (4) +#define DUMP_LISTS_DEPS (64) /* Dump all information about the lists. */ #define DUMP_LISTS_ALL (DUMP_LISTS_SIZE | DUMP_LISTS_DEPS) /* Dump deps_lists of INSN specified by TYPES to DUMP. - FLAGS is a bit mask specifying what information about the lists needs - to be printed. + FLAGS is a bit mask specifying what information about the lists and + the individual deps needs to be printed, this is a combination of + DUMP_DEP_* and DUMP_LISTS_* flags. If FLAGS has the very first bit set, then dump all information about - the lists and propagate this bit into the callee dump functions. */ + the lists and deps propagate this bit into the callee dump functions. */ static void dump_lists (FILE *dump, rtx insn, sd_list_types_def types, int flags) { @@ -1488,10 +1487,12 @@ dump_lists (FILE *dump, rtx insn, sd_list_types_def types, int flags) { FOR_EACH_DEP (insn, types, sd_it, dep) { - dump_dep (dump, dep, dump_dep_flags | all); + dump_dep (dump, dep, flags | all); fprintf (dump, " "); } } + + fprintf (dump, "]"); } /* Dump all information about deps_lists of INSN specified by TYPES -- 2.34.1
[PATCH v3 7/8] Improve logging of register data in scheduler dependency analysis
Scheduler dependency analysis uses two main data structures: 1. reg_pending_* data contains effects of INSN on the register file, which is then incorporated into ... 2. deps_desc object, which contains commulative information about all instructions processed from deps_desc object's initialization. This patch adds debug dumping of (1). gcc/ChangeLog: * sched-deps.cc (print-rtl.h): Include for str_pattern_slim(). (dump_reg_pending_data): New function. (sched_analyze_insn): Use it. --- gcc/sched-deps.cc | 90 ++- 1 file changed, 89 insertions(+), 1 deletion(-) diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc index e0d3c97d935..f9290c82fd2 100644 --- a/gcc/sched-deps.cc +++ b/gcc/sched-deps.cc @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. If not see #include "sched-int.h" #include "cselib.h" #include "function-abi.h" +#include "print-rtl.h" #ifdef INSN_SCHEDULING @@ -432,10 +433,24 @@ dep_spec_p (dep_t dep) return false; } +/* These regsets describe how a single instruction affects registers. + Their "life-time" is restricted to a single call of sched_analyze_insn(). + They are populated by sched_analyze_1() and sched_analyze_2(), and + then sched_analyze_insn() transfers data from these into deps->reg_last[i]. + Near the end sched_analyze_insn() clears these regsets for the next + insn. */ static regset reg_pending_sets; static regset reg_pending_clobbers; static regset reg_pending_uses; static regset reg_pending_control_uses; + +/* Similar to reg_pending_* regsets, this variable specifies whether + the current insn analyzed by sched_analyze_insn() is a scheduling + barrier that should "split" dependencies inside a block. Internally + sched-deps.cc does this by pretending that the barrier insn uses and + sets all registers. + Near the end sched_analyze_insn() transfers barrier info from this variable + into deps->last_reg_pending_barrier. */ static enum reg_pending_barrier_mode reg_pending_barrier; /* Hard registers implicitly clobbered or used (or may be implicitly @@ -2880,7 +2895,77 @@ get_implicit_reg_pending_clobbers (HARD_REG_SET *temp, rtx_insn *insn) *temp &= ~ira_no_alloc_regs; } -/* Analyze an INSN with pattern X to find all dependencies. */ +/* Dump state of reg_pending_* data for debug purposes. + Dump only non-empty data to reduce log clobber. */ +static void +dump_reg_pending_data (FILE *file, rtx_insn *insn) +{ + fprintf (file, "\n"); + fprintf (file, ";; sched_analysis after insn %d: %s\n", + INSN_UID (insn), str_pattern_slim (PATTERN (insn))); + + if (!REG_SET_EMPTY_P (reg_pending_sets) + || !REG_SET_EMPTY_P (reg_pending_clobbers) + || !REG_SET_EMPTY_P (reg_pending_uses) + || !REG_SET_EMPTY_P (reg_pending_control_uses)) +{ + fprintf (file, ";; insn reg"); + if (!REG_SET_EMPTY_P (reg_pending_sets)) + { + fprintf (file, " sets("); + dump_regset (reg_pending_sets, file); + fprintf (file, ")"); + } + if (!REG_SET_EMPTY_P (reg_pending_clobbers)) + { + fprintf (file, " clobbers("); + dump_regset (reg_pending_clobbers, file); + fprintf (file, ")"); + } + if (!REG_SET_EMPTY_P (reg_pending_uses)) + { + fprintf (file, " uses("); + dump_regset (reg_pending_uses, file); + fprintf (file, ")"); + } + if (!REG_SET_EMPTY_P (reg_pending_control_uses)) + { + fprintf (file, " control("); + dump_regset (reg_pending_control_uses, file); + fprintf (file, ")"); + } + fprintf (file, "\n"); +} + + if (reg_pending_barrier) +fprintf (file, ";; insn reg barrier: %d\n", reg_pending_barrier); + + if (!hard_reg_set_empty_p (implicit_reg_pending_clobbers) + || !hard_reg_set_empty_p (implicit_reg_pending_uses)) +{ + fprintf (file, ";; insn reg"); + if (!hard_reg_set_empty_p (implicit_reg_pending_clobbers)) + { + print_hard_reg_set (file, implicit_reg_pending_clobbers, + " implicit clobbers(", false); + fprintf (file, ")"); + } + if (!hard_reg_set_empty_p (implicit_reg_pending_uses)) + { + print_hard_reg_set (file, implicit_reg_pending_uses, + " implicit uses(", false); + fprintf (file, ")"); + } + fprintf (file, "\n"); +} +} + +/* Analyze an INSN with pattern X to find all dependencies. + This analysis uses two main data structures: + 1. reg_pending_* data contains effects of INSN on the register file, + which is then incorporated into ... + 2. deps_desc object, which contains commulative information about all + instructions processed from deps_desc object's initialization. */ static void sched_analyze_insn (class deps_desc *deps, rtx x, rtx_insn *insn) { @@ -3328,6 +3413,9 @@ sched_analyze_insn (class d
Re: [PATCH 01/12] [contrib] validate_failures.py: Avoid testsuite aliasing
Hi Bernhard, Thanks, I meant to fix this, but forgot. The underlying problem here is that we want to detect which sub-testsuites had failures. Current regex doesn't match go's case because there is no "..." at the end: "Running foo" vs "Running foo ..." . My preferred way of fixing this is to make go's testsuite print out "..." . We have a similar patch for glibc [1]. [1] https://sourceware.org/pipermail/libc-alpha/2023-June/148702.html -- Maxim Kuvyrkov https://www.linaro.org > On Sep 26, 2023, at 19:46, Bernhard Reutner-Fischer > wrote: > > Hi Maxim! > > On Mon, 5 Jun 2023 18:06:25 +0400 > Maxim Kuvyrkov via Gcc-patches wrote: > >>> On Jun 3, 2023, at 19:17, Jeff Law wrote: >>> >>> On 6/2/23 09:20, Maxim Kuvyrkov via Gcc-patches wrote: >>>> This patch adds tracking of current testsuite "tool" and "exp" >>>> to the processing of .sum files. This avoids aliasing between >>>> tests from different testsuites with same name+description. >>>> E.g., this is necessary for testsuite/c-c++-common, which is ran >>>> for both gcc and g++ "tools". >>>> This patch changes manifest format from ... >>>> >>>> FAIL: gcc_test >>>> FAIL: g++_test >>>> >>>> ... to ... >>>> >>>> === gcc tests === >>>> Running gcc/foo.exp ... >>>> FAIL: gcc_test >>>> === gcc Summary == >>>> === g++ tests === >>>> Running g++/bar.exp ... >>>> FAIL: g++_test >>>> === g++ Summary == >>>> . >>>> The new format uses same formatting as DejaGnu's .sum files >>>> to specify which "tool" and "exp" the test belongs to. >>> I think the series is fine. You're not likely to hear from Diego or Doug I >>> suspect, I don't think either are involved in GNU stuff anymore. >>> >> >> Thanks, Jeff. I'll wait for a couple of days and will merge if there are no >> new comments. > > Maxim, may i ask you to have a look at the following problem, please? > > ISTM that your exp code does not work as expected for go, maybe you > forgot to test the changes with go enabled? > > Ever since your changes in summer i see the following: > > gcc-14.mine$ > /scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py > --clean_build ../gcc-14.orig/ > Getting actual results from build directory . > ./gcc/testsuite/go/go.sum > ./gcc/testsuite/gcc/gcc.sum > ./gcc/testsuite/objc/objc.sum > ./gcc/testsuite/jit/jit.sum > ./gcc/testsuite/gdc/gdc.sum > ./gcc/testsuite/gnat/gnat.sum > ./gcc/testsuite/ada/acats/acats.sum > ./gcc/testsuite/g++/g++.sum > ./gcc/testsuite/obj-c++/obj-c++.sum > ./gcc/testsuite/rust/rust.sum > ./gcc/testsuite/gfortran/gfortran.sum > ./x86_64-pc-linux-gnu/libgomp/testsuite/libgomp.sum > ./x86_64-pc-linux-gnu/libphobos/testsuite/libphobos.sum > ./x86_64-pc-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum > ./x86_64-pc-linux-gnu/libffi/testsuite/libffi.sum > ./x86_64-pc-linux-gnu/libitm/testsuite/libitm.sum > ./x86_64-pc-linux-gnu/libgo/libgo.sum > ./x86_64-pc-linux-gnu/libatomic/testsuite/libatomic.sum > ./gotools/gotools.sum > .sum file seems to be broken: tool="gotools", exp="None", summary_line="FAIL: > TestScript" > Traceback (most recent call last): > File > "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", > line 732, in >retval = Main(sys.argv) > File > "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", > line 721, in Main >retval = CompareBuilds() > File > "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", > line 622, in CompareBuilds >actual = GetResults(sum_files) > File > "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", > line 466, in GetResults >build_results.update(ParseSummary(sum_fname)) > File > "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", > line 405, in ParseSummary >result = result_set.MakeTestResult(line, ordinal) > File > "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", > line 239, in MakeTestResult >return TestResult(summary_line, ordinal, > File > "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py", > line 151, in __init__ >raise > RuntimeError: No active exception to reraise > > > The problem seems to be that gotools.sum does not mention any ".exp" > files. > > $ grep "Running " gotools/gotools.sum > Running cmd/go > Running runtime > Running cgo > Running carchive > Running cmd/vet > Running embed > $ grep -c "\.exp" gotools/gotools.sum > 0 > > The .sum files looks like this: > ---8<--- > Test Run By foo on Tue Sep 26 14:46:48 CEST 2023 > Native configuration is x86_64-foo-linux-gnu > >=== gotools tests === > > Running cmd/go > UNTESTED: TestAccidentalGitCheckout > PASS: TestAlwaysLinkSysoFiles > ... > UNTESTED: TestParallelTest > FAIL: TestScript > ... > ---8<--- > > May i ask you to have a look, please? > > TIA,
Re: [PATCH, libgcc] Fix licenses on several files
On 29/07/2013, at 10:03 AM, Maxim Kuvyrkov wrote: > While verifying license compliance for GCC and its libraries I noticed that > several libgcc files that end up in the final library are licensed under > GPL-3.0+ instead of GPL-3.0-with-GCC-exception. > > This is, obviously, was not the intention of developers who just copied wrong > boilerplate text, and this patch fixes the oversights. I wrote up a blog post about GCC licenses and ideas on their audit at http://www.kugelworks.com/blog/gcc-license-audit/ . The blog post mentions two improvements in handling licenses in GCC and I will follow up on them in separate threads. -- Maxim Kuvyrkov www.kugelworks.com
Re: [PATCH] Enable non-complex math builtins from C99 for Bionic
RGET_LIBC_HAS_FUNCTION no_c99_libc_has_function > diff --git a/gcc/config/moxie/uclinux.h b/gcc/config/moxie/uclinux.h > index 498037e..85c65f2 100644 > --- a/gcc/config/moxie/uclinux.h > +++ b/gcc/config/moxie/uclinux.h > @@ -37,3 +37,6 @@ see the files COPYING3 and COPYING.RUNTIME > respectively. If not, see > --wrap=mmap --wrap=munmap --wrap=alloca\ > %{fmudflapth: --wrap=pthread_create\ > }} %{fmudflap|fmudflapth: --wrap=main}" > + > +#undef TARGET_LIBC_HAS_FUNCTION > +#define TARGET_LIBC_HAS_FUNCTION no_c99_libc_has_function > > Thanks, -- Maxim Kuvyrkov www.kugelworks.com
Re: [bootstrap] Fix build for several targets (including pr58242)
On 3/09/2013, at 5:53 PM, Jakub Jelinek wrote: > On Tue, Sep 03, 2013 at 09:25:31AM +0400, Alexander Ivchenko wrote: >> Several builds are broken after r201838. > > What targets actually support bionic? If it is just arm, i?86/x86_64 > and perhaps aarch64 and nothing else, I'd like to question the way where > you enforce all the > # Add Android userspace support to Linux targets. > case $target in >*linux*) > tm_p_file="${tm_p_file} linux-protos.h" > tmake_file="${tmake_file} t-linux-android" > tm_file="$tm_file linux-android.h" > extra_options="$extra_options linux-android.opt" > extra_objs="$extra_objs linux-android.o" > ;; > esac > stuff onto all Linux targets, without actually testing all of them. > Wouldn't it be much better not to support Android in any way > in config/linux.h (say at most #define TARGET_HAS_BIONIC 0 there, > #ifdef HAVE_GNU_INDIRECT_FUNCTION > #define TARGET_LIBC_HAS_FUNCTION hook_bool*true > #endif > etc.), and append the android stuff only to targets which you > support, not before the target specific snippets in config.gcc, > but after those (so linux-android.h, etc. come last, not first)? Jakub, Nominal handling of Bionic by all *linux* targets is my fault, and it seemed like a good thing at the time. With several iterations of improvements laid on top of this initial decision it now becoming a mess. I'm thinking along similar lines as you to restrict Bionic/Android handling only to targets that actually support it. As Alexander said in his email, this rework should be done as a separate patch. Thanks, -- Maxim Kuvyrkov www.kugelworks.com
Re: [PATCH] Enable non-complex math builtins from C99 for Bionic
On 4/09/2013, at 7:43 PM, Alexander Ivchenko wrote: > Hi Maxim, > > 2013/9/4 Maxim Kuvyrkov : >> On 23/08/2013, at 1:04 AM, Alexander Ivchenko wrote: >> >>> Ugh.. thanks, you are right. That points to another problem that I >>> didn't see before: >>> >>> 3) *linux* targets that do not append to tm_p_file (s390x-*-linux* and >>> s390x-ibm-tpf* - your patch addresses that problem correctly) OR >>> tmake_file (bfin*-linux-uclibc* or crisv32-*-linux* | cris-*-linux*) >> >> Could you be more verbose, please? What some of the *linux* target do not >> append to tm_p_file? > > *linux* targets that do not append to tm_p_file are 390x-*-linux* and > s390x-ibm-tpf* and Andrew is already addressed this problem. OK. > > >> It seems that there are at least two separate problems: >> >> 1. OPTION_BIONIC is not defined in linux-android.c . I think the right fix >> here is to copy definitions of OPTION_BIONIC (and, optionally, >> OPTION_UCLIBC) from gcc/config/linux.h to rs6000/linux.h, rs6000/linux64.h >> and alpha/linux.h -- in other words, define OPTION_BIONIC and OPTION_UCLIBC >> whenever OPTION_GLIBC is defined. > > Why can't we just use the checks like "if (linux_libc == LIBC_GLIBC)". > Seems that we have all infrastructure in place (linux_libc, > LIBC_GLIBC, LIBC_BIONIC, LIBC_UCLIBC are defined for all *linux* > targets). I don't see the reason to make new defines. Same reason why the existing defines are in place. OPTION_X is not always the same as (linux_libc == LIBC_X). > >> 2. The second problem is to do with definitions of TARGET_LIBC_HAS_FUNCTION >> for bfin, c6x, lm32, m68k and moxie. What is the failure scenario here? > > "For them we have ar: linux-android.o: No such file or directory" So I > added t-linux-android rules to tmake_file. > > (and for bfin additionally we have: > "../../gcc/gcc/config/bfin/bfin.c:5812:29: error: > ‘linux_android_libc_has_function’ was not declared in this scope". > because > the tm_p.h is not included in bfin.c) OK. > >> >>> diff --git a/gcc/config.gcc b/gcc/config.gcc >>> index 7e1d529..89cf30a 100644 >>> --- a/gcc/config.gcc >>> +++ b/gcc/config.gcc >>> @@ -1018,7 +1018,7 @@ bfin*-uclinux*) >>> ;; >>> bfin*-linux-uclibc*) >>> tm_file="${tm_file} dbxelf.h elfos.h bfin/elf.h gnu-user.h linux.h >>> glibc-stdint.h bfin/linux.h ./linux-sysroot-suffix.h" >>> - tmake_file="bfin/t-bfin-linux t-slibgcc" >>> + tmake_file="bfin/t-bfin-linux t-slibgcc t-linux-android" >>> use_collect2=no >>> ;; >>> bfin*-rtems*) >> >> Why? Bfin has nothing to do with android. > > Since ‘linux_android_libc_has_function’ handles all three libc options > for all linux targets, that's natural to use this function. May be it > should be called "linux_libc_has_function" and be located in linux.c > (we don't have this file..). OK. Ideally we would have linux.c with linux_libc_has_function that knows nothing about Bionic and linux-android.c with linux_android_libc_has_function. The patch is OK with definitions of OPTION_GLIBC, OPTION_UCLIBC and OPTION_BIONIC copied verbatim from gcc/config/linux.h to rs6000 and alpha versions. It is then on my plate to refactor handling of Bionic libc and remove it from targets that don't support it. Thanks, -- Maxim Kuvyrkov www.kugelworks.com
Cleanup Linux libc selection and Android support
Following recent breakage caused by adding nominal Android support to all *linux* targets [*] this patch series cleans up libc selection for Linux targets (-mglibc/-muclibc/-mbionic), splits libc selection logic from Android support, and removes Android handling from targets that don't support it. [*] http://thread.gmane.org/gmane.comp.gcc.patches/277430/focus=292362 Special thanks goes to Alexander who tested and reviewed initial versions of these patches and fixed several problems. The patch series was tested on various Linux and uClinux targets including arm, bfin, c6x, m68k, mips, powerpc, x86, x86_64. Patches will be posted in their separate threads, and below is a summary. Individually the patches are all borderline trivial. Reviews and approvals are welcome! - 0001-Rename-files-for-libc-selection-on-Linux-targets Mechanical rename of files in preparation for splitting Android handling from libc selection. - 0002-Rename-functions-relating-to-libc-support-on-Linux-t Mechanical rename of functions. - 0003-Robustify-check-for-IFUNC-support Trivial fix. - 0004-Cleanup-definitions-of-libc-related-target-hooks Consolidate definitions of libc target hooks in linux.h - 0005-Cleanup-libc-selection-and-Android-support Split Android handling from libc selection and remove Android handling from targets that don't support it. Thanks, -- Maxim Kuvyrkov www.kugelworks.com
[PATCH 1/5] Rename files for libc selection on Linux targets
Mechanical rename of files in preparation for splitting Android handling from libc selection. The patch series was tested on various Linux and uClinux targets including arm, bfin, c6x, m68k, mips, powerpc, x86, x86_64. OK to apply? -- Maxim Kuvyrkov www.kugelworks.com 0001-Rename-files-for-libc-selection-on-Linux-targets.patch Description: Binary data
[PATCH 2/5] Rename functions relating to libc support on Linux targets
Mechanical rename of functions. The patch series was tested on various Linux and uClinux targets including arm, bfin, c6x, m68k, mips, powerpc, x86, x86_64. OK to apply? -- Maxim Kuvyrkov www.kugelworks.com 0002-Rename-functions-relating-to-libc-support-on-Linux-t.patch Description: Binary data
[PATCH 3/5] Robustify check for IFUNC support
This patch changes check in linux_has_ifunc_p() from TARGET_ANDROID to OPTION_BIONIC. These two predicates are, currently, the same, but it is better to check for a specific C library rather than operational environment the library is commonly associated with. The patch series was tested on various Linux and uClinux targets including arm, bfin, c6x, m68k, mips, powerpc, x86, x86_64. OK to apply? Thanks, -- Maxim Kuvyrkov www.kugelworks.com 0003-Robustify-check-for-IFUNC-support.patch Description: Binary data
[PATCH 4/5] Cleanup definitions of libc-related target hooks
This patch consolidates definitions of TARGET_LIBC_HAS_FUNCTION and TARGET_HAS_IFUNC_P hooks in linux.h from linux-android.h and uclinux.h files of various architectures. One caveat to consolidating definitions in linux.h is that both *-linux-* and *-uclinux-* targets include config/linux.h, but *-uclinux-* targets then skip linux.opt, linux-protos.h and linux.c, which declare and define functions implementing target hooks. Since we don't really want to add linux.opt and company to *-uclinux-* targets we check for "#if *-uclinux-*" in linux.h and define TARGET_LIBC_HAS_FUNCTION and TARGET_HAS_IFUNC_P separately for *-linux-* and for *-uclinux-*. The patch series was tested on various Linux and uClinux targets including arm, bfin, c6x, m68k, mips, powerpc, x86, x86_64. OK to apply? Thanks, -- Maxim Kuvyrkov www.kugelworks.com 0004-Cleanup-definitions-of-libc-related-target-hooks.patch Description: Binary data
[PATCH 5/5] Cleanup libc selection and Android support
This patch splits Android handling from libc selection and removes Android handling from targets that don't support it. The patch series was tested on various Linux and uClinux targets including arm, bfin, c6x, m68k, mips, powerpc, x86, x86_64. OK to apply? Thanks, -- Maxim Kuvyrkov www.kugelworks.com 0005-Cleanup-libc-selection-and-Android-support.patch Description: Binary data
[PATCH] Add testcase for PR61033
This patch adds a testcase for PR61033 [1]. The bug is about compiler going into infinite loop while solving data-flow in vt_find_locations on arm-* targets. The issue in fixed in GCC 5 and later by Richard B.'s r211624 [2]. The bug affects GCC 4.8 and 4.9, but is unlikely to be fixed there. This patch adds a regression test to make sure we catch the problem should it re-appear again. Tested on arm-linux-gnueabihf and x86_64-linux-gnu. OK to apply? [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61033 [2] http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=211625 2014-06-13 Richard Biener 1bf2ea41.diff Description: Binary data
[PATCH,testsuite] Print markers to stderr to avoid races with sanitizer output
This patch improves sanitizer testsuite to avoid sporadic failures, especially when [cross-]testing on a remote machine. There are several unstable sanitizer tests in GCC testsuite (e.g., g++.dg/tsan/aligned_vs_unaligned_race.C), and the instability comes from the test doing this: printf("Pass\n"); /* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */ /* { dg-output "Pass.*" } */ . Thread sanitizer prints its data to stderr, and then printf prints to stdout. The testcase expects stderr to be flushed before stdout, and "Pass" to be at the end of the output. AFAIK, that works fine in native/local environment, but fails in remote testing from time to time. The failure case happens when stdout output is printed before stderr output. This is, likely, because there is no strict ordering of stdout and stderr when remote testing. This patch fixes the affected tests in GCC testsuite, and I'll be also posting a similar patch for LLVM testsuite. OK for trunk? -- Maxim Kuvyrkov www.linaro.org 8cffd698.diff Description: Binary data
Re: [PATCH,testsuite] Print markers to stderr to avoid races with sanitizer output
> On Mar 1, 2016, at 12:31 PM, Maxim Kuvyrkov wrote: > > This patch improves sanitizer testsuite to avoid sporadic failures, > especially when [cross-]testing on a remote machine. > > There are several unstable sanitizer tests in GCC testsuite (e.g., > g++.dg/tsan/aligned_vs_unaligned_race.C), and the instability comes from the > test doing this: > > printf("Pass\n"); > /* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */ > /* { dg-output "Pass.*" } */ > > . Thread sanitizer prints its data to stderr, and then printf prints to > stdout. The testcase expects stderr to be flushed before stdout, and "Pass" > to be at the end of the output. AFAIK, that works fine in native/local > environment, but fails in remote testing from time to time. The failure case > happens when stdout output is printed before stderr output. This is, likely, > because there is no strict ordering of stdout and stderr when remote testing. > > This patch fixes the affected tests in GCC testsuite, and I'll be also > posting a similar patch for LLVM testsuite. FYI, this was posted to LLVM (and approved) here: http://reviews.llvm.org/D17757 . > > OK for trunk? -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH] Fix detection of setrlimit in libstdc++ testsuite
PING ^ 2. The patch is sitting without comments for 4+ months. -- Maxim Kuvyrkov www.linaro.org > On Dec 10, 2015, at 4:47 PM, Maxim Kuvyrkov wrote: > >> On Nov 11, 2015, at 7:56 PM, Maxim Kuvyrkov >> wrote: >> >> Hi, >> >> This patch fixes an obscure cross-testing problem that crashed (OOMed) our >> boards at Linaro. Several tests in libstdc++ (e.g., [1]) limit themselves >> to some reasonable amount of RAM and then try to allocate 32 gigs. >> Unfortunately, the configure test that checks presence of setrlimit is >> rather strange: if target is native, then try compile file with call to >> setrlimit -- if compilation succeeds, then use setrlimit, otherwise, ignore >> setrlimit. The strange part is that the compilation check is done only for >> native targets, as if cross-toolchains can't generate working executables. >> [This is rather odd, and I might be missing some underlaying caveat.] >> >> Therefore, when testing a cross toolchain, the test [1] still tries to >> allocate 32GB of RAM with no setrlimit restrictions. On most targets that >> people use for cross-testing this is not an issue because either >> - the target is 32-bit, so there is no 32GB user-space to speak of, or >> - the target board has small amount of RAM and no swap, so allocation >> immediately fails, or >> - the target board has plenty of RAM, so allocating 32GB is not an issue. >> >> However, if one is testing on a 64-bit board with 16GB or RAM and 16GB of >> swap, then one gets into an obscure near-OOM swapping condition. This is >> exactly the case with cross-testing aarch64-linux-gnu toolchains on APM >> Mustang. >> >> The attached patch removes "native" restriction from configure test for >> setrlimit. This enables setrlimit restrictions on the testsuite, and the >> test [1] expectedly fails to allocate 32GB due to setrlimit restriction. >> >> I have tested it on x86_64-linux-gnu and i686-linux-gnu native toolchains, >> and aarch64-linux-gnu and arm-linux-gnueabi[hf] cross-toolchains with no >> regressions [*]. >> >> OK to commit? >> >> I didn't go as far as enabling setenv/locale tests when cross-testing >> libstdc++ because I remember of issues with generating locales in >> cross-built glibc. In any case, locale tests are unlikely to OOM the test >> board the way that absence of setrlimit does. >> >> [1] 27_io/ios_base/storage/2.cc >> >> [*] Cross-testing using user-mode QEMU made 27_io/fpos/14775.cc execution >> test to FAIL. This test uses setrlimit set max file size, and is >> misbehaving only under QEMU. I believe this a QEMU issue with not handling >> setrlimit correctly. >> > > Ping. > > -- > Maxim Kuvyrkov > www.linaro.org 0001-Use-setrlimit-for-testing-libstdc-in-cross-toolchain.patch Description: Binary data
Re: Cleanup Linux libc selection and Android support
On 6/11/2013, at 1:44 pm, Maxim Kuvyrkov wrote: > On 19/09/2013, at 8:26 am, Maxim Kuvyrkov wrote: > >> Following recent breakage caused by adding nominal Android support to all >> *linux* targets [*] this patch series cleans up libc selection for Linux >> targets (-mglibc/-muclibc/-mbionic), splits libc selection logic from >> Android support, and removes Android handling from targets that don't >> support it. > > Ping. Anyone wants to review these cleanup patches to config.gcc, config/t-* > and config/*.h files, or should I just start committing them quietly :-P ? This patch series was approved by Jeff Law, and I've committed them after retesting on current mainline. Thank you, -- Maxim Kuvyrkov www.kugelworks.com
[PATCH] Fix mips64-linux and s390x-linux builds
On 9/12/2013, at 8:21 am, Maxim Kuvyrkov wrote: > On 9/12/2013, at 3:24 am, Jan-Benedict Glaw wrote: > >> Hi Maxim! >> >> One of your recent libc<->android clean-up patches broke the >> mips64-linux target as a side-effect, see eg. >> http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=53806: >> >> g++ -c -DIN_GCC_FRONTEND -DIN_GCC_FRONTEND -g -O2 -DIN_GCC >> -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti >> -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings >> -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long >> -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H >> -I. -Ic-family -I/home/jbglaw/repos/gcc/gcc >> -I/home/jbglaw/repos/gcc/gcc/c-family >> -I/home/jbglaw/repos/gcc/gcc/../include >> -I/home/jbglaw/repos/gcc/gcc/../libcpp/include >> -I/home/jbglaw/repos/gcc/gcc/../libdecnumber >> -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber >> -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o c-family/c-cppbuiltin.o >> -MT c-family/c-cppbuiltin.o -MMD -MP -MF c-family/.deps/c-cppbuiltin.TPo >> /home/jbglaw/repos/gcc/gcc/c-family/c-cppbuiltin.c >> /home/jbglaw/repos/gcc/gcc/c-family/c-cppbuiltin.c: In function ‘void >> c_cpp_builtins(cpp_reader*)’: >> /home/jbglaw/repos/gcc/gcc/c-family/c-cppbuiltin.c:1014:370: error: >> ‘ANDROID_TARGET_OS_CPP_BUILTINS’ was not declared in this scope >> make[1]: *** [c-family/c-cppbuiltin.o] Error 1 > > I'm looking into this. My recent patches to cleanup support for Android/Bionic for *-linux-* targets broke mips64-linux and s390x-linux builds. Unfortunately, these targets fell out from the test coverage of these cleanups. The problems are in missing declarations, and are trivial to fix by adding necessary files in config.gcc. The fix for mips64-linux (and, also, mips*-mti-linux*) is to add linux-android.h and linux-android.opt to the target header/option files -- thus enabling android handling for all mips*-linux* toolchains. The fix for s390x-linux is to add linux-protos.h to target headers. No android handling is added to s390x-linux. Richard, Andreas, are you OK with changes to your respective ports? Thank you, -- Maxim Kuvyrkov www.kugelworks.com 0001-config.gcc-mips-mti-linux-mips64-linux.patch Description: Binary data
Re: [PATCH] Fix mips64-linux and s390x-linux builds
On 10/12/2013, at 1:07 am, Maxim Kuvyrkov wrote: > My recent patches to cleanup support for Android/Bionic for *-linux-* targets > broke mips64-linux and s390x-linux builds. Unfortunately, these targets fell > out from the test coverage of these cleanups. > > The problems are in missing declarations, and are trivial to fix by adding > necessary files in config.gcc. > > The fix for mips64-linux (and, also, mips*-mti-linux*) is to add > linux-android.h and linux-android.opt to the target header/option files -- > thus enabling android handling for all mips*-linux* toolchains. > > The fix for s390x-linux is to add linux-protos.h to target headers. No > android handling is added to s390x-linux. > > Richard, Andreas, are you OK with changes to your respective ports? Given that the one-line fix for s390x-linux is trivial (just add file with function prototypes to the list of headers), and that Richard approved mips*-linux part of the patch -- I'm going to commit the patch tomorrow unless I hear something to the contrary. I don't want to keep mips and s390x builds broken for too much longer. Thank you, -- Maxim Kuvyrkov www.kugelworks.com
Re: [testsuite] Fix potential race conditions in gfortran tests
Hi, And this is a similar patch for gfortran.dg/streamio_N.f90 tests. OK to commit? -- Maxim Kuvyrkov www.linaro.org > On Oct 18, 2015, at 4:17 PM, Mikael Morin wrote: > > Le 16/10/2015 10:08, Christophe Lyon a écrit : >> Hi, >> >> We have noticed a few random failures in gfortran tests in our validations. >> >> Maxim investigated some of them and noticed a possible race condition >> in the streamio tests, for which he'll post a patch. >> >> I looked for other similar cases (checking which files are unlinked >> several times during 'make check'), and noticed that a few other cases >> might read/write/delete the same file concurrently. >> >> The proposed fix is to use different names for the different testcases. >> >> I ran make check -jX on various ARM/AArch64 configuration with no regression. >> >> OK? >> > Looks good to me. Thanks. > > Mikael 0001-Fix-race-on-temp-file-in-gfortran-streamio_-.f90-tes.ChangeLog Description: Binary data 0001-Fix-race-on-temp-file-in-gfortran-streamio_-.f90-tes.patch Description: Binary data
[PATCH] Fix detection of setrlimit in libstdc++ testsuite
Hi, This patch fixes an obscure cross-testing problem that crashed (OOMed) our boards at Linaro. Several tests in libstdc++ (e.g., [1]) limit themselves to some reasonable amount of RAM and then try to allocate 32 gigs. Unfortunately, the configure test that checks presence of setrlimit is rather strange: if target is native, then try compile file with call to setrlimit -- if compilation succeeds, then use setrlimit, otherwise, ignore setrlimit. The strange part is that the compilation check is done only for native targets, as if cross-toolchains can't generate working executables. [This is rather odd, and I might be missing some underlaying caveat.] Therefore, when testing a cross toolchain, the test [1] still tries to allocate 32GB of RAM with no setrlimit restrictions. On most targets that people use for cross-testing this is not an issue because either - the target is 32-bit, so there is no 32GB user-space to speak of, or - the target board has small amount of RAM and no swap, so allocation immediately fails, or - the target board has plenty of RAM, so allocating 32GB is not an issue. However, if one is testing on a 64-bit board with 16GB or RAM and 16GB of swap, then one gets into an obscure near-OOM swapping condition. This is exactly the case with cross-testing aarch64-linux-gnu toolchains on APM Mustang. The attached patch removes "native" restriction from configure test for setrlimit. This enables setrlimit restrictions on the testsuite, and the test [1] expectedly fails to allocate 32GB due to setrlimit restriction. I have tested it on x86_64-linux-gnu and i686-linux-gnu native toolchains, and aarch64-linux-gnu and arm-linux-gnueabi[hf] cross-toolchains with no regressions [*]. OK to commit? I didn't go as far as enabling setenv/locale tests when cross-testing libstdc++ because I remember of issues with generating locales in cross-built glibc. In any case, locale tests are unlikely to OOM the test board the way that absence of setrlimit does. [1] 27_io/ios_base/storage/2.cc [*] Cross-testing using user-mode QEMU made 27_io/fpos/14775.cc execution test to FAIL. This test uses setrlimit set max file size, and is misbehaving only under QEMU. I believe this a QEMU issue with not handling setrlimit correctly. -- Maxim Kuvyrkov www.linaro.org 0001-Use-setrlimit-for-testing-libstdc-in-cross-toolchain.patch Description: Binary data
Re: [PATCH] Fix detection of setrlimit in libstdc++ testsuite
> On Nov 11, 2015, at 7:56 PM, Maxim Kuvyrkov wrote: > > Hi, > > This patch fixes an obscure cross-testing problem that crashed (OOMed) our > boards at Linaro. Several tests in libstdc++ (e.g., [1]) limit themselves to > some reasonable amount of RAM and then try to allocate 32 gigs. > Unfortunately, the configure test that checks presence of setrlimit is rather > strange: if target is native, then try compile file with call to setrlimit -- > if compilation succeeds, then use setrlimit, otherwise, ignore setrlimit. > The strange part is that the compilation check is done only for native > targets, as if cross-toolchains can't generate working executables. [This is > rather odd, and I might be missing some underlaying caveat.] > > Therefore, when testing a cross toolchain, the test [1] still tries to > allocate 32GB of RAM with no setrlimit restrictions. On most targets that > people use for cross-testing this is not an issue because either > - the target is 32-bit, so there is no 32GB user-space to speak of, or > - the target board has small amount of RAM and no swap, so allocation > immediately fails, or > - the target board has plenty of RAM, so allocating 32GB is not an issue. > > However, if one is testing on a 64-bit board with 16GB or RAM and 16GB of > swap, then one gets into an obscure near-OOM swapping condition. This is > exactly the case with cross-testing aarch64-linux-gnu toolchains on APM > Mustang. > > The attached patch removes "native" restriction from configure test for > setrlimit. This enables setrlimit restrictions on the testsuite, and the > test [1] expectedly fails to allocate 32GB due to setrlimit restriction. > > I have tested it on x86_64-linux-gnu and i686-linux-gnu native toolchains, > and aarch64-linux-gnu and arm-linux-gnueabi[hf] cross-toolchains with no > regressions [*]. > > OK to commit? > > I didn't go as far as enabling setenv/locale tests when cross-testing > libstdc++ because I remember of issues with generating locales in cross-built > glibc. In any case, locale tests are unlikely to OOM the test board the way > that absence of setrlimit does. > > [1] 27_io/ios_base/storage/2.cc > > [*] Cross-testing using user-mode QEMU made 27_io/fpos/14775.cc execution > test to FAIL. This test uses setrlimit set max file size, and is misbehaving > only under QEMU. I believe this a QEMU issue with not handling setrlimit > correctly. > Ping. -- Maxim Kuvyrkov www.linaro.org
Re: One more patch for PR91333
Hi Vladimir, This patch increases code size at -Os on arm-linux-gnueabihf by 1% (with no code-size reductions) on several SPEC CPU2006 benchmarks: 400.perlbench,perlbench_base.default ,580842,583842 429.mcf,mcf_base.default ,7867,7955 403.gcc,gcc_base.default ,1726449,1736149 433.milc,milc_base.default ,66328,66816 456.hmmer,hmmer_base.default ,148394,149434 482.sphinx3,sphinx_livepretend_base.default ,99183,99863 Could you look into whether these regressions can be avoided? Thanks, -- Maxim Kuvyrkov https://www.linaro.org > On Feb 2, 2020, at 7:46 PM, Vladimir Makarov wrote: > > The previous patch for > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91333 > > resulted in aarch64 testsuite failures. > > The following patch solves some of the failures and modified the PR test as > the generated code changed. > > The patch was successfully bootstrapped on x86-64 and benchmarked on SPEC2000. > > >
Re: One more patch for PR91333
> On Feb 18, 2020, at 6:30 PM, Vladimir Makarov wrote: > > On 2/17/20 10:08 AM, Maxim Kuvyrkov wrote: >> Hi Vladimir, >> >> This patch increases code size at -Os on arm-linux-gnueabihf by 1% (with no >> code-size reductions) on several SPEC CPU2006 benchmarks: >> >> 400.perlbench,perlbench_base.default ,580842,583842 >> 429.mcf,mcf_base.default ,7867,7955 >> 403.gcc,gcc_base.default ,1726449,1736149 >> 433.milc,milc_base.default ,66328,66816 >> 456.hmmer,hmmer_base.default ,148394,149434 >> 482.sphinx3,sphinx_livepretend_base.default ,99183,99863 >> >> Could you look into whether these regressions can be avoided? >> > Sure, Maxim. I'll look into it. If I can not solve the problem, I probably > revert the patch. Thanks, Vladimir. Forgot to mention that this is for Thumb2 (--target=arm-linux-gnueabihf --with-mode=thumb). Regards, -- Maxim Kuvyrkov https://www.linaro.org
Re: [PATCH 4/6] Port prefetch configuration from aarch32 to aarch64
> On Jun 8, 2017, at 6:13 PM, Richard Earnshaw (lists) > wrote: > > On 08/06/17 14:47, James Greenhalgh wrote: >> On Mon, Jan 30, 2017 at 08:35:00AM -0800, Andrew Pinski wrote: >>> On Mon, Jan 30, 2017 at 3:48 AM, Maxim Kuvyrkov >>> wrote: >>>> This patch port prefetch configuration from aarch32 backend to aarch64. >>>> There is no code-generation change from this patch. >>>> >>>> This patch also happens to address Kyrill's comment on Andrew's >>>> prefetching patch at >>>> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02133.html . >>>> >>>> This patch also fixes a minor bug in aarch64_override_options_internal(), >>>> which used "selected_cpu->tune" instead of "aarch64_tune_params". >>> >>> I am not a fan of the macro at all. >> >> I'm with Andrew for this. The precedent in the AArch64 port is for >> explicitly spelling this out, as we do with the branch costs, approx_modes, >> vector costs etc. >> >> I'd rather we went that route than the macro you're using. I don't have >> any objections to the rest of your patch. >> >> Thanks, >> James >> > > I disagree with having to write all these things out, but I do agree > that we should be self-consistent within the port. I'm re-writing the patch with approach that James suggested, and instead of AARCH64_PREFETCH_NOT_BENEFICIAL there will be "&generic_prefetch_tune", so not much copy-paste of parameters. > > The purpose of introducing the macros in the ARM port was to avoid the > common problem of merging adding new parameters with adding new ports. > C initializers for these tables assign values sequentially with > zero-padding at the end, so failing to merge such changes carefully > leads to real bugs in the compiler (even if just performance bugs). > > I notice that the last entry in the current tune params table is an int, > rather than something that is type-checked (like the penultimate entry - > an enum). Having the final entry be type checked at least ensures that > the right number of elements exist, even if the order is not strictly > checked. If you prefer, I can move "&generic_prefetch_tune" to the bottom of "struct tune_params", thus enabling type checking on struct pointer. -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH 4/6] Port prefetch configuration from aarch32 to aarch64
> On Jun 9, 2017, at 1:04 PM, James Greenhalgh wrote: >>> ... >>> I disagree with having to write all these things out, but I do agree >>> that we should be self-consistent within the port. >> >> I'm re-writing the patch with approach that James suggested, and instead of >> AARCH64_PREFETCH_NOT_BENEFICIAL there will be "&generic_prefetch_tune", so >> not much copy-paste of parameters. >> >>> >>> The purpose of introducing the macros in the ARM port was to avoid the >>> common problem of merging adding new parameters with adding new ports. >>> C initializers for these tables assign values sequentially with >>> zero-padding at the end, so failing to merge such changes carefully >>> leads to real bugs in the compiler (even if just performance bugs). >>> >>> I notice that the last entry in the current tune params table is an int, >>> rather than something that is type-checked (like the penultimate entry - >>> an enum). Having the final entry be type checked at least ensures that >>> the right number of elements exist, even if the order is not strictly >>> checked. >> >> If you prefer, I can move "&generic_prefetch_tune" to the bottom of "struct >> tune_params", thus enabling type checking on struct pointer. > > For the zero-cost/small-benefit tradefoff it gives, I'd say "why not". > > With that change in place and the obvious rebase needed for patch 6/6 in > place, both this (4/6) and patch 6/6 are OK. Thanks, James. I'm going to commit this once bootstrap / regtest finish on aarch64-linux-gnu. -- Maxim Kuvyrkov www.linaro.org 0001-Port-prefetch-configuration-from-aarch32-to-aarch64-.patch Description: Binary data
Re: [PATCH 5/6][AArch64] Enable -fprefetch-loop-arrays at -O3 for cores that benefit from prefetching.
> On Jun 8, 2017, at 7:31 PM, James Greenhalgh wrote: > > On Fri, Feb 03, 2017 at 02:58:23PM +0300, Maxim Kuvyrkov wrote: >>> On Jan 30, 2017, at 5:50 PM, Maxim Kuvyrkov >>> wrote: >>> >>>> On Jan 30, 2017, at 3:23 PM, Kyrill Tkachov >>>> wrote: >>>> >>>> Hi Maxim, >>>> >>>> On 30/01/17 12:06, Maxim Kuvyrkov wrote: >>>>> This patch enables prefetching at -O3 for aarch64 cores that set >>>>> "simultaneous prefetches" parameter above 0. There are currently no such >>>>> settings, so this patch doesn't change default code generation. >>>>> >>>>> I'm now working on improvements to -fprefetch-loop-arrays pass to make it >>>>> suitable for -O2. I'll post this work in the next month. >>>>> >>>>> Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu. >>>>> >>>> >>>> Are you aiming to get this in for GCC 8? >>>> I have one small comment on this patch: >>>> >>>> + /* Enable sw prefetching at -O3 for CPUS that have prefetch, and we >>>> + have deemed it beneficial (signified by setting >>>> + prefetch.num_slots to 1 or more). */ >>>> + if (flag_prefetch_loop_arrays < 0 >>>> + && HAVE_prefetch >>>> >>>> HAVE_prefetch will always be true on aarch64. >>>> I imagine midend code that had logic like this would need this check, but >>>> aarch64-specific code shouldn't need it. >>> >>> Agree, I'll remove HAVE_prefetch. >>> >>> This pattern was copied from other backends, and HAVE_prefetch is most >>> likely a historical artifact. >> >> Andrew raised a good point in the review of his patch that it is a bad idea >> to use one of prefetching parameters (simultaneous_prefetches) as indicator >> for whether to enable prefetching pass by default. Indeed there are cases >> when we want to set simultaneous_prefetch according to HW documentation (or >> experimental results), but not enable prefetching pass by default. >> >> This update to the patch addresses it. The patch adds a new explicit field >> to prefetch tuning structure "default_opt_level" that sets optimization level >> from which prefetching should be enabled by default. The current value is to >> enable prefetching at -O3; additionally, this parameter will come handy for >> enabling prefetching at -O2 [when it is ready]. > > I really don't like the scheme of changing the optimisation threshold when > profiling data is used. > > I've seen too many reports and presentations by the uninitiated who believe > that the use of profiling data has made the difference, when in reality > it is just GCC changing behaviour on which passes run. It is very > misleading! OK, agree. That line came from i386 backend. I'll run benchmarks for enabling prefetching at -O2 at a later date, and, possibly, we will have a performance argument for reducing prefetch threshold when profile data is available. > > With that line removed, and any rebasing needed over changes to the macro, > I'm happy with this patch. Attached is updated patch. I'll commit it after bootstrap / regtest. Thanks, -- Maxim Kuvyrkov www.linaro.org 0002-Enable-fprefetch-loop-arrays-at-given-optimization-l.patch Description: Binary data
Re: [PATCH 6/6][AArch64] Update prefetch tuning parameters for falkor and qdf24xx tunings.
> On Jun 8, 2017, at 7:32 PM, James Greenhalgh wrote: > > On Mon, Jan 30, 2017 at 03:08:04PM +0300, Maxim Kuvyrkov wrote: >> This patch enables software prefetching at -O3 for Qualcomm's qdf24xx cores. >> >> Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu. > > This patch is OK in whatever form it takes after rebasing for the macro > in 4/6. Here is updated patch for the record. -- Maxim Kuvyrkov www.linaro.org 0003-Update-prefetch-tuning-parameters-for-qdf24xx.patch Description: Binary data
Re: [PATCH 0/6] Improve -fprefetch-loop-arrays in general and for AArch64 in particular
> On May 28, 2017, at 8:01 AM, Andrew Pinski wrote: > > On Tue, Feb 28, 2017 at 1:53 AM, Maxim Kuvyrkov >> ... >> Hi Kyrill, >> >> My hope was to push them in time for GCC 7, but it seems to late now. I'll >> return to these patches at the beginning of Stage 1. > > Ping on this patch set as I really want to get in the prefetching side > for ThunderX 1 and 2. Or should I resubmit my patch set? Hi Andrew, James G. approved my prefetching patches, and I will be merging them today or on Monday. I've rebase your ThunderX patch on top of my patchset. I leave it to you to re-test and get it reviewed. -- Maxim Kuvyrkov www.linaro.org 0004-tFrom-Maxim-Kuvyrkov-maxim.kuvyrkov-linaro.org.patch Description: Binary data
Re: [C++ PATCH] Fix -fsanitize={null,alignment} of references (PR c++/79572)
On Thu, Mar 23, 2017 at 11:37 PM, Jakub Jelinek wrote: > Hi! > > Since late C++ folding has been committed, we don't sanitize some reference > bindings to NULL. Earlier we had always NOP_EXPR to REFERENCE_TYPE say from > INTEGER_CST or whatever else, but cp_fold can now turn that right into > INTEGER_CST with REFERENCE_TYPE. The following patch sanitizes even those. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2017-03-23 Jakub Jelinek > > PR c++/79572 > * c-ubsan.h (ubsan_maybe_instrument_reference): Change argument to > tree *. > * c-ubsan.c (ubsan_maybe_instrument_reference): Likewise. Handle > not just NOP_EXPR to REFERENCE_TYPE, but also INTEGER_CST with > REFERENCE_TYPE. > > * cp-gimplify.c (cp_genericize_r): Sanitize INTEGER_CSTs with > REFERENCE_TYPE. Adjust ubsan_maybe_instrument_reference caller > for NOP_EXPR to REFERENCE_TYPE. > > * g++.dg/ubsan/null-8.C: New test. > ... > --- gcc/testsuite/g++.dg/ubsan/null-8.C.jj 2017-03-23 09:42:31.664696676 > +0100 > +++ gcc/testsuite/g++.dg/ubsan/null-8.C 2017-03-23 09:43:31.501908802 +0100 > @@ -0,0 +1,19 @@ > +// PR c++/79572 > +// { dg-do run } > +// { dg-options "-fsanitize=null -std=c++14" } > +// { dg-output "reference binding to null pointer of type 'const int'" } > + > +void > +foo (const int &iref) > +{ > + if (&iref) > +__builtin_printf ("iref %d\n", iref); > + else > +__builtin_printf ("iref is NULL\n"); Hi Jakub, Using __builtin_printf causes this test to fail sporadically when cross-testing. Stdout and stderr output can get mixed in cross-testing, so dejagnu might see == g++.dg/ubsan/null-8.C:18:7: runtime error: reference binding to null pointer of type iref is NULL 'const int' == instead of == g++.dg/ubsan/null-8.C:18:7: runtime error: reference binding to null pointer of type 'const int' iref is NULL == Is it essential for this testcase to use __builtin_printf or simple "fprintf (stderr, ...)" would do just fine? > +} > + > +int > +main () > +{ > + foo (*((int*) __null)); > +} Regards, -- Maxim Kuvyrkov
Re: C++ PATCH to primary_template_instantiation_p
> On Nov 28, 2017, at 12:29 AM, Jason Merrill wrote: > > All the uses of primary_template_instantiation_p actually want to > query whether the entity in question is a specialization of the > template, not whether it's an instantiation or explicit > specialization. > > Tested x86_64-pc-linux-gnu, applying to trunk. > Hi Jason, I get the following failure with the new test on x86_64-linux-gnu and aarch64-linux-gnu: > --- /dev/null > +++ b/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg2a.C > @@ -0,0 +1,27 @@ > +// PR c++/46831 > +// { dg-do compile { target c++11 } } > +// { dg-options "" } > + > +struct B { }; > +struct D : B { }; > +struct A { > + template operator D&(); // { dg-message "template > conversion" } > + operator long(); > +}; > + > +template <> A::operator D&(); "Template conversion" warning is triggered on this line, rather than above. > + > +void f(long); > +void f(B&); > + > +struct A2 { > + template operator B&(); > +}; > + > +void f2(const B&); > + > +int main() { > + f(A()); > + f2(A2()); > + f2(A()); // { dg-error "" } > +} > Would you please take a look? === spawn -ignore SIGHUP /home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/_build/builds/x86_64-unknown-linux-gnu/x86_64-unknown-linux-gnu/gcc.git~master-stage2/gcc/testsuite/g++5/../../xg++ -B/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/_build/builds/x86_64-unknown-linux-gnu/x86_64-unknown-linux-gnu/gcc.git~master-stage2/gcc/testsuite/g++5/../../ /home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg2a.C -fno-diagnostics-show-caret -fdiagnostics-color=never -nostdinc++ -I/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/_build/builds/x86_64-unknown-linux-gnu/x86_64-unknown-linux-gnu/gcc.git~master-stage2/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu -I/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/_build/builds/x86_64-unknown-linux-gnu/x86_64-unknown-linux-gnu/gcc.git~master-stage2/x86_64-unknown-linux-gnu/libstdc++-v3/include -I/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/libstdc++-v3/libsupc++ -I/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/libstdc++-v3/include/backward -I/home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/libstdc++-v3/testsuite/util -fmessage-length=0 -std=gnu++11 -S -o fntmpdefarg2a.s /home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg2a.C: In function 'int main()': /home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg2a.C:26:6: error: invalid user-defined conversion from 'A' to 'const B&' [-fpermissive] /home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg2a.C:12:13: note: candidate is: 'A::operator D&() [with T = void]' /home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg2a.C:12:13: note: conversion from return type 'D&' of template conversion function specialization to 'const B&' is not an exact match /home/tcwg-buildslave/workspace/tcwg-buildfarm/tcwg-x86_64-build/snapshots/gcc.git~master/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg2a.C:21:6: note: initializing argument 1 of 'void f2(const B&)' compiler exited with status 1 FAIL: g++.dg/cpp0x/fntmpdefarg2a.C -std=gnu++11 (test for warnings, line 8) PASS: g++.dg/cpp0x/fntmpdefarg2a.C -std=gnu++11 (test for errors, line 26) PASS: g++.dg/cpp0x/fntmpdefarg2a.C -std=gnu++11 (test for excess errors) === Regards, -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH] PR84068: Fix sort order of SCHED_PRESSURE_MODEL
> On Jan 31, 2018, at 4:33 PM, Wilco Dijkstra wrote: > > Richard Sandiford wrote: > >> This was the original intent, but was changed in r213708. TBH I'm not >> sure what the second hunk in that revision fixed, since model_index is >> supposed to return an index greater than all valid indices when passed >> an instruction outside the current block. Maxim, do you remember? > > See https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00932.html, it says: > > "The second one is to account for the fact that model_index() of two > instructions > is meaningful only when both instructions are in the current basic block." > > Unless something has changed, I'm assuming that's still true today. Maybe > the underlying idea was to allow interleaving of instructions outside the > current block, Yes. > but that isn't feasible if you want a well-defined sort ordering. Agree. Wilco, your patch looks good. Thanks for fixing this! -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH] PR84068: Fix sort order of SCHED_PRESSURE_MODEL
> On Feb 2, 2018, at 7:40 PM, Wilco Dijkstra wrote: > > Right, so here is version 2 which ends up much simpler: > > The comparison function for SCHED_PRESSURE_MODEL is incorrect. If either > instruction is not in target_bb, the ordering is not well defined. > Since all instructions outside the target_bb get the highest model_index, > all we need to do is sort on model_index. If the model_index is the same > we defer to RFS_DEP_COUNT and/or RFS_TIE. > > Bootstrap OK, OK for commit? Looks good to me. -- Maxim Kuvyrkov www.linaro.org > > ChangeLog: > 2018-02-02 Wilco Dijkstra > > PR rlt-optimization/84068 > * haifa-sched.c (rank_for_schedule): Fix SCHED_PRESSURE_MODEL sorting. > > PR rlt-optimization/84068 > * gcc.dg/pr84068.c: New test. > -- > > diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c > index > ebdec46bf04f1ba07e8b70607602a3bc9e7ec7de..4a899b56173dd7d67db084ed86d24ad865ccadcd > 100644 > --- a/gcc/haifa-sched.c > +++ b/gcc/haifa-sched.c > @@ -2783,12 +2783,11 @@ rank_for_schedule (const void *x, const void *y) > } > > /* Prefer instructions that occur earlier in the model schedule. */ > - if (sched_pressure == SCHED_PRESSURE_MODEL > - && INSN_BB (tmp) == target_bb && INSN_BB (tmp2) == target_bb) > + if (sched_pressure == SCHED_PRESSURE_MODEL) > { > diff = model_index (tmp) - model_index (tmp2); > - gcc_assert (diff != 0); > - return rfs_result (RFS_PRESSURE_INDEX, diff, tmp, tmp2); > + if (diff != 0) > + return rfs_result (RFS_PRESSURE_INDEX, diff, tmp, tmp2); > } > > /* Prefer the insn which has more later insns that depend on it. > diff --git a/gcc/testsuite/gcc.dg/pr84068.c b/gcc/testsuite/gcc.dg/pr84068.c > new file mode 100644 > index > ..13110d84455f20edfc50f09efe4074721bd6a7d0 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr84068.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-sched-critical-path-heuristic > -fno-sched-rank-heuristic --param=max-sched-extend-regions-iters=5 --param > sched-pressure-algorithm=2" } */ > + > +#ifdef __SIZEOF_INT128__ > +typedef __int128 largeint; > +#else > +typedef long long largeint; > +#endif > + > +largeint a; > +int b; > + > +largeint > +foo (char d, short e, int f) > +{ > + b = __builtin_sub_overflow_p (b, 1, (unsigned long)0); > + return a + f; > +}
Re: [PATCH][arm] PR target/82518: Return false in ARRAY_MODE_SUPPORTED_P for BYTES_BIG_ENDIAN
> On Mar 20, 2018, at 8:11 PM, Kyrill Tkachov > wrote: > > Hi all, > > This PR shows that we get the load/store_lanes logic wrong for arm big-endian. > It is tricky to get right. Aarch64 does it by adding the appropriate > lane-swapping > operations during expansion. > > I'd like to do the same on arm eventually, but we'd need to port and validate > the VTBL-generating > code and add it to all the right places and I'm not comfortable enough doing > it for GCC 8, but I am keen > in getting the wrong-code fixed. > As I say in the PR, vectorisation on armeb is already severely restricted (we > disable many patterns on BYTES_BIG_ENDIAN) > and the load/store_lanes patterns really were not working properly at all, so > disabling them is not > a radical approach. > > The way to do that is to return false in ARRAY_MODE_SUPPORTED_P for > BYTES_BIG_ENDIAN. > > Bootstrapped and tested on arm-none-linux-gnueabihf. > Also tested on armeb-none-eabi. > > Committing to trunk. ... > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -6609,7 +6609,8 @@ proc check_effective_target_vect_load_lanes { } { > verbose "check_effective_target_vect_load_lanes: using cached result" 2 > } else { > set et_vect_load_lanes 0 > - if { ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok]) > + # We don't support load_lanes correctly on big-endian arm. > + if { ([istarget arm-*-*] && [check_effective_target_arm_neon_ok]) >|| [istarget aarch64*-*-*] } { > set et_vect_load_lanes 1 > } > Hi Kyrill, This part makes armv8l-linux-gnueabihf target fail a few of slp-perm-* tests. Using [check_effective_target_arm_little_endian] should fix this. Would you please fix this on master and gcc-7-branch? Thanks! -- Maxim Kuvyrkov www.linaro.org
Re: [Libgomp, Fortran] Fix canadian cross build
> On Jun 23, 2017, at 4:44 PM, Yvan Roux wrote: > > Hello, > > Fortran parts of libgomp (omp_lib.mod, openacc.mod, etc...) are > missing in a canadian cross build, at least when target gfortran > compiler comes from PATH and not from GFORTRAN_FOR_TARGET. > > Back in 2010, executability test of GFORTRAN was added to fix libgomp > build on cygwin, but when the executable doesn't contain the path, > "test -x" fails and part of the library are not built. > > This patch fixes the issue by using M4 macro AC_PATH_PROG (which > returns the absolute name) instead of AC_CHECK_PROG in the function > defined in config/acx.m4: NCN_STRICT_CHECK_TARGET_TOOLS. I renamed it > into NCN_STRICT_PATH_TARGET_TOOLS to keep the semantic used in M4. > > Tested by building cross and candian cross toolchain (host: > i686-w64-mingw32) for arm-linux-gnueabihf with issue and with a > complete libgomp. > > ok for trunk ? Hi Yvan, The patch looks OK, but it is a pain to review. Would you please split it into 2 patches: one for the mechanical renames, and one for logical changes to acx.m4? This should allow Paolo and DJ to approve your patch. Thanks! > > Thanks > Yvan > > config/ChangeLog > 2017-06-23 Yvan Roux > >* acx.m4 (NCN_STRICT_CHECK_TARGET_TOOLS): Renamed to ... >(NCN_STRICT_PATH_TARGET_TOOLS): ... this. It reflects the replacement >of AC_CHECK_PROG by AC_PATH_PROG to get the absolute name of the >program. >(ACX_CHECK_INSTALLED_TARGET_TOOL): Use renamed function. > > ChangeLog > 2017-06-23 Yvan Roux > >* configure.ac: Use NCN_STRICT_PATH_TARGET_TOOLS instead of >NCN_STRICT_CHECK_TARGET_TOOLS. >* configure: Regenerate. > -- Maxim Kuvyrkov www.linaro.org
Re: Cleanup Linux libc selection and Android support
On 19/09/2013, at 8:26 am, Maxim Kuvyrkov wrote: > Following recent breakage caused by adding nominal Android support to all > *linux* targets [*] this patch series cleans up libc selection for Linux > targets (-mglibc/-muclibc/-mbionic), splits libc selection logic from Android > support, and removes Android handling from targets that don't support it. Ping. Anyone wants to review these cleanup patches to config.gcc, config/t-* and config/*.h files, or should I just start committing them quietly :-P ? Thanks, -- Maxim Kuvyrkov www.kugelworks.com > > [*] http://thread.gmane.org/gmane.comp.gcc.patches/277430/focus=292362 > > Special thanks goes to Alexander who tested and reviewed initial versions of > these patches and fixed several problems. > > The patch series was tested on various Linux and uClinux targets including > arm, bfin, c6x, m68k, mips, powerpc, x86, x86_64. > > Patches will be posted in their separate threads, and below is a summary. > Individually the patches are all borderline trivial. > > Reviews and approvals are welcome! > > - 0001-Rename-files-for-libc-selection-on-Linux-targets > Mechanical rename of files in preparation for splitting Android handling from > libc selection. > > - 0002-Rename-functions-relating-to-libc-support-on-Linux-t > Mechanical rename of functions. > > - 0003-Robustify-check-for-IFUNC-support > Trivial fix. > > - 0004-Cleanup-definitions-of-libc-related-target-hooks > Consolidate definitions of libc target hooks in linux.h > > - 0005-Cleanup-libc-selection-and-Android-support > Split Android handling from libc selection and remove Android handling from > targets that don't support it. > > Thanks, > > -- > Maxim Kuvyrkov > www.kugelworks.com > > >
[PATCH] Fix race in libstdc++ testsuite
Hi, This patch fixes a race in libstdc++ testsuite. When we split libstdc++'s testsuite into several parallel instances, it may occur that testing for features (check_v3_target_fileio in this case) is done in two sub-processes at the same time. The race occurs on the cin_unget-1.txt data file, with one thread overwriting the file that was created by another thread. The fix is to use unique name cin_unget-1-[pid].txt for the data file. Tested on x86_64-linux-gnu. OK to apply? -- Maxim Kuvyrkov www.linaro.org 0001-Fix-race-in-libstdc-testsuite.ChangeLog Description: Binary data 0001-Fix-race-in-libstdc-testsuite.patch Description: Binary data
[PATCH] Account for prologue spills in reg_pressure scheduling
Hi, This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number of available registers. At the moment the scheduler does not account for spills in the prologues and restores in the epilogue, which occur from use of call-used registers. The current state is, essentially, optimized for case when there is a hot loop inside the function, and the loop executes significantly more often than the prologue/epilogue. However, on the opposite end, we have a case when the function is just a single non-cyclic basic block, which executes just as often as prologue / epilogue, so spills in the prologue hurt performance as much as spills in the basic block itself. In such a case the scheduler should throttle-down on the number of available registers and try to not go beyond call-clobbered registers. The patch uses basic block frequencies to balance the cost of using call-used registers for intermediate cases between the two above extremes. The motivation for this patch was a floating-point testcase on arm-linux-gnueabihf (ARM is one of the few targets that use register pressure scheduling by default). A "thanks" goes to Richard good discussion of the problem and suggestions on the approach to fix it. The patch was bootstrapped on x86_64-linux-gnu (which doesn't really exercises the patch), and cross-tested on arm-linux-gnueabihf and aarch64-linux-gnu. OK to apply? -- Maxim Kuvyrkov www.linaro.org 0001-sched_class_reg_num.ChangeLog Description: Binary data 0001-sched_class_reg_num.patch Description: Binary data
[PATCH] Improve scheduler dumps of ready list
Hi, Following previous improvement to scheduler dumps that provided insight into which heuristics in rank_for_schedule make most decisions, this patch adds print outs that show the deciding reason for an instruction in the ready list to be at its particular place. This patch allowed me to troubleshoot several scheduling problems in the register pressure scheduling. Tested on x86_64-linux-gnu, arm-linux-gnueabihf and aarch64-linux-gnu. OK to apply? Thank you, -- Maxim Kuvyrkov www.linaro.org 0002-last_rfs_win.ChangeLog Description: Binary data 0002-last_rfs_win.patch Description: Binary data
Re: [PATCH] Account for prologue spills in reg_pressure scheduling
On Oct 21, 2014, at 8:11 AM, Sebastian Pop wrote: > Maxim Kuvyrkov wrote: >> Hi, >> >> This patch improves register pressure scheduling (both >> SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number >> of available registers. >> >> At the moment the scheduler does not account for spills in the prologues and >> restores in the epilogue, which occur from use of call-used registers. The >> current state is, essentially, optimized for case when there is a hot loop >> inside the function, and the loop executes significantly more often than the >> prologue/epilogue. However, on the opposite end, we have a case when the >> function is just a single non-cyclic basic block, which executes just as >> often as prologue / epilogue, so spills in the prologue hurt performance as >> much as spills in the basic block itself. In such a case the scheduler >> should throttle-down on the number of available registers and try to not go >> beyond call-clobbered registers. >> >> The patch uses basic block frequencies to balance the cost of using >> call-used registers for intermediate cases between the two above extremes. >> >> The motivation for this patch was a floating-point testcase on >> arm-linux-gnueabihf (ARM is one of the few targets that use register >> pressure scheduling by default). >> > > Does aarch64 enable reg pressure sched by default, or what is the flag to > enable it? > I'm planing to look at the perf impact of the patch. Thanks, benchmarking results are welcome! AArch64 doesn't use reg_pressure scheduling by default. Use "-fsched-pressure --param=sched-pressure-algorithm=2" to enable same thing as on ARM. I would imagine C++ and Fortran floating-point code to be most affected. -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH] Account for prologue spills in reg_pressure scheduling
[Adding ARM maintainers to CC] On Oct 21, 2014, at 9:44 AM, Sebastian Pop wrote: > Hi Maxim, > > Maxim Kuvyrkov wrote: >> Thanks, benchmarking results are welcome! AArch64 doesn't use reg_pressure >> scheduling by default. Use "-fsched-pressure >> --param=sched-pressure-algorithm=2" to enable same thing as on ARM. I would >> imagine C++ and Fortran floating-point code to be most affected. > > On aarch64 I only see perf improvements with your patch: no perf degradations > on > all the tests that I have run. > > base0: r216447, -O3 > base1: r216447, -O3 -fsched-pressure --param=sched-pressure-algorithm=2 > patch: r216447 + your patch, -O3 -fsched-pressure > --param=sched-pressure-algorithm=2 > > patch vs. base1 is only an improvement. > > base1 vs. base0 has a few good improvements, and some small degradations: your > patch improves the perf for one of the degradations to the point it is better > now with -fsched-pressure --param=sched-pressure-algorithm=2 than at -O3. > > Could we turn on "-fsched-pressure --param=sched-pressure-algorithm=2" by > default for aarch64? These are great results, yay! Sebastian, what benchmarks did you run? We need to see improvements on spec2k / spec2k6 to enable register-pressure scheduling on AArch64 by default. The current understanding is that AArch64 has enough registers to not benefit from pressure-aware scheduling. On the other hand, one could argue that cores with more complex pipelines (e.g., A57) might not benefit from pipeline-oriented scheduling either, and, therefore, scheduling for register pressure can provide a better win. Thank you, -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH] Account for prologue spills in reg_pressure scheduling
On Oct 21, 2014, at 10:39 AM, Ramana Radhakrishnan wrote: > On Mon, Oct 20, 2014 at 10:17 PM, Richard Sandiford > wrote: >> Maxim Kuvyrkov writes: >>> [Adding ARM maintainers to CC] >>> >>> On Oct 21, 2014, at 9:44 AM, Sebastian Pop wrote: >>> >>>> Hi Maxim, >>>> >>>> Maxim Kuvyrkov wrote: >>>>> Thanks, benchmarking results are welcome! AArch64 doesn't use >>>>> reg_pressure >>>>> scheduling by default. Use "-fsched-pressure >>>>> --param=sched-pressure-algorithm=2" to enable same thing as on ARM. I >>>>> would >>>>> imagine C++ and Fortran floating-point code to be most affected. >>>> >>>> On aarch64 I only see perf improvements with your patch: no perf >>>> degradations on >>>> all the tests that I have run. >>>> >>>> base0: r216447, -O3 >>>> base1: r216447, -O3 -fsched-pressure --param=sched-pressure-algorithm=2 >>>> patch: r216447 + your patch, -O3 -fsched-pressure >>>> --param=sched-pressure-algorithm=2 >>>> >>>> patch vs. base1 is only an improvement. >>>> >>>> base1 vs. base0 has a few good improvements, and some small degradations: >>>> your >>>> patch improves the perf for one of the degradations to the point it is >>>> better >>>> now with -fsched-pressure --param=sched-pressure-algorithm=2 than at -O3. >>>> >>>> Could we turn on "-fsched-pressure --param=sched-pressure-algorithm=2" by >>>> default for aarch64? > > We already have sched-pressure --param=sched-pressure-algorithm=1 on > by default in the AArch64 backend from September. > https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01663.html went in a few > days back. Oh, good, I didn't notice AArch64 using sched-pressure because of weighted algorithm. This patch affects both "weighted" and "model" algorithm, so it doesn't necessarily makes "model" algorithm better than "weighted". That said, my personal preference is for both AArch32 and AArch64 to use "model" algorithm, but, as you said, we need benchmarking data to make that decision. ... > We can run this and let > you know the results, though SPECFP2k6 takes quite a while - are all > your patches to sched-pressure now done ? Wait on benchmarking for 1-2 days. The sched-pressure patches are done now, but there are more patches for the 2nd scheduler pass in the queue (from linaro-dev/sched-model-prefetch branch). Hopefully those will be posted here today. Thank you, -- Maxim Kuvyrkov www.linaro.org
[PATCH 3/8] Remove cached_first_cycle_multipass_dfa_lookahead and cached_issue_rate
Hi, This patch cleans up (removes) cached_first_cycle_multipass_dfa_lookahead and cached_issue_rate. These seem to be an artifact from the scheduler refactoring 10+ years ago. They assume that dfa_lookahead and issue_rate can change mid-way through scheduling, which is never the case. All backends currently treat dfa_lookahead and issue_rate as constants for the duration of scheduling passes. Bootstrapped on x86_64-linux-gnu. Regression testing is in progress. OK to commit if no regressions? [ChangeLog is part of the git patch] -- Maxim Kuvyrkov www.linaro.org 0003-Remove-cached_first_cycle_multipass_dfa_lookahead-an.patch Description: Binary data
[PATCH 4/8] Disable max_issue when scheduling for register pressure
Hi, This patch disables max_issue-based lookahead multipass scheduling when register-pressure heuristics are active. The two approaches tend to undo each others decisions and don't play well together. Currently this patch is a no-op, since no target uses both max_issue and register-pressure scheduling. This will change soon with upcoming patches that will enable max_issue during 2nd scheduling pass for ARM and AArch64. Bootstrapped on x86_64-linux-gnu. Regression testing is in progress. OK to commit if no regressions? [ChangeLog is part of the git patch] Thank you, -- Maxim Kuvyrkov www.linaro.org 0004-Disable-max_issue-when-scheduling-for-register-press.patch Description: Binary data
[PATCH 5/8] Enable max_issue for AArch32 and AArch64
Hi Ramana, Hi Marcus, This patch enables max_issue multipass lookahead scheduling for 2nd scheduler pass (or, more pedantically, whenever register-pressure scheduling is not in use). Multipass lookahead scheduling is being enabled for cores that can issue 2 or more instructions per cycle, and it allows scheduler to better exploit multi-issue pipelines. This patch also provides foundation for [upcoming] auto-prefetcher model in the scheduler, which is handled via max_issue. This change requires benchmarking, which I can't easily do at the moment. I would appreciate any benchmarking results that you can share. Bootstrap on aarch64-linux-gnu is in progress. OK to apply, provided no performance or correctness regressions? Thank you, -- Maxim Kuvyrkov www.linaro.org 0005-Enable-max_issue-for-AArch32-and-AArch64.patch Description: Binary data
[PATCH 6/8] Handle SCRATCH in decompose_address
Hi, This patch is a simple fix to allow decompose_address to handle SCRATCH'es during 2nd scheduler pass. This patch is a prerequisite for a scheduler improvement that relies on decompose_address to parse insns. Bootstrapped and regtested on x86_64-linux-gnu and regtested on arm-linux-gnueabihf and aarch64-linux-gnu. OK to apply? Thank you, [ChangeLog is part of the git patch] -- Maxim Kuvyrkov www.linaro.org 0006-Handle-SCRATCH-in-decompose_address.patch Description: Binary data
[PATCH 7/8] Model cache auto-prefetcher in scheduler
Hi, This patch adds auto-prefetcher modeling to GCC scheduler. The auto-prefetcher model is currently enabled only for ARM Cortex-A15, since this is the only CPU that I know of to have the hardware auto-prefetcher unit. The documentation on the auto-prefetcher is very sparse, and all I have are my empirical studies and a short note in Cortex-A15 manual (search for "L2 cache auto-prefether"). This patch, therefore, implements a very abstract model that makes scheduler prefer "mem_op (base+8); mem_op (base+12)" over "mem_op (base+12); mem_op (base+8)". In other words, memory operations are tried to be issued in order of increasing memory offsets. The auto-prefetcher model implementation is based on max_issue mutlipass lookahead scheduling, and its "guard" hook. The guard hook examines contents of the ready list and the queue, and, if it finds instructions with lower memory offsets, marks instructions with higher memory offset as unavailable for immediate scheduling. This patch has been in works since beginning of the year, and many of my previous scheduler cleanup patches were to prepare the infrastructure for this feature. Ramana, this change requires benchmarking, which I can't easily do at the moment. I would appreciate any benchmarking results that you can share. In particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH needs to be tuned/confirmed for Cortex-A15. At the moment the parameter is set to "2", which means that the autopref model will look through ready list and 1-stall queue in search of relevant instructions. Values of -1 (disable autopref), 0 (use autopref only in rank_for_schedule), 1 (look through ready list), 2 (look through ready list and 1-stall queue), and 3 (look through ready list and 2-stall queue) should be considered and benchmarked. Bootstrapped on x86_64-linux-gnu and regtested on arm-linux-gnueaihf and aarch64-linux-gnu. OK to apply, provided no performance or correctness regressions? [ChangeLog is part of the git patch] Thank you, -- Maxim Kuvyrkov www.linaro.org 0007-Model-cache-auto-prefetcher-in-scheduler.patch Description: Binary data
[PATCH 8/8] Use rank_for_schedule to as tie-breaker in model_order_p
Hi, This patch improves model_order_p to use non-reg-pressure version of rank_for_schedule when it needs to break the tie. At the moment it is comparing INSN_PRIORITY by itself, and it seems prudent to outsource that to rank_for_schedule. Bootstrapped on x86_64-linux-gnu and regtested on arm-linux-gnueabihf and aarch64-linux-gnu. OK to apply? Thank you, -- Maxim Kuvyrkov www.linaro.org 0008-Use-rank_for_schedule-to-as-tie-breaker-in-model_ord.patch Description: Binary data
Re: [PATCH 8/8] Use rank_for_schedule to as tie-breaker in model_order_p
On Oct 21, 2014, at 9:11 PM, Richard Sandiford wrote: > Maxim Kuvyrkov writes: >> This patch improves model_order_p to use non-reg-pressure version of >> rank_for_schedule when it needs to break the tie. At the moment it is >> comparing INSN_PRIORITY by itself, and it seems prudent to outsource >> that to rank_for_schedule. > > Do you have an example of where this helps? A possible danger is that > rank_for_schedule might (indirectly) depend on state that isn't maintained > or updated in the same way during the model schedule phase. I don't have an example where this patch helps, and I consider this patch a general cleanup. From what I can see, all scheduler data structures are maintained during reg_sched_model scheduling. Thank you, -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH 8/8] Use rank_for_schedule to as tie-breaker in model_order_p
On Oct 22, 2014, at 9:24 AM, Richard Sandiford wrote: > Maxim Kuvyrkov writes: >> On Oct 21, 2014, at 9:11 PM, Richard Sandiford >> wrote: >> >>> Maxim Kuvyrkov writes: >>>> This patch improves model_order_p to use non-reg-pressure version of >>>> rank_for_schedule when it needs to break the tie. At the moment it is >>>> comparing INSN_PRIORITY by itself, and it seems prudent to outsource >>>> that to rank_for_schedule. >>> >>> Do you have an example of where this helps? A possible danger is that >>> rank_for_schedule might (indirectly) depend on state that isn't maintained >>> or updated in the same way during the model schedule phase. >> >> I don't have an example where this patch helps, and I consider this >> patch a general cleanup. From what I can see, all scheduler data >> structures are maintained during reg_sched_model scheduling. > > I'm not really sure it's a cleanup though. The code isn't aesthetically > cleaner because of the way it has to switch the global sched_pressure > variable for the duration of the call. And it doesn't really seem > conceptually cleaner. The model ordering was deliberately simple > because the model schedule doesn't take pipeline characteristics like > issue delays or unit conflicts into account; all it's doing is trying to > minimise the register pressure. It isn't obvious to me that all the > extra stuff that rank_for_schedule does would make sense in that context > (i.e. that it would tend to reduce pressure compared to the current test, > rather than increase it). It also looks like using rank_for_schedule would > increase compile time. > > So personally I'd rather keep things the way they are unless we have > a motivating example. I will not be pushing for this particular change, I'm fine with dropping the patch if current code looks more aesthetically pleasing. Thanks, -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH] Account for prologue spills in reg_pressure scheduling
On Oct 22, 2014, at 4:24 AM, Vladimir Makarov wrote: > On 10/20/2014 02:57 AM, Maxim Kuvyrkov wrote: >> Hi, >> >> This patch improves register pressure scheduling (both >> SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number >> of available registers. >> >> At the moment the scheduler does not account for spills in the prologues and >> restores in the epilogue, which occur from use of call-used registers. The >> current state is, essentially, optimized for case when there is a hot loop >> inside the function, and the loop executes significantly more often than the >> prologue/epilogue. However, on the opposite end, we have a case when the >> function is just a single non-cyclic basic block, which executes just as >> often as prologue / epilogue, so spills in the prologue hurt performance as >> much as spills in the basic block itself. In such a case the scheduler >> should throttle-down on the number of available registers and try to not go >> beyond call-clobbered registers. >> >> The patch uses basic block frequencies to balance the cost of using >> call-used registers for intermediate cases between the two above extremes. >> >> The motivation for this patch was a floating-point testcase on >> arm-linux-gnueabihf (ARM is one of the few targets that use register >> pressure scheduling by default). >> >> A "thanks" goes to Richard good discussion of the problem and suggestions on >> the approach to fix it. >> >> The patch was bootstrapped on x86_64-linux-gnu (which doesn't really >> exercises the patch), and cross-tested on arm-linux-gnueabihf and >> aarch64-linux-gnu. >> >> OK to apply? >> > It is a pretty interesting idea for heuristic, Maxim. > > But I don't understand the following loop: > > + for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i) > + if (call_used_regs[i]) > + for (c = 0; c < ira_pressure_classes_num; ++c) > + { > + enum reg_class cl = ira_pressure_classes[c]; > + if (ira_class_hard_regs[cl][i]) > + ++call_used_regs_num[cl]; > > > ira_class_hard_regs[cl] is array containing hard registers belonging to > class CL. So if GENERAL_REGS consists of hard regs 0..3, 12..15, the > array will contain 8 elements 0..3, 12..15. The array size is defined > by ira_class_hard_regs_num[cl]. So the index is order number of hard > reg in the class (starting from 0) but not hard register number itself. > Also the pressure classes never intersect so you can stop the inner loop > when you find class to which hard reg belongs to. Thanks for spotting this. Indeed, this is a bug, but it still happened to correctly calculate numbers of call-used register for ARM (where I debugged the implementation). > > I believe you should rewrite the code and get performance results again > to get an approval. Sebastian, could you run the geekbench again to make sure you see same performance numbers? > You also missed the changelog. > The changelog was in the separate file. Also attached here with the fixed patch. Bootstrapped on x86_64-linux-gnu. Bootstrap and regtest on arm-linux-gnueabihf is in progress. -- Maxim Kuvyrkov www.linaro.org 0001-sched_class_reg_num.ChangeLog Description: Binary data 0001-sched_class_reg_num.patch Description: Binary data
Re: [PATCH 5/8] Enable max_issue for AArch32 and AArch64
On Oct 23, 2014, at 11:42 AM, Sebastian Pop wrote: > Sebastian Pop wrote: >> Maxim Kuvyrkov wrote: >>> This change requires benchmarking, which I can't easily do at the moment. I >>> would appreciate any benchmarking results that you can share. >> >> I will run my testsuite on aarch64. >> Do you need the perf for all the previous patches 1 to 5 together, or just >> for patch 5? > > I ran your patch 5 on top of r216539 at -O3 and there were not many > differences > in performance, and of very little amplitude. Some small speedups, and some > small slowdowns. Let me know if you want a reduced testcase for the most > significative slowdown. Hi Sebastian, This is, pretty much, what I expected. This patch enables infrastructure for patch 7, and, by itself, was not supposed to give measurable speed ups. Patch 7 should improve performance for Cortex-A15 (and any other cores that have similar auto-prefetcher hardware). Thank you for the benchmarking! -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH 6/8] Handle SCRATCH in decompose_address
On Oct 23, 2014, at 9:02 AM, Jeff Law wrote: > On 10/20/14 21:35, Maxim Kuvyrkov wrote: >> Hi, >> >> This patch is a simple fix to allow decompose_address to handle >> SCRATCH'es during 2nd scheduler pass. This patch is a prerequisite >> for a scheduler improvement that relies on decompose_address to parse >> insns. >> >> Bootstrapped and regtested on x86_64-linux-gnu and regtested on >> arm-linux-gnueabihf and aarch64-linux-gnu. > I'd like to see some further discussion here. > > get_base_term is supposed to look at its argument as a base address. I'm > curious under what circumstances you want to have a SCRATCH as a base address? > > I didn't see anything in patch #8 which obviously dependended on this, but > maybe it's in there, but more subtle than expected. > > If you can justify why it's useful to handle scratch in here, then the patch > will be fine. Without this patch decompose_address() ICEs during second scheduler pass on prologue instructions that usually have "(clobber (mem:BLK (scratch))". The only reason for this patch is to prevent that fault and enable use of decompose_address during 2nd scheduler pass. Does this answer your question, or are you looking for a more in-depth reason? Thank you, -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH] Account for prologue spills in reg_pressure scheduling
On Oct 23, 2014, at 3:45 AM, Vladimir Makarov wrote: ... > Richard proposed 2-loops solution instead of 3. So I'd definitely prefer his. > > The patch is ok with Richard's 2-loops proposal. Thanks, Maxim. As I wrote > the heuristic is an interesting one. This is the patch I'm going to commit after some more testing. Thank you both reviews! -- Maxim Kuvyrkov www.linaro.org 0001-sched_class_reg_num.ChangeLog Description: Binary data 0001-sched_class_reg_num.patch Description: Binary data
Re: [PATCH 3/8] Remove cached_first_cycle_multipass_dfa_lookahead and cached_issue_rate
On Oct 22, 2014, at 4:53 AM, Vladimir Makarov wrote: > On 10/20/2014 11:16 PM, Maxim Kuvyrkov wrote: >> Hi, >> >> This patch cleans up (removes) cached_first_cycle_multipass_dfa_lookahead >> and cached_issue_rate. >> >> These seem to be an artifact from the scheduler refactoring 10+ years ago. >> They assume that dfa_lookahead and issue_rate can change mid-way through >> scheduling, which is never the case. All backends currently treat >> dfa_lookahead and issue_rate as constants for the duration of scheduling >> passes. >> >> Bootstrapped on x86_64-linux-gnu. Regression testing is in progress. OK to >> commit if no regressions? >> > Yes. the patch for issue rate itself is ok but you should have modified > doc/tm.texi too for dfa lookahead hook saying that it should be a > constant (issue rate already has such clause). > > On the other hand I'd not assume that dfa look ahead is a constant. In > future we could make it non-constant to differentiate non-hot and hot > functions to speed up the scheduler as dfa look ahead scheduling is > pretty expensive. > > So issue rate change is ok but I'd not rush to change dfa look ahead > related code. It is trivial to prove that currently cached_first_cycle_multipass_dfa_lookahead always has the same value as dfa_lookahead. And, even should targetm.sched.first_cycle_multipass_dfa_lookahead start returning different values, max_issue will happily continue to use the value that the hook returned in sched_init(). Also, your suggestion to use different dfa_lookahead values for hot/cold functions is not affected by this patch. The values of dfa_lookahead variable have scope of the scheduling pass, which is invoked separately for every function. I guess, you could, potentially, start differentiating values of dfa_lookahead based on whether a basic_block is hot or cold, but handling this scenario would require significant changes throughout the scheduler. To summarize, dfa_lookahead is currently a pass-time invariant, that is free to change between invocations of the scheduler pass. Cached_first_cycle_multipass_dfa_lookahead always holds the same value as dfa_lookahead, and, as such, is extraneous. Thank you, -- Maxim Kuvyrkov www.linaro.org
[PATCH] New debug parameter sched-pressure-factor
Hi, For benchmarking and tuning of register-pressure scheduling I've made the following simple patch to adjust effective number of registers that scheduler has at its disposal. Do you think this would be valuable for others to investigate performance problems or tune register-pressure scheduling? The parameter scales number of register by the given number of percent (the default is 100, which is the current status quo). At value of 50 the scheduler attempt to use only half of available registers, while value of 200 will cause it to assume double the number of available registers (and, likely, cause a lot of spills). Thank you, -- Maxim Kuvyrkov www.linaro.org 0008-Sched_pressure_factor.patch Description: Binary data
Re: [PATCH] Account for prologue spills in reg_pressure scheduling
On Oct 23, 2014, at 8:20 PM, Richard Sandiford wrote: > Maxim Kuvyrkov writes: >> @@ -6701,6 +6753,24 @@ alloc_global_sched_pressure_data (void) >>saved_reg_live = BITMAP_ALLOC (NULL); >>region_ref_regs = BITMAP_ALLOC (NULL); >> } >> + >> + /* Calculate number of CALL_USED_REGS in register classes that >> + we calculate register pressure for. */ >> + for (int c = 0; c < ira_pressure_classes_num; ++c) >> +{ >> + enum reg_class cl = ira_pressure_classes[c]; >> + >> + call_used_regs_num[cl] = 0; >> + >> + for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i) >> +if (call_used_regs[ira_class_hard_regs[cl][i]]) >> + { >> +/* Register I belongs to pressure class CL. Pressure >> + classes do not intersect, so don't look further. */ >> +++call_used_regs_num[cl]; >> +break; >> + } >> +} > > I don't think we want the break here. The effect would be to count > at most one call-used register per pressure class. You make me feel dumb :-) Thank you, -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH, sched] Cleanup and improve multipass_dfa_lookahead_guard
On Jun 2, 2014, at 7:54 PM, Andreas Schwab wrote: > Regtested on ia64-suse-linux and installed as obvious. > > Andreas. > > * config/ia64/ia64.c > (ia64_first_cycle_multipass_dfa_lookahead_guard): Check > pending_data_specs first. > > diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c > index 118e5bf..4c5390b 100644 > --- a/gcc/config/ia64/ia64.c > +++ b/gcc/config/ia64/ia64.c > @@ -7536,7 +7536,7 @@ ia64_first_cycle_multipass_dfa_lookahead_guard (rtx > insn, int ready_index) > > /* Size of ALAT is 32. As far as we perform conservative > data speculation, we keep ALAT half-empty. */ > - if ((TODO_SPEC (insn) & BEGIN_DATA) && pending_data_specs >= 16) > + if (pending_data_specs >= 16 && (TODO_SPEC (insn) & BEGIN_DATA)) > return ready_index == 0 ? -1 : 1; > > if (ready_index == 0) Thanks for fixing this. To make it a bit more robust I would suggest using same condition as in ia64_variable_issue(): === if (sched_deps_info->generate_spec_deps && !sel_sched_p ()) /* Modulo scheduling does not extend h_i_d when emitting new instructions. Don't use h_i_d, if we don't have to. */ { if (DONE_SPEC (insn) & BEGIN_DATA) pending_data_specs++; if (CHECK_SPEC (insn) & BEGIN_DATA) pending_data_specs--; } === The underlaying problem is that TODO_SPEC(insn) should not be accessed for sel-sched. Because pending_data_specs turns out to always be null during sel-sched, "pending_data_specs >= 16" can be used as a shorthand for "sched_deps_info->generate_spec_deps && !sel_sched_p () && pending_data_specs >= 16" -- Maxim Kuvyrkov www.linaro.org
Re: [AArch64][PR65139] use clobber with match_scratch for aarch64_lshr_sisd_or_int_3
> On Apr 18, 2015, at 8:21 PM, Richard Earnshaw > wrote: > > On 18/04/15 16:13, Jakub Jelinek wrote: >> On Sat, Apr 18, 2015 at 03:07:16PM +0100, Richard Earnshaw wrote: >>> You need to ensure that your scratch register cannot overlap op1, since >>> the scratch is written before op1 is read. >> >> - (clobber (match_scratch:QI 3 "=X,w,X"))] >> + (clobber (match_scratch:QI 3 "=X,&w,X"))] >> >> incremental diff should ensure that, right? >> >> Jakub >> > > > Sorry, where in the patch is that hunk? > > I see just: > > + (clobber (match_scratch:QI 3 "=X,w,X"))] Jakub's suggestion is an incremental patch on top of Kugan's. > > And why would early clobbering the scratch be notably better than the > original? > It will still be better. With this patch we want to allow RA freedom to optimally handle both of the following cases: 1. operand[1] dies after the instruction. In this case we want operand[0] and operand[1] to be assigned to the same reg, and operand[3] to be assigned to a different register to provide a temporary. In this case we don't care whether operand[3] is early-clobber or not. This case is not optimally handled with current insn patterns. 2. operand[1] lives on after the instruction. In this case we want operand[0] and operand[3] to be assigned to the same reg, and not clobber operand[1]. By marking operand[3] early-clobber we ensure that operand[1] is in a different register from what operand[0] and operand[3] were assigned to. This case should be handled equally well before and after the patch. My understanding is that Kugan's patch with Jakub's fix on top satisfy both of these cases. -- Maxim Kuvyrkov www.linaro.org