Re: Referencing a register in different modes

2024-09-12 Thread Stefan Schulze Frielinghaus via Gcc
On Fri, Aug 09, 2024 at 09:49:03AM +0200, Stefan Schulze Frielinghaus wrote:
> On Thu, Aug 08, 2024 at 01:56:48PM -0600, Jeff Law wrote:
> > > I haven't tested it extensively but it triggers at least for the current 
> > > case.
> > > I would have loved to also print the insn but couldn't figure out how to 
> > > ICE
> > > and stringify an insn.  I will have a look at this tomorrow.  Did you 
> > > have any
> > > place in mind where to put/call something like this?
> > I didn't have anywhere specific in mind.   As I suspected the
> > verify_rtl_sharing isn't a great fit from an implementation standpoint, but
> > it seems right conceptually.
> 
> Right and since we are walking over all insns there anyway we could also
> check pseudos on the go.  I didn't want to rename verify_rtl_sharing()
> since this is publicly visible but renamed verify_insn_sharing() into
> verify_insn_sharing_and_pseudo_references() to reflect the additional
> work---although the name is bit of a mouthful.
> 
> > If you want to throw a patch over the wall for testing, happy to put it into
> > my tester and see what comes out the other side.  I wouldn't be at all
> > surprised if it tripped on other targets.
> 
> Having more testing would be great.  I've attached a new patch.

Out of curiosity, did you have any chance to run the patch in your
tester?  If there is no fallout, i.e., no target is immediately
affected, I would be happy to post the patch.

Cheers,
Stefan

> From 0199088d2877c9c840ce984f61365816879818bc Mon Sep 17 00:00:00 2001
> From: Stefan Schulze Frielinghaus 
> Date: Fri, 9 Aug 2024 09:45:57 +0200
> Subject: [PATCH] rtl: Verify pseudo register references
> 
> Ensure that each pseudo register referenced in an insn equals its
> definition.  In particular this means that we error out if a pseudo
> register is referenced in a different mode.
> ---
>  gcc/emit-rtl.cc | 37 +
>  1 file changed, 33 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index cb04aa1a8c6..00af73ca219 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -3122,16 +3122,45 @@ reset_all_used_flags (void)
>}
>  }
>  
> -/* Verify sharing in INSN.  */
> +/* Verify that each pseudo register referenced in INSN equals its definition.
> +   For example, error out if modes do not coincide. */
>  
>  static void
> -verify_insn_sharing (rtx insn)
> +verify_rtl_pseudo_references (rtx insn)
> +{
> +  subrtx_iterator::array_type array;
> +  FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST)
> +{
> +  const_rtx reg = *iter;
> +  if (!reg || !REG_P (reg))
> + continue;
> +  int regno = REGNO (reg);
> +  /* Hard registers may be referenced in different modes.  */
> +  if (HARD_REGISTER_NUM_P (regno))
> + continue;
> +  const_rtx orig_reg = regno_reg_rtx[regno];
> +  if (!rtx_equal_p (reg, orig_reg))
> + {
> +   error ("pseudo register");
> +   debug_rtx (orig_reg);
> +   error ("does not coincide with its reference in insn");
> +   debug_rtx (insn);
> +   internal_error ("internal consistency failure");
> + }
> +}
> +}
> +
> +/* Verify sharing and pseudo references in INSN.  */
> +
> +static void
> +verify_insn_sharing_and_pseudo_references (rtx insn)
>  {
>gcc_assert (INSN_P (insn));
>verify_rtx_sharing (PATTERN (insn), insn);
>verify_rtx_sharing (REG_NOTES (insn), insn);
>if (CALL_P (insn))
>  verify_rtx_sharing (CALL_INSN_FUNCTION_USAGE (insn), insn);
> +  verify_rtl_pseudo_references (insn);
>  }
>  
>  /* Go through all the RTL insn bodies and check that there is no unexpected
> @@ -3151,13 +3180,13 @@ verify_rtl_sharing (void)
>{
>   rtx pat = PATTERN (p);
>   if (GET_CODE (pat) != SEQUENCE)
> -   verify_insn_sharing (p);
> +   verify_insn_sharing_and_pseudo_references (p);
>   else
> for (int i = 0; i < XVECLEN (pat, 0); i++)
> {
>   rtx insn = XVECEXP (pat, 0, i);
>   if (INSN_P (insn))
> -   verify_insn_sharing (insn);
> +   verify_insn_sharing_and_pseudo_references (insn);
> }
>}
>  
> -- 
> 2.45.2
> 



Re: Late combine & mode switch

2024-09-12 Thread Richard Sandiford via Gcc
Sorry for the slow response.

Xi Ruoyao  writes:
> Hi Richard,
>
> When I hack the LoongArch backend I notice something like
>
> slli.d $r4, $r4, 2
> add.w $r4, $r4, $r5
>
> Or
>
> (set (reg:DI 4) (ashift:DI (reg:DI 4) (const_int 2))
> (set (reg:DI 4)
>  (sign_extend:DI (add:SI (reg:SI 4) (reg:SI 5
>
> can appear after split.  On LoongArch it can be done via an alsl.w
> instruction, so I attempted to combine them in late combine with:
>
> (define_insn
>   [(set (match_operand:DI 0 "register_operand" "=r")
> (sign_extend:DI
>   (add:SI
> (subreg:SI
>   (ashift:DI (match_operand:DI 1 "register_operand" "r")
>  (match_operand:SI 2 "const_immalsl_operand" ""))
>   0)
> (match_operand:SI 3 "register_operand" "r"]
>   "TARGET_64BIT"
>   "alsl.w\t%0,%1,%3,%2")
>
> But this does not work and I get "RTL substitution failed" with
> -fdump-rtl-late_combine2-details.
>
> I want to open an RFE in Bugzilla.  But before that I'm wondering: maybe
> I'm just too stupid to figure out the correct way for this?

At the moment, insn_propagation::apply_to_rvalue_1 chickens out of
a change in hard register mode that would involve an explicit subreg.
In this particular case, I'd hope that the subreg:SI would be pushed
into the ashift to give an ashift:SI of a reg:SI, but it seems that
that isn't happening for some reason.  It probably has something to do
with WORD_REGISTER_OPERATIONS (which loongson defines, but aarch64 doesn't).

That answer only applies to some codes though.  Shifts right would still
hit the problem you mention.  I think it'd be ok to try to relax the
condition if a port needs it -- the current code is (supposed to be)
deliberately conservative.

Thanks,
Richard



Re: Proposed new pass to optimise mode register assignments

2024-09-12 Thread Richard Sandiford via Gcc
Jeff Law  writes:
> On 9/7/24 1:09 AM, Richard Biener wrote:
>> 
>> 
>>> Am 06.09.2024 um 17:38 schrieb Andrew Carlotti :
>>>
>>> Hi,
>>>
>>> I'm working on optimising assignments to the AArch64 Floating-point Mode
>>> Register (FPMR), as part of our FP8 enablement work.  Claudio has already
>>> implemented FPMR as a hard register, with the intention that FP8 intrinsic
>>> functions will compile to a combination of an fpmr register set, followed 
>>> by an
>>> FP8 operation that takes fpmr as an input operand.
>>>
>>> It would clearly be inefficient to retain an explicit FPMR assignment prior 
>>> to whic
>>> each FP8 instruction (especially in the common case where every assignment 
>>> uses
>>> the same FPMR value).  I think the best way to optimise this would be to
>>> implement a new pass that can optimise assignments to individual hard 
>>> registers.
>>>
>>> There are a number of existing passes that do similar optimisations, but 
>>> which
>>> I believe are unsuitable for this scenario for various reasons.  For 
>>> example:
>>>
>>> - cse1 can already optimise FPMR assignments within an extended basic block,
>>>   but can't handle broader optimisations.
>>> - pre (in gcse.c) doesn't work with assigning constant values, which would 
>>> miss
>>>   many potential usages.  It also has limits on how far code can be moved,
>>>   based around ideas of register pressure that don't apply to the context 
>>> of a
>>>   single hard register that shouldn't be used by the register allocator for
>>>   anything else.  Additionally, it doesn't run at -Os.
>>> - hoist (also using gcse.c) only handles constant values, and only runs when
>>>   optimising for size.  It also has the rest of the issues that pre does.
>>> - mode_sw only handles a small finite set of modes.  The mode requirements 
>>> are
>>>   determined solely by the instructions that require the specific mode, so 
>>> mode
>>>   switches don't depend on the output of previous instructions.
>>>
>>>
>>> My intention would be for the new pass to reuse ideas, and hopefully some of
>>> the existing code, from the mode-switching and gcse passes.  In particular,
>>> gcse.c (or it's dependencies) has code that could identify when values 
>>> assigned
>>> to the FPMR are known to be the same (although we may not need the full CSE
>>> capabilities of gcse.c), and mode-switching.cc knows how to globally 
>>> optimise
>>> mdoe assignments (and unlike gcse.c, doesn't use cautious heuristics to 
>>> avoid
>>> excessively increasing register pressure).
>>>
>>> Initially the new pass would only apply to the AArch64 FPMR register, but in
>>> future it could also be used for other hard registers with similar 
>>> properties.
>>>
>>> Does anyone have any comments on this approach, before I start writing any
>>> code?
>> 
>> Can you explain in more detail why the mode-switching pass
> infrastructure isn’t a good fit?  ISTR it already is customizable via
> target hooks.
> Agreed.  Mode switching seems to be the right pass to look at.
>
> It probably is worth pointing out that mode switching is LCM based and 
> as such never speculates.  Given the potential cost of a mode switch, 
> failure to speculate may be a notable limitation (though the same would 
> apply to the ideas Andrew floated above).
>
> This has recently come up in the RISC-V space due to needing VXRM 
> assignments so that we can utilize the vaaddu add-with-averaging 
> instructions.Placement of VXRM mode switches looks optimal from an 
> LCM standpoint, but speculation can measurably improve performance.  It 
> was something like 2% on the BPI for x264.  The k1/m1 chip in the BPI is 
> almost certainly flushing its pipelines on the VXRM assignment.

Ah yeah, good point.  I expect speculation would be best for FPMR as well.
I imagine most use cases will be well-structured in practice, but for
those that aren't...

> I've got a hack here that I'll submit upstream at some point.  Just not 
> at the top of my list yet -- especially now that our uarch has been 
> fixed to not flush its pipelines at VXRM assignments ;-)

Is that handled by mode-switching, or is it a separate thing?

RIchard


Re: Proposed new pass to optimise mode register assignments

2024-09-12 Thread Jeff Law via Gcc




On 9/12/24 8:22 AM, Richard Sandiford wrote:



This has recently come up in the RISC-V space due to needing VXRM
assignments so that we can utilize the vaaddu add-with-averaging
instructions.Placement of VXRM mode switches looks optimal from an
LCM standpoint, but speculation can measurably improve performance.  It
was something like 2% on the BPI for x264.  The k1/m1 chip in the BPI is
almost certainly flushing its pipelines on the VXRM assignment.


Ah yeah, good point.  I expect speculation would be best for FPMR as well.
I imagine most use cases will be well-structured in practice, but for
those that aren't...

It's certainly worth investigating.




I've got a hack here that I'll submit upstream at some point.  Just not
at the top of my list yet -- especially now that our uarch has been
fixed to not flush its pipelines at VXRM assignments ;-)


Is that handled by mode-switching, or is it a separate thing?
I abused one of the existing mode switching hooks.  Essentially I scan 
the function once to look for all the possible modes of vxrm.  If there 
is precisely once mode needed, then my hack pretends that mode is needed 
on the first insn of the function.  Then we let the standard mode 
switching algorithm run.


For the cases that matter (and there are very very few with vxrm), that 
gets us the desired speculation.   While I could certainly construct a 
testcase where the speculation was unprofitable, I doubt it ever happens 
in practice.


jeff


[CAULDRON] Topics for the Toolchain and Linux kernel BoF

2024-09-12 Thread Jose E. Marchesi via Gcc


Hello people!

This year we will be having a kernel BoF at Cauldron.  It is scheduled
for Saturday from 15:30 to 16:30.  There will be several kernel
maintainers and hackers in attendance, and the goal of the BoF is to
discuss and collect feedback about several toolchain-related issues that
are of current interest for the kernel.  The output of the discussions
and the feedback collected will then be used as a basis for further
discussions at the Linux Plumbers conference that will be held the next
week in Vienna.  The idea is to get kernel and toolchain hackers
together and advance on these topics.

Find below some of the topics we will be discussing.  Many of them are
relevant to GCC, and we ask you to consider attending if you are coming
to the Cauldron.  The list of topics is of course not closed, and you
are very welcome to bring your own, specially if your work would benefit
from feedback from the kernel hackers.

- LTO and inline asm symbols

  A lot of assembler statements reference C symbols, which need to be
  externally_visible and global for GCC LTO, otherwise they can end up in the
  wrong asm file and cause missing symbols.

  Goal of the discussion:

  Provide an assessment of the reported problem, and discuss the two
  alternatives already proposed in the bugzillas below: one ad-hoc solution
  based on parsing symbol references in inline asm strings, another is to
  allow top-level extended asm that can get input arguments.

  References:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107779
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41045

- "noreturn" and jump tables run-time hints

  It has been expressed on the kernel side the desire of having the C compiler
  emit run-time hints marking functions that are not supposed to return and
  also to provide annotations on jump tables.  This is for the benefit of
  objtool in arm64, see references below.

  Goal of the discussion:

  Collect and assess the requirements of these features, discuss their
  pertinence and the way it could be best implemented.  The outcome of the
  discussion will then be used to continue the discussion with the clang/llvm
  and kernel hackers at LPC.

  References:
  
https://lore.kernel.org/linux-arm-kernel/yylmhuxtuanza...@hirez.programming.kicks-ass.net/

- Struct layout randomization (-frandomize-struct-layout) and debug info

  The GCC plugin hooks in a way that emitted debug info doesn't match with the
  resulting randomized structs.  It works in clang because it generates DWARF
  later in the compilation process.

  References:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84052
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116457

  Goal of the discussion:

  Determine how to best fix this in the plugin, or by using a different
  approach.  The outcome of the discussion will then be used to continue the
  discussion with the clang/llvm and kernel hackers at LPC.

- Userland stack unwinding from within the Linux kernel

  There are reasons for wanting to unwind both userland and kernel
  stacks from within the kernel.  Currently the kernel can unwind kernel
  stacks based on ORC (which is revese-engineered from kernel compiled
  objects by objtool) and userland stacks provided stack frame pointers
  are present.  SFrame is a format similar to ORC, but general enough to
  be used in userspace, and there is an on-going effort to introduce a
  SFrame based unwinder in the kernel.  This will require some glibc
  support as well.

  References:

  First prototype (V1) from Josh Poimboeuf:
  https://lkml.kernel.org/lkml/cover.1699487758.git.jpoim...@kernel.org/

- Linking BTF in kernel builds

  At the moment the BTF used on the kernel side is not directly
  generated by compilers, but it is instead translated from DWARF by
  pahole:

  vmlinux DWARF   BTF   C
  module1.ko ---> pahole ---> vmlinux.btf --> vmlinux.h
  module2.ko

  On the BPF program side, however, the BTF is generated directly by the
  compiler.  It has been suggested to get DWARF out of the picture, by
  having the compiler generating BTF also in the kernel build, and
  having the BTF linked and deduplicated by the link editor.

  Note that GCC already supports emitting BTF for targets other than
  BPF.  LLVM currently restricts BTF to the BPF target, but it could be
  easily adapted to do the same according to the LLVM BPF backend
  maintainers.

  Goal of the discussion:

  Provide an assessment of the proposed approach considering its
  advantages and possible disadvantages, and kick off the technical
  discussion on necessary BTF extensions needed that will be then
  continued at the LPC Toolchains Track.

- Potential inter-language LTO issues in kernel

  Rust code is (as of now provisionally) being added to the Linux kernel.  On
  the other hand the kernel build can be configured to enable LTO when linking
  vmlinux.  Doing LTO involving both C and Rust compiled code allegedly works
  in clan

Sourceware Open Office Friday and Cauldron Monday

2024-09-12 Thread Mark Wielaard
We'll have our regular Sourceware Open Office this Friday Sep 13,
16:00 UTC Using #overseers on irc.libera.chat

To get the right time in your local timezone:
$ date -d "Fri Sep 13 16:00 UTC 2024"

Then at the Cauldron https://gcc.gnu.org/wiki/cauldron2024 in Prague
on Monday at 11:00 local time we'll also have an in person session.

  Bof: Sourceware infrastructure tips & tricks

  Speaker(s): Sourceware Project Leadership Committee,
  Elena Zannoni, Mark J. Wielaard, Ian Kelling

  Sourceware has provided the infrastructure for the core toolchain
  and developer tools for more than 25 years. The last couple of years
  it has transformed from a purely volunteer into a professional
  organization with an eight person strong Project Leadership
  Committee, monthly open office hours, multiple hardware services
  partners, expanded services, the Software Freedom Conservancy as
  fiscal sponsor and a more diverse funding model that allows us to
  enter into contracts with paid contractors or staff when
  appropriate.

  The Sourceware services are loosely coupled, but developers become
  most productive when they combine them. So lets exchange tips and
  tricks on how using bugzilla and cgit, b4 and public-inbox, git-pw
  and patchwork, the snapshot builders and manual generation, wikis,
  buildbot and try-bots, ci-bots, full-builds and the bunsen
  testresults database, make you most productive.

  Lets also discuss the recent "Cybersecurity" regulations, how
  Sourceware prepared and what policies projects could adopt to
  improve their secure software development framework.

  https://sourceware.org/mission.html#services
  https://sourceware.org/sourceware-25-roadmap.html
  https://sourceware.org/sourceware-security-vision.html

Overview of suggested discussion topics:

- Sourceware Project Leadership Committee (PLC) Members
  - History and Roadmaps
  - Software Freedom Conservancy (SFC)
  - Communication and contacts
- overseers, project admins
- mailinglists, bugzilla,
  technical roadmaps, quaterly updates, open office hours
  - (Hardware) sponsors
- Funds and how to spend it (see security-vision below)

- Mailing lists Mailman
  - dmarc/dkim, (no) From-rewriting
(and SRS and SPF and VERP - for maximal email hygiene)
  - Moderators and spam
  - (python2) what should an upgrade look like?

- public-inbox
  - imap, nntp, rss, git clone inbox.sourceware.org 
  - b4 config and usage

- Git
  - cgit everywhere?
  - project configs
  - adacore hooks
  - signed commits (also see cybersecurity)

- Bugzilla
  - Account creation
  - Two is better than one?
  - Upgrading and local patches

- Wiki
  - MoinMoin

- builder.sourceware.org
  - hardware overview
i386, x86_64, ppc64le, s390x, ppc64, armhf, arm64, riscv
  - kinds of builders
- pre-commit, try builders
- ci-builders finding regressions
- full-builders, storing test results (see bunsen below)
- autoregen (binutils-gdb, gcc)
  - config grew a lot last year
help needed simplifying

- bunsen
  - indexes repository of raw testsuite log files (sqlite + git)
  - understands dejagnu, glibc, automake, autoconf styles
  - use toolkit locally or centralized on sourceware
  - live web interface at https://builder.sourceware.org/testruns/

- patchwork.sourceware.org
  - git pw setup
  - patchwork plus CI/CD
- Let's use those buildbot workers too

- snapshots.sourceware.org
  new isolated machine/vm/container
  takes over cron jobs
  triggered by buildbot
  can also trigger, manuals, code coverage, api docs, etc.
  valgrind, libabigail, gnupoke, glibc, gdb, elfutils, dwarfstd, binutils

- Cybersecurity
  - US Improving the Nation's Cybersecurity
Executive Order 14028
  - EU Cyber Resilience Act (EU CRA)
  - Secure Software Development Framework (SSDF, NIST SP 800-218)
  - What can Sourceware do?
- signed-commit census report
- Sourceware Security Vision
  https://sourceware.org/sourceware-security-vision.html
  - Practical steps for projects and individual contributors

If time permits...

- Experiment: sourcehut
  https://sr.ht/~sourceware/
  A more webby git workflow alternative
  git send-email without the email

- non-Sourceware, but useful, services
  - BBB server (SFC)
  - mattermost (OSUOSL)
  - irc (libera.chat,oftc)
- gerrit server
- Software Heritage and archive.org mirrors


gcc-12-20240912 is now available

2024-09-12 Thread GCC Administrator via Gcc
Snapshot gcc-12-20240912 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/12-20240912/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 12 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-12 revision 682cc3f90d0ba42123ef6bb838f28039524e3ea8

You'll find:

 gcc-12-20240912.tar.xz   Complete GCC

  SHA256=93d2b7dcf19a936e521b5751064566927150be1fd62fa53d5a8b4f64b394682a
  SHA1=07a1f3aba2d3cbe8f13edb804adda4d4bc3331a8

Diffs from 12-20240905 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-12
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.