On Fri, Feb 3, 2023 at 10:16 PM Michael Meissner via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> These patches were originally posted on November 10th.  Segher has asked that 
> I
> repost them.  These patches are somewhat changed since the original posting to
> address some of the comments.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605581.html
>
> In the first patch (adding -mcpu=future), I have taken out the code of making
> -mtune=future act as -mtune=power10.  Instead I went through all of the places
> that look at the tuning (mostly in power10.md and rs6000.cc), and added future
> as an option.  Obviously at a later time, we will provide a separate tuning
> file for future (or whatever the new name will be if the instructions are 
> added
> officially).  But for now, it will suffice.
>
> In patch #3, I fixed the opcode for clearing a dense math register that Peter
> had noticed.  I was using the name based on the existing clear instruction,
> instead of the new instruction.
>
> In patch #6, I fixed the code, relying on the changes for setting the 
> precision
> field to 16 bits.  Since that patch will not be able to go into GCC 13 at
> present, we might skip that support for now.  The important thing for existing
> users of the MMA code is the support for accumulators being in the separate
> dense math registers rather than overlapping does need to go in, and we can
> probably delay the 1,024 bit register support, or implement in a different
> fashion.
>
> In the insn names, I tried to switch to using _vsx instead of _fpr for the
> existing MMA support instructions.  I also tried to clear up the comments to
> specify ISA 3.1 instead of power10 when talking about the existing MMA
> support.
>
> The following is from the original posting (slightly modified):
>
> This patch is very preliminary support for a potential new feature to the
> PowerPC that extends the current power10 MMA architecture.  This feature may 
> or
> may not be present in any specific future PowerPC processor.
>
> In the current MMA subsystem for Power10, there are 8 512-bit accumulator
> registers.  These accumulators are each tied to sets of 4 FPR registers.  When
> you issue a prime instruction, it makes sure the accumulator is a copy of the 
> 4
> FPR registers the accumulator is tied to.  When you issue a deprime
> instruction, it makes sure that the accumulator data content is logically
> copied to the matching FPR register.
>
> In the potential dense math system, the accumulators are moved to separate
> registers called dense math registers (DM registers or DMR).  The DMRs are 
> then
> extended to 1,024 bits and new instructions will be added to deal with all
> 1,024 bits of the DMRs.
>
> If you take existing MMA code, it will work as long as you don't do anything
> with accumulators, and you follow the rules in the ISA 3.1 documentation for
> using the MMA subsystem.
>
> These patches add support for the 512-bit accumulators within the dense math
> system, and for allocation of the 1,024-bit DMRs.  At this time, no additional
> built-in functions will be done to support any dense math features other than
> doing data movement between the DMRs and the VSX registers.  Before we can 
> look
> at adding any new dense math support other than data movement, we need the GCC
> compiler to be able to allocate and use these DMRs.
>
> There are 8 patches in this patch set:
>
> 1) The first patch just adds -mcpu=future as an option to add new support.
> This is similar to the -mcpu=future that we did before power10 was announced.
>
> 2) The second patch enables GCC to use the load and store vector pair
> instructions to optimize memory copy operations in the compiler.  For power10,
> we needed to just stay with normal vector load/stores for memory copy
> operations.
>
> 3) The third patch enables 512-bit accumulators store in DMRs.  This patch
> enables the register allocation, but it does not move the existing MMA to use
> these registers.
>
> 4) The fourth patch switches the MMA subsystem to use 512-bit accumulators
> within DMRs if you use -mcpu=future.
>
> 5) The fifth patch switches the names of the MMA instructions to use the dense
> math equivalent name if -mcpu=future.
>
> 6) The sixth patch enables using the full 1,024-bit DMRs.  Right now, all you
> can do with DMRs is move a VSX register to a DMR register, and to move a DMR
> register to a VSX register.  [As I mentioned above, at the moment, this patch
> is problematical as is]
>
> 7) The seventh patch is not DMR related.  It adds support for variants of the
> load/store vector with length instruction that may be added in future PowerPC
> processors.  These variants eliminate having to shift the byte length left by
> 56 bits.
>
> 8) The eighth patch is also not DMR related.  It adds support for a saturating
> subtract operation that may be added to future PowerPC processors.
>
> In terms of changes, we now use the wD constraint for accumulators.  If you
> compile with -mcpu=power10, the wD constraint will match the equivalent VSX
> register (0..31) that overlaps with the accumulator.  If you compile with
> -mcpu=future, the wD constraint will match the DMR register and not the FPR
> register.
>
> This patch also modifies the print_operand %A output modifier to print out DMR
> register numbers if -mcpu=future, and continue to print out the FPR register
> number divided by 4 for -mcpu=power10.
>
> In general, if you only use the built-in functions, things work between the 
> two
> systems.  If you use extended asm, you will likely need to modify the code.
> Going forward, hopefully if you modify your code to use the wD constraint and
> %A output modifier, you can write code that switches more easily between the
> two systems.
>
> Again, these are preliminary patches for a potential future machine.  Things
> will likely change in terms of implementation and usage over time.

May I ask to consider delaying this to stage1 exactly because of this
last reason?

Richard.

>
> --
> Michael Meissner, IBM
> PO Box 98, Ayer, Massachusetts, USA, 01432
> email: meiss...@linux.ibm.com

Reply via email to