changing subject, for reference / background:

* Paul Mackerras is working on an experimental branch to add VSX
 https://github.com/paulusmack/microwatt/blob/vecvsx/decode1.vhdl
 he was experimenting to see what was needed to get Fedora booting.  the
internal design is a Finite State Machine.  multiple clocks per instruction
(due to internal 64 bit pathways)

* neither A2I nor A2O have VSX and an estimate for adding it to these
high-performance gate-level designs would be around 2 years
https://github.com/openpower-cores/a2o
https://github.com/openpower-cores/a2i

* LibreSOC we are just not going to add VSX. the development cost is far
too high, the performance nowhere near that of Vectors, software complexity
far too high and L1 cache usage is compromised.
http://git.libre-soc.org

all of these designs - all four - have internal 64 bit pathways.  OpenPOWER
instruction decoding is complex even without SIMD (4,000 gates) and adding
VSX multiplies that by three or four.  that's enough gates to do a decent
embedded RISC core in any other RISC ISA.

IBM had years in which to incrementally extend SIMD operations.  Jeffrey in
another post kindly outlines the progression.

now, at POWER10 with 18 billion transistors, the barrier to entry is so
high that if someone doesn't put their foot down and say "no" to SIMD there
isn't going to *be* any new OpenPOWER hardware other than that from IBM
[not, at least, capable of running standard ppc64le distros that is]

we seriously considered doing an entirely new ppc64le-eabi-1.5/1.9 Debian
Distro port at one point, going *backwards* to the time when SIMD was not
mandatory but doing LE rather than BE, but the risk of it being viewed in
the same way as "rasbian" is too great.


On Tuesday, March 2, 2021, Riccardo Mottola <riccardo.mott...@libero.it>
wrote:

>
> Emulation at kernel level is painfully slow,

seriously, i kid you not, it is infinitely better than trying to implement
VSX in hardware.  we would spend so long implementing it that it would
delay LibreSOC *beyond* the point where money from NLnet was available,
jeapordising the entire project in the process.

given a choice between "painfully slow right now but fixable in software
later" and "completely destroying any chance of completing and delivering
even any hardware at all" it's hardly a choice :)

the Cray-style Vectorisation being added will smoke SIMD in the long run,
once the ABIs and compilers are sorted.


> yes enabling runtime libraries could be done, requires extensive work in
> upstream code.

this is a better situation than an entire new distro port.  we may have to
have one anyway: all timescale estimates which start from defining a new
triplet and going from there are around 3-5 years.

if a new EABI has to be defined and spec'd as well it's even longer.

> An easier version is the path that TenFourFox and other follow: just
> provide two binaries, which is what I intend to do with ArcticFox.
> However if Debian wants to come up with the pain of two (or more?) FF
> packages

deep breath, this may be a sane medium term solution.  long term the
separation of SIMD is needed behind dynamic loadable libraries (and HWCAPS
in glibc6) rather than assuming it is 100% guaranteed available.

LibreSOC in particular needs to appear to go *backwards* in terms of
performance before it can go forwards, once the Cray-style Vectorisation
hits gcc properly.

then other hardware can also do variants of the same libraries (including
POWER9/10).


On Tuesday, March 2, 2021, Jeffrey Walton <noloa...@gmail.com> wrote:

> Based on my experience with Botan and Crypto++... VSX is available
> with POWER7 and -mvsx compiler option. VSX is part of POWER8 core and
> does not need a compiler option.

as demonstrated by A2O/I in particular there is unfortunately a problem
with referring to IBM proprietary processor names as the canonical
definition of available features: A2I/O are Power v2.06/7 compliant but
*still do not have VSX*.

this is because the feature is optional except by the time you get to the
AIX Compliancy Subset.  see v3.0C or v3.1 first few pages, copy easily
available at http://ftp.libre-soc.org

note that it really does say "SIMD is optional" for Linux/UNIX subset,
Floating Point Embedded subset and Fixed Point subset.

many people misinterpret / misread that document including myself for
several months.

this conflation is caused by the fact that only IBM processors, which
happen to go by proprietary names POWER7-10, are commonly available.  NXP
Quorl, not so well known, which is v2.08B compliant, used in the PowerPC
Notebook, is going EOL.


> VSX is a lot like Intel tic/toc features. VSX allows a 64-bit vector
> loads and stores, but it does not provide operations on 64-bit
> vectors. You have to use POWER8 to get the 64-bit add (addudm),
> subtract (subudm), etc.

this illustrates very nicely the progression over time (many years) as the
team inside IBM ramped up the capabilities.

we can see very unfortunately that they too were seduced by what SIMD says
it can do.  as they progressed far beyond what other OpenPOWER Foundation
members were able to handle (NXP Quorl for example: NXP has made it clear
they have no intention of implementing v3.0) IBM became the only
implementor.

this is a very delicate situation with not very good options.  ideas
appreciated.

l.


-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

Reply via email to