Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)
On Sun, Feb 28, 2021 at 11:52:12PM +, Luke Kenneth Casson Leighton wrote: > On Monday, March 1, 2021, Riccardo Mottola > wrote: > > > A quick solution would to have this configure as a convenience, but have > a way to pass an --enable-altivec and -disable-altivec (or with/without?) > parameter to configure. > > EABI v2.0 rather unfortunately, despite it being optional in the OpenPOWER > Compliancy Suite, made SIMD mandatory. > > EABI v1.5 does not require SIMD. > > the problem is that the assumption "#ifdef POWER9" is bleeding through to > many code repositories. > > Tulio Magno Quites Machado Filho is currently working on glibc6 patches > which reverse these erroneous assumptions, replacing them with "#ifdef VSX" > thus allowing people to compile code that does not rely on SIMD. Beware that VSX is not Altivec. Altivec was called VMX by IBM and VSX is a superset of Altivec (IIRC). G4 and G5 do not have VSX. > > unfortunately it is somewhat a lost cause because of the mistake made in > EABI v2. modifying EABI v2 to make SIMD optional is no longer possible > because it would break backwards compatibility, the only option being to > create a new triplet, then an entire new distro port, and that is a 3 to 5 > year process. > > an alternative solution is to have a kernel-level emulator of SIMD > instructions. > https://bugs.libre-soc.org/show_bug.cgi?id=602 This is going to be hopelessly slow. The point of SIMD is to process quickly vast amounts of data, the overhead of every single emulated instructions is counted in hundreds of cycles. IMHO, the only solution is to: a) only use SIMD in library code b) compile 2 or 3 versions of libraries: no SIMD, VMX and/or VSX c) put each library in a different directory d) at run time, select the path to load the libraries from CPU capabilities There is a precedent for this in the x86 world, where there were i386 and i686 directories to support the PPro. It is still the case on the machines where I have to install 32 bit libraries: $ locate libx264.so /usr/lib/i386-linux-gnu/i686/sse2/libx264.so.155 /usr/lib/i386-linux-gnu/libx264.so.155 /usr/lib/x86_64-linux-gnu/libx264.so.155 There are two 32 bit versions of the libx264, one for old processors and one for processors with sse2. Regards, Gabriel > > fascinatingly there is precedent for this in the form of sstep.c which > triggers from illegal instruction trap and emulates some parts of the > OpenPOWER ISA. > > l. > > > -- > --- > crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)
--- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Mon, Mar 1, 2021 at 8:39 AM Gabriel Paubert wrote: > Beware that VSX is not Altivec. Altivec was called VMX by IBM and > VSX is a superset of Altivec (IIRC). > > G4 and G5 do not have VSX. apologies i tend to lump these together. > This is going to be hopelessly slow. great! i have absolutely no problem with that, at all. the idea is to give people access to something where due to the ongoing cascading mistaken assumptions "nobody has any hardware except IBM POWER9 and EABI 2.0 says VSX therefore #ifdef POWER9 --> enable VSX". it's a stopgap measure that at least allows... _something_. breathing space whilst the OpenPOWER Foundation puts together a plan. > The point of SIMD is to process quickly vast amounts of data, that was its seductive intent. the reality is very different, poisoning L1 I-Cache through massive bloating of program size, and in some cases actually causing such heavy internal bus contention between instruction and data reads that all processing grinds to a halt. https://www.sigarch.org/simd-instructions-considered-harmful/ > the overhead of every single emulated > instructions is counted in hundreds of cycles. > IMHO, the only solution is to: > a) only use SIMD in library code > b) compile 2 or 3 versions of libraries: no SIMD, VMX and/or VSX this requires going backwards to EABI 1.5. EABI 2.0 as currently defined *makes SIMD mandatory*. given that debian PPC64 is BE EABI 1.5 but PPC64LE is LE EABI 2.0 i don't see how that's workable. unless you create a new triplet PPC64-LE-using-EABI-1.5 also: multilib and it is being ripped out from distros. > c) put each library in a different directory > d) at run time, select the path to load the libraries from CPU >capabilities this is multiarch i believe. it requires, as i recall, a syscall-level understanding of the two ABIs. with ppc64 being BE and ppc64le being LE this would require word-order swapping at the syscall level. l.
Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)
On Mon, Mar 01, 2021 at 12:22:22PM +, Luke Kenneth Casson Leighton wrote: > --- > crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 > > On Mon, Mar 1, 2021 at 8:39 AM Gabriel Paubert wrote: > > > Beware that VSX is not Altivec. Altivec was called VMX by IBM and > > VSX is a superset of Altivec (IIRC). > > > > G4 and G5 do not have VSX. > > apologies i tend to lump these together. > > > This is going to be hopelessly slow. > > great! i have absolutely no problem with that, at all. the idea is > to give people access to something where due to the ongoing cascading > mistaken assumptions "nobody has any hardware except IBM POWER9 and > EABI 2.0 says VSX therefore #ifdef POWER9 --> enable VSX". > > it's a stopgap measure that at least allows... _something_. breathing > space whilst the OpenPOWER Foundation puts together a plan. > > > The point of SIMD is to process quickly vast amounts of data, > > that was its seductive intent. the reality is very different, > poisoning L1 I-Cache through massive bloating of program size, and in > some cases actually causing such heavy internal bus contention between > instruction and data reads that all processing grinds to a halt. > > https://www.sigarch.org/simd-instructions-considered-harmful/ This publications claims (and probably rightly) that vector instructions are preferable to SIMD, but does not say at all that falling back to purely scalar is better. Also, PPC SIMD has seen fewer variations than x86, which started with MMX (64bit), then SSE (128 bit registers, single precision only), SSE2 (finally able to get rid of the awful x87 stacked registers) and so many extensions that I agree that it is impossible to track. At least for PPC until now, it has been 128 bit registers, always. > > > > the overhead of every single emulated > > instructions is counted in hundreds of cycles. > > > IMHO, the only solution is to: > > a) only use SIMD in library code > > b) compile 2 or 3 versions of libraries: no SIMD, VMX and/or VSX > > this requires going backwards to EABI 1.5. EABI 2.0 as currently > defined *makes SIMD mandatory*. > > given that debian PPC64 is BE EABI 1.5 but PPC64LE is LE EABI 2.0 i > don't see how that's workable. Hmmm, G5 is BE only. No way to run LE, G4 and older are 32 bit BE (they could run LE also, but it's not easy). > > unless you create a new triplet PPC64-LE-using-EABI-1.5 > I don't think so, stay with BE. > also: multilib and it is being ripped out from distros. > > > c) put each library in a different directory > > d) at run time, select the path to load the libraries from CPU > >capabilities > > this is multiarch i believe. it requires, as i recall, a > syscall-level understanding of the two ABIs. with ppc64 being BE and > ppc64le being LE this would require word-order swapping at the syscall > level. You have to be BE anyway (kernel and userspace) to support the oldest 64 bit processors. The switch to LE occured during Power7, but I believe that real official distro support only happened with Power8. Locating libraries at program startup is done by ld.so, not by the kernel. Gabriel
enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)
On Monday, March 1, 2021, Gabriel Paubert wrote: > On Mon, Mar 01, 2021 at 12:22:22PM +, Luke Kenneth Casson Leighton wrote: >> --- >> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 >> >> On Mon, Mar 1, 2021 at 8:39 AM Gabriel Paubert wrote: >> >> > Beware that VSX is not Altivec. Altivec was called VMX by IBM and >> > VSX is a superset of Altivec (IIRC). >> > >> > G4 and G5 do not have VSX. >> >> apologies i tend to lump these together. >> >> > This is going to be hopelessly slow. >> >> great! i have absolutely no problem with that, at all. the idea is >> to give people access to something where due to the ongoing cascading >> mistaken assumptions "nobody has any hardware except IBM POWER9 and >> EABI 2.0 says VSX therefore #ifdef POWER9 --> enable VSX". >> >> it's a stopgap measure that at least allows... _something_. breathing >> space whilst the OpenPOWER Foundation puts together a plan. >> >> > The point of SIMD is to process quickly vast amounts of data, >> >> that was its seductive intent. the reality is very different, >> poisoning L1 I-Cache through massive bloating of program size, and in >> some cases actually causing such heavy internal bus contention between >> instruction and data reads that all processing grinds to a halt. >> >> https://www.sigarch.org/simd-instructions-considered-harmful/ > > This publications claims (and probably rightly) that vector instructions > are preferable to SIMD, but does not say at all that falling back to > purely scalar is better. i appreciate this is a side-track: LibreSOC is introducing a concept of Cray-style "hardware for-loops" around the scalar ISA. with gcc autovectorisation the seemingly-scalar c code becomes as fast as the hardware has available parallel ALUs. hence the performance penalty is not as great. POWER9 on the other hand, if you've seen the proposed glibc6 patch to add VSX to e.g. strncpy, it's alarming. whilst the above article is hypothetical, the real-world patch is a staggering 250 hand-coded assembly instructions (the equivalent RVV is 13), dramatically reducing L1 cache effectiveness and likely interfering with the use of memory bounds checkers that align memory at the end of pages. > Also, PPC SIMD has seen fewer variations than x86, which started with > MMX (64bit), then SSE (128 bit registers, single precision only), SSE2 > (finally able to get rid of the awful x87 stacked registers) and so many > extensions that I agree that it is impossible to track. indeed. all that is gone with Cray-style Vectors. > Hmmm, G5 is BE only. No way to run LE, G4 and older are 32 bit BE (they > could run LE also, but it's not easy). > understood. ok so EABI 2.0 is out of the running, and EABI 1.9 is the 64-bit upgrade of 1.5, which is what debian-ppc64 (be) is based on. 1.5 and 1.9 never had SIMD / VMX / VSX so there shouldn't be a problem (for G5). which, coming back to the original question, i'm not seeing a reason why disabling altivec should not compile. unless, of course, there have been assumptions "#ifdef PPC64 equals POWER9 therefore VSX" which are unfortunately creeping in ever since EABI 2.0 came about? l. -- --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)
On Mon, Mar 1, 2021 at 3:39 AM Gabriel Paubert wrote: > > On Sun, Feb 28, 2021 at 11:52:12PM +, Luke Kenneth Casson Leighton wrote: > > On Monday, March 1, 2021, Riccardo Mottola > > wrote: > > ... > > Tulio Magno Quites Machado Filho is currently working on glibc6 patches > > which reverse these erroneous assumptions, replacing them with "#ifdef VSX" > > thus allowing people to compile code that does not rely on SIMD. > > Beware that VSX is not Altivec. Altivec was called VMX by IBM and > VSX is a superset of Altivec (IIRC). Based on my experience with Botan and Crypto++... VSX is available with POWER7 and -mvsx compiler option. VSX is part of POWER8 core and does not need a compiler option. VSX is a lot like Intel tic/toc features. VSX allows a 64-bit vector loads and stores, but it does not provide operations on 64-bit vectors. You have to use POWER8 to get the 64-bit add (addudm), subtract (subudm), etc. So a POWER7+VSX 64-bit add might look like: typedef __vector unsigned intuint32x4_p; typedef __vector unsigned long long uint64x2_p; # Load 64-bit vector from uint64_t[2] uint64x2_p a = vec_ld(...); uint64x2_p b = vec_ld(...); # But still perform the 32-bit add uint64x2_p c = (uint64x2_p )VecAdd64((uint32x4_p)a, (uint32x4_p)b); And: uint32x4_p VecAdd64(const uint32x4_p vec1, const uint32x4_p vec2) { // The carry mask selects carry's for elements 1 and 3 and sets // remaining elements to 0. The result is then shifted so the // carried values are added to elements 0 and 2. #if defined(MYLIB_BIG_ENDIAN) const uint32x4_p zero = {0, 0, 0, 0}; const uint32x4_p mask = {0, 1, 0, 1}; #else const uint32x4_p zero = {0, 0, 0, 0}; const uint32x4_p mask = {1, 0, 1, 0}; #endif uint32x4_p cy = vec_addc(vec1, vec2); uint32x4_p res = vec_add(vec1, vec2); cy = vec_and(mask, cy); cy = vec_sld (cy, zero, 4); return vec_add(res, cy); } A POWER8 add looks as expected: uint64x2_p VecAdd64(const uint64x2_p vec1, const uint64x2_p vec2) { return vec_add(a, b); } Even with the crippled 64-bit add using 32-bit elements, some algorithms, like Bernstein's ChaCha, runs about 2.5x faster than over the scalar unit. Jeff