On Tue, Jul 13, 2021 at 2:20 PM Sébastien Villemot <sebast...@debian.org> wrote: > > Le mardi 13 juillet 2021 à 20:06 +0200, Mathieu Malaterre a écrit : > > On Tue, Jul 13, 2021 at 7:21 PM Sébastien Villemot <sebast...@debian.org> > > wrote: > > > Le mardi 13 juillet 2021 à 18:56 +0200, Mathieu Malaterre a écrit : > > > > > > > > On Tue, Jul 13, 2021 at 2:04 PM Sébastien Villemot > > > > <sebast...@debian.org> wrote: > > > > > > > > > > The wiki page that synthesizes architecture specificities indicates > > > > > that Altivec is included in the baseline for the ppc64 port: > > > > > https://wiki.debian.org/ArchitectureSpecificsMemo#ppc64 > > > > > > > > > > However my understanding is that this port supports any powerpc64 CPU, > > > > > including some that don’t have Altivec (e.g. POWER4 or POWER5). This > > > > > is > > > > > also what the main wiki page for PPC64 says: > > > > > https://wiki.debian.org/PPC64 > > > > > > > > > > Can someone please clarify the situation? > > > > > > > > > > (I’m asking because I’m the maintainer of the openblas package, and > > > > > knowing whether Altivec is available or not, and more generally what > > > > > is > > > > > in the baseline, is essential for proper packaging). > > > > > > > > I do not believe that you can do much as a packager. You cannot assume > > > > anything on the target arch. You need to do the same thing as ffmpeg > > > > is doing for avx2/sse4 on amd64, you need to do runtime detection. So > > > > unless upstream is doing something very clever you cannot compile blas > > > > using any of the fancy altivec instructions :( > > > > > > > > The man page for ld.so mentions something about optimized libraries > > > > (search for "/usr/lib/sse2/"), but this is currently not in use in > > > > Debian (AFAIK). > > > > > > Actually OpenBLAS has its own runtime detection mechanism, which is > > > used to select the best linear algebra kernel for the current CPU > > > (those kernels are mainly written in assembly, and take advantage of > > > available ISA extensions). This mechanism is used on several archs, > > > including ppc64el (so at runtime, OpenBLAS chooses between a POWER8 and > > > a POWER9 kernel; there is even a POWER10 kernel already available). > > > > > > However, I cannot enable this mechanism on ppc64 and powerpc, because > > > the runtime detection only works for POWER6 and above, and my > > > understanding is that for these two ports the baseline is lower. Hence > > > on these two archs, only one kernel is included in the package binaries > > > (currently POWER4 for ppc64 and PPCG4 for powerpc). For optimal > > > performance, users should recompile OpenBLAS locally (as indicated in > > > the package description and in README.Debian). > > > > There are plenty of people on this mailing list that could test/verify > > that. Is there a quick way to check that your openblas package is > > compiled correctly for ppc32 and ppc64 (like a verbose mode) ? Did you > > do any experiment on perotto.debian.net ? > > perotto.debian.net is POWER8, so it’s clearly well above the baseline. > The package runs fine there, but that does not tell anything about > baseline violation. > > Verifying that the package compiled fine and passed its testsuite on > build daemons does not give any information about baseline violation > either, because buildds are probably above the baseline as well. FYI, > the most recent build logs are there: > https://buildd.debian.org/status/package.php?p=openblas&suite=experimental > (there is a problem with powerpc in experimental; but the version in > sid compiled). > > If nobody has the relevant knowledge, then the only option is to test > the package on the oldest possible hardware. The easiest way to test it > is to recompile it locally (since this will exercise the testsuite).
I can provide SSH access to a PowerMac G5 with Altivec. That should test the delineation between Altivec and PWR{5-10}. If OpenBLAS needs to do 64-bit math, then I have the routines cribbed away that performs 64-bit addition and subtraction using 32x4 vectors. The routines have to handle carry/borrow themselves. My experience with Crypto++ and algos like ChaCha20 demonstrate it is profitable. Send over your SSH public key/authorized_keys, if interested. Jeff