Hi Gentoo devs,

Classical numerical linear algebra libraries, BLAS[1] and LAPACK[2]
play important roles in the scientific computing field, as many
software such as Numpy, Scipy, Julia, Octave, R are built upon them.

There is a standard implementation of BLAS and LAPACK, named netlib
or simply "reference implementation". This implementation had been
provided by gentoo's main repo. However, it has a major problem:
performance. On the other hand, a number of well-optimized BLAS/LAPACK
implementations exist, including OpenBLAS (free), BLIS (free),
MKL (non-free), etc., but none of them has been properly integrated
into the Gentoo distribution.

I'm writing to propose a good solution to this problem. If no gentoo
developer is object to this proposal, I'll keep moving forward and
start submitting PRs to Gentoo main repo.

Historical Obstacle
-------------------

Different BLAS/LAPACK implementations are expected to be compatible
to each other in both the API and ABI level. They can be used as
drop-in replacement to the others. This sounds nice, but the difference
in SONAME hampered the gentoo integration of well-optimized ones.

Assume a Gentoo user compiled a pile of packages on top of the reference
BLAS and LAPACK, namely these reverse dependencies are linked against
libblas.so.3 and liblapack.so.3 . When the user discovered that
OpenBLAS provides much better performance, they'll have to recompile
the whole reverse dependency tree in order to take advantage from
OpenBLAS,
because the SONAME of OpenBLAS is libopenblas.so.0 . When the user
wants to try MKL (libmkl_rt.so), they'll have to recompile the whole
reverse dependency tree again.

This is not friendly to our earth.

Goal
----

  * When a program is linked against libblas.so or liblapack.so
    provided by any BLAS/LAPACK provider, the eselect-based solution
    will allow user to switch the underlying library without recompiling
    anything.

  * When a program is linked against a specific implementation, e.g.
    libmkl_rt.so, the solution doesn't break anything.

Solution
--------

Similar to Debian's update-alternatives mechanism, Gentoo's eselect
is good at dealing with drop-in replacements as well. My preliminary
investigation suggests that eselect is enough for enabling BLAS/LAPACK
runtime switching. Hence, the proposed solution is eselect-based:

  * Every BLAS/LAPACK implementation should provide generic library
    and eselect candidate libraries at the same time. Taking netlib,
    BLIS and OpenBLAS as examples:

    reference:

      usr/lib64/blas/reference/libblas.so.3 (SONAME=libblas.so.3)
        -- default BLAS provider
        -- candidate of the eselect "blas" unit
        -- will be symlinked to usr/lib64/libblas.so.3 by eselect

      usr/lib64/lapack/reference/liblapack.so.3 (SONAME=liblapack.so.3)
        -- default LAPACK provider
        -- candidate of the eselect "lapack" unit
        -- will be symlinked to usr/lib64/liblapack.so.3 by eselect

    blis (doesn't provide LAPACK):
      
      usr/lib64/libblis.so.2  (SONAME=libblis.so.2)
        -- general purpose

      usr/lib64/blas/blis/libblas.so.3 (SONAME=libblas.so.3)
        -- candidate of the eselect "blas" unit
        -- will be symlinked to usr/lib64/libblas.so.3 by eselect
        -- compiled from the same set of object files as libblis.so.2

    openblas:
          
      usr/lib64/libopenblas.so.0 (SONAME=libopenblas.so.0)
        -- general purpose

      usr/lib64/blas/openblas/libblas.so.3 (SONAME=libblas.so.3)
        -- candidate of the eselect "blas" unit
        -- will be symlinked to usr/lib64/libblas.so.3 by eselect
        -- compiled from the same set of object files as
libopenblas.so.0

      usr/lib64/lapack/openblas/liblapack.so.3 (SONAME=liblapack.so.3)
        -- candidate of the eselect "lapack" unit
        -- will be symlinked to usr/lib64/liblapack.so.3 by eselect
        -- compiled from the same set of object files as
libopenblas.so.0

This solution is similar to Debian's[3]. This solution achieves our
goal,
and it requires us to patch upstream build systems (same to Debian).
Preliminary demonstration for this solution is available, see below.

Is this solution reliable?
--------------------------

* A similar solution has been used by Debian for many years.
* Many projects call BLAS/LAPACK libraries through FFI, including Julia.
  (See Julia's standard library: LinearAlgebra)

Proposed Changes
----------------

1. Deprecate sci-libs/{blas,cblas,lapack,lapacke}-reference from gentoo
   main repo. They use exactly the same source tarball. It's not quite
   helpful to package these components in a fine-grained manner. A
single
   sci-libs/lapack package is enough.

2. Merge the "cblas" eselect unit into "blas" unit. It is potentially
   harmful when "blas" and "cblas" point to different implementations.
   That means "app-eselect/eselect-cblas" should be deprecated.

3. Update virtual/{blas,cblas,lapack,lapacke}. BLAS/LAPACK providers
   will be registered in their dependency information.

Note, ebuilds for BLAS/LAPACK reverse dependencies are expected to work
with these changes correctly without change. For example, my local
numpy-1.16.1 compilation was successful without change.

Preliminary Demonstration
-------------------------

The preliminary implementation is available in my personal overlay[4].
A simple sanity test script `check-cpp.sh` is provided to illustrate
the effectiveness of the proposed solution.

The script `check-cpp.sh` compiles two C++ programs -- one calls general
matrix-matrix multiplication from BLAS, while another one calls general
singular value decomposition from LAPACK. Once compiled, this script
will switch different BLAS/LAPACK implementations and run the C++
programs
without recompilation.

The preliminary result is avaiable here[5]. (CPU=Power9, ARCH=ppc64le)
>From the experimental results, we find that

  For (512x512) single precision matrix multiplication:
   * reference BLAS takes ~360 ms
   * BLIS takes ~70 ms
   * OpenBLAS takes ~10 ms

  For (512x512) single precision singular value decomposition:
   * reference LAPACK takes ~1900 ms
   * BLIS (+reference LAPACK) takes ~1500 ms
   * OpenBLAS takes ~1100 ms

The difference in computation speed illustrates the effectiveness of
the proposed solution. Theoretically, any other package could take
advantage from this solution without any recompilation as long as
it's linked against a library with SONAME.

Acknowledgement
---------------
This is an on-going GSoC-2019 Porject:
https://summerofcode.withgoogle.com/projects/?sp-page=2#6268942782300160

Mentor: Benda Xu

[1] BLAS = Basic Linear Algebra Subroutines. It's a set of API + ABI.
[2] LAPACK = Linear Algebra PACKage. It's a set of API + ABI.
[3] https://wiki.debian.org/DebianScience/LinearAlgebraLibraries
[4] https://github.com/cdluminate/my-overlay
[5] https://gist.github.com/cdluminate/0cfeab19b89a8b5ac4ea2c5f942d8f64

Reply via email to