Re: [numpy @ bigsur: multithreading]

Chris Jones Mon, 03 Jan 2022 09:50:48 -0800


> On 3 Jan 2022, at 5:18 pm, Maxim Abalenkov <[email protected]> wrote:
> 
> Dear all,
> 
> Thank you for all of your replies and suggestions! I have written my own 
> matrix multiplication script in order to test NumPy’s performance. Please 
> find it attached. I’m using the MKL variant of NumPy. Strangely enough the 
> `port variants py39-numpy` still returns:
> 
> port variants py39-numpy
> py39-numpy has the variants:
>   atlas: Use MacPorts ATLAS Libraries
>     * conflicts with mkl openblas
>   gcc10: Build using the MacPorts gcc 10 compiler
>     * conflicts with gcc11 gcc8 gcc9 gccdevel gfortran gfortran
>   gcc11: Build using the MacPorts gcc 11 compiler
>     * conflicts with gcc10 gcc8 gcc9 gccdevel gfortran gfortran
>   gcc8: Build using the MacPorts gcc 8 compiler
>     * conflicts with gcc10 gcc11 gcc9 gccdevel gfortran gfortran
>   gcc9: Build using the MacPorts gcc 9 compiler
>     * conflicts with gcc10 gcc11 gcc8 gccdevel gfortran gfortran
>   gccdevel: Build using the MacPorts gcc devel compiler
>     * conflicts with gcc10 gcc11 gcc8 gcc9 gfortran gfortran
> [+]gfortran: Build using the MacPorts gcc 11 Fortran compiler
>     * conflicts with gcc10 gcc11 gcc8 gcc9 gccdevel
>   mkl: Use MacPorts MKL Libraries
>     * conflicts with atlas openblas
> [+]openblas: Use MacPorts OpenBLAS Libraries
>     * conflicts with atlas mkl
>   universal: Build for multiple architectures
> 
> Either I don’t understand the expected behaviour or my `port variants` 
> command returns something else. I would expect it to show [+]gfortran and 
> [+]mkl, not the [+]openblas.


No. The + sign indicates which variants are enabled by default, not what you 
happened to be using yourself. For that the command you use below correctly 
shows this.

> On the other hand, command `port installed py39-numpy` shows:
> 
> port installed py39-numpy
> The following ports are currently installed:
>  py39-numpy @1.21.5_1+gfortran+mkl
>  py39-numpy @1.22.0_0+gfortran+mkl (active)
> 
> Finally, I wasn’t able to specify 8 execution threads with `export 
> MKL_NUM_THREADS=8`. NumPy was still using 4, but the `htop` reported 350–380% 
> CPU load for the `/usr/bin/env python3 ./dgemm_numpy.py` process. I think 
> this is good news!
> 
> The `otool` command executed under 
> `/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/core`
>  shows that MKL backend is being used.
> 
> otool -L _multiarray_umath.cpy
> _multiarray_umath.cpython-39-darwin.so:
>    
> /opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/libmkl_rt.2.dylib
>  (compatibility version 0.0.0, current version 0.0.0)
>    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 
> 1311.0.0)
> 
> I think I still need to experiment with OpenBLAS and compare the performance 
> numbers. Thank you for your help!
> 
> —
> Best wishes,
> Maxim
>

#!/usr/bin/env python3

import numpy as np
import time

print(np.__version__)
print(np.show_config())

m = 20000
k = 20000
n = 20000

t0 = time.time()
alpha = np.random.rand()
beta  = np.random.rand()

A  = np.random.rand(m, k)
B  = np.random.rand(k, n)
C  = np.random.rand(m, n)
t1 = time.time()

t = t1-t0
print('Generation time: {0:f}'.format(t))
print('  alpha: {0:f},  beta: {1:f}'.format(alpha, beta))

t0 = time.time()
C  = alpha*np.matmul(A, B) + beta*C
t1 = time.time()
t  = t1-t0
print('Multiplication time: {0:f}'.format(t))

## @eof dgemm_numpy.py

> 
> 
>>> On 29 Dec 2021, at 13:33, Joshua Root <[email protected]> wrote:
>>> 
>>> Maxim Abalenkov wrote:
>>> 
>>> 
>>> Dear all,
>>> 
>>> I’m looking for guidance please. I would like to make sure, that I use all 
>>> eight of my CPU cores, when I run Python’s 3.9.9 NumPy on my macOS BigSur 
>>> 12.1. When I run my NumPy code, I see in ‘htop’, that only one ‘python’ 
>>> process is running and the core utilisation is 20–25%. I remember in the 
>>> past, stock MacPorts NumPy installation would use Apple’s Accelerate 
>>> library including the multithreaded BLAS and LAPACK (
>>> https://developer.apple.com/documentation/accelerate
>>> ). As I understand this is no longer the case.
>>> 
>>> I run Python code using a virtual environment located under
>>> 
>>> /opt/venv/zipfstime/lib/python3.9/site-packages/numpy/core
>>> 
>>> When I change there and issue
>>> 
>>> otool -L _multiarray_umath.cpython-39-darwin.so
>>> 
>>> _multiarray_umath.cpython-39-darwin.so:
>>>    @loader_path/../.dylibs/libopenblas.0.dylib (compatibility version 
>>> 0.0.0, current version 0.0.0)
>>>    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 
>>> 1281.100.1)
>>> 
>>> In other words, NumPy relies on openBLAS. Command `port variants openblas` 
>>> returns
>>> 
>>> OpenBLAS has the variants:
>>>  g95: Build using the g95 Fortran compiler
>>>    * conflicts with gcc10 gcc11 gcc8 gcc9 gccdevel
>>>  gcc10: Build using the MacPorts gcc 10 compiler
>>>    * conflicts with g95 g95 gcc11 gcc8 gcc9 gccdevel
>>> [+]gcc11: Build using the MacPorts gcc 11 compiler
>>>    * conflicts with g95 g95 gcc10 gcc8 gcc9 gccdevel
>>>  gcc8: Build using the MacPorts gcc 8 compiler
>>>    * conflicts with g95 g95 gcc10 gcc11 gcc9 gccdevel
>>>  gcc9: Build using the MacPorts gcc 9 compiler
>>>    * conflicts with g95 g95 gcc10 gcc11 gcc8 gccdevel
>>>  gccdevel: Build using the MacPorts gcc devel compiler
>>>    * conflicts with g95 g95 gcc10 gcc11 gcc8 gcc9
>>> [+]lapack: Add Lapack/CLapack support to the library
>>>  native: Force compilation on machine to get fully optimized library
>>>  universal: Build for multiple architectures
>>> 
>>> I tried installing the “native” variant of OpenBLAS port with `sudo port 
>>> install openblas +native` and setting the environment variable 
>>> `OMP_NUM_THREADS=8`, but I didn’t see any improvement when running my 
>>> Python code. I would welcome your help and guidance on this subject.
>>> 
>> I'm using py39-numpy with default variants:
>> 
>> % port installed py39-numpy openblas
>> The following ports are currently installed:
>>  OpenBLAS @0.3.19_0+gcc11+lapack (active)
>>  py39-numpy @1.21.5_1+gfortran+openblas (active)
>> 
>> I see Python using around 600% CPU on my 6-core machine when running this 
>> basic benchmark 
>> script:<https://gist.github.com/markus-beuckelmann/8bc25531b11158431a5b09a45abd6276>
>> 
>> If you try that and see how many cores it uses, that will at least tell you 
>> if there is something different about your code. If it doesn't use all the 
>> cores for you, there are some other environment variables that OpenBLAS 
>> looks at that you could check: 
>> <https://github.com/xianyi/OpenBLAS#setting-the-number-of-threads-using-environment-variables>
>> 
>> - Josh
>> 
>

Re: [numpy @ bigsur: multithreading]

Reply via email to