Hi All,
Thank you for the recommendations. The first option provided by Matthew works 
in petsc4py but required some additional computations to re-organize 
information. I developed an approach that was easier to build within my current 
system using the option provided by Hong along with some other posts I found on 
the user forum. See below for an example of how I integrated Numba and petcs4py 
so the filling of matrix elements on each processor is done using the optimized 
machine code produced by Numba (you can eliminate the cleaning step if the 
number of non-zero elements for each row is well defined). Scaling and overall 
performance is now satisfactory. 

I do have another question. In order to make this work I create the Petsc 
matrix 'A' twice, first to get local ownership and second to define the local 
elements for a particular processor:

Algorithm------------
(1) Create a Petsc matrix 'A' and set size and type(2) Get the local row start 
and end for matrix 'A'(3) Define the local non-zero coefficients for the rows 
owned by processor using a Numba JIT-compiled loop and store result in a csr 
matrix defined using Scipy. 
(4) Redefine Petsc matix 'A' using the the local csr elements define in step 
3.(5) Begin and end assembly(6) Define RHS and initial solution vectors(7) 
Solve system 
 (see code example below for details)
Steps (1) and (4) appear redundant and potentially sub-optimal from a 
performance perspective (perhaps due to my limited experience with petscy4py). 
Is there a better approach in terms of elegance and performance? Also, if this 
is the optimal syntax could someone elaborate on what exactly is occurring 
behind the seen in terms of memory allocation?

Thank you all again for your guidance and insights.

Modified Version of Dalcin's 2D Poisson Example with Numba
----------------------------------------------------------------------------------
import sys
from typing import Tuple
import numpy.typing as npt
import numpy as np
from numba import njit
import petsc4py
petsc4py.init(sys.argv)
from petsc4py import PETSc
import scipy.sparse as sps


def main() -> None:
    xnodes = 8000
    nnz_max = 10
    
    ynodes = xnodes
    N = xnodes*ynodes
    dx = 1.0/(xnodes + 1)
    
    A = PETSc.Mat()
    A.create(comm=PETSc.COMM_WORLD)
    A.setSizes((xnodes*ynodes, xnodes*ynodes))
    A.setType(PETSc.Mat.Type.AIJ)
    
    rstart, rend = A.getOwnershipRange()
     Ascipy = build_csr_matrix(
        rstart, rend, nnz_max, dx, xnodes, ynodes)
    csr=(
        Ascipy.indptr[rstart:rend+1] - Ascipy.indptr[rstart],
        Ascipy.indices[Ascipy.indptr[rstart]:Ascipy.indptr[rend]],
        Ascipy.data[Ascipy.indptr[rstart]:Ascipy.indptr[rend]]
        )
    A = PETSc.Mat().createAIJ(size=(N,N), csr=csr)    
    A.assemblyBegin()
    A.assemblyEnd()
    
    ksp = PETSc.KSP()
    ksp.create(comm=A.getComm())
    ksp.setType(PETSc.KSP.Type.CG)
    ksp.getPC().setType(PETSc.PC.Type.GAMG)
    
    ksp.setOperators(A)
    ksp.setFromOptions()
    
    x, b = A.createVecs()
    b.set(1.0)
    
    ksp.solve(b, x)
    
    
def build_csr_matrix(
        rstart:int, 
        rend:int, 
        nnz_max:int, 
        dx:float, 
        xnodes:int, 
        ynodes:int
):
    Anz, Arow, Acol = build_nonzero_arrays(
        rstart, rend, nnz_max, dx, xnodes, ynodes
        )
    N = xnodes*ynodes
    Ls = sps.csr_matrix((Anz, (Arow,Acol)), shape=(N,N), dtype=np.float64)
    return Ls
  
  
def build_nonzero_arrays(
        rstart:int, 
        rend:int, 
        nnz_max:int, 
        dx:float, 
        xnodes:int, 
        ynodes:int
) -> Tuple[
    npt.NDArray[np.float64], 
    npt.NDArray[np.float64], 
    npt.NDArray[np.float64]
    ]:
    nrows_local = (rend - rstart) + 1
    Anz_ini = np.zeros((nnz_max*nrows_local), dtype=np.float64)
    Arow_ini = np.zeros((nnz_max*nrows_local), dtype=np.int32)
    Acol_ini = np.zeros((nnz_max*nrows_local), dtype=np.int32)
    icount_nz = define_nonzero_values(
        rstart, rend, dx, xnodes, ynodes, Anz_ini, Arow_ini, Acol_ini
        )
    (
        Anz, Arow, Acol
    ) = clean_nonzero_arrays(icount_nz, Anz_ini, Arow_ini, Acol_ini)
    return Anz, Arow, Acol


@njit
def define_nonzero_values(
        rstart:int, 
        rend:int,
        dx:float,
        xnodes:int,
        ynodes:int,
        Anz:npt.NDArray[np.float64],
        Arow:npt.NDArray[np.int64],
        Acol:npt.NDArray[np.int64]
) -> int:
    """ Fill matrix A
    """
    icount_nz = 0
    for row in range(rstart, rend):
        i, j = index_to_grid(row, xnodes)
        #A[row, row] = 4.0/dx**2
        Anz[icount_nz] = 4.0/dx**2
        Arow[icount_nz] = row
        Acol[icount_nz] = row
        icount_nz += 1
        if i > 0:
            column = row - xnodes
            #A[row, column] = -1.0/dx**2
            Anz[icount_nz] = -1.0/dx**2
            Arow[icount_nz] = row
            Acol[icount_nz] = column
            icount_nz += 1
        if i < xnodes - 1:
            column = row + xnodes
            #A[row, column] = -1.0/dx**2
            Anz[icount_nz] = -1.0/dx**2
            Arow[icount_nz] = row
            Acol[icount_nz] = column
            icount_nz += 1
        if j > 0:
            column = row - 1
            #A[row, column] = -1.0/dx**2
            Anz[icount_nz] = -1.0/dx**2
            Arow[icount_nz] = row
            Acol[icount_nz] = column
            icount_nz += 1
        if j < xnodes - 1:
            column = row + 1
            #A[row, column] = -1.0/dx**2
            Anz[icount_nz] = -1.0/dx**2
            Arow[icount_nz] = row
            Acol[icount_nz] = column
            icount_nz += 1
    return icount_nz


@njit
def clean_nonzero_arrays(
        icount_nz:int, 
        Anz_ini:npt.NDArray[np.float64], 
        Arow_ini:npt.NDArray[np.float64], 
        Acol_ini:npt.NDArray[np.float64]
) -> Tuple[
    npt.NDArray[np.float64], 
    npt.NDArray[np.float64], 
    npt.NDArray[np.float64]
    ]:
    Anz = np.zeros((icount_nz), dtype=np.float64)
    Arow = np.zeros((icount_nz), dtype=np.int32)
    Acol = np.zeros((icount_nz), dtype=np.int32)
    for i in range(icount_nz):
        Anz[i] = Anz_ini[i]
        Arow[i] = Arow_ini[i]
        Acol[i] = Acol_ini[i]
    return Anz, Arow, Acol


@njit
def index_to_grid(r:int, n:int) -> Tuple[int,int]:
    """Convert a row number into a grid point."""
    return (r//n, r%n)


if __name__ == "__main__":
    main()

    On Friday, August 18, 2023 at 11:24:36 AM CDT, Zhang, Hong 
<[email protected]> wrote:  
 
 You can use this to build a PETSc matrix with the index arrays ai,aj and the 
value array aa:
    PETSc.Mat().createAIJ(size=(nrows,ncols), csr=(ai,aj,aa))

Hong (Mr.)

> On Aug 17, 2023, at 7:37 AM, Erik Kneller via petsc-users 
> <[email protected]> wrote:
> 
> Hi All,
> 
> I need to fill non-zero values of a Petsc matrix via petsc4py for the domain 
> defined by A.getOwnershipRange() using three Numpy arrays: (1) array 
> containing row indices of non-zero value, (2) array containing column indices 
> of non-zero values and (3) array containing the non-zero matrix values. How 
> can one perform this type of filling operation in petsc4py? The method 
> A.setValues does not appear to allow this since it only works on an 
> individual matrix element or a block of matrix elements.
> 
> I am using Numpy arrays since they can be computed in loops optimized using 
> Numba on each processor. I also cannot pass the Petsc matrix to a Numba 
> compiled function since type information cannot be inferred. I absolutely 
> need to avoid looping in standard Python to define Petsc matrix elements due 
> to performance issues. I also need to use a standard petscy4py method and 
> avoid writing new C or Fortran wrappers to minimize language complexity.
> 
> Example Algorithm Building on Lisandro Dalcin's 2D Poisson Example:
> ----------------------------------------------------------------------------------------------
> comm = PETSc.COMM_WORLD
> rank = comm.getRank()
> 
> dx = 1.0/(xnodes + 1) # xnodes is the number of nodes in the x and 
> y-directions of the grid
> nnz_max = 5 # max number of non-zero values per row
> 
> A = PETSc.Mat()
> A.create(comm=PETSc.COMM_WORLD)
> A.setSizes((xnodes*ynodes, xnodes*ynodes))
> A.setType(PETSc.Mat.Type.AIJ)
> A.setPreallocationNNZ(nnz_max)
> 
> rstart, rend = A.getOwnershipRange()
> 
> # Here Anz, Arow and Acol are vectors with size equal to the number of 
> non-zero values 
> Anz, Arow, Acol = build_nonzero_numpy_arrays_using_numba(rstart, rend, 
> nnz_max, dx, xnodes, ynodes)
> 
> A.setValues(Arow, Acol, Anz) # <--- This does not work.
> 
> A.assemblyBegin()
> A.assemblyEnd()
> ksp = PETSc.KSP()
> ksp.create(comm=A.getComm())
> ksp.setType(PETSc.KSP.Type.CG)
> ksp.getPC().setType(PETSc.PC.Type.GAMG)
> ksp.setOperators(A)
> ksp.setFromOptions()
> x, b = A.createVecs()
> b.set(1.0)
> 
> ksp.solve(b, x)
> 
> Regards,
> Erik Kneller, Ph.D.
> 

  

Reply via email to