Re: [petsc-users] Python PETSc performance vs scipy ZVODE

Niclas Götting Tue, 15 Aug 2023 03:48:41 -0700

On the basis of your suggestion, I tried using vode for the real-valuedproblem with scipy and I get roughly the same speed as before withscipy, which could have three reasons


1. (Z)VODE is slower than plain RK (however, I must admit that I'm not
   quite sure what (Z)VODE does precisely)
2. Sparse matrix operations in scipy are slow. Some of them are even
   written in pure python.
3. The RHS function in scipy must *return* a vector and therefore
   allocates new memory for each iteration.

Parallelizing the code is of course a goal of mine, but I believe thiswill only become relevant for larger systems, which I want toinvestigate in the near future.

Regarding the RHS Jacobian, I see why defining RHSFunction vsRHSJacobian should be computationally equivalent, but I found it mucheasier to optimize the RHSFunction in this case, and I'm not quite sureas to why the documentation is so specific in strictly recommending thepattern of only providing a Jacobian and not a RHS function, while thatshould be equivalent.

Lastly, I'm aware that another performance boost awaits upon turning offthe debugging functionality, but for this simple test I just wanted tosee, if there is *any* improvement in performance and I was very muchsurprised over the factor of 7 with debugging turned on already.


Thank you all for the interesting input and have a nice day!
Niclas

On 15.08.23 00:37, Zhang, Hong wrote:

PETSs is not necessarily faster than scipy for your problem whenexecuted in serial. But you get benefits when running in parallel. ThePETSc code you wrote uses float64 while your scipy code usescomplex128, so the comparison may not be fair.

In addition, using the RHS Jacobian does not necessarily make yourPETSc code slower. In your case, the bottleneck is the matrixoperations. For best performance, you should avoid adding two sparsematrices (especially with different sparsity patterns) which is verycostly. So one MatMult + one MultAdd is the best option. MatAXPY withthe same nonzero pattern would be a bit slower but still faster thanMatAXPY with subset nonzero pattern, which you used in the Jacobianfunction.

I echo Barry’s suggestion that debugging should be turned off beforeyou do any performance study.


Hong (Mr.)

On Aug 10, 2023, at 4:40 AM, Niclas Götting<[email protected]> wrote:


Thank you both for the very quick answer!

So far, I compiled PETSc with debugging turned on, but I think itshould still be faster than standard scipy in both cases. Actually,Stefano's answer has got me very far already; now I only define theRHS of the ODE and no Jacobian (I wonder, why the documentationsuggests otherwise, though). I had the following four tries atimplementing the RHS:


 1. def rhsfunc1(ts, t, u, F):
        scale = 0.5 * (5 < t < 10)
        (l + scale * pump).mult(u, F)
 2. def rhsfunc2(ts, t, u, F):
        l.mult(u, F)
        scale = 0.5 * (5 < t < 10)
        (scale * pump).multAdd(u, F, F)
 3. def rhsfunc3(ts, t, u, F):
        l.mult(u, F)
        scale = 0.5 * (5 < t < 10)
        if scale != 0:
            pump.scale(scale)
            pump.multAdd(u, F, F)
            pump.scale(1/scale)
 4. def rhsfunc4(ts, t, u, F):
        tmp_pump.zeroEntries() # tmp_pump is pump.duplicate()
        l.mult(u, F)
        scale = 0.5 * (5 < t < 10)
        tmp_pump.axpy(scale, pump,
    structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN)
        tmp_pump.multAdd(u, F, F)

They all yield the same results, but with 50it/s, 800it/, 2300it/sand 1900it/s, respectively, which is a huge performance boost (almost7 times as fast as scipy, with PETSc debugging still turned on). Asthe scale function will most likely be a gaussian in the future, Ithink that option 3 will be become numerically unstable and I'll haveto go with option 4, which is already faster than I expected. If youthink it is possible to speed up the RHS calculation even more, I'dbe happy to hear your suggestions; the -log_view is attached to thismessage.

One last point: If I didn't misunderstand the documentation athttps://petsc.org/release/manual/ts/#special-cases, should this maybebe changed?


Best regards
Niclas

On 09.08.23 17:51, Stefano Zampini wrote:

TSRK is an explicit solver. Unless you are changing the ts type fromcommand line, the explicit jacobian should not be needed. On topof Barry's suggestion, I would suggest you to write the explicit RHSinstead of assembly a throw away matrix every time that functionneeds to be sampled.

On Wed, Aug 9, 2023, 17:09 Niclas Götting<[email protected]> wrote:


    Hi all,

    I'm currently trying to convert a quantum simulation from scipy to
    PETSc. The problem itself is extremely simple and of the form
    \dot{u}(t)
    = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test
    case is
    a square function. The matrices A_const and B_const are
    extremely sparse
    and therefore I thought, the problem will be well suited for PETSc.
    Currently, I solve the ODE with the following procedure in scipy
    (I can
    provide the necessary data files, if needed, but they are just some
    trace-preserving, very sparse matrices):

    import numpy as np
    import scipy.sparse
    import scipy.integrate

    from tqdm import tqdm


    l = np.load("../liouvillian.npy")
    pump = np.load("../pump_operator.npy")
    state = np.load("../initial_state.npy")

    l = scipy.sparse.csr_array(l)
    pump = scipy.sparse.csr_array(pump)

    def f(t, y, *args):
         return (l + 0.5 * (5 < t < 10) * pump) @ y
         #return l @ y # Uncomment for f(t) = 0

    dt = 0.1
    NUM_STEPS = 200
    res = np.empty((NUM_STEPS, 4096), dtype=np.complex128)
    solver =
    scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state)
    times = []
    for i in tqdm(range(NUM_STEPS)):
         res[i, :] = solver.integrate(solver.t + dt)
         times.append(solver.t)

    Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm
    reports
    about 330it/s on my machine. When converting the code to PETSc,
    I came
    to the following result (according to the chapter
    https://petsc.org/main/manual/ts/#special-cases)

    import sys
    import petsc4py
    petsc4py.init(args=sys.argv)
    import numpy as np
    import scipy.sparse

    from tqdm import tqdm
    from petsc4py import PETSc

    comm = PETSc.COMM_WORLD


    def mat_to_real(arr):
         return np.block([[arr.real, -arr.imag], [arr.imag,
    arr.real]]).astype(np.float64)

    def mat_to_petsc_aij(arr):
         arr_sc_sp = scipy.sparse.csr_array(arr)
         mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm)
         rstart, rend = mat.getOwnershipRange()
         print(rstart, rend)
         print(arr.shape[0])
         print(mat.sizes)
         I = arr_sc_sp.indptr[rstart : rend + 1] -
    arr_sc_sp.indptr[rstart]
         J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] :
    arr_sc_sp.indptr[rend]]
         V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] :
    arr_sc_sp.indptr[rend]]

         print(I.shape, J.shape, V.shape)
         mat.setValuesCSR(I, J, V)
         mat.assemble()
         return mat


    l = np.load("../liouvillian.npy")
    l = mat_to_real(l)
    pump = np.load("../pump_operator.npy")
    pump = mat_to_real(pump)
    state = np.load("../initial_state.npy")
    state = np.hstack([state.real, state.imag]).astype(np.float64)

    l = mat_to_petsc_aij(l)
    pump = mat_to_petsc_aij(pump)


    jac = l.duplicate()
    for i in range(8192):
         jac.setValue(i, i, 0)
    jac.assemble()
    jac += l

    vec = l.createVecRight()
    vec.setValues(np.arange(state.shape[0], dtype=np.int32), state)
    vec.assemble()


    dt = 0.1

    ts = PETSc.TS().create(comm=comm)
    ts.setFromOptions()
    ts.setProblemType(ts.ProblemType.LINEAR)
    ts.setEquationType(ts.EquationType.ODE_EXPLICIT)
    ts.setType(ts.Type.RK)
    ts.setRKType(ts.RKType.RK3BS)
    ts.setTime(0)
    print("KSP:", ts.getKSP().getType())
    print("KSP PC:",ts.getKSP().getPC().getType())
    print("SNES :", ts.getSNES().getType())

    def jacobian(ts, t, u, Amat, Pmat):
         Amat.zeroEntries()
         Amat.aypx(1, l,
    structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)
         Amat.axpy(0.5 * (5 < t < 10), pump,
    structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)

    ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear)
    #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) #
    Uncomment for f(t) = 0
    ts.setRHSJacobian(jacobian, jac)

    NUM_STEPS = 200
    res = np.empty((NUM_STEPS, 8192), dtype=np.float64)
    times = []
    rstart, rend = vec.getOwnershipRange()
    for i in tqdm(range(NUM_STEPS)):
         time = ts.getTime()
         ts.setMaxTime(time + dt)
         ts.solve(vec)
         res[i, rstart:rend] = vec.getArray()[:]
         times.append(time)

    I decomposed the complex ODE into a larger real ODE, so that I can
    easily switch maybe to GPU computation later on. Now, the
    solutions of
    both scripts are very much identical, but PETSc runs about 3 times
    slower at 120it/s on my machine. I don't use MPI for PETSc yet.

    I strongly suppose that the problem lies within the jacobian
    definition,
    as PETSc is about 3 times *faster* than scipy with f(t) = 0 and
    therefore a constant jacobian.

    Thank you in advance.

    All the best,
    Niclas

<log.log>

Re: [petsc-users] Python PETSc performance vs scipy ZVODE

Reply via email to