The branch should now be good to go (https://gitlab.com/petsc/petsc/-/merge_requests/6841). Sorry, I made a mistake before, hence the error on PetscObjectQuery(). I’m not sure the code will be covered by the pipeline, but I have tested this on a Raviart—Thomas discretization with PCFIELDSPLIT. You’ll see in the attached logs that: 1) the numerics match 2) in the SBAIJ case, PCFIELDSPLIT extract the (non-symmetric) A_{01} block from the global (symmetric) A and we get the A_{10} block cheaply by just using MatCreateHermitianTranspose(), instead of calling another time MatCreateSubMatrix() Please let me know if you have some time to test the branch and whether it fails or succeeds on your test cases. Also, I do not agree with what Hong said. Sometimes, the assembly of a coefficient can be more expensive than the communication of the said coefficient. So they are instances where SBAIJ would be more efficient than AIJ even if it would require more communication, it is not a black and white picture. Thanks, Pierre |
0 KSP Residual norm 3.873169750889e+00
Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 25
1 KSP Residual norm 1.182487410355e-01
Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 28
2 KSP Residual norm 1.102241338775e-02
Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 26
3 KSP Residual norm 2.301967727513e-03
Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 27
4 KSP Residual norm 1.597010741936e-04
Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 28
5 KSP Residual norm 5.540316293664e-05
Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 29
6 KSP Residual norm 6.398182217972e-06
KSP Object: 4 MPI processes
type: fgmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization
with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
right preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: fieldsplit
FieldSplit with Schur preconditioner, factorization FULL
Preconditioner for the Schur complement formed from Sp, an assembled
approximation to S, which uses A00's diagonal's inverse
Split info:
Split number 0 Defined by IS
Split number 1 Defined by IS
KSP solver for A00 block
KSP Object: (fieldsplit_0_) 4 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (fieldsplit_0_) 4 MPI processes
type: bjacobi
number of blocks = 4
Local solver information for first block is in the following KSP and
PC objects on rank 0:
Use -fieldsplit_0_ksp_view ::ascii_info_detail to display information
for all blocks
KSP Object: (fieldsplit_0_sub_) 1 MPI process
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (fieldsplit_0_sub_) 1 MPI process
type: icc
out-of-place factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
using Manteuffel shift [POSITIVE_DEFINITE]
matrix ordering: natural
factor fill ratio given 1., needed 1.
Factored matrix follows:
Mat Object: (fieldsplit_0_sub_) 1 MPI process
type: seqsbaij
rows=2121, cols=2121
package used to perform factorization: petsc
total: nonzeros=21972, allocated nonzeros=21972
block size is 1
linear system matrix = precond matrix:
Mat Object: (fieldsplit_0_sub_) 1 MPI process
type: seqsbaij
rows=2121, cols=2121
total: nonzeros=21972, allocated nonzeros=21972
total number of mallocs used during MatSetValues calls=0
block size is 1
linear system matrix = precond matrix:
Mat Object: (fieldsplit_0_) 4 MPI processes
type: mpisbaij
rows=10116, cols=10116
total: nonzeros=105912, allocated nonzeros=105912
total number of mallocs used during MatSetValues calls=0
block size is 1
KSP solver for S = A11 - A10 inv(A00) A01
KSP Object: (fieldsplit_1_) 4 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=0.0001, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (fieldsplit_1_) 4 MPI processes
type: hypre
HYPRE BoomerAMG preconditioning
Cycle type V
Maximum number of levels 25
Maximum number of iterations PER hypre call 1
Convergence tolerance PER hypre call 0.
Threshold for strong coupling 0.25
Interpolation truncation factor 0.
Interpolation: max elements per row 0
Number of levels of aggressive coarsening 0
Number of paths for aggressive coarsening 1
Maximum row sums 0.9
Sweeps down 1
Sweeps up 1
Sweeps on coarse 1
Relax down symmetric-SOR/Jacobi
Relax up symmetric-SOR/Jacobi
Relax on coarse Gaussian-elimination
Relax weight (all) 1.
Outer relax weight (all) 1.
Maximum size of coarsest grid 9
Minimum size of coarsest grid 1
Using CF-relaxation
Not using more complex smoothers.
Measure type local
Coarsen type Falgout
Interpolation type classical
SpGEMM type hypre
linear system matrix followed by preconditioner matrix:
Mat Object: (fieldsplit_1_) 4 MPI processes
type: schurcomplement
rows=5712, cols=5712
Schur complement A11 - A10 inv(A00) A01
A11
Mat Object: (fieldsplit_1_) 4 MPI processes
type: mpisbaij
rows=5712, cols=5712
total: nonzeros=19992, allocated nonzeros=19992
total number of mallocs used during MatSetValues calls=0
block size is 1
A10
Mat Object: 4 MPI processes
type: hermitiantranspose
rows=5712, cols=10116
KSP solver for A00 block viewable with the additional option
-fieldsplit_0_ksp_view
A01
Mat Object: 4 MPI processes
type: mpiaij
rows=10116, cols=5712
total: nonzeros=85680, allocated nonzeros=85680
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 692 nodes, limit
used is 5
Mat Object: 4 MPI processes
type: mpiaij
rows=5712, cols=5712
total: nonzeros=134208, allocated nonzeros=134208
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 470 nodes, limit used
is 5
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpisbaij
rows=15828, cols=15828
total: nonzeros=211584, allocated nonzeros=211584
total number of mallocs used during MatSetValues calls=0
block size is 1
0 KSP Residual norm 3.873169750889e+00
Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 25
1 KSP Residual norm 1.182487410353e-01
Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 28
2 KSP Residual norm 1.102241338764e-02
Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 26
3 KSP Residual norm 2.301967727466e-03
Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 27
4 KSP Residual norm 1.597010741933e-04
Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 28
5 KSP Residual norm 5.540316293543e-05
Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 29
6 KSP Residual norm 6.398182217802e-06
KSP Object: 4 MPI processes
type: fgmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization
with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
right preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: fieldsplit
FieldSplit with Schur preconditioner, factorization FULL
Preconditioner for the Schur complement formed from Sp, an assembled
approximation to S, which uses A00's diagonal's inverse
Split info:
Split number 0 Defined by IS
Split number 1 Defined by IS
KSP solver for A00 block
KSP Object: (fieldsplit_0_) 4 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (fieldsplit_0_) 4 MPI processes
type: bjacobi
number of blocks = 4
Local solver information for first block is in the following KSP and
PC objects on rank 0:
Use -fieldsplit_0_ksp_view ::ascii_info_detail to display information
for all blocks
KSP Object: (fieldsplit_0_sub_) 1 MPI process
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (fieldsplit_0_sub_) 1 MPI process
type: ilu
out-of-place factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 1., needed 1.
Factored matrix follows:
Mat Object: (fieldsplit_0_sub_) 1 MPI process
type: seqaij
rows=2121, cols=2121
package used to perform factorization: petsc
total: nonzeros=41823, allocated nonzeros=41823
using I-node routines: found 686 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: (fieldsplit_0_sub_) 1 MPI process
type: seqaij
rows=2121, cols=2121
total: nonzeros=41823, allocated nonzeros=41823
total number of mallocs used during MatSetValues calls=0
using I-node routines: found 686 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: (fieldsplit_0_) 4 MPI processes
type: mpiaij
rows=10116, cols=10116
total: nonzeros=201708, allocated nonzeros=201708
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 686 nodes, limit used
is 5
KSP solver for S = A11 - A10 inv(A00) A01
KSP Object: (fieldsplit_1_) 4 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=0.0001, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (fieldsplit_1_) 4 MPI processes
type: hypre
HYPRE BoomerAMG preconditioning
Cycle type V
Maximum number of levels 25
Maximum number of iterations PER hypre call 1
Convergence tolerance PER hypre call 0.
Threshold for strong coupling 0.25
Interpolation truncation factor 0.
Interpolation: max elements per row 0
Number of levels of aggressive coarsening 0
Number of paths for aggressive coarsening 1
Maximum row sums 0.9
Sweeps down 1
Sweeps up 1
Sweeps on coarse 1
Relax down symmetric-SOR/Jacobi
Relax up symmetric-SOR/Jacobi
Relax on coarse Gaussian-elimination
Relax weight (all) 1.
Outer relax weight (all) 1.
Maximum size of coarsest grid 9
Minimum size of coarsest grid 1
Using CF-relaxation
Not using more complex smoothers.
Measure type local
Coarsen type Falgout
Interpolation type classical
SpGEMM type hypre
linear system matrix followed by preconditioner matrix:
Mat Object: (fieldsplit_1_) 4 MPI processes
type: schurcomplement
rows=5712, cols=5712
Schur complement A11 - A10 inv(A00) A01
A11
Mat Object: (fieldsplit_1_) 4 MPI processes
type: mpiaij
rows=5712, cols=5712
total: nonzeros=34272, allocated nonzeros=34272
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 470 nodes, limit
used is 5
A10
Mat Object: 4 MPI processes
type: mpiaij
rows=5712, cols=10116
total: nonzeros=85680, allocated nonzeros=85680
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 469 nodes, limit
used is 5
KSP solver for A00 block viewable with the additional option
-fieldsplit_0_ksp_view
A01
Mat Object: 4 MPI processes
type: mpiaij
rows=10116, cols=5712
total: nonzeros=85680, allocated nonzeros=85680
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 692 nodes, limit
used is 5
Mat Object: 4 MPI processes
type: mpiaij
rows=5712, cols=5712
total: nonzeros=134208, allocated nonzeros=134208
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 470 nodes, limit used
is 5
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=15828, cols=15828
total: nonzeros=407340, allocated nonzeros=407340
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 965 nodes, limit used is 5
|
