[petsc-users] LU factorization and solution of independent matrices does not scale, why?

2012-12-21 Thread Thomas Witkowski
; >>>>>> >>>>>>> >> >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> In my multilevel FETI-DP code, I have localized course matrices, >>>>>>> which >>&g

[petsc-users] LU factorization and solution of independent matrices does not scale, why?

2012-12-21 Thread Alexander Grayver
Thomas, I'm missing one point... You run N sequential factorizations (i.e. each has its own matrix to work with and no need to communicate?) independently within ONE node? Or there are N factorizations that run on N nodes? Jed, > MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). Any reas

[petsc-users] LU factorization and solution of independent matrices does not scale, why?

2012-12-21 Thread Thomas Witkowski
I use a modified MPICH version. On the system I use for these benchmarks I cannot use another MPI library. I'm not fixed to MUMPS. Superlu_dist, for example, works also perfectly for this. But there is still the following problem I cannot solve: When I increase the number of coarse space mat

[petsc-users] getting a sub matrix from a matrix

2012-12-21 Thread Nachiket Gokhale
seems to be more involved. Thanks, -Nachiket -- next part -- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121221/9a2a607d/attachment.html>

[petsc-users] getting a sub matrix from a matrix

2012-12-21 Thread Barry Smith
On Dec 21, 2012, at 3:16 PM, Nachiket Gokhale wrote: > I have a dense matrix A (100x100) and I want to extract a matrix B from it > consisting of the first N columns of A. Is there a better way to do it > than getting the column using MatGetColumnVector, followed by VecGetArray, > and

[petsc-users] VecSetBlockSize with release 3.3

2012-12-21 Thread Aldo Bonfiglioli
Dear all, I am in the process of upgrading from 3.2 to 3.3. I am a little bit puzzled by the following change: > VecSetBlockSize() cannot be called after VecCreateSeq() or > VecCreateMPI() and must be called before VecSetUp() or > VecSetFromOptions() or before either VecSetType() or VecSetSizes()

[petsc-users] LU factorization and solution of independent matrices does not scale, why?

2012-12-21 Thread Matthew Knepley
On Fri, Dec 21, 2012 at 10:51 AM, Thomas Witkowski wrote: > I use a modified MPICH version. On the system I use for these benchmarks I > cannot use another MPI library. > > I'm not fixed to MUMPS. Superlu_dist, for example, works also perfectly for > this. But there is still the following problem

[petsc-users] LU factorization and solution of independent matrices does not scale, why?

2012-12-21 Thread Thomas Witkowski
Okay, I did a similar benchmark now with PETSc's event logging: UMFPACK 16p: Local solve 350 1.0 2.3025e+01 1.1 5.00e+04 1.0 0.0e+00 0.0e+00 7.0e+02 63 0 0 0 52 63 0 0 0 51 0 64p: Local solve 350 1.0 2.3208e+01 1.1 5.00e+04 1.0 0.0e+00 0.0e+00 7.0e+02 60 0 0 0

[petsc-users] PetscKernel_A_gets_inverse_A_

2012-12-21 Thread Aldo Bonfiglioli
Dear all, would it be possible to have a unified interface (also Fortran callable) to the PetscKernel_A_gets_inverse_A_ routines? I find them very useful within my own piece of Fortran code to solve small dense linear system (which I have to do very frequently). I have my own interface, at present,

[petsc-users] LU factorization and solution of independent matrices does not scale, why?

2012-12-21 Thread Jed Brown
seeing. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121221/92dbba70/attachment.html>

[petsc-users] LU factorization and solution of independent matrices does not scale, why?

2012-12-21 Thread Jed Brown
>>>> communicator, which is a subset of MPI::COMM_WORLD. The LU >>>>>>> factorization of >>>>>>> the matrices is computed with either MUMPS or superlu_dist, but both >>>>>>> show >>>>>>> some scaling property I really wonder of: When the overall problem >>>>>>> size is >>>>>>> increased, the solve with the LU factorization of the local matrices >>>>>>> does >>>>>>> not scale! But why not? I just increase the number of local >>>>>>> matrices, >>>>>>> but >>>>>>> all of them are independent of each other. Some example: I use 64 >>>>>>> cores, >>>>>>> each coarse matrix is spanned by 4 cores so there are 16 MPI >>>>>>> communicators >>>>>>> with 16 coarse space matrices. The problem need to solve 192 times >>>>>>> with the >>>>>>> coarse space systems, and this takes together 0.09 seconds. Now I >>>>>>> increase >>>>>>> the number of cores to 256, but let the local coarse space be >>>>>>> defined >>>>>>> again >>>>>>> on only 4 cores. Again, 192 solutions with these coarse spaces are >>>>>>> required, but now this takes 0.24 seconds. The same for 1024 cores, >>>>>>> and we >>>>>>> are at 1.7 seconds for the local coarse space solver! >>>>>>> >>>>>>> For me, this is a total mystery! Any idea how to explain, debug and >>>>>>> eventually how to resolve this problem? >>>>>>> >>>>>>> Thomas >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which >>>> their experiments lead. >>>> -- Norbert Wiener >>>> >>>> >>> >>> >> > > -- next part -- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121221/312cd24d/attachment-0001.html>

[petsc-users] VecSetBlockSize with release 3.3

2012-12-21 Thread Matthew Knepley
On Fri, Dec 21, 2012 at 7:04 AM, Aldo Bonfiglioli wrote: > Dear all, > I am in the process of upgrading from 3.2 to 3.3. > > I am a little bit puzzled by the following change: >> VecSetBlockSize() cannot be called after VecCreateSeq() or >> VecCreateMPI() and must be called before VecSetUp() or >

[petsc-users] LU factorization and solution of independent matrices does not scale, why?

2012-12-21 Thread Jed Brown
gt;> cores, >>>>> each coarse matrix is spanned by 4 cores so there are 16 MPI >>>>> communicators >>>>> with 16 coarse space matrices. The problem need to solve 192 times >>>>> with the >>>>> coarse space systems, and this takes together 0.09 seconds. Now I >>>>> increase >>>>> the number of cores to 256, but let the local coarse space be defined >>>>> again >>>>> on only 4 cores. Again, 192 solutions with these coarse spaces are >>>>> required, but now this takes 0.24 seconds. The same for 1024 cores, >>>>> and we >>>>> are at 1.7 seconds for the local coarse space solver! >>>>> >>>>> For me, this is a total mystery! Any idea how to explain, debug and >>>>> eventually how to resolve this problem? >>>>> >>>>> Thomas >>>>> >>>> >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> > > -- next part -- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121221/8968e584/attachment-0001.html>