You are not using hypre, you are using block Jacobi with ILU on
the blocks.
The number of iterations goes from around 4000 to around 5000 in
going from 4 to 8 processes,
this is why you do not see such a great speedup.
Barry
On Jun 6, 2008, at 8:07 PM, Ben Tay wrote:
> Hi,
>
> I have coded in parallel using PETSc and Hypre. I found that going
> from 1 to 4 processors gives an almost 4 times increase. However
> from 4 to 8 processors only increase performance by 1.2-1.5 instead
> of 2.
>
> Is the slowdown due to the size of the matrix being not large
> enough? Currently I am using 600x2160 to do the benchmark. Even when
> increase the matrix size to 900x3240 or 1200x2160, the performance
> increase is also not much. Is it possible to use -log_summary find
> out the error? I have attached the log file comparison for the 4 and
> 8 processors, I found that some event like VecScatterEnd, VecNorm
> and MatAssemblyBegin have much higher ratios. Does it indicate
> something? Another strange thing is that MatAssemblyBegin for the 4
> pros has a much higher ratio than the 8pros. I thought there should
> be less communications for the 4 pros case, and so the ratio should
> be lower. Does it mean there's some communication problem at that
> time?
>
> Thank you very much.
>
> Regards
>
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -
> r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance
> Summary: ----------------------------------------------
>
> ./a.out on a atlas3-mp named atlas3-c43 with 4 processors, by
> g0306332 Fri Jun 6 17:29:26 2008
> Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST
> 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b
>
> Max Max/Min Avg Total
> Time (sec): 1.750e+03 1.00043 1.750e+03
> Objects: 4.200e+01 1.00000 4.200e+01
> Flops: 6.961e+10 1.00074 6.959e+10 2.784e+11
> Flops/sec: 3.980e+07 1.00117 3.978e+07 1.591e+08
> MPI Messages: 8.168e+03 2.00000 6.126e+03 2.450e+04
> MPI Message Lengths: 5.525e+07 2.00000 6.764e+03 1.658e+08
> MPI Reductions: 3.203e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of
> length N --> 2N flops
> and VecAXPY() for complex vectors of
> length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- ---
> Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 1.7495e+03 100.0% 2.7837e+11 100.0% 2.450e+04
> 100.0% 6.764e+03 100.0% 1.281e+04 100.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops/sec: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all
> processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with
> PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in
> this phase
> %M - percent messages in this phase %L - percent message
> lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max
> time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
>
>
> ##########################################################
> # #
> # WARNING!!! #
> # #
> # This code was run without the PreLoadBegin() #
> # macros. To get timing results we always recommend #
> # preloading. otherwise timing numbers may be #
> # meaningless. #
> ##########################################################
>
>
> Event Count Time (sec) Flops/
> sec --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg
> len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 4082 1.0 8.2037e+01 1.5 4.67e+08 1.5 2.4e+04 6.8e
> +03 0.0e+00 4 37100100 0 4 37100100 0 1240
> MatSolve 1976 1.0 1.3250e+02 1.5 2.52e+08 1.5 0.0e+00 0.0e
> +00 0.0e+00 6 31 0 0 0 6 31 0 0 0 655
> MatLUFactorNum 300 1.0 3.8260e+01 1.2 2.07e+08 1.2 0.0e+00 0.0e
> +00 0.0e+00 2 9 0 0 0 2 9 0 0 0 668
> MatILUFactorSym 1 1.0 2.2550e-01 2.7 0.00e+00 0.0 0.0e+00 0.0e
> +00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatConvert 1 1.0 2.9182e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 301 1.0 1.0776e+021228.9 0.00e+00 0.0 0.0e+00
> 0.0e+00 6.0e+02 4 0 0 0 5 4 0 0 0 5 0
> MatAssemblyEnd 301 1.0 9.6146e+00 1.1 0.00e+00 0.0 1.2e+01 3.6e
> +03 3.1e+02 1 0 0 0 2 1 0 0 0 2 0
> MatGetRow 324000 1.0 1.2161e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 3 1.0 5.0068e-06 1.3 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 2.1279e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e
> +00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSetup 601 1.0 2.5108e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 600 1.0 1.2353e+03 1.0 5.64e+07 1.0 2.4e+04 6.8e
> +03 8.3e+03 71100100100 65 71100100100 65 225
> PCSetUp 601 1.0 4.0116e+01 1.2 1.96e+08 1.2 0.0e+00 0.0e
> +00 5.0e+00 2 9 0 0 0 2 9 0 0 0 637
> PCSetUpOnBlocks 300 1.0 3.8513e+01 1.2 2.06e+08 1.2 0.0e+00 0.0e
> +00 3.0e+00 2 9 0 0 0 2 9 0 0 0 664
> PCApply 4682 1.0 1.0566e+03 1.0 2.12e+07 1.0 0.0e+00 0.0e
> +00 0.0e+00 59 31 0 0 0 59 31 0 0 0 82
> VecDot 4812 1.0 8.2762e+00 1.1 4.00e+08 1.1 0.0e+00 0.0e
> +00 4.8e+03 0 4 0 0 38 0 4 0 0 38 1507
> VecNorm 3479 1.0 9.2739e+01 8.3 3.15e+08 8.3 0.0e+00 0.0e
> +00 3.5e+03 4 5 0 0 27 4 5 0 0 27 152
> VecCopy 900 1.0 2.0819e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 5882 1.0 9.4626e+00 1.5 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 5585 1.0 1.5397e+01 1.5 4.67e+08 1.5 0.0e+00 0.0e
> +00 0.0e+00 1 7 0 0 0 1 7 0 0 0 1273
> VecAYPX 2879 1.0 1.0303e+01 1.6 4.45e+08 1.6 0.0e+00 0.0e
> +00 0.0e+00 0 4 0 0 0 0 4 0 0 0 1146
> VecWAXPY 2406 1.0 7.7902e+00 1.6 3.14e+08 1.6 0.0e+00 0.0e
> +00 0.0e+00 0 2 0 0 0 0 2 0 0 0 801
> VecAssemblyBegin 1200 1.0 8.4259e+00 3.8 0.00e+00 0.0 0.0e+00 0.0e
> +00 3.6e+03 0 0 0 0 28 0 0 0 0 28 0
> VecAssemblyEnd 1200 1.0 2.4173e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 4082 1.0 1.2512e-01 1.5 0.00e+00 0.0 2.4e+04 6.8e
> +03 0.0e+00 0 0100100 0 0 0100100 0 0
> VecScatterEnd 4082 1.0 2.0954e+0153.3 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants'
> Mem.
>
> --- Event Stage 0: Main Stage
>
> Matrix 7 7 321241092 0
> Krylov Solver 3 3 8 0
> Preconditioner 3 3 528 0
> Index Set 7 7 7785600 0
> Vec 20 20 46685344 0
> Vec Scatter 2 2 0 0
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> ======================================================================
> Average time to get PetscTime(): 1.90735e-07
> Average time for MPI_Barrier(): 1.45912e-05
> Average time for zero size MPI_Send(): 7.27177e-06
> OptionTable: -log_summary test4_600
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Tue Jan 8 22:22:08 2008
> Configure options: --with-memcmp-ok --sizeof_char=1 --
> sizeof_void_p=8 --sizeof_short=2 --sizeof_int=4 --sizeof_long=8 --
> sizeof_long_long=8 --sizeof_float=4 --sizeof_double=8 --
> bits_per_byte=8 --sizeof_MPI_Comm=4 --sizeof_MPI_Fint=4 --with-
> vendor-compilers=intel --with-x=0 --with-hypre-dir=/home/enduser/
> g0306332/lib/hypre --with-debugging=0 --with-batch=1 --with-mpi-
> shared=0 --with-mpi-include=/usr/local/topspin/mpi/mpich/include --
> with-mpi-lib=/usr/local/topspin/mpi/mpich/lib/libmpich.a --with-
> mpirun=/usr/local/topspin/mpi/mpich/bin/mpirun --with-blas-lapack-
> dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared=0
> -----------------------------------------
> Libraries compiled on Tue Jan 8 22:34:13 SGT 2008 on atlas3-c01
> Machine characteristics: Linux atlas3-c01 2.6.9-42.ELsmp #1 SMP Wed
> Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux
> Using PETSc directory: /nfs/home/enduser/g0306332/petsc-2.3.3-p8
> Using PETSc arch: atlas3-mpi
> -----------------------------------------
> Using C compiler: mpicc -fPIC -O
> Using Fortran compiler: mpif90 -I. -fPIC -O
> -----------------------------------------
> Using include paths: -I/nfs/home/enduser/g0306332/petsc-2.3.3-p8 -I/
> nfs/home/enduser/g0306332/petsc-2.3.3-p8/bmake/atlas3-mpi -I/nfs/
> home/enduser/g0306332/petsc-2.3.3-p8/include -I/home/enduser/
> g0306332/lib/hypre/include -I/usr/local/topspin/mpi/mpich/include
> ------------------------------------------
> Using C linker: mpicc -fPIC -O
> Using Fortran linker: mpif90 -I. -fPIC -O
> Using libraries: -Wl,-rpath,/nfs/home/enduser/g0306332/petsc-2.3.3-
> p8/lib/atlas3-mpi -L/nfs/home/enduser/g0306332/petsc-2.3.3-p8/lib/
> atlas3-mpi -lpetscts -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -
> lpetscvec -lpetsc -Wl,-rpath,/home/enduser/g0306332/lib/hypre/
> lib -L/home/enduser/g0306332/lib/hypre/lib -lHYPRE -Wl,-rpath,/opt/
> mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-rpath,/
> opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-
> linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard -Wl,-rpath,/
> opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-
> rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-rpath,/usr/local/
> topspin/mpi/mpich/lib -L/usr/local/topspin/mpi/mpich/lib -lmpich -
> Wl,-rpath,/opt/intel/cmkl/8.1.1/lib/em64t -L/opt/intel/cmkl/8.1.1/
> lib/em64t -lmkl_lapack -lmkl_em64t -lguide -lpthread -Wl,-rpath,/usr/
> local/ofed/lib64 -L/usr/local/ofed/lib64 -Wl,-rpath,/opt/mvapich/
> 0.9.9/gen2/lib -L/opt/mvapich/0.9.9/gen2/lib -ldl -lmpich -libverbs -
> libumad -lpthread -lrt -Wl,-rpath,/opt/intel/cce/9.1.049/lib -L/opt/
> intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/
> 3.4.6/ -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/
> lib64 -L/usr/lib64 -lsvml -limf -lipgo -lirc -lgcc_s -lirc_s -
> lmpichf90nc -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/
> local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/
> usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-
> rpath,/opt/intel/fce/9.1.045/lib -L/opt/intel/fce/9.1.045/lib -
> lifport -lifcore -lm -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-
> rpath,/usr/local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -
> Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/
> lib64 -lm -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/
> local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/
> usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc+
> + -lcxaguard -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/
> local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/
> usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-
> rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -
> Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard -Wl,-
> rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -
> Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/mvapich/
> 0.9.9/gen2/lib -L/opt/mvapich/0.9.9/gen2/lib -ldl -lmpich -Wl,-
> rpath,/usr/local/ofed/lib64 -L/usr/local/ofed/lib64 -libverbs -
> libumad -lpthread -lrt -Wl,-rpath,/opt/intel/cce/9.1.049/lib -L/opt/
> intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/
> 3.4.6/ -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/
> lib64 -L/usr/lib64 -lsvml -limf -lipgo -lirc -lgcc_s -lirc_s -ldl -lc
> ------------------------------------------
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -
> r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance
> Summary: ----------------------------------------------
>
> ./a.out on a atlas3-mp named atlas3-c18 with 8 processors, by
> g0306332 Fri Jun 6 17:23:25 2008
> Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST
> 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b
>
> Max Max/Min Avg Total
> Time (sec): 1.140e+03 1.00019 1.140e+03
> Objects: 4.200e+01 1.00000 4.200e+01
> Flops: 4.620e+10 1.00158 4.619e+10 3.695e+11
> Flops/sec: 4.053e+07 1.00177 4.051e+07 3.241e+08
> MPI Messages: 9.954e+03 2.00000 8.710e+03 6.968e+04
> MPI Message Lengths: 7.224e+07 2.00000 7.257e+03 5.057e+08
> MPI Reductions: 1.716e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of
> length N --> 2N flops
> and VecAXPY() for complex vectors of
> length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- ---
> Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 1.1402e+03 100.0% 3.6953e+11 100.0% 6.968e+04
> 100.0% 7.257e+03 100.0% 1.372e+04 100.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops/sec: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all
> processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with
> PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in
> this phase
> %M - percent messages in this phase %L - percent message
> lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max
> time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
>
>
> ##########################################################
> # #
> # WARNING!!! #
> # #
> # This code was run without the PreLoadBegin() #
> # macros. To get timing results we always recommend #
> # preloading. otherwise timing numbers may be #
> # meaningless. #
> ##########################################################
>
>
> Event Count Time (sec) Flops/
> sec --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg
> len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 4975 1.0 7.8154e+01 1.9 4.19e+08 1.9 7.0e+04 7.3e
> +03 0.0e+00 5 38100100 0 5 38100100 0 1798
> MatSolve 2855 1.0 1.0870e+02 1.8 2.57e+08 1.8 0.0e+00 0.0e
> +00 0.0e+00 7 34 0 0 0 7 34 0 0 0 1153
> MatLUFactorNum 300 1.0 2.3238e+01 1.5 2.07e+08 1.5 0.0e+00 0.0e
> +00 0.0e+00 2 7 0 0 0 2 7 0 0 0 1099
> MatILUFactorSym 1 1.0 6.1973e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e
> +00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatConvert 1 1.0 1.4168e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 301 1.0 6.9683e+01 8.6 0.00e+00 0.0 0.0e+00 0.0e
> +00 6.0e+02 4 0 0 0 4 4 0 0 0 4 0
> MatAssemblyEnd 301 1.0 6.2247e+00 1.2 0.00e+00 0.0 2.8e+01 3.6e
> +03 3.1e+02 0 0 0 0 2 0 0 0 0 2 0
> MatGetRow 162000 1.0 6.0330e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 3 1.0 9.0599e-06 3.2 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 5.6710e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e
> +00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSetup 601 1.0 1.5631e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 600 1.0 8.1668e+02 1.0 5.66e+07 1.0 7.0e+04 7.3e
> +03 9.2e+03 72100100100 67 72100100100 67 452
> PCSetUp 601 1.0 2.4372e+01 1.5 1.93e+08 1.5 0.0e+00 0.0e
> +00 5.0e+00 2 7 0 0 0 2 7 0 0 0 1048
> PCSetUpOnBlocks 300 1.0 2.3303e+01 1.5 2.07e+08 1.5 0.0e+00 0.0e
> +00 3.0e+00 2 7 0 0 0 2 7 0 0 0 1096
> PCApply 5575 1.0 6.5344e+02 1.1 2.57e+07 1.1 0.0e+00 0.0e
> +00 0.0e+00 55 34 0 0 0 55 34 0 0 0 192
> VecDot 4840 1.0 6.8932e+00 1.3 3.07e+08 1.3 0.0e+00 0.0e
> +00 4.8e+03 1 3 0 0 35 1 3 0 0 35 1820
> VecNorm 4365 1.0 1.2250e+02 3.6 6.82e+07 3.6 0.0e+00 0.0e
> +00 4.4e+03 8 5 0 0 32 8 5 0 0 32 153
> VecCopy 900 1.0 1.4297e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 6775 1.0 8.1405e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 6485 1.0 1.0003e+01 1.9 5.73e+08 1.9 0.0e+00 0.0e
> +00 0.0e+00 1 7 0 0 0 1 7 0 0 0 2420
> VecAYPX 3765 1.0 7.8289e+00 2.0 5.17e+08 2.0 0.0e+00 0.0e
> +00 0.0e+00 0 4 0 0 0 0 4 0 0 0 2092
> VecWAXPY 2420 1.0 3.8504e+00 1.9 3.80e+08 1.9 0.0e+00 0.0e
> +00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1629
> VecAssemblyBegin 1200 1.0 9.2808e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e
> +00 3.6e+03 1 0 0 0 26 1 0 0 0 26 0
> VecAssemblyEnd 1200 1.0 2.3313e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 4975 1.0 2.2727e-01 2.6 0.00e+00 0.0 7.0e+04 7.3e
> +03 0.0e+00 0 0100100 0 0 0100100 0 0
> VecScatterEnd 4975 1.0 2.7557e+0168.1 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants'
> Mem.
>
> --- Event Stage 0: Main Stage
>
> Matrix 7 7 160595412 0
> Krylov Solver 3 3 8 0
> Preconditioner 3 3 528 0
> Index Set 7 7 3897600 0
> Vec 20 20 23357344 0
> Vec Scatter 2 2 0 0
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> ======================================================================
> Average time to get PetscTime(): 1.19209e-07
> Average time for MPI_Barrier(): 2.10285e-05
> Average time for zero size MPI_Send(): 7.59959e-06
> OptionTable: -log_summary test8_600
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Tue Jan 8 22:22:08 2008
> Configure options: --with-memcmp-ok --sizeof_char=1 --
> sizeof_void_p=8 --sizeof_short=2 --sizeof_int=4 --sizeof_long=8 --
> sizeof_long_long=8 --sizeof_float=4 --sizeof_double=8 --
> bits_per_byte=8 --sizeof_MPI_Comm=4 --sizeof_MPI_Fint=4 --with-
> vendor-compilers=intel --with-x=0 --with-hypre-dir=/home/enduser/
> g0306332/lib/hypre --with-debugging=0 --with-batch=1 --with-mpi-
> shared=0 --with-mpi-include=/usr/local/topspin/mpi/mpich/include --
> with-mpi-lib=/usr/local/topspin/mpi/mpich/lib/libmpich.a --with-
> mpirun=/usr/local/topspin/mpi/mpich/bin/mpirun --with-blas-lapack-
> dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared=0
> -----------------------------------------
> Libraries compiled on Tue Jan 8 22:34:13 SGT 2008 on atlas3-c01
> Machine characteristics: Linux atlas3-c01 2.6.9-42.ELsmp #1 SMP Wed
> Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux
> Using PETSc directory: /nfs/home/enduser/g0306332/petsc-2.3.3-p8
> Using PETSc arch: atlas3-mpi
> -----------------------------------------
> Using C compiler: mpicc -fPIC -O
> Using Fortran compiler: mpif90 -I. -fPIC -O
> -----------------------------------------
> Using include paths: -I/nfs/home/enduser/g0306332/petsc-2.3.3-p8 -I/
> nfs/home/enduser/g0306332/petsc-2.3.3-p8/bmake/atlas3-mpi -I/nfs/
> home/enduser/g0306332/petsc-2.3.3-p8/include -I/home/enduser/
> g0306332/lib/hypre/include -I/usr/local/topspin/mpi/mpich/include
> ------------------------------------------
> Using C linker: mpicc -fPIC -O
> Using Fortran linker: mpif90 -I. -fPIC -O
> Using libraries: -Wl,-rpath,/nfs/home/enduser/g0306332/petsc-2.3.3-
> p8/lib/atlas3-mpi -L/nfs/home/enduser/g0306332/petsc-2.3.3-p8/lib/
> atlas3-mpi -lpetscts -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -
> lpetscvec -lpetsc -Wl,-rpath,/home/enduser/g0306332/lib/hypre/
> lib -L/home/enduser/g0306332/lib/hypre/lib -lHYPRE -Wl,-rpath,/opt/
> mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-rpath,/
> opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-
> linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard -Wl,-rpath,/
> opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-
> rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-rpath,/usr/local/
> topspin/mpi/mpich/lib -L/usr/local/topspin/mpi/mpich/lib -lmpich -
> Wl,-rpath,/opt/intel/cmkl/8.1.1/lib/em64t -L/opt/intel/cmkl/8.1.1/
> lib/em64t -lmkl_lapack -lmkl_em64t -lguide -lpthread -Wl,-rpath,/usr/
> local/ofed/lib64 -L/usr/local/ofed/lib64 -Wl,-rpath,/opt/mvapich/
> 0.9.9/gen2/lib -L/opt/mvapich/0.9.9/gen2/lib -ldl -lmpich -libverbs -
> libumad -lpthread -lrt -Wl,-rpath,/opt/intel/cce/9.1.049/lib -L/opt/
> intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/
> 3.4.6/ -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/
> lib64 -L/usr/lib64 -lsvml -limf -lipgo -lirc -lgcc_s -lirc_s -
> lmpichf90nc -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/
> local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/
> usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-
> rpath,/opt/intel/fce/9.1.045/lib -L/opt/intel/fce/9.1.045/lib -
> lifport -lifcore -lm -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-
> rpath,/usr/local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -
> Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/
> lib64 -lm -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/
> local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/
> usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc+
> + -lcxaguard -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/
> local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/
> usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-
> rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -
> Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard -Wl,-
> rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -
> Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/mvapich/
> 0.9.9/gen2/lib -L/opt/mvapich/0.9.9/gen2/lib -ldl -lmpich -Wl,-
> rpath,/usr/local/ofed/lib64 -L/usr/local/ofed/lib64 -libverbs -
> libumad -lpthread -lrt -Wl,-rpath,/opt/intel/cce/9.1.049/lib -L/opt/
> intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/
> 3.4.6/ -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/
> lib64 -L/usr/lib64 -lsvml -limf -lipgo -lirc -lgcc_s -lirc_s -ldl -lc
> ------------------------------------------