[OMPI users] trouble_MPI
Dear Madam/Sir, I have a serial Fortran code (f90), dealing with matrix diagonalizing subroutines, and recently got its parallel version to be faster in some unfeasible parts via the serial program. I have been using the following commands for initializing MPI in the code --- call MPI_INIT(ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, p, ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, my_rank, ierr) CPU requirement >> pmem=1500mb,nodes=5:ppn=8 << --- Everything looks OK when matrix dimensions are less than 1000x1000. When I increase the matrix dimensions to some larger values the parallel code gets the following error -- mpirun noticed that process rank 6 with PID 1566 on node node1082 exited on signal 11 (Segmentation fault) -- There is no such error with the serial version even for larger matrix dimensions than 2400x2400. I then thought it might be raised by the number of nodes and memory space I'm requiring. Then changed it as follows pmem=10gb,nodes=20:ppn=2 which is more or less similar to what I'm using for serial jobs ( mem=10gb,nodes=1:ppn=1). But the problem persists still. Is there any limitation on MPI subroutines for transferring data size or the issue would be raised by some cause else? Best of Regards, Mohammad
Re: [OMPI users] trouble_MPI
As the error states, your code is segfaulting - your best way to find out where might be to use a debugger (e.g., gdb) on the core dump, or use a parallel debugger if you have one. On Sep 18, 2012, at 2:14 PM, Alidoust wrote: > > Dear Madam/Sir, > > > I have a serial Fortran code (f90), dealing with matrix diagonalizing > subroutines, and recently got its parallel version to be faster in some > unfeasible parts via the serial program. > I have been using the following commands for initializing MPI in the code > --- > call MPI_INIT(ierr) > call MPI_COMM_SIZE(MPI_COMM_WORLD, p, ierr) > call MPI_COMM_RANK(MPI_COMM_WORLD, my_rank, ierr) > > CPU requirement >> pmem=1500mb,nodes=5:ppn=8 << > --- > Everything looks OK when matrix dimensions are less than 1000x1000. When I > increase the matrix dimensions to some larger values the parallel code gets > the following error > -- > mpirun noticed that process rank 6 with PID 1566 on node node1082 exited on > signal 11 (Segmentation fault) > -- > There is no such error with the serial version even for larger matrix > dimensions than 2400x2400. I then thought it might be raised by the number of > nodes and memory space I'm requiring. Then changed it as follows > > pmem=10gb,nodes=20:ppn=2 > > which is more or less similar to what I'm using for serial jobs > (mem=10gb,nodes=1:ppn=1). But the problem persists still. Is there any > limitation on MPI subroutines for transferring data size or the issue would > be raised by some cause else? > > Best of Regards, > Mohammad > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] trouble_MPI
On Tue, Sep 18, 2012 at 2:14 PM, Alidoust wrote: > > Dear Madam/Sir, > > > I have a serial Fortran code (f90), dealing with matrix diagonalizing > subroutines, and recently got its parallel version to be faster in some > unfeasible parts via the serial program. > I have been using the following commands for initializing MPI in the code > --- > call MPI_INIT(ierr) > call MPI_COMM_SIZE(MPI_COMM_WORLD, p, ierr) > call MPI_COMM_RANK(MPI_COMM_WORLD, my_rank, ierr) > > CPU requirement >> pmem=1500mb,nodes=5:ppn=8 << > --- > Everything looks OK when matrix dimensions are less than 1000x1000. When I > increase the matrix dimensions to some larger values the parallel code gets > the following error > -- > mpirun noticed that process rank 6 with PID 1566 on node node1082 exited on > signal 11 (Segmentation fault) > -- > There is no such error with the serial version even for larger matrix > dimensions than 2400x2400. I then thought it might be raised by the number > of nodes and memory space I'm requiring. Then changed it as follows > > pmem=10gb,nodes=20:ppn=2 > > which is more or less similar to what I'm using for serial jobs > (mem=10gb,nodes=1:ppn=1). But the problem persists still. Is there any > limitation on MPI subroutines for transferring data size or the issue would > be raised by some cause else? > > Best of Regards, > Mohammad > I believe the send/recv/bcast calls are all limited to sending 2 GB data since they use a signed 32-bit integer to denote the size. If your matrices require a lot of space per element, I suppose this limit could be reached. Brian