Dear all, I am not sure if this the right forum to ask this question, so sorry if I am wrong. I am using ScaLAPACK in my code and MPI of course (OMPI) in a electromagnetic solver program, running on a cluster. I get very strange behavior when I use a large number of processors to run my code for very large problems. In these cases, however, the program finishes successfully, but it stays until the wall time exceeds the limit and the job is terminated by queue manager (I use qsub ti submit a job). This happens when, for example I use more than 80 processors for a problem which needs more than 700 GB memory. For smaller problem, everything is OK and all output files are generated correctly, while when this happens, the output files are empty. I am almost sure that there is a synchronization problem and some processes fail to reach the finalization point while others are done.
My code is written in C++ and in "main" function I call a routine called "Solver". My Solver function looks like below: Solver() { for (std::vector<double>::iterator ti=times.begin(); ti!=times.end(); ++ti) { Stopwatch iwatch, dwatch, twatch; // some ScaLAPACK operations if (iamroot()) { // some operation only for root process } } blacs::gridexit(ictxt); blacs::exit(1); } and my "main" function which calls "Solver" looks like below: int main() { // some preparing operations Solver(); if (rank==0) std::cout << "Total execution time: " << time.tick() << " s\n" << std::flush; err=MPI_Finalize(); if (MPI_SUCCESS!=err) { std::cerr << "MPI_Finalize failed: " << err << "\n"; return err; } return 0; } I did put a "blacs::barrier(ictxt, 'A')" at the and of "Solver" routine, before calling "blacs::exit(1)" to make sure that all processes arrive here before MPI_Finalize, but the problem didn't solve. Do you have any idea where the problem is? Thanks in advance, -- Danesh Daroui Ph.D Student Lulea University of Technology http://www.ltu.se danesh.dar...@ltu.se +46-704-399847