Hi,
You should consult the CPMD manual on how to run the program in parallel - this doesn't look like a problem in Open MPI. The error comes from MPI_ABORT being called by rank 0. As rank 0 process is the one that reads all the input data and prepares the computation I would say that the most probable reason for the crash is inconsistency in the program input. It could be that some of the parameters specified there are not compatible with running the program with 4 processes. It can also happen (at least with some DFT codes) if you try to continue a previous simulation that was performed on different number of processes. Quantum Espresso also uses similar technique to abort but at least it prints a cryptic error message before the crash :) Hope that helps! Kind regards, Hristo -- Hristo Iliev, Ph.D. -- High Performance Computing RWTH Aachen University, Center for Computing and Communication Rechen- und Kommunikationszentrum der RWTH Aachen Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241 80 24367 -- Fax/UMS: +49 241 80 624367 From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Abhra Paul Sent: Thursday, July 19, 2012 1:35 PM To: us...@open-mpi.org Subject: [OMPI users] mpirun command gives ERROR Respected developers and users I am trying to run a parallel program CPMD with the command " /usr/local/bin/mpirun -np 4 ./cpmd.x 1-h2-wave.inp > 1-h2-wave.out &" , it is giving the following error: ============================================================================ ========================== [testcpmd@slater CPMD_3_15_3]$ /usr/local/bin/mpirun -np 4 ./cpmd.x 1-h2-wave.inp > 1-h2-wave.out & [1] 1769 [testcpmd@slater CPMD_3_15_3]$ -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 999. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun has exited due to process rank 0 with PID 1770 on node slater.rcamos.iacs exiting improperly. There are two reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- [1]+ Exit 231 /usr/local/bin/mpirun -np 4 ./cpmd.x 1-h2-wave.inp > 1-h2-wave.out ============================================================================ ========================== I am unable to find out the reason of that error. Please help. My Open-MPI version is 1.6. With regards Abhra Paul
smime.p7s
Description: S/MIME cryptographic signature