Hi,


You should consult the CPMD manual on how to run the program in parallel -
this doesn't look like a problem in Open MPI. The error comes from MPI_ABORT
being called by rank 0. As rank 0 process is the one that reads all the
input data and prepares the computation I would say that the most probable
reason for the crash is inconsistency in the program input. It could be that
some of the parameters specified there are not compatible with running the
program with 4 processes. It can also happen (at least with some DFT codes)
if you try to continue a previous simulation that was performed on different
number of processes. Quantum Espresso also uses similar technique to abort
but at least it prints a cryptic error message before the crash :)



Hope that helps!



Kind regards,

Hristo

--

Hristo Iliev, Ph.D. -- High Performance Computing

RWTH Aachen University, Center for Computing and Communication

Rechen- und Kommunikationszentrum der RWTH Aachen

Seffenter Weg 23,  D 52074  Aachen (Germany)

Tel: +49 241 80 24367 -- Fax/UMS: +49 241 80 624367



From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Abhra Paul
Sent: Thursday, July 19, 2012 1:35 PM
To: us...@open-mpi.org
Subject: [OMPI users] mpirun command gives ERROR



Respected developers and users



I am trying to run a parallel program CPMD with the command "
/usr/local/bin/mpirun -np 4 ./cpmd.x 1-h2-wave.inp > 1-h2-wave.out &" , it 

is giving the following error:

============================================================================
==========================



[testcpmd@slater CPMD_3_15_3]$ /usr/local/bin/mpirun -np 4 ./cpmd.x
1-h2-wave.inp > 1-h2-wave.out &
[1] 1769
[testcpmd@slater CPMD_3_15_3]$
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 999.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 1770 on
node slater.rcamos.iacs exiting improperly. There are two reasons this could
occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

[1]+  Exit 231                /usr/local/bin/mpirun -np 4 ./cpmd.x
1-h2-wave.inp > 1-h2-wave.out
============================================================================
==========================

I am unable to find out the reason of that error. Please help. My Open-MPI
version is 1.6.



With regards

Abhra Paul

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to