Hi Tim,
Ok, I thank you for all theses precisions. I also add "static int
pls_poe_cancel_operation(void)" similary to you, and I can continue the
compilation. But, I had another problem. In ompi/mpi/cxx/mpicxx.cc,
three variables are already defined. The preprocessor set them to the
constant of C. So, I put theses lines in comment:
//const int SEEK_SET = MPI_SEEK_SET;
//const int SEEK_CUR = MPI_SEEK_CUR;
//const int SEEK_END = MPI_SEEK_END;
After that, I can achieve to compile OpenMPI. I didn't try to launch it
in rsh mode. But I tried to launch it with POE.
But firstly I remind here my experience with OpenMPI 1.1.x on IBM. My
machine has some restriction, but I have two ways for launching an
application:
- interactive mode: OpenMPI didn't work in this mode. I have this error:
$ export MP_PROCS=2
$ mpiexec -n 2 myprog.exe
ERROR: 0031-125 Fewer nodes (1) specified in
/tmpdir/inter/int.ssos181-130093928631562/a-UWUb than tasks (2).
I think it is because of my machine configuration
- batch mode (for queuing): OpenMPI worked, but some functions didn't
work (like MPI_Comm_Spawn). And it seems that performances during
communications are very bad. (But in intra-nodes, it has the same
performance as MPI constructor)
Then, I hope OpenMPI 1.2.xxx work on SP4, but I have the same problem in
interactive mode. And in batch mode, I have the error:
[0,0,0] ORTE_ERROR_LOG: Not implemented in file errmgr_hnp.c at line 90
--------------------------------------------------------------------------
mpiexec was unable to cleanly terminate the daemons for this job.
Returned value Not implemented instead of ORTE_SUCCESS.
--------------------------------------------------------------------------
I think it is like you said before, POE isn't yet implemented.
I was interested for OpenMPI because it support MPI-2. Since OpenMPI
1.1.1, I install all the version on my SP4 for testing. My impressions are:
- it seems to be very difficult for developpers to implement OpenMPI on
SP4 and I hope one day they achieve it ;)
- in my context, my institution puts many restrictions on the use of our
machine, that's why my tests are incomplete. (On the same way, rsh
command is forbidden between our nodes...)
So, I really thank you for your explanations and precisions.
Best Regards,
**************************************
NGUYEN Anh-Khai Laurent
Equipe Support Utilisateur
Email : laurent.ngu...@idris.fr
Tél : 01.69.35.85.66
Adresse : IDRIS - Institut du Développement et des Ressources en
Informatique Scientifique
CNRS
Batiment 506
BP 167
F - 91403 ORSAY Cedex
Site Web : http://www.idris.fr
**************************************
Tim Prins a écrit :
Hi Laurent,
Unfortunately, as far as I know, none of the current Open MPI developers has
access to a system with POE, so the POE process launcher has fallen into
disrepair. Attached is a patch that should allow you to compile (however, you
may also need to add #include <signal.h> to pls_poe_module.c).
Though this should allow the compile to succeed, launching with POE may not
work (it has not been tested for quite a while). If it doesn't work, you
should use the rsh launcher instead (pass -mca pls rsh on the command line,
or set the parameter using one of the methods here:
http://www.open-mpi.org/faq/?category=tuning#setting-mca-params).
Sorry about this. We have an IBM machine at my institution which I am told
will have POE on it 'soon', but I am not sure when. Once it does, we will be
working on getting POE well supported again.
I should mention that we do use LoadLeveler on one of our machines and Open
MPI seems to work with it quite well. I would be interested in hearing how it
works for you.
Hope this helps, let me know if this works.
Thanks,
Tim
On Thursday 10 May 2007 02:57 am, Laurent Nguyen wrote:
Hello,
I tried to install OpenMPI 1.2 but I saw there some problems when
compiling files with POE. When OpenMPI 1.2.1 was released, I saw in the
bug fixes that this problem was fixed. Then I tried, but it still
doesn't work. The problem comes from orte/mca/pls/poe/pls_poe_module.c.
A static function "static int pls_poe_cancel_operation(void);" is
declared but not defined in the files. I don't know if my configuration
make it bug.
So, if someone achieved to install OpenMPI 1.2.1 on IBM, I would like to
have some advices.
Thank you for your help,
PS: I attached some output files of my installation
------------------------------------------------------------------------
Index: orte/mca/pls/poe/pls_poe_module.c
===================================================================
--- orte/mca/pls/poe/pls_poe_module.c (revision 14640)
+++ orte/mca/pls/poe/pls_poe_module.c (working copy)
@@ -37,6 +37,7 @@
#include "opal/mca/base/mca_base_param.h"
#include "opal/util/argv.h"
#include "opal/util/opal_environ.h"
+#include "opal/util/output.h"
#include "orte/mca/errmgr/errmgr.h"
#include "orte/mca/gpr/gpr.h"
@@ -69,7 +70,10 @@
static int pls_poe_signal_job(orte_jobid_t jobid, int32_t signal, opal_list_t
*attrs);
static int pls_poe_signal_proc(const orte_process_name_t *name, int32_t
signal);
static int pls_poe_finalize(void);
-static int pls_poe_cancel_operation(void);
+static int pls_poe_cancel_operation(void) {
+ return ORTE_ERR_NOT_IMPLEMENTED;
+}
+
orte_pls_base_module_t orte_pls_poe_module = {
pls_poe_launch_job,