Hi Tim,

Ok, I thank you for all theses precisions. I also add "static int pls_poe_cancel_operation(void)" similary to you, and I can continue the compilation. But, I had another problem. In ompi/mpi/cxx/mpicxx.cc, three variables are already defined. The preprocessor set them to the constant of C. So, I put theses lines in comment:
  //const int SEEK_SET = MPI_SEEK_SET;
  //const int SEEK_CUR = MPI_SEEK_CUR;
  //const int SEEK_END = MPI_SEEK_END;

After that, I can achieve to compile OpenMPI. I didn't try to launch it in rsh mode. But I tried to launch it with POE.

But firstly I remind here my experience with OpenMPI 1.1.x on IBM. My machine has some restriction, but I have two ways for launching an application:
- interactive mode: OpenMPI didn't work in this mode. I have this error:
   $ export MP_PROCS=2
   $ mpiexec -n 2 myprog.exe
ERROR: 0031-125 Fewer nodes (1) specified in /tmpdir/inter/int.ssos181-130093928631562/a-UWUb than tasks (2).

 I think it is because of my machine configuration

- batch mode (for queuing): OpenMPI worked, but some functions didn't work (like MPI_Comm_Spawn). And it seems that performances during communications are very bad. (But in intra-nodes, it has the same performance as MPI constructor)

Then, I hope OpenMPI 1.2.xxx work on SP4, but I have the same problem in interactive mode. And in batch mode, I have the error:
[0,0,0] ORTE_ERROR_LOG: Not implemented in file errmgr_hnp.c at line 90
--------------------------------------------------------------------------
mpiexec was unable to cleanly terminate the daemons for this job. Returned value Not implemented instead of ORTE_SUCCESS.

--------------------------------------------------------------------------

I think it is like you said before, POE isn't yet implemented.

I was interested for OpenMPI because it support MPI-2. Since OpenMPI 1.1.1, I install all the version on my SP4 for testing. My impressions are: - it seems to be very difficult for developpers to implement OpenMPI on SP4 and I hope one day they achieve it ;) - in my context, my institution puts many restrictions on the use of our machine, that's why my tests are incomplete. (On the same way, rsh command is forbidden between our nodes...)

So, I really thank you for your explanations and precisions.

Best Regards,


**************************************
NGUYEN Anh-Khai Laurent
Equipe Support Utilisateur

Email    :    laurent.ngu...@idris.fr
Tél      :    01.69.35.85.66
Adresse  :    IDRIS - Institut du Développement et des Ressources en
              Informatique Scientifique
              CNRS
              Batiment 506
              BP 167
              F - 91403 ORSAY Cedex
Site Web :    http://www.idris.fr
**************************************

Tim Prins a écrit :
Hi Laurent,

Unfortunately, as far as I know, none of the current Open MPI developers has access to a system with POE, so the POE process launcher has fallen into disrepair. Attached is a patch that should allow you to compile (however, you may also need to add #include <signal.h> to pls_poe_module.c). Though this should allow the compile to succeed, launching with POE may not work (it has not been tested for quite a while). If it doesn't work, you should use the rsh launcher instead (pass -mca pls rsh on the command line, or set the parameter using one of the methods here: http://www.open-mpi.org/faq/?category=tuning#setting-mca-params). Sorry about this. We have an IBM machine at my institution which I am told will have POE on it 'soon', but I am not sure when. Once it does, we will be working on getting POE well supported again.

I should mention that we do use LoadLeveler on one of our machines and Open MPI seems to work with it quite well. I would be interested in hearing how it works for you.

Hope this helps, let me know if this works.

Thanks,

Tim

On Thursday 10 May 2007 02:57 am, Laurent Nguyen wrote:
Hello,

I tried to install OpenMPI 1.2 but I saw there some problems when
compiling files with POE. When OpenMPI 1.2.1 was released, I saw in the
bug fixes that this problem was fixed. Then I tried, but it still
doesn't work. The problem comes from orte/mca/pls/poe/pls_poe_module.c.
A static function "static int pls_poe_cancel_operation(void);" is
declared but not defined in the files. I don't know if my configuration
make it bug.

So, if someone achieved to install OpenMPI 1.2.1 on IBM, I would like to
have some advices.

Thank you for your help,

PS: I attached some output files of my installation

------------------------------------------------------------------------

Index: orte/mca/pls/poe/pls_poe_module.c
===================================================================
--- orte/mca/pls/poe/pls_poe_module.c   (revision 14640)
+++ orte/mca/pls/poe/pls_poe_module.c   (working copy)
@@ -37,6 +37,7 @@
 #include "opal/mca/base/mca_base_param.h"
 #include "opal/util/argv.h"
 #include "opal/util/opal_environ.h"
+#include "opal/util/output.h"
#include "orte/mca/errmgr/errmgr.h"
 #include "orte/mca/gpr/gpr.h"
@@ -69,7 +70,10 @@
 static int pls_poe_signal_job(orte_jobid_t jobid, int32_t signal, opal_list_t 
*attrs);
 static int pls_poe_signal_proc(const orte_process_name_t *name, int32_t 
signal);
 static int pls_poe_finalize(void);
-static int pls_poe_cancel_operation(void);
+static int pls_poe_cancel_operation(void) {
+    return ORTE_ERR_NOT_IMPLEMENTED;
+}
+ orte_pls_base_module_t orte_pls_poe_module = {
     pls_poe_launch_job,

Reply via email to