[OMPI users] OpenMPI 1.2.1: cannot install on IBM SP4
Hello, I tried to install OpenMPI 1.2 but I saw there some problems when compiling files with POE. When OpenMPI 1.2.1 was released, I saw in the bug fixes that this problem was fixed. Then I tried, but it still doesn't work. The problem comes from orte/mca/pls/poe/pls_poe_module.c. A static function "static int pls_poe_cancel_operation(void);" is declared but not defined in the files. I don't know if my configuration make it bug. So, if someone achieved to install OpenMPI 1.2.1 on IBM, I would like to have some advices. Thank you for your help, PS: I attached some output files of my installation -- ** NGUYEN Anh-Khai Laurent Equipe Support Utilisateur Email:laurent.ngu...@idris.fr Tél :01.69.35.85.66 Adresse :IDRIS - Institut du Développement et des Ressources en Informatique Scientifique CNRS Batiment 506 BP 167 F - 91403 ORSAY Cedex Site Web :http://www.idris.fr ** files_out.tar.gz Description: GNU Zip compressed data
Re: [OMPI users] Newbie question. Please help.
I have previously been running parallel VASP happily with an old, prerelease version of OpenMPI: [terry@nocona Vasp.4.6-OpenMPI]$ head /home/terry/Install_trees/OpenMPI-1.0rc6/config.log This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. It was created by Open MPI configure 1.0rc6, which was generated by GNU Autoconf 2.59. Invocation command line was $ ./configure --enable-static --disable-shared --prefix=/home/terry/bin/Local --enable-picky --disable-heterogeneous --without-libnuma --without-slurm --without-tm F77=ifort In my VASP makefile: FC=/home/terry/bin/Local/bin/mpif90 OFLAG= -O3 -xP -tpp7 CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC -Dkind8 -DNGZhalf -DCACHE_SIZE=12000 -DPGF90 -Davoidalloc -DMPI_BLOCK=500 -DRPROMU_DGEMV -DRACCMU_DGEMV FFLAGS = -FR -lowercase -assume byterecl As far as I can see (it was a long time ago!) I didn't use BLACS or SCALAPACK libraries. I used ATLAS. Maybe this will help. -- Dr Terry Frankcombe Physical Chemistry, Department of Chemistry Göteborgs Universitet SE-412 96 Göteborg Sweden Ph: +46 76 224 0887 Skype: terry.frankcombe
Re: [OMPI users] OpenMPI 1.2.1: cannot install on IBM SP4
Hi Laurent, Unfortunately, as far as I know, none of the current Open MPI developers has access to a system with POE, so the POE process launcher has fallen into disrepair. Attached is a patch that should allow you to compile (however, you may also need to add #include to pls_poe_module.c). Though this should allow the compile to succeed, launching with POE may not work (it has not been tested for quite a while). If it doesn't work, you should use the rsh launcher instead (pass -mca pls rsh on the command line, or set the parameter using one of the methods here: http://www.open-mpi.org/faq/?category=tuning#setting-mca-params). Sorry about this. We have an IBM machine at my institution which I am told will have POE on it 'soon', but I am not sure when. Once it does, we will be working on getting POE well supported again. I should mention that we do use LoadLeveler on one of our machines and Open MPI seems to work with it quite well. I would be interested in hearing how it works for you. Hope this helps, let me know if this works. Thanks, Tim On Thursday 10 May 2007 02:57 am, Laurent Nguyen wrote: > Hello, > > I tried to install OpenMPI 1.2 but I saw there some problems when > compiling files with POE. When OpenMPI 1.2.1 was released, I saw in the > bug fixes that this problem was fixed. Then I tried, but it still > doesn't work. The problem comes from orte/mca/pls/poe/pls_poe_module.c. > A static function "static int pls_poe_cancel_operation(void);" is > declared but not defined in the files. I don't know if my configuration > make it bug. > > So, if someone achieved to install OpenMPI 1.2.1 on IBM, I would like to > have some advices. > > Thank you for your help, > > PS: I attached some output files of my installation Index: orte/mca/pls/poe/pls_poe_module.c === --- orte/mca/pls/poe/pls_poe_module.c (revision 14640) +++ orte/mca/pls/poe/pls_poe_module.c (working copy) @@ -37,6 +37,7 @@ #include "opal/mca/base/mca_base_param.h" #include "opal/util/argv.h" #include "opal/util/opal_environ.h" +#include "opal/util/output.h" #include "orte/mca/errmgr/errmgr.h" #include "orte/mca/gpr/gpr.h" @@ -69,7 +70,10 @@ static int pls_poe_signal_job(orte_jobid_t jobid, int32_t signal, opal_list_t *attrs); static int pls_poe_signal_proc(const orte_process_name_t *name, int32_t signal); static int pls_poe_finalize(void); -static int pls_poe_cancel_operation(void); +static int pls_poe_cancel_operation(void) { +return ORTE_ERR_NOT_IMPLEMENTED; +} + orte_pls_base_module_t orte_pls_poe_module = { pls_poe_launch_job,
Re: [OMPI users] OpenMPI 1.2.1: cannot install on IBM SP4
Hi Tim, Ok, I thank you for all theses precisions. I also add "static int pls_poe_cancel_operation(void)" similary to you, and I can continue the compilation. But, I had another problem. In ompi/mpi/cxx/mpicxx.cc, three variables are already defined. The preprocessor set them to the constant of C. So, I put theses lines in comment: //const int SEEK_SET = MPI_SEEK_SET; //const int SEEK_CUR = MPI_SEEK_CUR; //const int SEEK_END = MPI_SEEK_END; After that, I can achieve to compile OpenMPI. I didn't try to launch it in rsh mode. But I tried to launch it with POE. But firstly I remind here my experience with OpenMPI 1.1.x on IBM. My machine has some restriction, but I have two ways for launching an application: - interactive mode: OpenMPI didn't work in this mode. I have this error: $ export MP_PROCS=2 $ mpiexec -n 2 myprog.exe ERROR: 0031-125 Fewer nodes (1) specified in /tmpdir/inter/int.ssos181-130093928631562/a-UWUb than tasks (2). I think it is because of my machine configuration - batch mode (for queuing): OpenMPI worked, but some functions didn't work (like MPI_Comm_Spawn). And it seems that performances during communications are very bad. (But in intra-nodes, it has the same performance as MPI constructor) Then, I hope OpenMPI 1.2.xxx work on SP4, but I have the same problem in interactive mode. And in batch mode, I have the error: [0,0,0] ORTE_ERROR_LOG: Not implemented in file errmgr_hnp.c at line 90 -- mpiexec was unable to cleanly terminate the daemons for this job. Returned value Not implemented instead of ORTE_SUCCESS. -- I think it is like you said before, POE isn't yet implemented. I was interested for OpenMPI because it support MPI-2. Since OpenMPI 1.1.1, I install all the version on my SP4 for testing. My impressions are: - it seems to be very difficult for developpers to implement OpenMPI on SP4 and I hope one day they achieve it ;) - in my context, my institution puts many restrictions on the use of our machine, that's why my tests are incomplete. (On the same way, rsh command is forbidden between our nodes...) So, I really thank you for your explanations and precisions. Best Regards, ** NGUYEN Anh-Khai Laurent Equipe Support Utilisateur Email:laurent.ngu...@idris.fr Tél :01.69.35.85.66 Adresse :IDRIS - Institut du Développement et des Ressources en Informatique Scientifique CNRS Batiment 506 BP 167 F - 91403 ORSAY Cedex Site Web :http://www.idris.fr ** Tim Prins a écrit : Hi Laurent, Unfortunately, as far as I know, none of the current Open MPI developers has access to a system with POE, so the POE process launcher has fallen into disrepair. Attached is a patch that should allow you to compile (however, you may also need to add #include to pls_poe_module.c). Though this should allow the compile to succeed, launching with POE may not work (it has not been tested for quite a while). If it doesn't work, you should use the rsh launcher instead (pass -mca pls rsh on the command line, or set the parameter using one of the methods here: http://www.open-mpi.org/faq/?category=tuning#setting-mca-params). Sorry about this. We have an IBM machine at my institution which I am told will have POE on it 'soon', but I am not sure when. Once it does, we will be working on getting POE well supported again. I should mention that we do use LoadLeveler on one of our machines and Open MPI seems to work with it quite well. I would be interested in hearing how it works for you. Hope this helps, let me know if this works. Thanks, Tim On Thursday 10 May 2007 02:57 am, Laurent Nguyen wrote: Hello, I tried to install OpenMPI 1.2 but I saw there some problems when compiling files with POE. When OpenMPI 1.2.1 was released, I saw in the bug fixes that this problem was fixed. Then I tried, but it still doesn't work. The problem comes from orte/mca/pls/poe/pls_poe_module.c. A static function "static int pls_poe_cancel_operation(void);" is declared but not defined in the files. I don't know if my configuration make it bug. So, if someone achieved to install OpenMPI 1.2.1 on IBM, I would like to have some advices. Thank you for your help, PS: I attached some output files of my installation Index: orte/mca/pls/poe/pls_poe_module.c === --- orte/mca/pls/poe/pls_poe_module.c (revision 14640) +++ orte/mca/pls/poe/pls_poe_module.c (working copy) @@ -37,6 +37,7 @@ #include "opal/mca/base/mca_base_param.h" #include "opal/util/argv.h" #include "opal/util/opal_environ.h" +#include "opal/util/output.h"
[OMPI users] newbie question
I'm trying to run a job specifically over tcp and the eth1 interface. It seems to be barfing on trying to listen via ipv6. I don't want ipv6. How can I disable it? Here's my mpirun line: [root@vic12-10g ~]# mpirun --n 2 --host vic12,vic20 --mca btl self,tcp -mca btl_tcp_if_include eth1 /root/IMB_2.3/src/IMB-MPI1 sendrecv [vic12][0,1,0][btl_tcp_component.c:489:mca_btl_tcp_component_create_listen] socket() failed: Address family not supported by protocol (97) [vic12-10g:15771] mca_btl_tcp_component: IPv6 listening socket failed [vic20][0,1,1][btl_tcp_component.c:489:mca_btl_tcp_component_create_listen] socket() failed: Address family not supported by protocol (97) [vic20-10g:23977] mca_btl_tcp_component: IPv6 listening socket failed Thanks, Steve.
Re: [OMPI users] OpenMPI 1.2.1: cannot install on IBM SP4
On Thursday 10 May 2007 11:35 am, Laurent Nguyen wrote: > Hi Tim, > > Ok, I thank you for all theses precisions. I also add "static int > pls_poe_cancel_operation(void)" similary to you, and I can continue the > compilation. But, I had another problem. In ompi/mpi/cxx/mpicxx.cc, > three variables are already defined. The preprocessor set them to the > constant of C. So, I put theses lines in comment: >//const int SEEK_SET = MPI_SEEK_SET; >//const int SEEK_CUR = MPI_SEEK_CUR; >//const int SEEK_END = MPI_SEEK_END; I remember there was a problem with these constants earlier. You should be able to disable them by passing --disable-mpi-cxx-seek to configure. > I was interested for OpenMPI because it support MPI-2. Since OpenMPI > 1.1.1, I install all the version on my SP4 for testing. My impressions are: > - it seems to be very difficult for developpers to implement OpenMPI on > SP4 and I hope one day they achieve it ;) > - in my context, my institution puts many restrictions on the use of our > machine, that's why my tests are incomplete. (On the same way, rsh > command is forbidden between our nodes...) Note that the name 'rsh' is a bit of a misnomer. The rsh launcher actually uses ssh by default. Tim > > So, I really thank you for your explanations and precisions. > > Best Regards, > > > ** > NGUYEN Anh-Khai Laurent > Equipe Support Utilisateur > > Email:laurent.ngu...@idris.fr > Tél :01.69.35.85.66 > Adresse :IDRIS - Institut du Développement et des Ressources en >Informatique Scientifique >CNRS >Batiment 506 >BP 167 >F - 91403 ORSAY Cedex > Site Web :http://www.idris.fr > ** > > Tim Prins a écrit : > > Hi Laurent, > > > > Unfortunately, as far as I know, none of the current Open MPI developers > > has access to a system with POE, so the POE process launcher has fallen > > into disrepair. Attached is a patch that should allow you to compile > > (however, you may also need to add #include to > > pls_poe_module.c). > > > > Though this should allow the compile to succeed, launching with POE may > > not work (it has not been tested for quite a while). If it doesn't work, > > you should use the rsh launcher instead (pass -mca pls rsh on the command > > line, or set the parameter using one of the methods here: > > http://www.open-mpi.org/faq/?category=tuning#setting-mca-params). > > > > Sorry about this. We have an IBM machine at my institution which I am > > told will have POE on it 'soon', but I am not sure when. Once it does, we > > will be working on getting POE well supported again. > > > > I should mention that we do use LoadLeveler on one of our machines and > > Open MPI seems to work with it quite well. I would be interested in > > hearing how it works for you. > > > > Hope this helps, let me know if this works. > > > > Thanks, > > > > Tim > > > > On Thursday 10 May 2007 02:57 am, Laurent Nguyen wrote: > >> Hello, > >> > >> I tried to install OpenMPI 1.2 but I saw there some problems when > >> compiling files with POE. When OpenMPI 1.2.1 was released, I saw in the > >> bug fixes that this problem was fixed. Then I tried, but it still > >> doesn't work. The problem comes from orte/mca/pls/poe/pls_poe_module.c. > >> A static function "static int pls_poe_cancel_operation(void);" is > >> declared but not defined in the files. I don't know if my configuration > >> make it bug. > >> > >> So, if someone achieved to install OpenMPI 1.2.1 on IBM, I would like to > >> have some advices. > >> > >> Thank you for your help, > >> > >> PS: I attached some output files of my installation > >> > >> > >> > >> Index: orte/mca/pls/poe/pls_poe_module.c > >> === > >> --- orte/mca/pls/poe/pls_poe_module.c (revision 14640) > >> +++ orte/mca/pls/poe/pls_poe_module.c (working copy) > >> @@ -37,6 +37,7 @@ > >> #include "opal/mca/base/mca_base_param.h" > >> #include "opal/util/argv.h" > >> #include "opal/util/opal_environ.h" > >> +#include "opal/util/output.h" > >> > >> #include "orte/mca/errmgr/errmgr.h" > >> #include "orte/mca/gpr/gpr.h" > >> @@ -69,7 +70,10 @@ > >> static int pls_poe_signal_job(orte_jobid_t jobid, int32_t signal, > >> opal_list_t *attrs); static int pls_poe_signal_proc(const > >> orte_process_name_t *name, int32_t signal); static int > >> pls_poe_finalize(void); > >> -static int pls_poe_cancel_operation(void); > >> +static int pls_poe_cancel_operation(void) { > >> +return ORTE_ERR_NOT_IMPLEMENTED; > >> +} > >> + > >> > >> orte_pls_base_module_t orte_pls_poe_module = { > >> pls_poe_launch_job, > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] newbie question
Brian -- Didn't you add something to fix exactly this problem recently? I have a dim recollection of seeing a commit go by about this...? (I advised Steve in IM to use --disable-ipv6 in the meantime) On May 10, 2007, at 1:25 PM, Steve Wise wrote: I'm trying to run a job specifically over tcp and the eth1 interface. It seems to be barfing on trying to listen via ipv6. I don't want ipv6. How can I disable it? Here's my mpirun line: [root@vic12-10g ~]# mpirun --n 2 --host vic12,vic20 --mca btl self,tcp -mca btl_tcp_if_include eth1 /root/IMB_2.3/src/IMB-MPI1 sendrecv [vic12][0,1,0][btl_tcp_component.c: 489:mca_btl_tcp_component_create_listen] socket() failed: Address family not supported by protocol (97) [vic12-10g:15771] mca_btl_tcp_component: IPv6 listening socket failed [vic20][0,1,1][btl_tcp_component.c: 489:mca_btl_tcp_component_create_listen] socket() failed: Address family not supported by protocol (97) [vic20-10g:23977] mca_btl_tcp_component: IPv6 listening socket failed Thanks, Steve. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] newbie question
On Thu, 2007-05-10 at 20:07 -0400, Jeff Squyres wrote: > Brian -- > > Didn't you add something to fix exactly this problem recently? I > have a dim recollection of seeing a commit go by about this...? > > (I advised Steve in IM to use --disable-ipv6 in the meantime) > Yes, disabling it worked. ;-)
Re: [OMPI users] Newbie question. Please help.
Good to know. This suggests that building VASP properly with Open MPI should work properly; perhaps there's some secret sauce in the Makefile somewhere...? Off list, someone cited the following to me: - Also VASP has a forum for things like this too. http://cms.mpi.univie.ac.at/vasp-forum/forum.php From there it looks like people have been having problems with ifort 9.1.043 with vasp. and from this post it looks like I'm not the only one to use openMPI and VASP http://cms.mpi.univie.ac.at/vasp-forum/forum_viewtopic.php?2.550 - I have not received a reply from the VASP author yet. On May 10, 2007, at 8:52 AM, Terry Frankcombe wrote: I have previously been running parallel VASP happily with an old, prerelease version of OpenMPI: [terry@nocona Vasp.4.6-OpenMPI]$ head /home/terry/Install_trees/OpenMPI-1.0rc6/config.log This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. It was created by Open MPI configure 1.0rc6, which was generated by GNU Autoconf 2.59. Invocation command line was $ ./configure --enable-static --disable-shared --prefix=/home/terry/bin/Local --enable-picky --disable-heterogeneous --without-libnuma --without-slurm --without-tm F77=ifort In my VASP makefile: FC=/home/terry/bin/Local/bin/mpif90 OFLAG= -O3 -xP -tpp7 CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC -Dkind8 -DNGZhalf -DCACHE_SIZE=12000 -DPGF90 -Davoidalloc -DMPI_BLOCK=500 -DRPROMU_DGEMV -DRACCMU_DGEMV FFLAGS = -FR -lowercase -assume byterecl As far as I can see (it was a long time ago!) I didn't use BLACS or SCALAPACK libraries. I used ATLAS. Maybe this will help. -- Dr Terry Frankcombe Physical Chemistry, Department of Chemistry Göteborgs Universitet SE-412 96 Göteborg Sweden Ph: +46 76 224 0887 Skype: terry.frankcombe ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
[OMPI users] debugging my program in openmpi
I am a newbie in openmpi. I have just compiled a program with -g -pg (an mpi program with a listener thread, which all MPI calls except initialization and MPI_Finalize are placed within) and I run it. However it crashes and I can't find any core dump, even I set the core dump max size to 10 by ulimit -c 10 Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:(nil) [0] func:raytrace [0x8185581] [1] func:[0xe440] [2] func:raytrace [0x8056736] [3] func:/lib/tls/libpthread.so.0 [0x40063b63] [4] func:/lib/tls/libc.so.6(__clone+0x5a) [0x4014618a] *** End of error message *** I tried to use gdb and I ran: gdb mpirun run --hostfile ../hostfile n 16 raytrace -finputs/car.env when I type backtrace after it crashes, it just said "no stack" I really want to find out what lines in what function are responsible for the crash. What can I do to find out the culprit?
Re: [OMPI users] debugging my program in openmpi
On Thursday 10 May 2007 07:19 pm, Code Master wrote: > I am a newbie in openmpi. I have just compiled a program with -g -pg (an > mpi program with a listener thread, which all MPI calls except > initialization and MPI_Finalize are placed within) and I run it. However > it crashes and I can't find any core dump, even I set the core dump max > size to 10 by > > ulimit -c 10 You probably need to set the ulimit in your .bashrc to get a core dump, since processes are (by default) started via ssh. > > Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) > Failing at addr:(nil) > [0] func:raytrace [0x8185581] > [1] func:[0xe440] > [2] func:raytrace [0x8056736] > [3] func:/lib/tls/libpthread.so.0 [0x40063b63] > [4] func:/lib/tls/libc.so.6(__clone+0x5a) [0x4014618a] > *** End of error message *** > I tried to use gdb and I ran: > gdb mpirun > > run --hostfile ../hostfile n 16 raytrace -finputs/car.env > > when I type > > backtrace > > > after it crashes, it just said "no stack" This is because you are debugging mpirun, and not your application. Mpirun runs to completion successfully, but it is your program which is crashing. Hope this helps, Tim > > I really want to find out what lines in what function are responsible for > the crash. What can I do to find out the culprit?
Re: [OMPI users] debugging my program in openmpi
On 5/11/07, Tim Prins wrote: On Thursday 10 May 2007 07:19 pm, Code Master wrote: > I am a newbie in openmpi. I have just compiled a program with -g -pg (an > mpi program with a listener thread, which all MPI calls except > initialization and MPI_Finalize are placed within) and I run it. However > it crashes and I can't find any core dump, even I set the core dump max > size to 10 by > > ulimit -c 10 You probably need to set the ulimit in your .bashrc to get a core dump, since processes are (by default) started via ssh. > > Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) > Failing at addr:(nil) > [0] func:raytrace [0x8185581] > [1] func:[0xe440] > [2] func:raytrace [0x8056736] > [3] func:/lib/tls/libpthread.so.0 [0x40063b63] > [4] func:/lib/tls/libc.so.6(__clone+0x5a) [0x4014618a] > *** End of error message *** > I tried to use gdb and I ran: > gdb mpirun > > run --hostfile ../hostfile n 16 raytrace -finputs/car.env > > when I type > > backtrace > > > after it crashes, it just said "no stack" This is because you are debugging mpirun, and not your application. Mpirun runs to completion successfully, but it is your program which is crashing. That's great, but how can I debug my program under mpi?
Re: [OMPI users] debugging my program in openmpi
Check out the FAQ: http://www.lam-mpi.org/faq/category6.php3 On May 10, 2007, at 9:50 PM, Code Master wrote: On 5/11/07, Tim Prins wrote: On Thursday 10 May 2007 07:19 pm, Code Master wrote: > I am a newbie in openmpi. I have just compiled a program with -g -pg (an > mpi program with a listener thread, which all MPI calls except > initialization and MPI_Finalize are placed within) and I run it. However > it crashes and I can't find any core dump, even I set the core dump max > size to 10 by > > ulimit -c 10 You probably need to set the ulimit in your .bashrc to get a core dump, since processes are (by default) started via ssh. > > Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) > Failing at addr:(nil) > [0] func:raytrace [0x8185581] > [1] func:[0xe440] > [2] func:raytrace [0x8056736] > [3] func:/lib/tls/libpthread.so.0 [0x40063b63] > [4] func:/lib/tls/libc.so.6(__clone+0x5a) [0x4014618a] > *** End of error message *** > I tried to use gdb and I ran: > gdb mpirun > > run --hostfile ../hostfile n 16 raytrace -finputs/car.env > > when I type > > backtrace > > > after it crashes, it just said "no stack" This is because you are debugging mpirun, and not your application. Mpirun runs to completion successfully, but it is your program which is crashing. That's great, but how can I debug my program under mpi? ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems