Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-26 Thread Siegmar Gross

Hi Ralph and Gilles,

it's strange that the program works with "--host" and "--slot-list"
in your environment and not in mine. I get the following output, if
I run the program in gdb without a breakpoint.


loki spawn 142 gdb /usr/local/openmpi-1.10.3_64_gcc/bin/mpiexec
GNU gdb (GDB; SUSE Linux Enterprise 12) 7.9.1
...
(gdb) set args -np 1 --host loki --slot-list 0:0-1,1:0-1 simple_spawn
(gdb) run
Starting program: /usr/local/openmpi-1.10.3_64_gcc/bin/mpiexec -np 1 --host 
loki --slot-list 0:0-1,1:0-1 simple_spawn
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Detaching after fork from child process 18031.
[pid 18031] starting up!
0 completed MPI_Init
Parent [pid 18031] about to spawn!
Detaching after fork from child process 18033.
Detaching after fork from child process 18034.
[pid 18033] starting up!
[pid 18034] starting up!
[loki:18034] *** Process received signal ***
[loki:18034] Signal: Segmentation fault (11)
...



I get a different output, if I run the program in gdb with
a breakpoint.

gdb /usr/local/openmpi-1.10.3_64_gcc/bin/mpiexec
(gdb) set args -np 1 --host loki --slot-list 0:0-1,1:0-1 simple_spawn
(gbd) set follow-fork-mode child
(gdb) break ompi_proc_self
(gdb) run
(gdb) next

Repeating "next" very often results in the following output.

...
Starting program: 
/home/fd1026/work/skripte/master/parallel/prog/mpi/spawn/simple_spawn
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[pid 13277] starting up!
[New Thread 0x742ef700 (LWP 13289)]

Breakpoint 1, ompi_proc_self (size=0x7fffc060)
at ../../openmpi-1.10.3rc3/ompi/proc/proc.c:413
413 ompi_proc_t **procs = (ompi_proc_t**) malloc(sizeof(ompi_proc_t*));
(gdb) n
414 if (NULL == procs) {
(gdb)
423 OBJ_RETAIN(ompi_proc_local_proc);
(gdb)
424 *procs = ompi_proc_local_proc;
(gdb)
425 *size = 1;
(gdb)
426 return procs;
(gdb)
427 }
(gdb)
ompi_comm_init () at ../../openmpi-1.10.3rc3/ompi/communicator/comm_init.c:138
138 group->grp_my_rank   = 0;
(gdb)
139 group->grp_proc_count= (int)size;
...
193 ompi_comm_reg_init();
(gdb)
196 ompi_comm_request_init ();
(gdb)
198 return OMPI_SUCCESS;
(gdb)
199 }
(gdb)
ompi_mpi_init (argc=0, argv=0x0, requested=0, provided=0x7fffc21c)
at ../../openmpi-1.10.3rc3/ompi/runtime/ompi_mpi_init.c:738
738 if (OMPI_SUCCESS != (ret = ompi_file_init())) {
(gdb)
744 if (OMPI_SUCCESS != (ret = ompi_win_init())) {
(gdb)
750 if (OMPI_SUCCESS != (ret = ompi_attr_init())) {
...
988 ompi_mpi_initialized = true;
(gdb)
991 if (ompi_enable_timing && 0 == OMPI_PROC_MY_NAME->vpid) {
(gdb)
999 return MPI_SUCCESS;
(gdb)
1000}
(gdb)
PMPI_Init (argc=0x0, argv=0x0) at pinit.c:94
94  if (MPI_SUCCESS != err) {
(gdb)
104 return MPI_SUCCESS;
(gdb)
105 }
(gdb)
0x00400d0c in main ()
(gdb)
Single stepping until exit from function main,
which has no line number information.
0 completed MPI_Init
Parent [pid 13277] about to spawn!
[New process 13472]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
process 13472 is executing new program: 
/usr/local/openmpi-1.10.3_64_gcc/bin/orted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New process 13474]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
process 13474 is executing new program: 
/home/fd1026/work/skripte/master/parallel/prog/mpi/spawn/simple_spawn
[pid 13475] starting up!
[pid 13476] starting up!
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[pid 13474] starting up!
[New Thread 0x7491b700 (LWP 13480)]
[Switching to Thread 0x77ff1740 (LWP 13474)]

Breakpoint 1, ompi_proc_self (size=0x7fffba30)
at ../../openmpi-1.10.3rc3/ompi/proc/proc.c:413
413 ompi_proc_t **procs = (ompi_proc_t**) malloc(sizeof(ompi_proc_t*));
(gdb)
414 if (NULL == procs) {
...
426 return procs;
(gdb)
427 }
(gdb)
ompi_comm_init () at ../../openmpi-1.10.3rc3/ompi/communicator/comm_init.c:138
138 group->grp_my_rank   = 0;
(gdb)
139 group->grp_proc_count= (int)size;
(gdb)
140 OMPI_GROUP_SET_INTRINSIC (group);
...
193 ompi_comm_reg_init();
(gdb)
196 ompi_comm_request_init ();
(gdb)
198 return OMPI_SUCCESS;
(gdb)
199 }
(gdb)
ompi_mpi_init (argc=0, argv=0x0, requested=0, provided=0x7fffbbec)
at ../../openmpi-1.10.3rc3/ompi/runtime/ompi_mpi_init.c:738
738 if (OMPI_SUCCESS != (ret = ompi_file_init())) {
(gdb)
744 if (OMPI_SUCCESS != (ret = ompi_win_init())) {
(gdb)
750 if (OMPI_SUCCESS != (ret = ompi_attr_init())) {
...
863 if (OMPI_SUCCESS != (ret = ompi_

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-26 Thread Ralph Castain
I’m afraid I honestly can’t make any sense of it. It seems you at least have a 
simple workaround (use a hostfile instead of -host), yes?


> On May 26, 2016, at 5:48 AM, Siegmar Gross 
>  wrote:
> 
> Hi Ralph and Gilles,
> 
> it's strange that the program works with "--host" and "--slot-list"
> in your environment and not in mine. I get the following output, if
> I run the program in gdb without a breakpoint.
> 
> 
> loki spawn 142 gdb /usr/local/openmpi-1.10.3_64_gcc/bin/mpiexec
> GNU gdb (GDB; SUSE Linux Enterprise 12) 7.9.1
> ...
> (gdb) set args -np 1 --host loki --slot-list 0:0-1,1:0-1 simple_spawn
> (gdb) run
> Starting program: /usr/local/openmpi-1.10.3_64_gcc/bin/mpiexec -np 1 --host 
> loki --slot-list 0:0-1,1:0-1 simple_spawn
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Detaching after fork from child process 18031.
> [pid 18031] starting up!
> 0 completed MPI_Init
> Parent [pid 18031] about to spawn!
> Detaching after fork from child process 18033.
> Detaching after fork from child process 18034.
> [pid 18033] starting up!
> [pid 18034] starting up!
> [loki:18034] *** Process received signal ***
> [loki:18034] Signal: Segmentation fault (11)
> ...
> 
> 
> 
> I get a different output, if I run the program in gdb with
> a breakpoint.
> 
> gdb /usr/local/openmpi-1.10.3_64_gcc/bin/mpiexec
> (gdb) set args -np 1 --host loki --slot-list 0:0-1,1:0-1 simple_spawn
> (gbd) set follow-fork-mode child
> (gdb) break ompi_proc_self
> (gdb) run
> (gdb) next
> 
> Repeating "next" very often results in the following output.
> 
> ...
> Starting program: 
> /home/fd1026/work/skripte/master/parallel/prog/mpi/spawn/simple_spawn
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [pid 13277] starting up!
> [New Thread 0x742ef700 (LWP 13289)]
> 
> Breakpoint 1, ompi_proc_self (size=0x7fffc060)
>at ../../openmpi-1.10.3rc3/ompi/proc/proc.c:413
> 413 ompi_proc_t **procs = (ompi_proc_t**) 
> malloc(sizeof(ompi_proc_t*));
> (gdb) n
> 414 if (NULL == procs) {
> (gdb)
> 423 OBJ_RETAIN(ompi_proc_local_proc);
> (gdb)
> 424 *procs = ompi_proc_local_proc;
> (gdb)
> 425 *size = 1;
> (gdb)
> 426 return procs;
> (gdb)
> 427 }
> (gdb)
> ompi_comm_init () at ../../openmpi-1.10.3rc3/ompi/communicator/comm_init.c:138
> 138 group->grp_my_rank   = 0;
> (gdb)
> 139 group->grp_proc_count= (int)size;
> ...
> 193 ompi_comm_reg_init();
> (gdb)
> 196 ompi_comm_request_init ();
> (gdb)
> 198 return OMPI_SUCCESS;
> (gdb)
> 199 }
> (gdb)
> ompi_mpi_init (argc=0, argv=0x0, requested=0, provided=0x7fffc21c)
>at ../../openmpi-1.10.3rc3/ompi/runtime/ompi_mpi_init.c:738
> 738 if (OMPI_SUCCESS != (ret = ompi_file_init())) {
> (gdb)
> 744 if (OMPI_SUCCESS != (ret = ompi_win_init())) {
> (gdb)
> 750 if (OMPI_SUCCESS != (ret = ompi_attr_init())) {
> ...
> 988 ompi_mpi_initialized = true;
> (gdb)
> 991 if (ompi_enable_timing && 0 == OMPI_PROC_MY_NAME->vpid) {
> (gdb)
> 999 return MPI_SUCCESS;
> (gdb)
> 1000}
> (gdb)
> PMPI_Init (argc=0x0, argv=0x0) at pinit.c:94
> 94  if (MPI_SUCCESS != err) {
> (gdb)
> 104 return MPI_SUCCESS;
> (gdb)
> 105 }
> (gdb)
> 0x00400d0c in main ()
> (gdb)
> Single stepping until exit from function main,
> which has no line number information.
> 0 completed MPI_Init
> Parent [pid 13277] about to spawn!
> [New process 13472]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> process 13472 is executing new program: 
> /usr/local/openmpi-1.10.3_64_gcc/bin/orted
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New process 13474]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> process 13474 is executing new program: 
> /home/fd1026/work/skripte/master/parallel/prog/mpi/spawn/simple_spawn
> [pid 13475] starting up!
> [pid 13476] starting up!
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [pid 13474] starting up!
> [New Thread 0x7491b700 (LWP 13480)]
> [Switching to Thread 0x77ff1740 (LWP 13474)]
> 
> Breakpoint 1, ompi_proc_self (size=0x7fffba30)
>at ../../openmpi-1.10.3rc3/ompi/proc/proc.c:413
> 413 ompi_proc_t **procs = (ompi_proc_t**) 
> malloc(sizeof(ompi_proc_t*));
> (gdb)
> 414 if (NULL == procs) {
> ...
> 426 return procs;
> (gdb)
> 427 }
> (gdb)
> ompi_comm_init () at ../../openmpi-1.10.3rc3/ompi/communicator/comm_init.c:138
> 138 group->grp_my_rank   = 0;
> (gdb)
> 139 group->grp_proc_count= (int)size;
> (gdb)
> 140 OMPI_GROUP_SET_INTRINSIC (group);
> ...
> 193 ompi_comm

Re: [OMPI users] users Digest, Vol 3510, Issue 2

2016-05-26 Thread Megdich Islem
Thank you all for your suggestions !!
I found an answer to a similar case in Open MPI FAQ (Question 15)FAQ: Running 
MPI jobs

|   |
|   |  |   |   |   |   |   |
| FAQ: Running MPI jobsTable of contents: What pre-requisites are necessary for 
running an Open MPI job? What ABI guarantees does Open MPI provide? Do I need a 
common filesystem on a... |
|  |
| Afficher sur www.open-mpi.org | Aperçu par Yahoo |
|  |
|   |

which suggests to use mpirun's  prefix command line option or to use the mpirun 
wrapper.
I modified my command  to the following mpirun --prefix 
/opt/openfoam30/platforms/linux64GccDPInt32Opt/lib/Openmpi-system -np 1 
pimpleDyMFoam -case OF
But, I got an error (see attached picture). Is the syntax correct? How can I 
solve the problem? That first method seems to be easier than using the mpirun 
wrapper.
Otherwise, how can I use the mpirun wrapper?
Regards,islem 

Le Mercredi 25 mai 2016 16h40, Dave Love  a écrit :
 

 I wrote: 

> You could wrap one (set of) program(s) in a script to set the
> appropriate environment before invoking the real program.  

I realize I should have said something like "program invocations",
i.e. if you have no control over something invoking mpirun for programs
using different MPIs, then an mpirun wrapper needs to check what it's
being asked to run.




Re: [OMPI users] users Digest, Vol 3510, Issue 2

2016-05-26 Thread Jeff Squyres (jsquyres)
You're still intermingling your Open MPI and MPICH installations.

You need to ensure to use the wrapper compilers and mpirun/mpiexec from the 
same MPI implementation.

For example, if you use mpicc/mpifort from Open MPI to build your program, then 
you must use Open MPI's mpirun/mpiexec.

If you absolutely need to have both MPI implementations in your PATH / 
LD_LIBRARY_PATH, you might want to use absolute path names to for 
mpicc/mpifort/mpirun/mpiexec.



> On May 26, 2016, at 3:46 PM, Megdich Islem  wrote:
> 
> Thank you all for your suggestions !!
> 
> I found an answer to a similar case in Open MPI FAQ (Question 15)
> FAQ: Running MPI jobs
>  
>  
> 
>  
>  
>  
>  
>  
> FAQ: Running MPI jobs
> Table of contents: What pre-requisites are necessary for running an Open MPI 
> job? What ABI guarantees does Open MPI provide? Do I need a common filesystem 
> on a...
> Afficher sur www.open-mpi.org
> Aperçu par Yahoo
>  
> which suggests to use mpirun's  prefix command line option or to use the 
> mpirun wrapper.
> 
> I modified my command  to the following
>  mpirun --prefix 
> /opt/openfoam30/platforms/linux64GccDPInt32Opt/lib/Openmpi-system -np 1 
> pimpleDyMFoam -case OF
> 
> But, I got an error (see attached picture). Is the syntax correct? How can I 
> solve the problem? That first method seems to be easier than using the mpirun 
> wrapper.
> 
> Otherwise, how can I use the mpirun wrapper?
> 
> Regards,
> islem
> 
> 
> Le Mercredi 25 mai 2016 16h40, Dave Love  a écrit :
> 
> 
> I wrote:
> 
> 
> > You could wrap one (set of) program(s) in a script to set the
> > appropriate environment before invoking the real program. 
> 
> 
> I realize I should have said something like "program invocations",
> i.e. if you have no control over something invoking mpirun for programs
> using different MPIs, then an mpirun wrapper needs to check what it's
> being asked to run.
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29317.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/