I cannot replicate the problem - both scenarios work fine for me. I’m not 
convinced your test code is correct, however, as you call Comm_free the 
inter-communicator but didn’t call Comm_disconnect. Checkout the attached for a 
correct code and see if it works for you.

FWIW: I don’t know how many cores you have on your sockets, but if you have 6 
cores/socket, then your slot-list is equivalent to “—bind-to none” as the 
slot-list applies to every process being launched

Attachment: simple_spawn.c
Description: Binary data


> On May 23, 2016, at 6:26 AM, Siegmar Gross 
> <siegmar.gr...@informatik.hs-fulda.de> wrote:
> 
> Hi,
> 
> I installed openmpi-1.10.3rc2 on my "SUSE Linux Enterprise Server
> 12 (x86_64)" with Sun C 5.13  and gcc-6.1.0. Unfortunately I get
> a segmentation fault for "--slot-list" for one of my small programs.
> 
> 
> loki spawn 119 ompi_info | grep -e "OPAL repo revision:" -e "C compiler 
> absolute:"
>      OPAL repo revision: v1.10.2-201-gd23dda8
>     C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc
> 
> 
> loki spawn 120 mpiexec -np 1 --host loki,loki,loki,loki,loki spawn_master
> 
> Parent process 0 running on loki
>  I create 4 slave processes
> 
> Parent process 0: tasks in MPI_COMM_WORLD:                    1
>                  tasks in COMM_CHILD_PROCESSES local group:  1
>                  tasks in COMM_CHILD_PROCESSES remote group: 4
> 
> Slave process 0 of 4 running on loki
> Slave process 1 of 4 running on loki
> Slave process 2 of 4 running on loki
> spawn_slave 2: argv[0]: spawn_slave
> Slave process 3 of 4 running on loki
> spawn_slave 0: argv[0]: spawn_slave
> spawn_slave 1: argv[0]: spawn_slave
> spawn_slave 3: argv[0]: spawn_slave
> 
> 
> 
> 
> loki spawn 121 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master
> 
> Parent process 0 running on loki
>  I create 4 slave processes
> 
> [loki:17326] *** Process received signal ***
> [loki:17326] Signal: Segmentation fault (11)
> [loki:17326] Signal code: Address not mapped (1)
> [loki:17326] Failing at address: 0x8
> [loki:17326] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f4e469b3870]
> [loki:17326] [ 1] *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> [loki:17324] Local abort before MPI_INIT completed successfully; not able to 
> aggregate error messages, and not able to guarantee that all other processes 
> were killed!
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7f4e46c165b0]
> [loki:17326] [ 2] 
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7f4e46bf5b08]
> [loki:17326] [ 3] *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> [loki:17325] Local abort before MPI_INIT completed successfully; not able to 
> aggregate error messages, and not able to guarantee that all other processes 
> were killed!
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7f4e46c1be8a]
> [loki:17326] [ 4] 
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x180)[0x7f4e46c5828e]
> [loki:17326] [ 5] spawn_slave[0x40097e]
> [loki:17326] [ 6] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f4e4661db05]
> [loki:17326] [ 7] spawn_slave[0x400a54]
> [loki:17326] *** End of error message ***
> -------------------------------------------------------
> Child job 2 terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>  Process name: [[56340,2],0]
>  Exit code:    1
> --------------------------------------------------------------------------
> loki spawn 122
> 
> 
> 
> 
> I would be grateful, if somebody can fix the problem. Thank you
> very much for any help in advance.
> 
> 
> Kind regards
> 
> Siegmar
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29281.php

Reply via email to