Hi!
Thank you Jeff for you suggestion. But, I am still not able to understand what 
do you mean by using absolute path names to for mpicc/mpifort-mpirun/mpiexec ?
This is how my .bashrc looks like
source /opt/openfoam30/etc/bashrc
export PATH=/home/Desktop/mpich/bin:$PATHexport 
LD_LIBRARY_PATH="/home/islem/Desktop/mpich/lib/:$LD_LIBRARY_PATH"export 
MPICH_F90=gfortranexport MPICH_CC=/opt/intel/bin/iccexport 
MPICH_CXX=/opt/intel/bin/icpcexport MPICH_LINK_CXX="-L/home/Desktop/mpich/lib/ 
-Wl,-rpath -Wl,/home/islem/Desktop/mpich/lib -lmpichcxx -lmpich -lopa -lmpl 
-lrt -lpthread"
export 
PATH=$PATH:/opt/intel/bin/LD_LIBRARY_PATH="/opt/intel/lib/intel64:$LD_LIBRARY_PATH"export
 LD_LIBRARY_PATHsource 
/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/mpivars.sh 
intel64
alias startEMPIRE=". /home/islem/software/empire/EMPIRE-Core/etc/bashrc.sh ICC"
mpirun --version gives mpich 3.0.4
This is how I run one example that couples 2 clients through the server 
EMPIRE.I use three terminals, in each I write one of these command lines
mpiexec -np 1 Emperor emperorInput.xml  (I got a message in the terminal saying 
that Empire started)
mpiexec -np 1 dummyCSM dummyCSMInput (I get a message that Emperor acknowledged 
connection)mpiexec -np 1 pimpleDyMFoam -case OF (I got no message in the 
terminal which means no connection)
How can I use the mpirun and where to right any modifications ?
Regards,Islem 

    Le Vendredi 27 mai 2016 17h00, "users-requ...@open-mpi.org" 
<users-requ...@open-mpi.org> a écrit :
 

 Send users mailing list submissions to
    us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
    https://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
    users-requ...@open-mpi.org

You can reach the person managing the list at
    users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

  1. Re: users Digest, Vol 3510, Issue 2 (Jeff Squyres (jsquyres))
  2. Re: segmentation fault for slot-list and openmpi-1.10.3rc2
      (Siegmar Gross)
  3. OpenMPI virtualization aware (Marco D'Amico)
  4. Re: OpenMPI virtualization aware (Ralph Castain)


----------------------------------------------------------------------

Message: 1
List-Post: users@lists.open-mpi.org
Date: Thu, 26 May 2016 23:28:17 +0000
From: "Jeff Squyres (jsquyres)" <jsquy...@cisco.com>
To: Megdich Islem <megdich_is...@yahoo.fr>, "Open MPI User's List"
    <us...@open-mpi.org>
Cc: Dave Love <d.l...@liverpool.ac.uk>
Subject: Re: [OMPI users] users Digest, Vol 3510, Issue 2
Message-ID: <441f803d-fdbb-443d-82aa-74ff3845a...@cisco.com>
Content-Type: text/plain; charset="utf-8"

You're still intermingling your Open MPI and MPICH installations.

You need to ensure to use the wrapper compilers and mpirun/mpiexec from the 
same MPI implementation.

For example, if you use mpicc/mpifort from Open MPI to build your program, then 
you must use Open MPI's mpirun/mpiexec.

If you absolutely need to have both MPI implementations in your PATH / 
LD_LIBRARY_PATH, you might want to use absolute path names to for 
mpicc/mpifort/mpirun/mpiexec.



> On May 26, 2016, at 3:46 PM, Megdich Islem <megdich_is...@yahoo.fr> wrote:
> 
> Thank you all for your suggestions !!
> 
> I found an answer to a similar case in Open MPI FAQ (Question 15)
> FAQ: Running MPI jobs
>  
>  
> 
>  
>  
>  
>  
>  
> FAQ: Running MPI jobs
> Table of contents: What pre-requisites are necessary for running an Open MPI 
> job? What ABI guarantees does Open MPI provide? Do I need a common filesystem 
> on a...
> Afficher sur www.open-mpi.org
> Aper?u par Yahoo
>  
> which suggests to use mpirun's  prefix command line option or to use the 
> mpirun wrapper.
> 
> I modified my command  to the following
>  mpirun --prefix 
>/opt/openfoam30/platforms/linux64GccDPInt32Opt/lib/Openmpi-system -np 1 
>pimpleDyMFoam -case OF
> 
> But, I got an error (see attached picture). Is the syntax correct? How can I 
> solve the problem? That first method seems to be easier than using the mpirun 
> wrapper.
> 
> Otherwise, how can I use the mpirun wrapper?
> 
> Regards,
> islem
> 
> 
> Le Mercredi 25 mai 2016 16h40, Dave Love <d.l...@liverpool.ac.uk> a ?crit :
> 
> 
> I wrote:
> 
> 
> > You could wrap one (set of) program(s) in a script to set the
> > appropriate environment before invoking the real program. 
> 
> 
> I realize I should have said something like "program invocations",
> i.e. if you have no control over something invoking mpirun for programs
> using different MPIs, then an mpirun wrapper needs to check what it's
> being asked to run.
> 
> 
> 
> <mpirun-error.png><path-to-open-mpi.png>_______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29317.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


------------------------------

Message: 2
List-Post: users@lists.open-mpi.org
Date: Fri, 27 May 2016 08:16:41 +0200
From: Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de>
To: Open MPI Users <us...@open-mpi.org>
Subject: Re: [OMPI users] segmentation fault for slot-list and
    openmpi-1.10.3rc2
Message-ID:
    <f5653a5c-174f-4569-c730-082a9db82...@informatik.hs-fulda.de>
Content-Type: text/plain; charset=windows-1252; format=flowed

Hi Ralph,


Am 26.05.2016 um 17:38 schrieb Ralph Castain:
> I?m afraid I honestly can?t make any sense of it. It seems
> you at least have a simple workaround (use a hostfile instead
> of -host), yes?

Only the combination "--host" and "--slot-list" breaks.
Everything else works as expected. One more remark: As you
can see below, this combination worked using gdb and "next"
after the breakpoint. The process blocks, if I keep the
enter-key pressed down and I have to kill simple_spawn in
another window to get control back in gdb (<Ctrl-c> or
anything else didn't work). I got this error yesterday
evening.

...
(gdb)
ompi_mpi_init (argc=0, argv=0x0, requested=0, provided=0x7fffffffbc0c)
    at ../../openmpi-1.10.3rc3/ompi/runtime/ompi_mpi_init.c:738
738        if (OMPI_SUCCESS != (ret = ompi_file_init())) {
(gdb)
744        if (OMPI_SUCCESS != (ret = ompi_win_init())) {
(gdb)
750        if (OMPI_SUCCESS != (ret = ompi_attr_init())) {
(gdb)
758        if (OMPI_SUCCESS != (ret = ompi_proc_complete_init())) {
(gdb)
764        ret = MCA_PML_CALL(enable(true));
(gdb)
765        if( OMPI_SUCCESS != ret ) {
(gdb)
771        if (NULL == (procs = ompi_proc_world(&nprocs))) {
(gdb)
775        ret = MCA_PML_CALL(add_procs(procs, nprocs));
(gdb)
776        free(procs);
(gdb)
780        if (OMPI_ERR_UNREACH == ret) {
(gdb)
785        } else if (OMPI_SUCCESS != ret) {
(gdb)
790        MCA_PML_CALL(add_comm(&ompi_mpi_comm_world.comm));
(gdb)
791        MCA_PML_CALL(add_comm(&ompi_mpi_comm_self.comm));
(gdb)
796        if (ompi_mpi_show_mca_params) {
(gdb)
803        ompi_rte_wait_for_debugger();
(gdb)
807        if (ompi_enable_timing && 0 == OMPI_PROC_MY_NAME->vpid) {
(gdb)
817        coll = OBJ_NEW(ompi_rte_collective_t);
(gdb)
818        coll->id = ompi_process_info.peer_init_barrier;
(gdb)
819        coll->active = true;
(gdb)
820        if (OMPI_SUCCESS != (ret = ompi_rte_barrier(coll))) {
(gdb)
825        OMPI_WAIT_FOR_COMPLETION(coll->active);
(gdb)














Program received signal SIGTERM, Terminated.
0x00007ffff7a7acd0 in opal_progress@plt ()
    from /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12
(gdb)
Single stepping until exit from function opal_progress@plt,
which has no line number information.
[Thread 0x7ffff491b700 (LWP 19602) exited]

Program terminated with signal SIGTERM, Terminated.
The program no longer exists.
(gdb)
The program is not being run.
(gdb)
...



Kind regards

Siegmar


>> On May 26, 2016, at 5:48 AM, Siegmar Gross 
>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>
>> Hi Ralph and Gilles,
>>
>> it's strange that the program works with "--host" and "--slot-list"
>> in your environment and not in mine. I get the following output, if
>> I run the program in gdb without a breakpoint.
>>
>>
>> loki spawn 142 gdb /usr/local/openmpi-1.10.3_64_gcc/bin/mpiexec
>> GNU gdb (GDB; SUSE Linux Enterprise 12) 7.9.1
>> ...
>> (gdb) set args -np 1 --host loki --slot-list 0:0-1,1:0-1 simple_spawn
>> (gdb) run
>> Starting program: /usr/local/openmpi-1.10.3_64_gcc/bin/mpiexec -np 1 --host 
>> loki --slot-list 0:0-1,1:0-1 simple_spawn
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Detaching after fork from child process 18031.
>> [pid 18031] starting up!
>> 0 completed MPI_Init
>> Parent [pid 18031] about to spawn!
>> Detaching after fork from child process 18033.
>> Detaching after fork from child process 18034.
>> [pid 18033] starting up!
>> [pid 18034] starting up!
>> [loki:18034] *** Process received signal ***
>> [loki:18034] Signal: Segmentation fault (11)
>> ...
>>
>>
>>
>> I get a different output, if I run the program in gdb with
>> a breakpoint.
>>
>> gdb /usr/local/openmpi-1.10.3_64_gcc/bin/mpiexec
>> (gdb) set args -np 1 --host loki --slot-list 0:0-1,1:0-1 simple_spawn
>> (gbd) set follow-fork-mode child
>> (gdb) break ompi_proc_self
>> (gdb) run
>> (gdb) next
>>
>> Repeating "next" very often results in the following output.
>>
>> ...
>> Starting program: 
>> /home/fd1026/work/skripte/master/parallel/prog/mpi/spawn/simple_spawn
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> [pid 13277] starting up!
>> [New Thread 0x7ffff42ef700 (LWP 13289)]
>>
>> Breakpoint 1, ompi_proc_self (size=0x7fffffffc060)
>>    at ../../openmpi-1.10.3rc3/ompi/proc/proc.c:413
>> 413        ompi_proc_t **procs = (ompi_proc_t**) 
>> malloc(sizeof(ompi_proc_t*));
>> (gdb) n
>> 414        if (NULL == procs) {
>> (gdb)
>> 423        OBJ_RETAIN(ompi_proc_local_proc);
>> (gdb)
>> 424        *procs = ompi_proc_local_proc;
>> (gdb)
>> 425        *size = 1;
>> (gdb)
>> 426        return procs;
>> (gdb)
>> 427    }
>> (gdb)
>> ompi_comm_init () at 
>> ../../openmpi-1.10.3rc3/ompi/communicator/comm_init.c:138
>> 138        group->grp_my_rank      = 0;
>> (gdb)
>> 139        group->grp_proc_count    = (int)size;
>> ...
>> 193        ompi_comm_reg_init();
>> (gdb)
>> 196        ompi_comm_request_init ();
>> (gdb)
>> 198        return OMPI_SUCCESS;
>> (gdb)
>> 199    }
>> (gdb)
>> ompi_mpi_init (argc=0, argv=0x0, requested=0, provided=0x7fffffffc21c)
>>    at ../../openmpi-1.10.3rc3/ompi/runtime/ompi_mpi_init.c:738
>> 738        if (OMPI_SUCCESS != (ret = ompi_file_init())) {
>> (gdb)
>> 744        if (OMPI_SUCCESS != (ret = ompi_win_init())) {
>> (gdb)
>> 750        if (OMPI_SUCCESS != (ret = ompi_attr_init())) {
>> ...
>> 988        ompi_mpi_initialized = true;
>> (gdb)
>> 991        if (ompi_enable_timing && 0 == OMPI_PROC_MY_NAME->vpid) {
>> (gdb)
>> 999        return MPI_SUCCESS;
>> (gdb)
>> 1000    }
>> (gdb)
>> PMPI_Init (argc=0x0, argv=0x0) at pinit.c:94
>> 94          if (MPI_SUCCESS != err) {
>> (gdb)
>> 104        return MPI_SUCCESS;
>> (gdb)
>> 105    }
>> (gdb)
>> 0x0000000000400d0c in main ()
>> (gdb)
>> Single stepping until exit from function main,
>> which has no line number information.
>> 0 completed MPI_Init
>> Parent [pid 13277] about to spawn!
>> [New process 13472]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> process 13472 is executing new program: 
>> /usr/local/openmpi-1.10.3_64_gcc/bin/orted
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> [New process 13474]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> process 13474 is executing new program: 
>> /home/fd1026/work/skripte/master/parallel/prog/mpi/spawn/simple_spawn
>> [pid 13475] starting up!
>> [pid 13476] starting up!
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> [pid 13474] starting up!
>> [New Thread 0x7ffff491b700 (LWP 13480)]
>> [Switching to Thread 0x7ffff7ff1740 (LWP 13474)]
>>
>> Breakpoint 1, ompi_proc_self (size=0x7fffffffba30)
>>    at ../../openmpi-1.10.3rc3/ompi/proc/proc.c:413
>> 413        ompi_proc_t **procs = (ompi_proc_t**) 
>> malloc(sizeof(ompi_proc_t*));
>> (gdb)
>> 414        if (NULL == procs) {
>> ...
>> 426        return procs;
>> (gdb)
>> 427    }
>> (gdb)
>> ompi_comm_init () at 
>> ../../openmpi-1.10.3rc3/ompi/communicator/comm_init.c:138
>> 138        group->grp_my_rank      = 0;
>> (gdb)
>> 139        group->grp_proc_count    = (int)size;
>> (gdb)
>> 140        OMPI_GROUP_SET_INTRINSIC (group);
>> ...
>> 193        ompi_comm_reg_init();
>> (gdb)
>> 196        ompi_comm_request_init ();
>> (gdb)
>> 198        return OMPI_SUCCESS;
>> (gdb)
>> 199    }
>> (gdb)
>> ompi_mpi_init (argc=0, argv=0x0, requested=0, provided=0x7fffffffbbec)
>>    at ../../openmpi-1.10.3rc3/ompi/runtime/ompi_mpi_init.c:738
>> 738        if (OMPI_SUCCESS != (ret = ompi_file_init())) {
>> (gdb)
>> 744        if (OMPI_SUCCESS != (ret = ompi_win_init())) {
>> (gdb)
>> 750        if (OMPI_SUCCESS != (ret = ompi_attr_init())) {
>> ...
>> 863        if (OMPI_SUCCESS != (ret = ompi_pubsub_base_select())) {
>> (gdb)
>> 869        if (OMPI_SUCCESS != (ret = 
>> mca_base_framework_open(&ompi_dpm_base_framework, 0))) {
>> (gdb)
>> 873        if (OMPI_SUCCESS != (ret = ompi_dpm_base_select())) {
>> (gdb)
>> 884        if ( OMPI_SUCCESS !=
>> (gdb)
>> 894        if (OMPI_SUCCESS !=
>> (gdb)
>> 900        if (OMPI_SUCCESS !=
>> (gdb)
>> 911        if (OMPI_SUCCESS != (ret = ompi_dpm.dyn_init())) {
>> (gdb)
>> Parent done with spawn
>> Parent sending message to child
>> 2 completed MPI_Init
>> Hello from the child 2 of 3 on host loki pid 13476
>> 1 completed MPI_Init
>> Hello from the child 1 of 3 on host loki pid 13475
>> 921        if (OMPI_SUCCESS != (ret = ompi_cr_init())) {
>> (gdb)
>> 931        opal_progress_event_users_decrement();
>> (gdb)
>> 934        opal_progress_set_yield_when_idle(ompi_mpi_yield_when_idle);
>> (gdb)
>> 937        if (ompi_mpi_event_tick_rate >= 0) {
>> (gdb)
>> 946        if (OMPI_SUCCESS != (ret = ompi_mpiext_init())) {
>> (gdb)
>> 953        if (ret != OMPI_SUCCESS) {
>> (gdb)
>> 972        OBJ_CONSTRUCT(&ompi_registered_datareps, opal_list_t);
>> (gdb)
>> 977        OBJ_CONSTRUCT( &ompi_mpi_f90_integer_hashtable, 
>> opal_hash_table_t);
>> (gdb)
>> 978        opal_hash_table_init(&ompi_mpi_f90_integer_hashtable, 16 /* why 
>> not? */);
>> (gdb)
>> 980        OBJ_CONSTRUCT( &ompi_mpi_f90_real_hashtable, opal_hash_table_t);
>> (gdb)
>> 981        opal_hash_table_init(&ompi_mpi_f90_real_hashtable, 
>> FLT_MAX_10_EXP);
>> (gdb)
>> 983        OBJ_CONSTRUCT( &ompi_mpi_f90_complex_hashtable, 
>> opal_hash_table_t);
>> (gdb)
>> 984        opal_hash_table_init(&ompi_mpi_f90_complex_hashtable, 
>> FLT_MAX_10_EXP);
>> (gdb)
>> 988        ompi_mpi_initialized = true;
>> (gdb)
>> 991        if (ompi_enable_timing && 0 == OMPI_PROC_MY_NAME->vpid) {
>> (gdb)
>> 999        return MPI_SUCCESS;
>> (gdb)
>> 1000    }
>> (gdb)
>> PMPI_Init (argc=0x0, argv=0x0) at pinit.c:94
>> 94          if (MPI_SUCCESS != err) {
>> (gdb)
>> 104        return MPI_SUCCESS;
>> (gdb)
>> 105    }
>> (gdb)
>> 0x0000000000400d0c in main ()
>> (gdb)
>> Single stepping until exit from function main,
>> which has no line number information.
>> 0 completed MPI_Init
>> Hello from the child 0 of 3 on host loki pid 13474
>>
>> Child 2 disconnected
>> Child 1 disconnected
>> Child 0 received msg: 38
>> Parent disconnected
>> 13277: exiting
>>
>> Program received signal SIGTERM, Terminated.
>> 0x0000000000400f0a in main ()
>> (gdb)
>> Single stepping until exit from function main,
>> which has no line number information.
>> [tcsetpgrp failed in terminal_inferior: No such process]
>> [Thread 0x7ffff491b700 (LWP 13480) exited]
>>
>> Program terminated with signal SIGTERM, Terminated.
>> The program no longer exists.
>> (gdb)
>> The program is not being run.
>> (gdb)
>> The program is not being run.
>> (gdb) info break
>> Num    Type          Disp Enb Address            What
>> 1      breakpoint    keep y  0x00007ffff7aa35c7 in ompi_proc_self
>>                                                  at 
>>../../openmpi-1.10.3rc3/ompi/proc/proc.c:413 inf 8, 7, 6, 5, 4, 3, 2, 1
>>        breakpoint already hit 2 times
>> (gdb) delete 1
>> (gdb) r
>> Starting program: 
>> /home/fd1026/work/skripte/master/parallel/prog/mpi/spawn/simple_spawn
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> [pid 16708] starting up!
>> 0 completed MPI_Init
>> Parent [pid 16708] about to spawn!
>> [New process 16720]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> process 16720 is executing new program: 
>> /usr/local/openmpi-1.10.3_64_gcc/bin/orted
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> [New process 16722]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> process 16722 is executing new program: 
>> /home/fd1026/work/skripte/master/parallel/prog/mpi/spawn/simple_spawn
>> [pid 16723] starting up!
>> [pid 16724] starting up!
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> [pid 16722] starting up!
>> Parent done with spawn
>> Parent sending message to child
>> 1 completed MPI_Init
>> Hello from the child 1 of 3 on host loki pid 16723
>> 2 completed MPI_Init
>> Hello from the child 2 of 3 on host loki pid 16724
>> 0 completed MPI_Init
>> Hello from the child 0 of 3 on host loki pid 16722
>> Child 0 received msg: 38
>> Child 0 disconnected
>> Parent disconnected
>> Child 1 disconnected
>> Child 2 disconnected
>> 16708: exiting
>> 16724: exiting
>> 16723: exiting
>> [New Thread 0x7ffff491b700 (LWP 16729)]
>>
>> Program received signal SIGTERM, Terminated.
>> [Switching to Thread 0x7ffff7ff1740 (LWP 16722)]
>> __GI__dl_debug_state () at dl-debug.c:74
>> 74      dl-debug.c: No such file or directory.
>> (gdb) 
>> --------------------------------------------------------------------------
>> WARNING: A process refused to die despite all the efforts!
>> This process may still be running and/or consuming resources.
>>
>> Host: loki
>> PID:  16722
>>
>> --------------------------------------------------------------------------
>>
>>
>> The following simple_spawn processes exist now.
>>
>> loki spawn 171 ps -aef | grep simple_spawn
>> fd1026  11079 11053  0 14:00 pts/0    00:00:00 
>> /usr/local/openmpi-1.10.3_64_gcc/bin/mpiexec -np 1 --host loki --slot-list 
>> 0:0-1,1:0-1 simple_spawn
>> fd1026  11095 11079 29 14:01 pts/0    00:09:37 [simple_spawn] <defunct>
>> fd1026  16722    1  0 14:31 ?        00:00:00 [simple_spawn] <defunct>
>> fd1026  17271 29963  0 14:33 pts/2    00:00:00 grep simple_spawn
>> loki spawn 172
>>
>>
>> Is it possible that there is a race condition? How can I help
>> to get a solution for my problem?
>>
>>
>> Kind regards
>>
>> Siegmar
>>
>> Am 24.05.2016 um 16:54 schrieb Ralph Castain:
>>> Works perfectly for me, so I believe this must be an environment issue - I 
>>> am using gcc 6.0.0 on CentOS7 with x86:
>>>
>>> $ mpirun -n 1 -host bend001 --slot-list 0:0-1,1:0-1 --report-bindings 
>>> ./simple_spawn
>>> [bend001:17599] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 
>>> 0[core 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: 
>>> [BB/BB/../../../..][BB/BB/../../../..]
>>> [pid 17601] starting up!
>>> 0 completed MPI_Init
>>> Parent [pid 17601] about to spawn!
>>> [pid 17603] starting up!
>>> [bend001:17599] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 
>>> 0[core 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: 
>>> [BB/BB/../../../..][BB/BB/../../../..]
>>> [bend001:17599] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket 
>>> 0[core 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: 
>>> [BB/BB/../../../..][BB/BB/../../../..]
>>> [bend001:17599] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 
>>> 0[core 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: 
>>> [BB/BB/../../../..][BB/BB/../../../..]
>>> [pid 17604] starting up!
>>> [pid 17605] starting up!
>>> Parent done with spawn
>>> Parent sending message to child
>>> 0 completed MPI_Init
>>> Hello from the child 0 of 3 on host bend001 pid 17603
>>> Child 0 received msg: 38
>>> 1 completed MPI_Init
>>> Hello from the child 1 of 3 on host bend001 pid 17604
>>> 2 completed MPI_Init
>>> Hello from the child 2 of 3 on host bend001 pid 17605
>>> Child 0 disconnected
>>> Child 2 disconnected
>>> Parent disconnected
>>> Child 1 disconnected
>>> 17603: exiting
>>> 17605: exiting
>>> 17601: exiting
>>> 17604: exiting
>>> $
>>>
>>>> On May 24, 2016, at 7:18 AM, Siegmar Gross 
>>>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>>>
>>>> Hi Ralph and Gilles,
>>>>
>>>> the program breaks only, if I combine "--host" and "--slot-list". Perhaps 
>>>> this
>>>> information is helpful. I use a different machine now, so that you can see 
>>>> that
>>>> the problem is not restricted to "loki".
>>>>
>>>>
>>>> pc03 spawn 115 ompi_info | grep -e "OPAL repo revision:" -e "C compiler 
>>>> absolute:"
>>>>    OPAL repo revision: v1.10.2-201-gd23dda8
>>>>    C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc
>>>>
>>>>
>>>> pc03 spawn 116 uname -a
>>>> Linux pc03 3.12.55-52.42-default #1 SMP Thu Mar 3 10:35:46 UTC 2016 
>>>> (4354e1d) x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>>
>>>> pc03 spawn 117 cat host_pc03.openmpi
>>>> pc03.informatik.hs-fulda.de slots=12 max_slots=12
>>>>
>>>>
>>>> pc03 spawn 118 mpicc simple_spawn.c
>>>>
>>>>
>>>> pc03 spawn 119 mpiexec -np 1 --report-bindings a.out
>>>> [pc03:03711] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: 
>>>> [BB/../../../../..][../../../../../..]
>>>> [pid 3713] starting up!
>>>> 0 completed MPI_Init
>>>> Parent [pid 3713] about to spawn!
>>>> [pc03:03711] MCW rank 0 bound to socket 1[core 6[hwt 0-1]], socket 1[core 
>>>> 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 
>>>> 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: 
>>>> [../../../../../..][BB/BB/BB/BB/BB/BB]
>>>> [pc03:03711] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
>>>> 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 
>>>> 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: 
>>>> [BB/BB/BB/BB/BB/BB][../../../../../..]
>>>> [pc03:03711] MCW rank 2 bound to socket 1[core 6[hwt 0-1]], socket 1[core 
>>>> 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 
>>>> 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: 
>>>> [../../../../../..][BB/BB/BB/BB/BB/BB]
>>>> [pid 3715] starting up!
>>>> [pid 3716] starting up!
>>>> [pid 3717] starting up!
>>>> Parent done with spawn
>>>> Parent sending message to child
>>>> 0 completed MPI_Init
>>>> Hello from the child 0 of 3 on host pc03 pid 3715
>>>> 1 completed MPI_Init
>>>> Hello from the child 1 of 3 on host pc03 pid 3716
>>>> 2 completed MPI_Init
>>>> Hello from the child 2 of 3 on host pc03 pid 3717
>>>> Child 0 received msg: 38
>>>> Child 0 disconnected
>>>> Child 2 disconnected
>>>> Parent disconnected
>>>> Child 1 disconnected
>>>> 3713: exiting
>>>> 3715: exiting
>>>> 3716: exiting
>>>> 3717: exiting
>>>>
>>>>
>>>> pc03 spawn 120 mpiexec -np 1 --hostfile host_pc03.openmpi --slot-list 
>>>> 0:0-1,1:0-1 --report-bindings a.out
>>>> [pc03:03729] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
>>>> 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: 
>>>> [BB/BB/../../../..][BB/BB/../../../..]
>>>> [pid 3731] starting up!
>>>> 0 completed MPI_Init
>>>> Parent [pid 3731] about to spawn!
>>>> [pc03:03729] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
>>>> 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: 
>>>> [BB/BB/../../../..][BB/BB/../../../..]
>>>> [pc03:03729] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
>>>> 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: 
>>>> [BB/BB/../../../..][BB/BB/../../../..]
>>>> [pc03:03729] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
>>>> 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: 
>>>> [BB/BB/../../../..][BB/BB/../../../..]
>>>> [pid 3733] starting up!
>>>> [pid 3734] starting up!
>>>> [pid 3735] starting up!
>>>> Parent done with spawn
>>>> Parent sending message to child
>>>> 2 completed MPI_Init
>>>> Hello from the child 2 of 3 on host pc03 pid 3735
>>>> 1 completed MPI_Init
>>>> Hello from the child 1 of 3 on host pc03 pid 3734
>>>> 0 completed MPI_Init
>>>> Hello from the child 0 of 3 on host pc03 pid 3733
>>>> Child 0 received msg: 38
>>>> Child 0 disconnected
>>>> Child 2 disconnected
>>>> Child 1 disconnected
>>>> Parent disconnected
>>>> 3731: exiting
>>>> 3734: exiting
>>>> 3733: exiting
>>>> 3735: exiting
>>>>
>>>>
>>>> pc03 spawn 121 mpiexec -np 1 --host pc03 --slot-list 0:0-1,1:0-1 
>>>> --report-bindings a.out
>>>> [pc03:03744] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
>>>> 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: 
>>>> [BB/BB/../../../..][BB/BB/../../../..]
>>>> [pid 3746] starting up!
>>>> 0 completed MPI_Init
>>>> Parent [pid 3746] about to spawn!
>>>> [pc03:03744] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
>>>> 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: 
>>>> [BB/BB/../../../..][BB/BB/../../../..]
>>>> [pc03:03744] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
>>>> 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: 
>>>> [BB/BB/../../../..][BB/BB/../../../..]
>>>> [pid 3748] starting up!
>>>> [pid 3749] starting up!
>>>> [pc03:03749] *** Process received signal ***
>>>> [pc03:03749] Signal: Segmentation fault (11)
>>>> [pc03:03749] Signal code: Address not mapped (1)
>>>> [pc03:03749] Failing at address: 0x8
>>>> [pc03:03749] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7fe6f0d1f870]
>>>> [pc03:03749] [ 1] 
>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7fe6f0f825b0]
>>>> [pc03:03749] [ 2] 
>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7fe6f0f61b08]
>>>> [pc03:03749] [ 3] 
>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7fe6f0f87e8a]
>>>> [pc03:03749] [ 4] 
>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x1a0)[0x7fe6f0fc42ae]
>>>> [pc03:03749] [ 5] a.out[0x400d0c]
>>>> [pc03:03749] [ 6] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fe6f0989b05]
>>>> [pc03:03749] [ 7] a.out[0x400bf9]
>>>> [pc03:03749] *** End of error message ***
>>>> --------------------------------------------------------------------------
>>>> mpiexec noticed that process rank 2 with PID 3749 on node pc03 exited on 
>>>> signal 11 (Segmentation fault).
>>>> --------------------------------------------------------------------------
>>>> pc03 spawn 122
>>>>
>>>>
>>>>
>>>> Kind regards
>>>>
>>>> Siegmar
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 05/24/16 15:44, Ralph Castain wrote:
>>>>>
>>>>>> On May 24, 2016, at 6:21 AM, Siegmar Gross 
>>>>>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>>>>>
>>>>>> Hi Ralph,
>>>>>>
>>>>>> I copy the relevant lines to this place, so that it is easier to see what
>>>>>> happens. "a.out" is your program, which I compiled with mpicc.
>>>>>>
>>>>>>>> loki spawn 153 ompi_info | grep -e "OPAL repo revision:" -e "C compiler
>>>>>>>> absolute:"
>>>>>>>>    OPAL repo revision: v1.10.2-201-gd23dda8
>>>>>>>>  C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc
>>>>>>>> loki spawn 154 mpicc simple_spawn.c
>>>>>>
>>>>>>>> loki spawn 155 mpiexec -np 1 a.out
>>>>>>>> [pid 24008] starting up!
>>>>>>>> 0 completed MPI_Init
>>>>>> ...
>>>>>>
>>>>>> "mpiexec -np 1 a.out" works.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I don?t know what ?a.out? is, but it looks like there is some memory
>>>>>>> corruption there.
>>>>>>
>>>>>> "a.out" is still your program. I get the same error on different
>>>>>> machines, so that it is not very likely, that the (hardware) memory
>>>>>> is corrupted.
>>>>>>
>>>>>>
>>>>>>>> loki spawn 156 mpiexec -np 1 --host loki --slot-list 0-5 a.out
>>>>>>>> [pid 24102] starting up!
>>>>>>>> 0 completed MPI_Init
>>>>>>>> Parent [pid 24102] about to spawn!
>>>>>>>> [pid 24104] starting up!
>>>>>>>> [pid 24105] starting up!
>>>>>>>> [loki:24105] *** Process received signal ***
>>>>>>>> [loki:24105] Signal: Segmentation fault (11)
>>>>>>>> [loki:24105] Signal code: Address not mapped (1)
>>>>>> ...
>>>>>>
>>>>>> "mpiexec -np 1 --host loki --slot-list 0-5 a.out" breaks with a 
>>>>>> segmentation
>>>>>> faUlt. Can I do something, so that you can find out, what happens?
>>>>>
>>>>> I honestly have no idea - perhaps Gilles can help, as I have no access to 
>>>>> that kind of environment. We aren?t seeing such problems elsewhere, so it 
>>>>> is likely something local.
>>>>>
>>>>>>
>>>>>>
>>>>>> Kind regards
>>>>>>
>>>>>> Siegmar
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/24/16 15:07, Ralph Castain wrote:
>>>>>>>
>>>>>>>> On May 24, 2016, at 4:19 AM, Siegmar Gross
>>>>>>>> <siegmar.gr...@informatik.hs-fulda.de
>>>>>>>> <mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote:
>>>>>>>>
>>>>>>>> Hi Ralph,
>>>>>>>>
>>>>>>>> thank you very much for your answer and your example program.
>>>>>>>>
>>>>>>>> On 05/23/16 17:45, Ralph Castain wrote:
>>>>>>>>> I cannot replicate the problem - both scenarios work fine for me. I?m 
>>>>>>>>> not
>>>>>>>>> convinced your test code is correct, however, as you call Comm_free 
>>>>>>>>> the
>>>>>>>>> inter-communicator but didn?t call Comm_disconnect. Checkout the 
>>>>>>>>> attached
>>>>>>>>> for a correct code and see if it works for you.
>>>>>>>>
>>>>>>>> I thought that I only need MPI_Comm_Disconnect, if I would have 
>>>>>>>> established a
>>>>>>>> connection with MPI_Comm_connect before. The man page for 
>>>>>>>> MPI_Comm_free states
>>>>>>>>
>>>>>>>> "This  operation marks the communicator object for deallocation. The
>>>>>>>> handle is set to MPI_COMM_NULL. Any pending operations that use this
>>>>>>>> communicator will complete normally; the object is actually 
>>>>>>>> deallocated only
>>>>>>>> if there are no other active references to it.".
>>>>>>>>
>>>>>>>> The man page for MPI_Comm_disconnect states
>>>>>>>>
>>>>>>>> "MPI_Comm_disconnect waits for all pending communication on comm to 
>>>>>>>> complete
>>>>>>>> internally, deallocates the communicator object, and sets the handle to
>>>>>>>> MPI_COMM_NULL. It is  a  collective operation.".
>>>>>>>>
>>>>>>>> I don't see a difference for my spawned processes, because both 
>>>>>>>> functions will
>>>>>>>> "wait" until all pending operations have finished, before the object 
>>>>>>>> will be
>>>>>>>> destroyed. Nevertheless, perhaps my small example program worked all 
>>>>>>>> the years
>>>>>>>> by chance.
>>>>>>>>
>>>>>>>> However, I don't understand, why my program works with
>>>>>>>> "mpiexec -np 1 --host loki,loki,loki,loki,loki spawn_master" and 
>>>>>>>> breaks with
>>>>>>>> "mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master". You 
>>>>>>>> are right,
>>>>>>>> my slot-list is equivalent to "-bind-to none". I could also have used
>>>>>>>> "mpiexec -np 1 --host loki --oversubscribe spawn_master" which works 
>>>>>>>> as well.
>>>>>>>
>>>>>>> Well, you are only giving us one slot when you specify "-host loki?, 
>>>>>>> and then
>>>>>>> you are trying to launch multiple processes into it. The ?slot-list? 
>>>>>>> option only
>>>>>>> tells us what cpus to bind each process to - it doesn?t allocate 
>>>>>>> process slots.
>>>>>>> So you have to tell us how many processes are allowed to run on this 
>>>>>>> node.
>>>>>>>
>>>>>>>>
>>>>>>>> The program breaks with "There are not enough slots available in the 
>>>>>>>> system
>>>>>>>> to satisfy ...", if I only use "--host loki" or different host names,
>>>>>>>> without mentioning five host names, using "slot-list", or 
>>>>>>>> "oversubscribe",
>>>>>>>> Unfortunately "--host <host name>:<number of slots>" isn't available 
>>>>>>>> for
>>>>>>>> openmpi-1.10.3rc2 to specify the number of available slots.
>>>>>>>
>>>>>>> Correct - we did not backport the new syntax
>>>>>>>
>>>>>>>>
>>>>>>>> Your program behaves the same way as mine, so that MPI_Comm_disconnect
>>>>>>>> will not solve my problem. I had to modify your program in a 
>>>>>>>> negligible way
>>>>>>>> to get it compiled.
>>>>>>>>
>>>>>>>> loki spawn 153 ompi_info | grep -e "OPAL repo revision:" -e "C 
>>>>>>>> compiler absolute:"
>>>>>>>>  OPAL repo revision: v1.10.2-201-gd23dda8
>>>>>>>>  C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc
>>>>>>>> loki spawn 154 mpicc simple_spawn.c
>>>>>>>> loki spawn 155 mpiexec -np 1 a.out
>>>>>>>> [pid 24008] starting up!
>>>>>>>> 0 completed MPI_Init
>>>>>>>> Parent [pid 24008] about to spawn!
>>>>>>>> [pid 24010] starting up!
>>>>>>>> [pid 24011] starting up!
>>>>>>>> [pid 24012] starting up!
>>>>>>>> Parent done with spawn
>>>>>>>> Parent sending message to child
>>>>>>>> 0 completed MPI_Init
>>>>>>>> Hello from the child 0 of 3 on host loki pid 24010
>>>>>>>> 1 completed MPI_Init
>>>>>>>> Hello from the child 1 of 3 on host loki pid 24011
>>>>>>>> 2 completed MPI_Init
>>>>>>>> Hello from the child 2 of 3 on host loki pid 24012
>>>>>>>> Child 0 received msg: 38
>>>>>>>> Child 0 disconnected
>>>>>>>> Child 1 disconnected
>>>>>>>> Child 2 disconnected
>>>>>>>> Parent disconnected
>>>>>>>> 24012: exiting
>>>>>>>> 24010: exiting
>>>>>>>> 24008: exiting
>>>>>>>> 24011: exiting
>>>>>>>>
>>>>>>>>
>>>>>>>> Is something wrong with my command line? I didn't use slot-list 
>>>>>>>> before, so
>>>>>>>> that I'm not sure, if I use it in the intended way.
>>>>>>>
>>>>>>> I don?t know what ?a.out? is, but it looks like there is some memory 
>>>>>>> corruption
>>>>>>> there.
>>>>>>>
>>>>>>>>
>>>>>>>> loki spawn 156 mpiexec -np 1 --host loki --slot-list 0-5 a.out
>>>>>>>> [pid 24102] starting up!
>>>>>>>> 0 completed MPI_Init
>>>>>>>> Parent [pid 24102] about to spawn!
>>>>>>>> [pid 24104] starting up!
>>>>>>>> [pid 24105] starting up!
>>>>>>>> [loki:24105] *** Process received signal ***
>>>>>>>> [loki:24105] Signal: Segmentation fault (11)
>>>>>>>> [loki:24105] Signal code: Address not mapped (1)
>>>>>>>> [loki:24105] Failing at address: 0x8
>>>>>>>> [loki:24105] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f39aa76f870]
>>>>>>>> [loki:24105] [ 1]
>>>>>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7f39aa9d25b0]
>>>>>>>> [loki:24105] [ 2]
>>>>>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7f39aa9b1b08]
>>>>>>>> [loki:24105] [ 3] *** An error occurred in MPI_Init
>>>>>>>> *** on a NULL communicator
>>>>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
>>>>>>>> abort,
>>>>>>>> ***    and potentially your MPI job)
>>>>>>>> [loki:24104] Local abort before MPI_INIT completed successfully; not 
>>>>>>>> able to
>>>>>>>> aggregate error messages, and not able to guarantee that all other 
>>>>>>>> processes
>>>>>>>> were killed!
>>>>>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7f39aa9d7e8a]
>>>>>>>> [loki:24105] [ 4]
>>>>>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x1a0)[0x7f39aaa142ae]
>>>>>>>> [loki:24105] [ 5] a.out[0x400d0c]
>>>>>>>> [loki:24105] [ 6] 
>>>>>>>> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f39aa3d9b05]
>>>>>>>> [loki:24105] [ 7] a.out[0x400bf9]
>>>>>>>> [loki:24105] *** End of error message ***
>>>>>>>> -------------------------------------------------------
>>>>>>>> Child job 2 terminated normally, but 1 process returned
>>>>>>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>>>>>>> -------------------------------------------------------
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> mpiexec detected that one or more processes exited with non-zero 
>>>>>>>> status, thus
>>>>>>>> causing
>>>>>>>> the job to be terminated. The first process to do so was:
>>>>>>>>
>>>>>>>> Process name: [[49560,2],0]
>>>>>>>> Exit code:    1
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> loki spawn 157
>>>>>>>>
>>>>>>>>
>>>>>>>> Hopefully, you will find out what happens. Please let me know, if I can
>>>>>>>> help you in any way.
>>>>>>>>
>>>>>>>> Kind regards
>>>>>>>>
>>>>>>>> Siegmar
>>>>>>>>
>>>>>>>>
>>>>>>>>> FWIW: I don?t know how many cores you have on your sockets, but if you
>>>>>>>>> have 6 cores/socket, then your slot-list is equivalent to ??bind-to 
>>>>>>>>> none?
>>>>>>>>> as the slot-list applies to every process being launched
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On May 23, 2016, at 6:26 AM, Siegmar Gross
>>>>>>>>>> <siegmar.gr...@informatik.hs-fulda.de
>>>>>>>>>> <mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I installed openmpi-1.10.3rc2 on my "SUSE Linux Enterprise Server
>>>>>>>>>> 12 (x86_64)" with Sun C 5.13  and gcc-6.1.0. Unfortunately I get
>>>>>>>>>> a segmentation fault for "--slot-list" for one of my small programs.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> loki spawn 119 ompi_info | grep -e "OPAL repo revision:" -e "C 
>>>>>>>>>> compiler
>>>>>>>>>> absolute:"
>>>>>>>>>>  OPAL repo revision: v1.10.2-201-gd23dda8
>>>>>>>>>> C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> loki spawn 120 mpiexec -np 1 --host loki,loki,loki,loki,loki 
>>>>>>>>>> spawn_master
>>>>>>>>>>
>>>>>>>>>> Parent process 0 running on loki
>>>>>>>>>> I create 4 slave processes
>>>>>>>>>>
>>>>>>>>>> Parent process 0: tasks in MPI_COMM_WORLD:                    1
>>>>>>>>>>              tasks in COMM_CHILD_PROCESSES local group:  1
>>>>>>>>>>              tasks in COMM_CHILD_PROCESSES remote group: 4
>>>>>>>>>>
>>>>>>>>>> Slave process 0 of 4 running on loki
>>>>>>>>>> Slave process 1 of 4 running on loki
>>>>>>>>>> Slave process 2 of 4 running on loki
>>>>>>>>>> spawn_slave 2: argv[0]: spawn_slave
>>>>>>>>>> Slave process 3 of 4 running on loki
>>>>>>>>>> spawn_slave 0: argv[0]: spawn_slave
>>>>>>>>>> spawn_slave 1: argv[0]: spawn_slave
>>>>>>>>>> spawn_slave 3: argv[0]: spawn_slave
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> loki spawn 121 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 
>>>>>>>>>> spawn_master
>>>>>>>>>>
>>>>>>>>>> Parent process 0 running on loki
>>>>>>>>>> I create 4 slave processes
>>>>>>>>>>
>>>>>>>>>> [loki:17326] *** Process received signal ***
>>>>>>>>>> [loki:17326] Signal: Segmentation fault (11)
>>>>>>>>>> [loki:17326] Signal code: Address not mapped (1)
>>>>>>>>>> [loki:17326] Failing at address: 0x8
>>>>>>>>>> [loki:17326] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f4e469b3870]
>>>>>>>>>> [loki:17326] [ 1] *** An error occurred in MPI_Init
>>>>>>>>>> *** on a NULL communicator
>>>>>>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
>>>>>>>>>> abort,
>>>>>>>>>> ***    and potentially your MPI job)
>>>>>>>>>> [loki:17324] Local abort before MPI_INIT completed successfully; not 
>>>>>>>>>> able to
>>>>>>>>>> aggregate error messages, and not able to guarantee that all other 
>>>>>>>>>> processes
>>>>>>>>>> were killed!
>>>>>>>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7f4e46c165b0]
>>>>>>>>>> [loki:17326] [ 2]
>>>>>>>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7f4e46bf5b08]
>>>>>>>>>> [loki:17326] [ 3] *** An error occurred in MPI_Init
>>>>>>>>>> *** on a NULL communicator
>>>>>>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
>>>>>>>>>> abort,
>>>>>>>>>> ***    and potentially your MPI job)
>>>>>>>>>> [loki:17325] Local abort before MPI_INIT completed successfully; not 
>>>>>>>>>> able to
>>>>>>>>>> aggregate error messages, and not able to guarantee that all other 
>>>>>>>>>> processes
>>>>>>>>>> were killed!
>>>>>>>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7f4e46c1be8a]
>>>>>>>>>> [loki:17326] [ 4]
>>>>>>>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x180)[0x7f4e46c5828e]
>>>>>>>>>> [loki:17326] [ 5] spawn_slave[0x40097e]
>>>>>>>>>> [loki:17326] [ 6] 
>>>>>>>>>> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f4e4661db05]
>>>>>>>>>> [loki:17326] [ 7] spawn_slave[0x400a54]
>>>>>>>>>> [loki:17326] *** End of error message ***
>>>>>>>>>> -------------------------------------------------------
>>>>>>>>>> Child job 2 terminated normally, but 1 process returned
>>>>>>>>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>>>>>>>>> -------------------------------------------------------
>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>> mpiexec detected that one or more processes exited with non-zero 
>>>>>>>>>> status,
>>>>>>>>>> thus causing
>>>>>>>>>> the job to be terminated. The first process to do so was:
>>>>>>>>>>
>>>>>>>>>> Process name: [[56340,2],0]
>>>>>>>>>> Exit code:    1
>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>> loki spawn 122
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I would be grateful, if somebody can fix the problem. Thank you
>>>>>>>>>> very much for any help in advance.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Kind regards
>>>>>>>>>>
>>>>>>>>>> Siegmar
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>>>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> Link to this post:
>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2016/05/29281.php
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> Link to this
>>>>>>>>> post: http://www.open-mpi.org/community/lists/users/2016/05/29284.php
>>>>>>>>>
>>>>>>>> <simple_spawn_modified.c>_______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post: 
>>>>>>>> http://www.open-mpi.org/community/lists/users/2016/05/29300.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/users/2016/05/29301.php
>>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2016/05/29304.php
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2016/05/29307.php
>>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2016/05/29308.php
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2016/05/29309.php
>>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/05/29315.php
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29316.php
>



------------------------------

Message: 3
List-Post: users@lists.open-mpi.org
Date: Fri, 27 May 2016 09:14:42 +0000
From: "Marco D'Amico" <marco.damic...@gmail.com>
To: us...@open-mpi.org
Subject: [OMPI users] OpenMPI virtualization aware
Message-ID:
    <CABi-01XH+vdi2egBD=knen_cyxpecg0j-+3rtvnfnc6mtd+...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi I'm recently investigating in Virtualization used in HPC field, and I
found out that MVAPICH has a "Virtualization aware" version, that permit to
overcome the big latencies problems of using a Virtualization environment
for HPC.

My question is if there is any similar efforts in OpenMPI, since I would
eventually contribute in it.

Best regards,
Marco D'Amico
-------------- next part --------------
HTML attachment scrubbed and removed

------------------------------

Message: 4
List-Post: users@lists.open-mpi.org
Date: Fri, 27 May 2016 06:45:05 -0700
From: Ralph Castain <r...@open-mpi.org>
To: Open MPI Users <us...@open-mpi.org>
Subject: Re: [OMPI users] OpenMPI virtualization aware
Message-ID: <bbeb8e66-40b0-4688-8284-2113252e1...@open-mpi.org>
Content-Type: text/plain; charset="utf-8"

Hi Marco

OMPI has integrated support for the Singularity container:

http://singularity.lbl.gov/index.html <http://singularity.lbl.gov/index.html>

https://groups.google.com/a/lbl.gov/forum/#!forum/singularity 
<https://groups.google.com/a/lbl.gov/forum/#!forum/singularity>

It is in OMPI master now, and an early version is in 2.0 - the full integration 
will be in 2.1. Singularity is undergoing changes for its 2.0 release (so we?ll 
need to do some updating of the OMPI integration), and there is still plenty 
that can be done to further optimize its integration - so contributions would 
be welcome!

Ralph



> On May 27, 2016, at 2:14 AM, Marco D'Amico <marco.damic...@gmail.com> wrote:
> 
> Hi I'm recently investigating in Virtualization used in HPC field, and I 
> found out that MVAPICH has a "Virtualization aware" version, that permit to 
> overcome the big latencies problems of using a Virtualization environment for 
> HPC.
> 
> My question is if there is any similar efforts in OpenMPI, since I would 
> eventually contribute in it.
> 
> Best regards,
> Marco D'Amico
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29320.php

-------------- next part --------------
HTML attachment scrubbed and removed

------------------------------

Subject: Digest Footer

_______________________________________________
users mailing list
us...@open-mpi.org
https://www.open-mpi.org/mailman/listinfo.cgi/users

------------------------------

End of users Digest, Vol 3514, Issue 1
**************************************


Reply via email to