I don’t see that LD_PRELOAD showing up on the ssh path, Andy > /usr/bin/ssh mic1 PATH=/home/ariebs/mic/mpi-nightly/bin:$PATH ; export > PATH ; LD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$LD_LIBRARY_PATH ; > export LD_LIBRARY_PATH ; > DYLD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$DYLD_LIBRARY_PATH ; > export DYLD_LIBRARY_PATH ; /home/ariebs/mic/mpi-nightly/bin/orted > --hnp-topo-sig 0N:1S:0L3:61L2:61L1:61C:244H:k1om -mca ess "env" -mca > orte_ess_jobid "1901330432" -mca orte_ess_vpid 1 -mca orte_ess_num_procs "2" > -mca orte_hnp_uri > "1901330432.0;usock;tcp://16.113.180.125,192.0.0.121:34249;ud://2359370.86.1" > --tree-spawn --mca spml "yoda" --mca btl "sm,self,tcp" --mca plm_base_verbose > "5" --mca memheap_base_verbose "100" -mca plm "rsh" -mca rmaps_ppr_n_pernode > “2"
The -x option doesn’t impact the ssh line - it only forwards the value to the application’s environment. You’ll need to include the path in your LD_LIBRARY_PATH > On Apr 13, 2015, at 1:06 PM, Andy Riebs <andy.ri...@hp.com> wrote: > > Progress! I can run my trivial program on the local PHI, but not the other > PHI, on the system. Here are the interesting parts: > > A pretty good recipe with last night's nightly master: > > $ ./configure --prefix=/home/ariebs/mic/mpi-nightly CC="icc -mmic" CXX="icpc > -mmic" \ > --build=x86_64-unknown-linux-gnu --host=x86_64-k1om-linux \ > AR=x86_64-k1om-linux-ar RANLIB=x86_64-k1om-linux-ranlib > LD=x86_64-k1om-linux-ld \ > --enable-mpirun-prefix-by-default --disable-io-romio > --disable-mpi-fortran \ > --enable-orterun-prefix-by-default \ > --enable-debug > $ make && make install > $ shmemrun -x SHMEM_SYMMETRIC_HEAP_SIZE=1M -H localhost -N 2 --mca spml yoda > --mca btl sm,self,tcp $PWD/mic.out > Hello World from process 0 of 2 > Hello World from process 1 of 2 > $ shmemrun -x SHMEM_SYMMETRIC_HEAP_SIZE=1M -H localhost -N 2 --mca spml yoda > --mca btl openib,sm,self $PWD/mic.out > Hello World from process 0 of 2 > Hello World from process 1 of 2 > $ > > However, I can't seem to cross the fabric. I can ssh freely back and forth > between mic0 and mic1. However, running the next 2 tests from mic0, it > certainly seems like the second one should work, too: > > $ shmemrun -x SHMEM_SYMMETRIC_HEAP_SIZE=1M -H mic0 -N 2 --mca spml yoda --mca > btl sm,self,tcp $PWD/mic.out > Hello World from process 0 of 2 > Hello World from process 1 of 2 > $ shmemrun -x SHMEM_SYMMETRIC_HEAP_SIZE=1M -H mic1 -N 2 --mca spml yoda --mca > btl sm,self,tcp $PWD/mic.out > /home/ariebs/mic/mpi-nightly/bin/orted: error while loading shared libraries: > libimf.so: cannot open shared object file: No such file or directory > -------------------------------------------------------------------------- > ORTE was unable to reliably start one or more daemons. > This usually is caused by: > > * not finding the required libraries and/or binaries on > one or more nodes. Please check your PATH and LD_LIBRARY_PATH > settings, or configure OMPI with --enable-orterun-prefix-by-default > > * lack of authority to execute on one or more specified nodes. > Please verify your allocation and authorities. > > * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). > Please check with your sys admin to determine the correct location to use. > > * compilation of the orted with dynamic libraries when static are required > (e.g., on Cray). Please check your configure cmd line and consider using > one of the contrib/platform definitions for your system type. > > * an inability to create a connection back to mpirun due to a > lack of common network interfaces and/or no route found between > them. Please check network connectivity (including firewalls > and network routing requirements). > ... > $ > > (Note that I get the same results with "--mca btl openib,sm,self"....) > > > $ ssh mic1 file > /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libimf.so > /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libimf.so: ELF 64-bit > LSB shared object, Intel Xeon Phi coprocessor (k1om), version 1 (SYSV), > dynamically linked, not stripped > $ shmemrun -x > LD_PRELOAD=/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libimf.so > -H mic1 -N 2 --mca spml yoda --mca btl sm,self,tcp $PWD/mic.out > /home/ariebs/mic/mpi-nightly/bin/orted: error while loading shared libraries: > libimf.so: cannot open shared object file: No such file or directory > -------------------------------------------------------------------------- > ORTE was unable to reliably start one or more daemons. > This usually is caused by: > > * not finding the required libraries and/or binaries on > one or more nodes. Please check your PATH and LD_LIBRARY_PATH > settings, or configure OMPI with --enable-orterun-prefix-by-default > > * lack of authority to execute on one or more specified nodes. > Please verify your allocation and authorities. > > * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). > Please check with your sys admin to determine the correct location to use. > > * compilation of the orted with dynamic libraries when static are required > (e.g., on Cray). Please check your configure cmd line and consider using > one of the contrib/platform definitions for your system type. > > * an inability to create a connection back to mpirun due to a > lack of common network interfaces and/or no route found between > them. Please check network connectivity (including firewalls > and network routing requirements). > > Following here is > - IB information > - Running the failing case with lots of debugging information. (As you might > imagine, I've tried 17 ways from Sunday to try to ensure that libimf.so is > found.) > > $ ibv_devices > device node GUID > ------ ---------------- > mlx4_0 24be05ffffa57160 > scif0 4c79bafffe4402b6 > $ ibv_devinfo > hca_id: mlx4_0 > transport: InfiniBand (0) > fw_ver: 2.11.1250 > node_guid: 24be:05ff:ffa5:7160 > sys_image_guid: 24be:05ff:ffa5:7163 > vendor_id: 0x02c9 > vendor_part_id: 4099 > hw_ver: 0x0 > phys_port_cnt: 2 > port: 1 > state: PORT_ACTIVE (4) > max_mtu: 2048 (4) > active_mtu: 2048 (4) > sm_lid: 8 > port_lid: 86 > port_lmc: 0x00 > link_layer: InfiniBand > > port: 2 > state: PORT_DOWN (1) > max_mtu: 2048 (4) > active_mtu: 2048 (4) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > link_layer: InfiniBand > > hca_id: scif0 > transport: SCIF (2) > fw_ver: 0.0.1 > node_guid: 4c79:baff:fe44:02b6 > sys_image_guid: 4c79:baff:fe44:02b6 > vendor_id: 0x8086 > vendor_part_id: 0 > hw_ver: 0x1 > phys_port_cnt: 1 > port: 1 > state: PORT_ACTIVE (4) > max_mtu: 4096 (5) > active_mtu: 4096 (5) > sm_lid: 1 > port_lid: 1001 > port_lmc: 0x00 > link_layer: SCIF > > $ shmemrun -x > LD_PRELOAD=/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libimf.so > -H mic1 -N 2 --mca spml yoda --mca btl sm,self,tcp --mca plm_base_verbose 5 > --mca memheap_base_verbose 100 $PWD/mic.out > [atl1-01-mic0:191024] mca:base:select:( plm) Querying component [rsh] > [atl1-01-mic0:191024] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh > path NULL > [atl1-01-mic0:191024] mca:base:select:( plm) Query of component [rsh] set > priority to 10 > [atl1-01-mic0:191024] mca:base:select:( plm) Querying component [isolated] > [atl1-01-mic0:191024] mca:base:select:( plm) Query of component [isolated] > set priority to 0 > [atl1-01-mic0:191024] mca:base:select:( plm) Querying component [slurm] > [atl1-01-mic0:191024] mca:base:select:( plm) Skipping component [slurm]. > Query failed to return a module > [atl1-01-mic0:191024] mca:base:select:( plm) Selected component [rsh] > [atl1-01-mic0:191024] plm:base:set_hnp_name: initial bias 191024 nodename > hash 4121194178 > [atl1-01-mic0:191024] plm:base:set_hnp_name: final jobfam 29012 > [atl1-01-mic0:191024] [[29012,0],0] plm:rsh_setup on agent ssh : rsh path NULL > [atl1-01-mic0:191024] [[29012,0],0] plm:base:receive start comm > [atl1-01-mic0:191024] [[29012,0],0] plm:base:setup_job > [atl1-01-mic0:191024] [[29012,0],0] plm:base:setup_vm > [atl1-01-mic0:191024] [[29012,0],0] plm:base:setup_vm creating map > [atl1-01-mic0:191024] [[29012,0],0] setup:vm: working unmanaged allocation > [atl1-01-mic0:191024] [[29012,0],0] using dash_host > [atl1-01-mic0:191024] [[29012,0],0] checking node mic1 > [atl1-01-mic0:191024] [[29012,0],0] plm:base:setup_vm add new daemon > [[29012,0],1] > [atl1-01-mic0:191024] [[29012,0],0] plm:base:setup_vm assigning new daemon > [[29012,0],1] to node mic1 > [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: launching vm > [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: local shell: 0 (bash) > [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: assuming same remote shell as > local shell > [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: remote shell: 0 (bash) > [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: final template argv: > /usr/bin/ssh <template> > PATH=/home/ariebs/mic/mpi-nightly/bin:$PATH ; export PATH ; > LD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$LD_LIBRARY_PATH ; export > LD_LIBRARY_PATH ; > DYLD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$DYLD_LIBRARY_PATH ; > export DYLD_LIBRARY_PATH ; /home/ariebs/mic/mpi-nightly/bin/orted > --hnp-topo-sig 0N:1S:0L3:61L2:61L1:61C:244H:k1om -mca ess "env" -mca > orte_ess_jobid "1901330432" -mca orte_ess_vpid "<template>" -mca > orte_ess_num_procs "2" -mca orte_hnp_uri > "1901330432.0;usock;tcp://16.113.180.125,192.0.0.121:34249;ud://2359370.86.1" > --tree-spawn --mca spml "yoda" --mca btl "sm,self,tcp" --mca plm_base_verbose > "5" --mca memheap_base_verbose "100" -mca plm "rsh" -mca rmaps_ppr_n_pernode > "2" > [atl1-01-mic0:191024] [[29012,0],0] plm:rsh:launch daemon 0 not a child of > mine > [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: adding node mic1 to launch list > [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: activating launch event > [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: recording launch of daemon > [[29012,0],1] > [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: executing: (/usr/bin/ssh) > [/usr/bin/ssh mic1 PATH=/home/ariebs/mic/mpi-nightly/bin:$PATH ; export > PATH ; LD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$LD_LIBRARY_PATH ; > export LD_LIBRARY_PATH ; > DYLD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$DYLD_LIBRARY_PATH ; > export DYLD_LIBRARY_PATH ; /home/ariebs/mic/mpi-nightly/bin/orted > --hnp-topo-sig 0N:1S:0L3:61L2:61L1:61C:244H:k1om -mca ess "env" -mca > orte_ess_jobid "1901330432" -mca orte_ess_vpid 1 -mca orte_ess_num_procs "2" > -mca orte_hnp_uri > "1901330432.0;usock;tcp://16.113.180.125,192.0.0.121:34249;ud://2359370.86.1" > --tree-spawn --mca spml "yoda" --mca btl "sm,self,tcp" --mca plm_base_verbose > "5" --mca memheap_base_verbose "100" -mca plm "rsh" -mca rmaps_ppr_n_pernode > "2"] > /home/ariebs/mic/mpi-nightly/bin/orted: error while loading shared libraries: > libimf.so: cannot open shared object file: No such file or directory > [atl1-01-mic0:191024] [[29012,0],0] daemon 1 failed with status 127 > [atl1-01-mic0:191024] [[29012,0],0] plm:base:orted_cmd sending orted_exit > commands > -------------------------------------------------------------------------- > ORTE was unable to reliably start one or more daemons. > This usually is caused by: > > * not finding the required libraries and/or binaries on > one or more nodes. Please check your PATH and LD_LIBRARY_PATH > settings, or configure OMPI with --enable-orterun-prefix-by-default > > * lack of authority to execute on one or more specified nodes. > Please verify your allocation and authorities. > > * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). > Please check with your sys admin to determine the correct location to use. > > * compilation of the orted with dynamic libraries when static are required > (e.g., on Cray). Please check your configure cmd line and consider using > one of the contrib/platform definitions for your system type. > > * an inability to create a connection back to mpirun due to a > lack of common network interfaces and/or no route found between > them. Please check network connectivity (including firewalls > and network routing requirements). > -------------------------------------------------------------------------- > [atl1-01-mic0:191024] [[29012,0],0] plm:base:receive stop comm > > > > On 04/13/2015 08:50 AM, Andy Riebs wrote: >> Hi Ralph, >> >> Here are the results with last night's "master" nightly, >> openmpi-dev-1487-g9c6d452.tar.bz2, and adding the memheap_base_verbose >> option (yes, it looks like the "ERROR_LOG" problem has gone away): >> >> $ cat /proc/sys/kernel/shmmax >> 33554432 >> $ cat /proc/sys/kernel/shmall >> 2097152 >> $ cat /proc/sys/kernel/shmmni >> 4096 >> $ export SHMEM_SYMMETRIC_HEAP=1M >> $ shmemrun -H localhost -N 2 --mca sshmem mmap --mca plm_base_verbose 5 >> --mca memheap_base_verbose 100 $PWD/mic.out >> [atl1-01-mic0:190439] mca:base:select:( plm) Querying component [rsh] >> [atl1-01-mic0:190439] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh >> path NULL >> [atl1-01-mic0:190439] mca:base:select:( plm) Query of component [rsh] set >> priority to 10 >> [atl1-01-mic0:190439] mca:base:select:( plm) Querying component [isolated] >> [atl1-01-mic0:190439] mca:base:select:( plm) Query of component [isolated] >> set priority to 0 >> [atl1-01-mic0:190439] mca:base:select:( plm) Querying component [slurm] >> [atl1-01-mic0:190439] mca:base:select:( plm) Skipping component [slurm]. >> Query failed to return a module >> [atl1-01-mic0:190439] mca:base:select:( plm) Selected component [rsh] >> [atl1-01-mic0:190439] plm:base:set_hnp_name: initial bias 190439 nodename >> hash 4121194178 >> [atl1-01-mic0:190439] plm:base:set_hnp_name: final jobfam 31875 >> [atl1-01-mic0:190439] [[31875,0],0] plm:rsh_setup on agent ssh : rsh path >> NULL >> [atl1-01-mic0:190439] [[31875,0],0] plm:base:receive start comm >> [atl1-01-mic0:190439] [[31875,0],0] plm:base:setup_job >> [atl1-01-mic0:190439] [[31875,0],0] plm:base:setup_vm >> [atl1-01-mic0:190439] [[31875,0],0] plm:base:setup_vm creating map >> [atl1-01-mic0:190439] [[31875,0],0] setup:vm: working unmanaged allocation >> [atl1-01-mic0:190439] [[31875,0],0] using dash_host >> [atl1-01-mic0:190439] [[31875,0],0] checking node atl1-01-mic0 >> [atl1-01-mic0:190439] [[31875,0],0] ignoring myself >> [atl1-01-mic0:190439] [[31875,0],0] plm:base:setup_vm only HNP in allocation >> [atl1-01-mic0:190439] [[31875,0],0] complete_setup on job [31875,1] >> [atl1-01-mic0:190439] [[31875,0],0] plm:base:launch_apps for job [31875,1] >> [atl1-01-mic0:190439] [[31875,0],0] plm:base:launch wiring up iof for job >> [31875,1] >> [atl1-01-mic0:190439] [[31875,0],0] plm:base:launch [31875,1] registered >> [atl1-01-mic0:190439] [[31875,0],0] plm:base:launch job [31875,1] is not a >> dynamic spawn >> [atl1-01-mic0:190441] mca: base: components_register: registering memheap >> components >> [atl1-01-mic0:190441] mca: base: components_register: found loaded component >> buddy >> [atl1-01-mic0:190441] mca: base: components_register: component buddy has no >> register or open function >> [atl1-01-mic0:190442] mca: base: components_register: registering memheap >> components >> [atl1-01-mic0:190442] mca: base: components_register: found loaded component >> buddy >> [atl1-01-mic0:190442] mca: base: components_register: component buddy has no >> register or open function >> [atl1-01-mic0:190442] mca: base: components_register: found loaded component >> ptmalloc >> [atl1-01-mic0:190442] mca: base: components_register: component ptmalloc has >> no register or open function >> [atl1-01-mic0:190441] mca: base: components_register: found loaded component >> ptmalloc >> [atl1-01-mic0:190441] mca: base: components_register: component ptmalloc has >> no register or open function >> [atl1-01-mic0:190441] mca: base: components_open: opening memheap components >> [atl1-01-mic0:190441] mca: base: components_open: found loaded component >> buddy >> [atl1-01-mic0:190441] mca: base: components_open: component buddy open >> function successful >> [atl1-01-mic0:190441] mca: base: components_open: found loaded component >> ptmalloc >> [atl1-01-mic0:190441] mca: base: components_open: component ptmalloc open >> function successful >> [atl1-01-mic0:190442] mca: base: components_open: opening memheap components >> [atl1-01-mic0:190442] mca: base: components_open: found loaded component >> buddy >> [atl1-01-mic0:190442] mca: base: components_open: component buddy open >> function successful >> [atl1-01-mic0:190442] mca: base: components_open: found loaded component >> ptmalloc >> [atl1-01-mic0:190442] mca: base: components_open: component ptmalloc open >> function successful >> [atl1-01-mic0:190442] base/memheap_base_alloc.c:38 - >> mca_memheap_base_alloc_init() Memheap alloc memory: 270532608 byte(s), 1 >> segments by method: 1 >> [atl1-01-mic0:190441] base/memheap_base_alloc.c:38 - >> mca_memheap_base_alloc_init() Memheap alloc memory: 270532608 byte(s), 1 >> segments by method: 1 >> [atl1-01-mic0:190442] base/memheap_base_static.c:205 - _load_segments() add: >> 00600000-00601000 rw-p 00000000 00:11 6029314 >> /home/ariebs/bench/hello/mic.out >> [atl1-01-mic0:190441] base/memheap_base_static.c:205 - _load_segments() add: >> 00600000-00601000 rw-p 00000000 00:11 6029314 >> /home/ariebs/bench/hello/mic.out >> [atl1-01-mic0:190442] base/memheap_base_static.c:75 - >> mca_memheap_base_static_init() Memheap static memory: 3824 byte(s), 2 >> segments >> [atl1-01-mic0:190442] base/memheap_base_register.c:39 - >> mca_memheap_base_reg() register seg#00: 0x0xff000000 - 0x0x10f200000 >> 270532608 bytes type=0x1 id=0xFFFFFFFF >> [atl1-01-mic0:190441] base/memheap_base_static.c:75 - >> mca_memheap_base_static_init() Memheap static memory: 3824 byte(s), 2 >> segments >> [atl1-01-mic0:190441] base/memheap_base_register.c:39 - >> mca_memheap_base_reg() register seg#00: 0x0xff000000 - 0x0x10f200000 >> 270532608 bytes type=0x1 id=0xFFFFFFFF >> [atl1-01-mic0:190442] Error base/memheap_base_register.c:130 - >> _reg_segment() Failed to register segment >> [atl1-01-mic0:190441] Error base/memheap_base_register.c:130 - >> _reg_segment() Failed to register segment >> [atl1-01-mic0:190442] Error: pshmem_init.c:61 - shmem_init() SHMEM failed to >> initialize - aborting >> [atl1-01-mic0:190441] Error: pshmem_init.c:61 - shmem_init() SHMEM failed to >> initialize - aborting >> -------------------------------------------------------------------------- >> It looks like SHMEM_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during SHMEM_INIT; some of which are due to configuration or environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open SHMEM >> developer): >> >> mca_memheap_base_select() failed >> --> Returned "Error" (-1) instead of "Success" (0) >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> SHMEM_ABORT was invoked on rank 0 (pid 190441, host=atl1-01-mic0) with >> errorcode -1. >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> A SHMEM process is aborting at a time when it cannot guarantee that all >> of its peer processes in the job will be killed properly. You should >> double check that everything has shut down cleanly. >> >> Local host: atl1-01-mic0 >> PID: 190441 >> -------------------------------------------------------------------------- >> ------------------------------------------------------- >> Primary job terminated normally, but 1 process returned >> a non-zero exit code.. Per user-direction, the job has been aborted. >> ------------------------------------------------------- >> [atl1-01-mic0:190439] [[31875,0],0] plm:base:orted_cmd sending orted_exit >> commands >> -------------------------------------------------------------------------- >> shmemrun detected that one or more processes exited with non-zero status, >> thus causing >> the job to be terminated. The first process to do so was: >> >> Process name: [[31875,1],0] >> Exit code: 255 >> -------------------------------------------------------------------------- >> [atl1-01-mic0:190439] 1 more process has sent help message >> help-shmem-runtime.txt / shmem_init:startup:internal-failure >> [atl1-01-mic0:190439] Set MCA parameter "orte_base_help_aggregate" to 0 to >> see all help / error messages >> [atl1-01-mic0:190439] 1 more process has sent help message >> help-shmem-api.txt / shmem-abort >> [atl1-01-mic0:190439] 1 more process has sent help message >> help-shmem-runtime.txt / oshmem shmem abort:cannot guarantee all killed >> [atl1-01-mic0:190439] [[31875,0],0] plm:base:receive stop comm >> >> >> >> On 04/12/2015 03:09 PM, Ralph Castain wrote: >>> Sorry about that - I hadn’t brought it over to the 1.8 branch yet. I’ve >>> done so now, which means the ERROR_LOG shouldn’t show up any more. It won’t >>> fix the memheap problem, though. >>> >>> You might try adding “--mca memheap_base_verbose 100” to your cmd line so >>> we can see why none of the memheap components are being selected. >>> >>> >>>> On Apr 12, 2015, at 11:30 AM, Andy Riebs <andy.ri...@hp.com >>>> <mailto:andy.ri...@hp.com>> wrote: >>>> >>>> Hi Ralph, >>>> >>>> Here's the output with openmpi-v1.8.4-202-gc2da6a5.tar.bz2 >>>> <https://www.open-mpi.org/nightly/v1.8/openmpi-v1.8.4-202-gc2da6a5.tar.bz2>: >>>> >>>> $ shmemrun -H localhost -N 2 --mca sshmem mmap --mca plm_base_verbose 5 >>>> $PWD/mic.out >>>> [atl1-01-mic0:190189] mca:base:select:( plm) Querying component [rsh] >>>> [atl1-01-mic0:190189] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : >>>> rsh path NULL >>>> [atl1-01-mic0:190189] mca:base:select:( plm) Query of component [rsh] set >>>> priority to 10 >>>> [atl1-01-mic0:190189] mca:base:select:( plm) Querying component [isolated] >>>> [atl1-01-mic0:190189] mca:base:select:( plm) Query of component >>>> [isolated] set priority to 0 >>>> [atl1-01-mic0:190189] mca:base:select:( plm) Querying component [slurm] >>>> [atl1-01-mic0:190189] mca:base:select:( plm) Skipping component [slurm]. >>>> Query failed to return a module >>>> [atl1-01-mic0:190189] mca:base:select:( plm) Selected component [rsh] >>>> [atl1-01-mic0:190189] plm:base:set_hnp_name: initial bias 190189 nodename >>>> hash 4121194178 >>>> [atl1-01-mic0:190189] plm:base:set_hnp_name: final jobfam 32137 >>>> [atl1-01-mic0:190189] [[32137,0],0] plm:rsh_setup on agent ssh : rsh path >>>> NULL >>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:receive start comm >>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:setup_job >>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:setup_vm >>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:setup_vm creating map >>>> [atl1-01-mic0:190189] [[32137,0],0] setup:vm: working unmanaged allocation >>>> [atl1-01-mic0:190189] [[32137,0],0] using dash_host >>>> [atl1-01-mic0:190189] [[32137,0],0] checking node atl1-01-mic0 >>>> [atl1-01-mic0:190189] [[32137,0],0] ignoring myself >>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:setup_vm only HNP in >>>> allocation >>>> [atl1-01-mic0:190189] [[32137,0],0] complete_setup on job [32137,1] >>>> [atl1-01-mic0:190189] [[32137,0],0] ORTE_ERROR_LOG: Not found in file >>>> base/plm_base_launch_support.c at line 440 >>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:launch_apps for job [32137,1] >>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:launch wiring up iof for job >>>> [32137,1] >>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:launch [32137,1] registered >>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:launch job [32137,1] is not a >>>> dynamic spawn >>>> -------------------------------------------------------------------------- >>>> It looks like SHMEM_INIT failed for some reason; your parallel process is >>>> likely to abort. There are many reasons that a parallel process can >>>> fail during SHMEM_INIT; some of which are due to configuration or >>>> environment >>>> problems. This failure appears to be an internal failure; here's some >>>> additional information (which may only be relevant to an Open SHMEM >>>> developer): >>>> >>>> mca_memheap_base_select() failed >>>> --> Returned "Error" (-1) instead of "Success" (0) >>>> -------------------------------------------------------------------------- >>>> [atl1-01-mic0:190191] Error: pshmem_init.c:61 - shmem_init() SHMEM failed >>>> to initialize - aborting >>>> [atl1-01-mic0:190192] Error: pshmem_init.c:61 - shmem_init() SHMEM failed >>>> to initialize - aborting >>>> -------------------------------------------------------------------------- >>>> SHMEM_ABORT was invoked on rank 1 (pid 190192, host=atl1-01-mic0) with >>>> errorcode -1. >>>> -------------------------------------------------------------------------- >>>> -------------------------------------------------------------------------- >>>> A SHMEM process is aborting at a time when it cannot guarantee that all >>>> of its peer processes in the job will be killed properly. You should >>>> double check that everything has shut down cleanly. >>>> >>>> Local host: atl1-01-mic0 >>>> PID: 190192 >>>> -------------------------------------------------------------------------- >>>> ------------------------------------------------------- >>>> Primary job terminated normally, but 1 process returned >>>> a non-zero exit code.. Per user-direction, the job has been aborted. >>>> ------------------------------------------------------- >>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:orted_cmd sending orted_exit >>>> commands >>>> -------------------------------------------------------------------------- >>>> shmemrun detected that one or more processes exited with non-zero status, >>>> thus causing >>>> the job to be terminated. The first process to do so was: >>>> >>>> Process name: [[32137,1],0] >>>> Exit code: 255 >>>> -------------------------------------------------------------------------- >>>> [atl1-01-mic0:190189] 1 more process has sent help message >>>> help-shmem-runtime.txt / shmem_init:startup:internal-failure >>>> [atl1-01-mic0:190189] Set MCA parameter "orte_base_help_aggregate" to 0 to >>>> see all help / error messages >>>> [atl1-01-mic0:190189] 1 more process has sent help message >>>> help-shmem-api.txt / shmem-abort >>>> [atl1-01-mic0:190189] 1 more process has sent help message >>>> help-shmem-runtime.txt / oshmem shmem abort:cannot guarantee all killed >>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:receive stop comm >>>> >>>> >>>> On 04/11/2015 07:41 PM, Ralph Castain wrote: >>>>> Got it - thanks. I fixed that ERROR_LOG issue (I think- please verify). I >>>>> suspect the memheap issue relates to something else, but I probably need >>>>> to let the OSHMEM folks comment on it >>>>> >>>>> >>>>>> On Apr 11, 2015, at 9:52 AM, Andy Riebs <andy.ri...@hp.com >>>>>> <mailto:andy.ri...@hp.com>> wrote: >>>>>> >>>>>> Everything is built on the Xeon side, with the icc "-mmic" switch. I >>>>>> then ssh into one of the PHIs, and run shmemrun from there. >>>>>> >>>>>> >>>>>> On 04/11/2015 12:00 PM, Ralph Castain wrote: >>>>>>> Let me try to understand the setup a little better. Are you running >>>>>>> shmemrun on the PHI itself? Or is it running on the host processor, and >>>>>>> you are trying to spawn a process onto the Phi? >>>>>>> >>>>>>> >>>>>>>> On Apr 11, 2015, at 7:55 AM, Andy Riebs <andy.ri...@hp.com >>>>>>>> <mailto:andy.ri...@hp.com>> wrote: >>>>>>>> >>>>>>>> Hi Ralph, >>>>>>>> >>>>>>>> Yes, this is attempting to get OSHMEM to run on the Phi. >>>>>>>> >>>>>>>> I grabbed openmpi-dev-1484-g033418f.tar.bz2 and configured it with >>>>>>>> >>>>>>>> $ ./configure --prefix=/home/ariebs/mic/mpi-nightly CC=icc -mmic >>>>>>>> CXX=icpc -mmic \ >>>>>>>> --build=x86_64-unknown-linux-gnu --host=x86_64-k1om-linux \ >>>>>>>> AR=x86_64-k1om-linux-ar RANLIB=x86_64-k1om-linux-ranlib >>>>>>>> LD=x86_64-k1om-linux-ld \ >>>>>>>> --enable-mpirun-prefix-by-default --disable-io-romio >>>>>>>> --disable-mpi-fortran \ >>>>>>>> --enable-debug >>>>>>>> --enable-mca-no-build=btl-usnic,btl-openib,common-verbs,oob-ud >>>>>>>> >>>>>>>> (Note that I had to add "oob-ud" to the "--enable-mca-no-build" >>>>>>>> option, as the build complained that mca oob/ud needed mca >>>>>>>> common-verbs.) >>>>>>>> >>>>>>>> With that configuration, here is what I am seeing now... >>>>>>>> >>>>>>>> $ export SHMEM_SYMMETRIC_HEAP_SIZE=1G >>>>>>>> $ shmemrun -H localhost -N 2 --mca sshmem mmap --mca plm_base_verbose >>>>>>>> 5 $PWD/mic.out >>>>>>>> [atl1-01-mic0:189895] mca:base:select:( plm) Querying component [rsh] >>>>>>>> [atl1-01-mic0:189895] [[INVALID],INVALID] plm:rsh_lookup on agent ssh >>>>>>>> : rsh path NULL >>>>>>>> [atl1-01-mic0:189895] mca:base:select:( plm) Query of component [rsh] >>>>>>>> set priority to 10 >>>>>>>> [atl1-01-mic0:189895] mca:base:select:( plm) Querying component >>>>>>>> [isolated] >>>>>>>> [atl1-01-mic0:189895] mca:base:select:( plm) Query of component >>>>>>>> [isolated] set priority to 0 >>>>>>>> [atl1-01-mic0:189895] mca:base:select:( plm) Querying component >>>>>>>> [slurm] >>>>>>>> [atl1-01-mic0:189895] mca:base:select:( plm) Skipping component >>>>>>>> [slurm]. Query failed to return a module >>>>>>>> [atl1-01-mic0:189895] mca:base:select:( plm) Selected component [rsh] >>>>>>>> [atl1-01-mic0:189895] plm:base:set_hnp_name: initial bias 189895 >>>>>>>> nodename hash 4121194178 >>>>>>>> [atl1-01-mic0:189895] plm:base:set_hnp_name: final jobfam 32419 >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:rsh_setup on agent ssh : rsh >>>>>>>> path NULL >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:receive start comm >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:setup_job >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:setup_vm >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:setup_vm creating map >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] setup:vm: working unmanaged >>>>>>>> allocation >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] using dash_host >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] checking node atl1-01-mic0 >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] ignoring myself >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:setup_vm only HNP in >>>>>>>> allocation >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] complete_setup on job [32419,1] >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] ORTE_ERROR_LOG: Not found in file >>>>>>>> base/plm_base_launch_support.c at line 440 >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:launch_apps for job >>>>>>>> [32419,1] >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:launch wiring up iof for >>>>>>>> job [32419,1] >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:launch [32419,1] >>>>>>>> registered >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:launch job [32419,1] is >>>>>>>> not a dynamic spawn >>>>>>>> [atl1-01-mic0:189899] Error: pshmem_init.c:61 - shmem_init() SHMEM >>>>>>>> failed to initialize - aborting >>>>>>>> [atl1-01-mic0:189898] Error: pshmem_init.c:61 - shmem_init() SHMEM >>>>>>>> failed to initialize - aborting >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> It looks like SHMEM_INIT failed for some reason; your parallel process >>>>>>>> is >>>>>>>> likely to abort. There are many reasons that a parallel process can >>>>>>>> fail during SHMEM_INIT; some of which are due to configuration or >>>>>>>> environment >>>>>>>> problems. This failure appears to be an internal failure; here's some >>>>>>>> additional information (which may only be relevant to an Open SHMEM >>>>>>>> developer): >>>>>>>> >>>>>>>> mca_memheap_base_select() failed >>>>>>>> --> Returned "Error" (-1) instead of "Success" (0) >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> SHMEM_ABORT was invoked on rank 1 (pid 189899, host=atl1-01-mic0) with >>>>>>>> errorcode -1. >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> A SHMEM process is aborting at a time when it cannot guarantee that all >>>>>>>> of its peer processes in the job will be killed properly. You should >>>>>>>> double check that everything has shut down cleanly. >>>>>>>> >>>>>>>> Local host: atl1-01-mic0 >>>>>>>> PID: 189899 >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> ------------------------------------------------------- >>>>>>>> Primary job terminated normally, but 1 process returned >>>>>>>> a non-zero exit code.. Per user-direction, the job has been aborted. >>>>>>>> ------------------------------------------------------- >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:orted_cmd sending >>>>>>>> orted_exit commands >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> shmemrun detected that one or more processes exited with non-zero >>>>>>>> status, thus causing >>>>>>>> the job to be terminated. The first process to do so was: >>>>>>>> >>>>>>>> Process name: [[32419,1],1] >>>>>>>> Exit code: 255 >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> [atl1-01-mic0:189895] 1 more process has sent help message >>>>>>>> help-shmem-runtime.txt / shmem_init:startup:internal-failure >>>>>>>> [atl1-01-mic0:189895] Set MCA parameter "orte_base_help_aggregate" to >>>>>>>> 0 to see all help / error messages >>>>>>>> [atl1-01-mic0:189895] 1 more process has sent help message >>>>>>>> help-shmem-api.txt / shmem-abort >>>>>>>> [atl1-01-mic0:189895] 1 more process has sent help message >>>>>>>> help-shmem-runtime.txt / oshmem shmem abort:cannot guarantee all killed >>>>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:receive stop comm >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 04/10/2015 06:37 PM, Ralph Castain wrote: >>>>>>>>> Andy - could you please try the current 1.8.5 nightly tarball and see >>>>>>>>> if it helps? The error log indicates that it is failing to get the >>>>>>>>> topology from some daemon, I�m assuming the one on the Phi? >>>>>>>>> >>>>>>>>> You might also add �enable-debug to that configure line and then put >>>>>>>>> -mca plm_base_verbose on the shmemrun cmd to get more help >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Apr 10, 2015, at 11:55 AM, Andy Riebs <andy.ri...@hp.com >>>>>>>>>> <mailto:andy.ri...@hp.com>> wrote: >>>>>>>>>> >>>>>>>>>> Summary: MPI jobs work fine, SHMEM jobs work just often enough to be >>>>>>>>>> tantalizing, on an Intel Xeon Phi/MIC system. >>>>>>>>>> >>>>>>>>>> Longer version >>>>>>>>>> >>>>>>>>>> Thanks to the excellent write-up last June >>>>>>>>>> (<https://www.open-mpi.org/community/lists/users/2014/06/24711.php> >>>>>>>>>> <https://www.open-mpi.org/community/lists/users/2014/06/24711.php>), >>>>>>>>>> I have been able to build a version of Open MPI for the Xeon Phi >>>>>>>>>> coprocessor that runs MPI jobs on the Phi coprocessor with no >>>>>>>>>> problem, but not SHMEM jobs. Just at the point where I was about to >>>>>>>>>> document the problems I was having with SHMEM, my trivial SHMEM job >>>>>>>>>> worked. And then failed when I tried to run it again, immediately >>>>>>>>>> afterwards. I have a feeling I may be in uncharted territory here. >>>>>>>>>> >>>>>>>>>> Environment >>>>>>>>>> RHEL 6.5 >>>>>>>>>> Intel Composer XE 2015 >>>>>>>>>> Xeon Phi/MIC >>>>>>>>>> ---------------- >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Configuration >>>>>>>>>> >>>>>>>>>> $ export PATH=/usr/linux-k1om-4.7/bin/:$PATH >>>>>>>>>> $ source /opt/intel/15.0/composer_xe_2015/bin/compilervars.sh intel64 >>>>>>>>>> $ ./configure --prefix=/home/ariebs/mic/mpi \ >>>>>>>>>> CC="icc -mmic" CXX="icpc -mmic" \ >>>>>>>>>> --build=x86_64-unknown-linux-gnu --host=x86_64-k1om-linux \ >>>>>>>>>> AR=x86_64-k1om-linux-ar RANLIB=x86_64-k1om-linux-ranlib \ >>>>>>>>>> LD=x86_64-k1om-linux-ld \ >>>>>>>>>> --enable-mpirun-prefix-by-default --disable-io-romio \ >>>>>>>>>> --disable-vt --disable-mpi-fortran \ >>>>>>>>>> --enable-mca-no-build=btl-usnic,btl-openib,common-verbs >>>>>>>>>> $ make >>>>>>>>>> $ make install >>>>>>>>>> >>>>>>>>>> ---------------- >>>>>>>>>> >>>>>>>>>> Test program >>>>>>>>>> >>>>>>>>>> #include <stdio.h> >>>>>>>>>> #include <stdlib.h> >>>>>>>>>> #include <shmem.h> >>>>>>>>>> int main(int argc, char **argv) >>>>>>>>>> { >>>>>>>>>> int me, num_pe; >>>>>>>>>> shmem_init(); >>>>>>>>>> num_pe = num_pes(); >>>>>>>>>> me = my_pe(); >>>>>>>>>> printf("Hello World from process %ld of %ld\n", me, num_pe); >>>>>>>>>> exit(0); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> ---------------- >>>>>>>>>> >>>>>>>>>> Building the program >>>>>>>>>> >>>>>>>>>> export PATH=/home/ariebs/mic/mpi/bin:$PATH >>>>>>>>>> export PATH=/usr/linux-k1om-4.7/bin/:$PATH >>>>>>>>>> source /opt/intel/15.0/composer_xe_2015/bin/compilervars.sh intel64 >>>>>>>>>> export >>>>>>>>>> LD_LIBRARY_PATH=/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic:$LD_LIBRARY_PATH >>>>>>>>>> >>>>>>>>>> icc -mmic -std=gnu99 -I/home/ariebs/mic/mpi/include -pthread \ >>>>>>>>>> -Wl,-rpath -Wl,/home/ariebs/mic/mpi/lib >>>>>>>>>> -Wl,--enable-new-dtags \ >>>>>>>>>> -L/home/ariebs/mic/mpi/lib -loshmem -lmpi -lopen-rte >>>>>>>>>> -lopen-pal \ >>>>>>>>>> -lm -ldl -lutil \ >>>>>>>>>> -Wl,-rpath >>>>>>>>>> -Wl,/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic \ >>>>>>>>>> -L/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic \ >>>>>>>>>> -o mic.out shmem_hello.c >>>>>>>>>> >>>>>>>>>> ---------------- >>>>>>>>>> >>>>>>>>>> Running the program >>>>>>>>>> >>>>>>>>>> (Note that the program had been consistently failing. Then, when I >>>>>>>>>> logged back into the system to capture the results, it worked once, >>>>>>>>>> and then immediately failed when I tried again, as shown below. >>>>>>>>>> Logging in and out isn't sufficient to correct the problem. Overall, >>>>>>>>>> I think I had 3 successful runs in 30-40 attempts.) >>>>>>>>>> >>>>>>>>>> $ shmemrun -H localhost -N 2 --mca sshmem mmap ./mic.out >>>>>>>>>> [atl1-01-mic0:189372] [[30936,0],0] ORTE_ERROR_LOG: Not found in >>>>>>>>>> file base/plm_base_launch_support.c at line 426 >>>>>>>>>> Hello World from process 0 of 2 >>>>>>>>>> Hello World from process 1 of 2 >>>>>>>>>> $ shmemrun -H localhost -N 2 --mca sshmem mmap ./mic.out >>>>>>>>>> [atl1-01-mic0:189381] [[30881,0],0] ORTE_ERROR_LOG: Not found in >>>>>>>>>> file base/plm_base_launch_support.c at line 426 >>>>>>>>>> [atl1-01-mic0:189383] Error: pshmem_init.c:61 - shmem_init() SHMEM >>>>>>>>>> failed to initialize - aborting >>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>> It looks like SHMEM_INIT failed for some reason; your parallel >>>>>>>>>> process is >>>>>>>>>> likely to abort. There are many reasons that a parallel process can >>>>>>>>>> fail during SHMEM_INIT; some of which are due to configuration or >>>>>>>>>> environment >>>>>>>>>> problems. This failure appears to be an internal failure; here's >>>>>>>>>> some >>>>>>>>>> additional information (which may only be relevant to an Open SHMEM >>>>>>>>>> developer): >>>>>>>>>> >>>>>>>>>> mca_memheap_base_select() failed >>>>>>>>>> --> Returned "Error" (-1) instead of "Success" (0) >>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>> SHMEM_ABORT was invoked on rank 0 (pid 189383, host=atl1-01-mic0) >>>>>>>>>> with errorcode -1. >>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>> A SHMEM process is aborting at a time when it cannot guarantee that >>>>>>>>>> all >>>>>>>>>> of its peer processes in the job will be killed properly. You should >>>>>>>>>> double check that everything has shut down cleanly. >>>>>>>>>> >>>>>>>>>> Local host: atl1-01-mic0 >>>>>>>>>> PID: 189383 >>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>> ------------------------------------------------------- >>>>>>>>>> Primary job terminated normally, but 1 process returned >>>>>>>>>> a non-zero exit code.. Per user-direction, the job has been aborted. >>>>>>>>>> ------------------------------------------------------- >>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>> shmemrun detected that one or more processes exited with non-zero >>>>>>>>>> status, thus causing >>>>>>>>>> the job to be terminated. The first process to do so was: >>>>>>>>>> >>>>>>>>>> Process name: [[30881,1],0] >>>>>>>>>> Exit code: 255 >>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>> >>>>>>>>>> Any thoughts about where to go from here? >>>>>>>>>> >>>>>>>>>> Andy >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Andy Riebs >>>>>>>>>> Hewlett-Packard Company >>>>>>>>>> High Performance Computing >>>>>>>>>> +1 404 648 9024 >>>>>>>>>> My opinions are not necessarily those of HP >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>>>>> Link to this post: >>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26670.php >>>>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26670.php> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>>>> Link to this post: >>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26676.php >>>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26676.php> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26678.php >>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26678.php> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26679.php >>>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26679.php> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26680.php >>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26680.php> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/04/26682.php >>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26682.php> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/04/26683.php >>>> <http://www.open-mpi.org/community/lists/users/2015/04/26683.php> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/04/26684.php >>> <http://www.open-mpi.org/community/lists/users/2015/04/26684.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/04/26697.php