Re: [OMPI users] Several threads making progress - How to disable them

2016-08-04 Thread r...@open-mpi.org
Yep, there are indeed two progress threads running - and no, you cannot disable them. They are, however, “blocked” so they aren’t eating any cycles during normal operation unless an event that requires their attention wakes them up. So they shouldn’t interfere with your app. > On Aug 4, 2016,

Re: [OMPI users] OPENSHMEM ERROR with 2+ Distributed Machines

2016-08-12 Thread r...@open-mpi.org
Just as a suggestion: most of us are leery of opening Word attachments on mailing lists. I’d suggest sending this to us as plain text if you want us to read it. > On Aug 12, 2016, at 4:03 AM, Debendra Das wrote: > > I have installed OpenMPI-2.0.0 in 5 systems with IP addresses 172.16.5.29, >

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-12 Thread r...@open-mpi.org
IIRC, the rationale behind adding the check was that someone using SGE wanted to specify a custom launch agent, and we were overriding it with qrsh. However, the check is incorrect as that MCA param cannot be NULL. I have updated this on master - can you see if this fixes the problem for you? h

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-08-12 Thread r...@open-mpi.org
Sorry for the delay - I had to catchup on some other things before I could come back to checking this one. Took me awhile to track this down, but the change is in test for master: https://github.com/open-mpi/ompi/pull/1958 Once complete, I’ll set it up for inclusion in v2.0.1 Thanks for report

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-12 Thread r...@open-mpi.org
On Aug 12, 2016, at 12:15 PM, Reuti wrote: > >> >> Am 12.08.2016 um 16:52 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>: >> >> IIRC, the rationale behind adding the check was that someone using SGE >> wanted to specify a custom launch agent, and we

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-12 Thread r...@open-mpi.org
> On Aug 12, 2016, at 1:48 PM, Reuti wrote: > > > Am 12.08.2016 um 21:44 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>: > >> Don’t know about the toolchain issue - I use those same versions, and don’t >> have a problem. I’m on CentOS-7, so that might

Re: [OMPI users] Problems with mpirun in openmpi-1.8.1 and -2.0.0

2016-08-19 Thread r...@open-mpi.org
The rdma error sounds like something isn’t right with your machine’s Infiniband installation. The cross-version problem sounds like you installed both OMPI versions into the same location - did you do that?? If so, then that might be the root cause of both problems. You need to install them in

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-22 Thread r...@open-mpi.org
Hmmm...perhaps we can break this out a bit? The stdin will be going to your rank=0 proc. It sounds like you have some subsequent step that calls MPI_Bcast? Can you first verify that the input is being correctly delivered to rank=0? This will help us isolate if the problem is in the IO forwarding

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-22 Thread r...@open-mpi.org
0 > #8 0x005c5b5d in LAMMPS_NS::Input::file() () at ../input.cpp:203 > #9 0x005d4236 in main () at ../main.cpp:31 > > Thanks, > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <mailto:u

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-22 Thread r...@open-mpi.org
n look into the issue and fix it. > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> mailto:r...

Re: [OMPI users] OS X El Capitan 10.11.6 ld: symbol(s) not found for architecture x86_64

2016-08-23 Thread r...@open-mpi.org
I’m confused - you keep talking about MPICH, but the symbol you are looking for is from OMPI. You cannot mix the two MPI libraries - is that what you are trying to do? > On Aug 23, 2016, at 1:30 PM, Richard G French wrote: > > Thanks for the suggestion, Doug - but I can't seem to find the miss

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-23 Thread r...@open-mpi.org
; 402-472-6400 > From: users <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>> > Sent: Monday, August 22, 2016 10:23:42 PM > To: Open MPI Users > Subject: Re: [OMPI users] stdin issue with op

Re: [OMPI users] Using Open MPI with PBS Pro

2016-08-23 Thread r...@open-mpi.org
I’ve never heard of that, and cannot imagine what it has to do with the resource manager. Can you point to where you heard that one? FWIW: we don’t ship OMPI with anything in the default mca params file, so somebody must have put it in there for you. > On Aug 23, 2016, at 4:48 PM, Andy Riebs

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-23 Thread r...@open-mpi.org
-Jingchao > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>> &

Re: [OMPI users] Using Open MPI with PBS Pro

2016-08-24 Thread r...@open-mpi.org
/341> :-) > > In any case, thanks for the information about the default params file -- I > won't worry too much about modifying it then. > > Andy > > I > On 08/23/2016 08:08 PM, r...@open-mpi.org wrote: >> I’ve never heard of that, and cannot imagine what i

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-24 Thread r...@open-mpi.org
Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>> > Sent: Tuesday, August 23, 2016 8:14:48 PM &

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-24 Thread r...@open-mpi.org
s attached. > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>> > S

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-25 Thread r...@open-mpi.org
chao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>> > Sent: Wednesday, Augus

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-27 Thread r...@open-mpi.org
tter - this print statement will tell me what I need to know. Thanks! Ralph > On Aug 25, 2016, at 8:19 AM, Jeff Squyres (jsquyres) > wrote: > > The IOF fix PR for v2.0.1 was literally just merged a few minutes ago; it > wasn't in last night's tarball. > > >

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-29 Thread r...@open-mpi.org
Init > Rank 4 has cleared MPI_Init > Rank 8 has cleared MPI_Init > Rank 0 has cleared MPI_Init > Rank 6 has cleared MPI_Init > Rank 7 has cleared MPI_Init > Rank 14 has cleared MPI_Init > Rank 15 has cleared MPI_Init > Rank 16 has cleared MPI_Init > Rank 18 has cleared MPI_I

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
fd, ORTE_NAME_PRINT(dst_name)); > > /* don't do this if the dst vpid is invalid or the fd is negative! */ > if (ORTE_VPID_INVALID == dst_name->vpid || fd < 0) { > return ORTE_SUCCESS; > } > > /*OPAL_OUTPUT_VERBOSE((1, ort

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
o:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>> > Sent: Tuesday, August 30, 2016 12:56:33 PM > To: Open MPI Users > Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0 > >

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
; > Please see attached for the outputs. > > Thank you Ralph. I am willing to provide whatever information you need. > > From: users <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread r...@open-mpi.org
The usual cause of this problem is that the nodename in the machinefile is given as a00551, while Torque is assigning the node name as a00551.science.domain. Thus, mpirun thinks those are two separate nodes and winds up spawning an orted on its own node. You might try ensuring that your machine

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread r...@open-mpi.org
You aren’t looking in the right place - there is an “openmpi” directory underneath that one, and the mca_xxx libraries are down there > On Sep 7, 2016, at 7:43 AM, Oswin Krause > wrote: > > Hi Gilles, > > I do not have this library. Maybe this helps already... > > libmca_common_sm.so libmpi

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-08 Thread r...@open-mpi.org
s each are available and i request > -l nodes=3:ppn=1 > i guess this is a different scheduler configuration, and i cannot change that. > > Could you please have a look at this ? > > Cheers, > > Gilles > > On 9/7/2016 11:15 PM, r...@open-mpi.org wrote: >> T

Re: [OMPI users] Unable to mpirun from within torque

2016-09-08 Thread r...@open-mpi.org
If you are correctly analyzing things, then there would be an issue in the code. When we get an allocation from a resource manager, we set a flag indicating that it is “gospel” - i.e., that we do not directly sense the number of cores on a node and set the #slots equal to that value. Instead, we

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-08 Thread r...@open-mpi.org
I’m pruning this email thread so I can actually read the blasted thing :-) Guys: you are off in the wilderness chasing ghosts! Please stop. When I say that Torque uses an “ordered” file, I am _not_ saying that all the host entries of the same name have to be listed consecutively. I am saying tha

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread r...@open-mpi.org
Maybe I’m missing something, but “mpirun -n 1” doesn’t include the name of an application to execute. The error message prior to that error indicates that you have some cruft sitting in your tmpdir. You just need to clean it out - look for something that starts with “openmpi” > On Sep 22, 201

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread r...@open-mpi.org
t; Aborting. > ---------- > > and when I type "ls" the directory > "openmpi-sessions-501@Justins-MacBook-Pro-2_0" reappeared. Unless > there's a different directory I need to look for? > > On T

Re: [OMPI users] Openmpi 1.10.x, mpirun and Slurm 15.08 problem

2016-09-23 Thread r...@open-mpi.org
This isn’t an issue with the SLURM integration - this is the problem of our OOB not correctly picking the right subnet for connecting back to mpirun. In this specific case, you probably want -mca btl_tcp_if_include em4 -mca oob_tcp_if_include em4 since it is the em4 network that ties the comput

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread r...@open-mpi.org
Spawn definitely does not work with srun. I don’t recognize the name of the file that segfaulted - what is “ptl.c”? Is that in your manager program? > On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet > wrote: > > Hi, > > I do not expect spawn can work with direct launch (e.g. srun) > > Do y

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread r...@open-mpi.org
uess is that ptl.c comes from PSM lib ... > > Cheers, > > Gilles > > On Thursday, September 29, 2016, r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>> wrote: > Spawn definitely does not work with srun. I don’t recognize the name of the >

Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?

2016-10-03 Thread r...@open-mpi.org
FWIW: the socket option seems to work fine for me: $ mpirun -n 12 -map-by socket:pe=2 -host rhc001 --report-bindings hostname [rhc001:200408] MCW rank 1 bound to socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]]: [../../../../../../../../../../../..][BB/BB/../../../../../../../../../..] [rh

Re: [OMPI users] how to tell if pmi or pmi2 is being used?

2016-10-13 Thread r...@open-mpi.org
If you are using mpirun, then neither PMI1 or PMI2 are involved at all. ORTE has its own internal mechanism for handling wireup. > On Oct 13, 2016, at 10:43 AM, David Shrader wrote: > > Hello All, > > I'm using Open MPI 1.10.3 with Slurm and would like to ask how do I find out > if pmi1 or p

[OMPI users] Supercomputing 2016: Birds-of-a-Feather meetings

2016-10-24 Thread r...@open-mpi.org
Hello all This year, we will again be hosting Birds-of-a-Feather meetings for Open MPI and PMIx. Open MPI: Wed, Nov 16th, 5:15-7pm http://sc16.supercomputing.org/presentation/?id=bof103&sess=sess322 PMIx: Wed, Nov16th, 12

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-10-27 Thread r...@open-mpi.org
Hey Andy Is there a SLURM envar that would tell us the binding option from the srun cmd line? We automatically bind when direct launched due to user complaints of poor performance if we don’t. If the user specifies a binding option, then we detect that we were already bound and don’t do it. Ho

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-10-27 Thread r...@open-mpi.org
RM_CPU_BIND_VERBOSE=quiet > SLURM_CPU_BIND_TYPE=mask_cpu: > SLURM_CPU_BIND_LIST=0x,0x > SLURM_CPU_BIND=quiet,mask_cpu:0x,0x > SLURM_CPU_BIND_VERBOSE=quiet > SLURM_CPU_BIND_TYPE=mask_cpu: > SLURM_CPU_BIND_LIST=0x,0x > SLURM_CPU_BIND=quiet,mask_cpu:0x,0x22

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-10-27 Thread r...@open-mpi.org
urm configuration options that could conceivably > change the behavior from system to system: > SelectType = select/cons_res > SelectTypeParameters= CR_CPU > > > On 10/27/2016 01:17 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: >> And i

Re: [OMPI users] MCA compilation later

2016-10-28 Thread r...@open-mpi.org
You don’t need any of the hardware - you just need the headers. Things like libfabric and libibverbs are all publicly available, and so you can build all that support even if you cannot run it on your machine. Once your customer installs the binary, the various plugins will check for their requ

Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?

2016-10-28 Thread r...@open-mpi.org
FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the OMPI BoF meeting at SC’16, for those who can attend > On Oct 11, 2016, at 8:16 AM, Dave Love wrote: > > Wirawan Purwanto writes: > >> Instead of the scenario above, I was trying to get the MPI processes >> side-by-side

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-10-28 Thread r...@open-mpi.org
FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the OMPI BoF meeting at SC’16, for those who can attend. Will try to explain the rationale as well as the mechanics of the options > On Oct 11, 2016, at 8:09 AM, Dave Love wrote: > > Gilles Gouaillardet mailto:gil...@rist.or.

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-10-28 Thread r...@open-mpi.org
7;s our foot, and we have been doing a good job of shooting it. ;-) > > -- bennet > > > > > On Fri, Oct 28, 2016 at 7:18 PM, r...@open-mpi.org wrote: >> FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the >> OMPI BoF meeting at SC’16, for those who c

Re: [OMPI users] mpi4py+OpenMPI: Qs about submitting bugs and examples

2016-10-31 Thread r...@open-mpi.org
> On Oct 31, 2016, at 10:39 AM, Jason Maldonis wrote: > > Hello everyone, > > I am using mpi4py with OpenMPI for a simulation that uses dynamic resource > allocation via `mpi_spawn_multiple`. I've been working on this problem for > about 6 months now and I have some questions and potential b

Re: [OMPI users] MCA compilation later

2016-10-31 Thread r...@open-mpi.org
les? > > (I'm a bit surprised only header files are necessary. Shouldn't the plugin > require at least runtime linking with a low-level transport library?) > > -Sean > > -- > Sean Ahern > Computational Engineering International > 919-363-0883 > >

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-11-01 Thread r...@open-mpi.org
rs] Slurm binding not propagated to MPI jobs > > Hi Ralph, > > I haven't played around in this code, so I'll flip the question over to the > Slurm list, and report back here when I learn anything. > > Cheers > Andy > > On 10/27/2016 01:44 PM, r...@open-mpi

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
you mistyped the option - it is “--map-by node”. Note the space between “by” and “node” - you had typed it with a “-“ instead of a “space” > On Nov 4, 2016, at 4:28 AM, Mahesh Nanavalla > wrote: > > Hi all, > > I am using openmpi-1.10.3,using quad core processor(node). > > I am running 3 pr

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
file only.. > kindly help me. > > > On Fri, Nov 4, 2016 at 5:00 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>> wrote: > you mistyped the option - it is “--map-by node”. Note the space between “by” > and “no

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
pawn threads to the remaining processors. >> >> Just a thought, -- bennet >> >> >> >> >> >> On Fri, Nov 4, 2016 at 8:39 AM, Mahesh Nanavalla >> wrote: >>> s... >>> >>> Thanks for responding me. >>>

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-11-04 Thread r...@open-mpi.org
some Slurm prolog scripts to effect that. > > Thanks Ralph! > > On 11/01/2016 11:36 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: >> Ah crumby!! We already solved this on master, but it cannot be backported to >> the 1.10 series without considerable pain.

Re: [OMPI users] malloc related crash inside openmpi

2016-11-23 Thread r...@open-mpi.org
It looks like the library may not have been fully installed on that node - can you see if the prefix location is present, and that the LD_LIBRARY_PATH on that node is correctly set? The referenced component did not exist prior to the 2.0 series, so I’m betting that your LD_LIBRARY_PATH isn’t cor

Re: [OMPI users] malloc related crash inside openmpi

2016-11-24 Thread r...@open-mpi.org
at 2:31 PM, Noam Bernstein > wrote: > > >> On Nov 23, 2016, at 5:26 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> >> wrote: >> >> It looks like the library may not have been fully installed on that node - >> can you see if the prefix location

Re: [OMPI users] question about "--rank-by slot" behavior

2016-11-30 Thread r...@open-mpi.org
I think you have confused “slot” with a physical “core”. The two have absolutely nothing to do with each other. A “slot” is nothing more than a scheduling entry in which a process can be placed. So when you --rank-by slot, the ranks are assigned round-robin by scheduler entry - i.e., you assign

Re: [OMPI users] question about "--rank-by slot" behavior

2016-11-30 Thread r...@open-mpi.org
hing I changed in my > examples) and this results in ranks being assigned differently? > > Thanks again, > David > > On 11/30/2016 01:23 PM, r...@open-mpi.org wrote: >> I think you have confused “slot” with a physical “core”. The two have >> absolutely nothing to do

Re: [OMPI users] Signal propagation in 2.0.1

2016-12-01 Thread r...@open-mpi.org
Yeah, that’s a bug - we’ll have to address it Thanks Ralph > On Nov 28, 2016, at 9:29 AM, Noel Rycroft wrote: > > I'm seeing different behaviour between Open MPI 1.8.4 and 2.0.1 with regards > to signal propagation. > > With version 1.8.4 mpirun seems to propagate SIGTERM to the tasks it star

Re: [OMPI users] Signal propagation in 2.0.1

2016-12-02 Thread r...@open-mpi.org
Fix is on the way: https://github.com/open-mpi/ompi/pull/2498 <https://github.com/open-mpi/ompi/pull/2498> Thanks Ralph > On Dec 1, 2016, at 10:49 AM, r...@open-mpi.org wrote: > > Yeah, that’s a bug - we’ll have to address it > > Thanks > Ralph > >> On Nov 2

Re: [OMPI users] Abort/ Deadlock issue in allreduce

2016-12-07 Thread r...@open-mpi.org
Hi Christof Sorry if I missed this, but it sounds like you are saying that one of your procs abnormally terminates, and we are failing to kill the remaining job? Is that correct? If so, I just did some work that might relate to that problem that is pending in PR #2528: https://github.com/open-

Re: [OMPI users] device failed to appear .. Connection timed out

2016-12-08 Thread r...@open-mpi.org
Sounds like something didn’t quite get configured right, or maybe you have a library installed that isn’t quite setup correctly, or... Regardless, we generally advise building from source to avoid such problems. Is there some reason not to just do so? > On Dec 8, 2016, at 6:16 AM, Daniele Tarta

Re: [OMPI users] Abort/ Deadlock issue in allreduce

2016-12-08 Thread r...@open-mpi.org
debug build ?), so I include it at the very >>>> bottom just in case. >>>> >>>> Off-list Gilles Gouaillardet suggested to set breakpoints at exit, >>>> __exit etc. to try to catch signals. Would that be useful ? I need a >>>> moment to fi

[OMPI users] Release of OMPI v1.10.5

2016-12-19 Thread r...@open-mpi.org
The Open MPI Team, representing a consortium of research, academic, and industry partners, is pleased to announce the release of Open MPI version 1.10.5. v1.10.5 is a bug fix release that includes an important performance regression fix. All users are encouraged to upgrade to v1.10.5 when possi

Re: [OMPI users] OpenMPI + InfiniBand

2016-12-23 Thread r...@open-mpi.org
Also check to ensure you are using the same version of OMPI on all nodes - this message usually means that a different version was used on at least one node. > On Dec 23, 2016, at 1:58 AM, gil...@rist.or.jp wrote: > > Serguei, > > > this looks like a very different issue, orted cannot be rem

Re: [OMPI users] still segmentation fault with openmpi-2.0.2rc3 on Linux

2017-01-10 Thread r...@open-mpi.org
I think there is some relevant discussion here: https://github.com/open-mpi/ompi/issues/1569 It looks like Gilles had (at least at one point) a fix for master when enable-heterogeneous, but I don’t know if that was committed. > On Jan 9, 2017, at

Re: [OMPI users] Startup limited to 128 remote hosts in some situations?

2017-01-17 Thread r...@open-mpi.org
As I recall, the problem was that qrsh isn’t available on the backend compute nodes, and so we can’t use a tree for launch. If that isn’t true, then we can certainly adjust it. > On Jan 17, 2017, at 9:37 AM, Mark Dixon wrote: > > Hi, > > While commissioning a new cluster, I wanted to run HPL

Re: [OMPI users] Startup limited to 128 remote hosts in some situations?

2017-01-19 Thread r...@open-mpi.org
I’ll create a patch that you can try - if it works okay, we can commit it > On Jan 18, 2017, at 3:29 AM, William Hay wrote: > > On Tue, Jan 17, 2017 at 09:56:54AM -0800, r...@open-mpi.org wrote: >> As I recall, the problem was that qrsh isn???t available on the backend >>

Re: [OMPI users] Startup limited to 128 remote hosts in some situations?

2017-01-20 Thread r...@open-mpi.org
; On Jan 19, 2017, at 5:29 PM, r...@open-mpi.org wrote: > > I’ll create a patch that you can try - if it works okay, we can commit it > >> On Jan 18, 2017, at 3:29 AM, William Hay wrote: >> >> On Tue, Jan 17, 2017 at 09:56:54AM -0800, r...@open-mpi.org wrote: >>

Re: [OMPI users] MPI_Comm_spawn question

2017-01-31 Thread r...@open-mpi.org
What version of OMPI are you using? > On Jan 31, 2017, at 7:33 AM, elistrato...@info.sgu.ru wrote: > > Hi, > > I am trying to write trivial master-slave program. Master simply creates > slaves, sends them a string, they print it out and exit. Everything works > just fine, however, when I add a d

Re: [OMPI users] Performance Issues on SMP Workstation

2017-02-01 Thread r...@open-mpi.org
Simple test: replace your executable with “hostname”. If you see multiple hosts come out on your cluster, then you know why the performance is different. > On Feb 1, 2017, at 2:46 PM, Andy Witzig wrote: > > Honestly, I’m not exactly sure what scheme is being used. I am using the > default tem

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread r...@open-mpi.org
I do see a diff between 2.0.1 and 2.0.2 that might have a related impact. The way we handled the MCA param that specifies the launch agent (ssh, rsh, or whatever) was modified, and I don’t think the change is correct. It basically says that we don’t look for qrsh unless the MCA param has been ch

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread r...@open-mpi.org
/ompi/pull/1960/files> > > > Glenn > > On Fri, Feb 3, 2017 at 10:56 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>> wrote: > I do see a diff between 2.0.1 and 2.0.2 that might have a related impact. The > way we handled the MCA par

Re: [OMPI users] MPI_Comm_spawn question

2017-02-03 Thread r...@open-mpi.org
We know v2.0.1 has problems with comm_spawn, and so you may be encountering one of those. Regardless, there is indeed a timeout mechanism in there. It was added because people would execute a comm_spawn, and then would hang and eat up their entire allocation time for nothing. In v2.0.2, I see i

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-12 Thread r...@open-mpi.org
Yeah, I’ll fix it this week. The problem is that you can’t check the source as being default as the default is ssh - so the only way to get the current code to check for qrsh is to specify something other than the default ssh (it doesn’t matter what you specify - anything will get you past the e

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-13 Thread r...@open-mpi.org
clarified the logic in the OMPI master repo. However, I don’t know how long it will be before a 2.0.3 release is issued, so GridEngine users might want to locally fix things in the interim. > On Feb 12, 2017, at 1:52 PM, r...@open-mpi.org wrote: > > Yeah, I’ll fix it this week. The p

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org
Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - the logic is looking expressly for values > 1 as we hadn’t anticipated this use-case. I can make that change. I’m off to a workshop for the next day or so, but can probably do this on the plane. > On Feb 15, 2017, at

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org
Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - the logic is looking expressly for values > 1 as we hadn’t anticipated this use-case. I can make that change. I’m off to a workshop for the next day or so, but can probably do this on the plane. > On Feb 15, 2017, at

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org
> On Feb 15, 2017, at 5:45 AM, Mark Dixon wrote: > > On Wed, 15 Feb 2017, r...@open-mpi.org wrote: > >> Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - >> the logic is looking expressly for values > 1 as we hadn’t anticipated this

Re: [OMPI users] Specify the core binding when spawning a process

2017-02-15 Thread r...@open-mpi.org
Sorry for slow response - was away for awhile. What version of OMPI are you using? > On Feb 8, 2017, at 1:59 PM, Allan Ma wrote: > > Hello, > > I'm designing a program on a dual socket system that needs the parent process > and spawned child process to be at least running on (or bound to) th

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
Nothing immediate comes to mind - all sbatch does is create an allocation and then run your script in it. Perhaps your script is using a different “mpirun” command than when you type it interactively? > On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina > wrote: > > Hi, > > I am trying to us

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread r...@open-mpi.org
If we knew what line in that file was causing the compiler to barf, we could at least address it. There is probably something added in recent commits that is causing problems for the compiler. So checking to see what commit might be triggering the failure would be most helpful. > On Feb 15, 2

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
it yet :( So > I even cannot check if it works with OpenMPI 2.0.2. > > On 15 February 2017 at 16:04, Howard Pritchard <mailto:hpprit...@gmail.com>> wrote: > Hi Anastasia, > > Definitely check the mpirun when in batch environment but you may also want > to upgrade

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
> Jason > >> On Wed, Feb 15, 2017 at 1:09 PM, Anastasia Kruchinina >> wrote: >> Hi! >> >> I am doing like this: >> >> sbatch -N 2 -n 5 ./job.sh >> >> where job.sh is: >> >> #!/bin/bash -l >> module load openmpi/

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread r...@open-mpi.org
e-Linux.x86_64.64_cc/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.lo > > loki openmpi-master 148 find > openmpi-master-201702100209-51def91-Linux.x86_64.64_cc -name pmix_esh.lo > loki openmpi-master 149 > > Which files do you need? Which commands shall I run to get differenc

Re: [OMPI users] Is building with "--enable-mpi-thread-multiple" recommended?

2017-02-17 Thread r...@open-mpi.org
Depends on the version, but if you are using something in the v2.x range, you should be okay with just one installed version > On Feb 17, 2017, at 4:41 AM, Mark Dixon wrote: > > Hi, > > We have some users who would like to try out openmpi MPI_THREAD_MULTIPLE > support on our InfiniBand cluste

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-17 Thread r...@open-mpi.org
nwhile, feel free to manually apply the attached patch > > > > Cheers, > > > Gilles > > > On 2/16/2017 8:09 AM, r...@open-mpi.org wrote: >> I guess it was the next nightly tarball, but not next commit. However, it >> was almost certainly 7acef48 f

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-17 Thread r...@open-mpi.org
Mark - this is now available in master. Will look at what might be required to bring it to 2.0 > On Feb 15, 2017, at 5:49 AM, r...@open-mpi.org wrote: > > >> On Feb 15, 2017, at 5:45 AM, Mark Dixon wrote: >> >> On Wed, 15 Feb 2017, r...@open-mpi.org wrote: >&g

Re: [OMPI users] OpenMPI and Singularity

2017-02-17 Thread r...@open-mpi.org
The embedded Singularity support hasn’t made it into the OMPI 2.x release series yet, though OMPI will still work within a Singularity container anyway. Compatibility across the container boundary is always a problem, as your examples illustrate. If the system is using one OMPI version and the c

Re: [OMPI users] OpenMPI and Singularity

2017-02-17 Thread r...@open-mpi.org
d still run, > but it would fall back to non-verbs communication, so it would just be > commensurately slower. > > Let me know if I've garbled things. Otherwise, wish me luck, and have > a good weekend! > > Thanks, -- bennet > > > > On Fri, Feb 17, 2017 at 7:2

Re: [OMPI users] Is building with "--enable-mpi-thread-multiple" recommended?

2017-02-18 Thread r...@open-mpi.org
b 2017, r...@open-mpi.org wrote: > >> Depends on the version, but if you are using something in the v2.x range, >> you should be okay with just one installed version > > Thanks Ralph. > > How good is MPI_THREAD_MULTIPLE support these days and how far up the > wishlis

Re: [OMPI users] Is building with "--enable-mpi-thread-multiple" recommended?

2017-02-18 Thread r...@open-mpi.org
> exceptions across processes) > > So it is good to hear there is progress. > > On Feb 18, 2017 7:43 AM, "r...@open-mpi.org <mailto:r...@open-mpi.org>" > mailto:r...@open-mpi.org>> wrote: > We have been making a concerted effort to resolve outstanding issues a

Re: [OMPI users] OpenMPI and Singularity

2017-02-20 Thread r...@open-mpi.org
iagnostics you would like, I can try to > provide those. I will be gone starting Thu for a week. > > -- bennet > > > > > On Fri, Feb 17, 2017 at 11:20 PM, r...@open-mpi.org wrote: >> I -think- that is correct, but you may need the verbs library as well - I >&

Re: [OMPI users] Segmentation Fault when using OpenMPI 1.10.6 and PGI 17.1.0 on POWER8

2017-02-21 Thread r...@open-mpi.org
Can you provide a backtrace with line numbers from a debug build? We don’t get much testing with lsf, so it is quite possible there is a bug in there. > On Feb 21, 2017, at 7:39 PM, Hammond, Simon David (-EXP) > wrote: > > Hi OpenMPI Users, > > Has anyone successfully tested OpenMPI 1.10.6 wi

Re: [OMPI users] More confusion about --map-by!

2017-02-23 Thread r...@open-mpi.org
From the mpirun man page: ** Open MPI employs a three-phase procedure for assigning process locations and ranks: mapping Assigns a default location to each process ranking Assigns an MPI_COMM_WORLD rank value to each process binding Constrains each process to run on specific proce

Re: [OMPI users] More confusion about --map-by!

2017-02-23 Thread r...@open-mpi.org
rank 7 bound to socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../../../../../..][../../BB/BB/../../../../../../../..] “span” causes ORTE to treat all the sockets etc. as being on a single giant node. HTH Ralph > On Feb 23, 2017, at 6:38 AM, r...@open-mpi.org wr

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-23 Thread r...@open-mpi.org
You might want to try using the DVM (distributed virtual machine) mode in ORTE. You can start it on an allocation using the “orte-dvm” cmd, and then submit jobs to it with “mpirun --hnp ”, where foo is either the contact info printed out by orte-dvm, or the name of the file you told orte-dvm to

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread r...@open-mpi.org
> On Feb 27, 2017, at 4:58 AM, Angel de Vicente wrote: > > Hi, > > "r...@open-mpi.org" writes: >> You might want to try using the DVM (distributed virtual machine) >> mode in ORTE. You can start it on an allocation using the “orte-dvm” >> cmd, an

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread r...@open-mpi.org
> On Feb 27, 2017, at 9:39 AM, Reuti wrote: > > >> Am 27.02.2017 um 18:24 schrieb Angel de Vicente : >> >> […] >> >> For a small group of users if the DVM can run with my user and there is >> no restriction on who can use it or if I somehow can authorize others to >> use it (via an authority

Re: [OMPI users] State of the DVM in Open MPI

2017-02-28 Thread r...@open-mpi.org
Hi Reuti The DVM in master seems to be fairly complete, but several organizations are in the process of automating tests for it so it gets more regular exercise. If you are using a version in OMPI 2.x, those are early prototype - we haven’t updated the code in the release branches. The more pro

Re: [OMPI users] Issues with different IB adapters and openmpi 2.0.2

2017-02-28 Thread r...@open-mpi.org
The root cause is that the nodes are defined as “heterogeneous” because the difference in HCAs causes a difference in selection logic. For scalability purposes, we don’t circulate the choice of PML as that isn’t something mpirun can “discover” and communicate. One option we could pursue is to p

Re: [OMPI users] MPI for microcontrolles without OS

2017-03-08 Thread r...@open-mpi.org
OpenMPI has been ported to microcontrollers before, but it does require at least a minimal OS to provide support (e.g., TCP for communications). Most IoT systems already include an OS on them for just that reason. I personally have OMPI running on a little Edison board using the OS that comes wi

Re: [OMPI users] MPI for microcontrolles without OS

2017-03-08 Thread r...@open-mpi.org
nd a solutions. > About this ported version - was it working properly? > > Thanks in advance, > Mateusz Tasz > > > 2017-03-08 18:23 GMT+01:00 r...@open-mpi.org : >> OpenMPI has been ported to microcontrollers before, but it does require at >> least a minimal OS to provi

Re: [OMPI users] OpenMPI in docker container

2017-03-11 Thread r...@open-mpi.org
Past attempts have indicated that only TCP works well with Docker - if you want to use OPA, you’re probably better off using Singularity as your container. http://singularity.lbl.gov/ The OMPI master has some optimized integration for Singularity, but 2.0.2 will wo

  1   2   3   >