Re: [OMPI users] MPI_Comm_accept()

2017-03-13 Thread r...@open-mpi.org
You should consider it a bug for now - it won’t work in the 2.0 series, and I don’t think it will work in the upcoming 2.1.0 release. Probably will be fixed after that. > On Mar 13, 2017, at 5:17 AM, Adam Sylvester wrote: > > As a follow-up, I tried this with Open MPI 1.10.4 and this worked a

Re: [OMPI users] MPI_Comm_accept()

2017-03-14 Thread r...@open-mpi.org
fixed? > > Thanks. > > On Mon, Mar 13, 2017 at 10:45 AM, r...@open-mpi.org > <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>> > wrote: > You should consider it a bug for now - it won’t work in the 2.0 series, and I > don’t think it will work in the upcomi

Re: [OMPI users] How to launch ompi-server?

2017-03-19 Thread r...@open-mpi.org
Well, your initial usage looks correct - you don’t launch ompi-server via mpirun. However, it sounds like there is probably a bug somewhere if it hangs as you describe. Scratching my head, I can only recall less than a handful of people ever using these MPI functions to cross-connect jobs, so i

Re: [OMPI users] OpenMPI-2.1.0 problem with executing orted when using SGE

2017-03-22 Thread r...@open-mpi.org
Sorry folks - for some reason (probably timing for getting 2.1.0 out), the fix for this got pushed to v2.1.1 - see the PR here: https://github.com/open-mpi/ompi/pull/3163 > On Mar 22, 2017, at 7:49 AM, Reuti wrote: > >> >> Am 22.03.2017 um 15:31

Re: [OMPI users] Communicating MPI processes running in Docker containers in the same host by means of shared memory?

2017-03-26 Thread r...@open-mpi.org
There are a couple of things you’d need to resolve before worrying about code: * IIRC, there is a separate ORTE daemon in each Docker container since OMPI thinks these are separate nodes. So you’ll first need to find some way those daemons can “discover” that they are on the same physical node.

Re: [OMPI users] Performance degradation of OpenMPI 1.10.2 when oversubscribed?

2017-03-27 Thread r...@open-mpi.org
I’m confused - mpi_yield_when_idle=1 is precisely the “oversubscribed” setting. So why would you expect different results? > On Mar 27, 2017, at 3:52 AM, Jordi Guitart wrote: > > Hi Ben, > > Thanks for your feedback. As described here > (https://www.open-mpi.org/faq/?category=running#oversubs

Re: [OMPI users] Install openmpi.2.0.2 with certain option

2017-04-04 Thread r...@open-mpi.org
--without-cuda --without-slurm should do the trick > On Apr 4, 2017, at 4:49 AM, Andrey Shtyrov via users > wrote: > > Dear openmpi communite, > > I am need to install openmpi.2.0.2 on sistem with slurm, and cuda, without > support it. > > I have tried write ".configure ... (--without-cuda

Re: [OMPI users] No more default core binding since 2.0.2?

2017-04-09 Thread r...@open-mpi.org
There has been no change in the policy - however, if you are oversubscribed, we did fix a bug to ensure that we don’t auto-bind in that situation Can you pass along your cmd line? So far as I can tell, it still seems to be working. > On Apr 9, 2017, at 3:40 AM, Reuti wrote: > > Hi, > > While

Re: [OMPI users] No more default core binding since 2.0.2?

2017-04-09 Thread r...@open-mpi.org
> On Apr 9, 2017, at 1:49 PM, Reuti wrote: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi, > > Am 09.04.2017 um 16:35 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>: > >> There has been no change in the policy - however, if you are

Re: [OMPI users] No more default core binding since 2.0.2?

2017-04-09 Thread r...@open-mpi.org
Let me try to clarify. If you launch a job that has only 1 or 2 processes in it (total), then we bind to core by default. This is done because a job that small is almost always some kind of benchmark. If there are more than 2 processes in the job (total), then we default to binding to NUMA (if

Re: [OMPI users] No more default core binding since 2.0.2?

2017-04-10 Thread r...@open-mpi.org
> On Apr 10, 2017, at 1:37 AM, Reuti wrote: > >> >> Am 10.04.2017 um 01:58 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>: >> >> Let me try to clarify. If you launch a job that has only 1 or 2 processes in >> it (total), then we bind to core b

Re: [OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)

2017-04-10 Thread r...@open-mpi.org
I’m not entirely sure I understand your reference to “real cores”. When we bind you to a core, we bind you to all the HT’s that comprise that core. So, yes, with HT enabled, the binding report will list things by HT, but you’ll always be bound to the full core if you tell us bind-to core The de

Re: [OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)

2017-04-12 Thread r...@open-mpi.org
d #0002(pid 07514), > 025, Cpus_allowed_list: > 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0003(pid 07514), > 029, Cpus_allowed_list: > 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > MPI In

Re: [OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)

2017-04-13 Thread r...@open-mpi.org
You can always specify a particular number of cpus to use for each process by adding it to the map-by directive: mpirun -np 8 --map-by ppr:2:socket:pe=5 --use-hwthread-cpus -report-bindings --mca plm_rsh_agent "qrsh" ./myid would map 2 processes to each socket, binding each process to 5 HTs on

Re: [OMPI users] fatal error for openmpi-master-201704200300-ded63c with SuSE Linux and gcc-6.3.0

2017-04-20 Thread r...@open-mpi.org
This is a known issue due to something in the NVIDIA library and it’s interactions with hwloc. Your tarball tag indicates you should have the attempted fix in it, so likely that wasn’t adequate. See https://github.com/open-mpi/ompi/pull/3283 for the

Re: [OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)

2017-04-22 Thread r...@open-mpi.org
/../../../../..][../../../../../../../../../..] > from OpenMPI directly? > > Cheers and thanks again, > > Ado > > On 13.04.2017 17:34, r...@open-mpi.org wrote: >> Yeah, we need libnuma to set the memory binding. There is a param to turn >> off the warning if inst

Re: [OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)

2017-04-24 Thread r...@open-mpi.org
> [../../../../../B./B./B./B./B.][../../../../../../../../../..] > [pascal-3-07:21027] ... > [../../../../../../../../../..][B./B./B./B./B./../../../../..] > [pascal-3-07:21027] ... > [../../../../../../../../../..][../../../../../B./B./B./B./B./] > > Cheers, > > A

Re: [OMPI users] In preparation for 3.x

2017-04-25 Thread r...@open-mpi.org
What is in your hostfile? > On Apr 25, 2017, at 11:39 AM, Eric Chamberland > wrote: > > Hi, > > just testing the 3.x branch... I launch: > > mpirun -n 8 echo "hello" > > and I get: > > -- > There are not enough slots a

Re: [OMPI users] In preparation for 3.x

2017-04-25 Thread r...@open-mpi.org
only node list is > # created by the RAS component named "localhost" if no other RAS > # components were able to find any hosts to run on (this behavior can > # be disabled by excluding the localhost RAS component by specifying > # the value "^localhost" [without t

Re: [OMPI users] In preparation for 3.x

2017-04-25 Thread r...@open-mpi.org
for job [40219,1] > [zorg:22463] [[40219,0],0] hostfile: checking hostfile > /opt/openmpi-3.x_debug/etc/openmpi-default-hostfile for nodes > [zorg:22463] [[40219,0],0] plm:base:launch wiring up iof for job [40219,1] > [zorg:22463] [[40

Re: [OMPI users] In preparation for 3.x

2017-04-25 Thread r...@open-mpi.org
OWN > dancer30: flags=0x10 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN > dancer31: flags=0x10 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN > = > -- > There are not enough slots available in the system to satisfy the 4 slot

Re: [OMPI users] In preparation for 3.x

2017-04-25 Thread r...@open-mpi.org
0 slots_inuse=0 state=UNKNOWN > dancer29: flags=0x10 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN > dancer30: flags=0x10 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN > dancer31: flags=0x10 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN > ==

Re: [OMPI users] In preparation for 3.x

2017-04-25 Thread r...@open-mpi.org
his (I > don't usually restrict my runs to a subset of the nodes). > > George. > > > On Tue, Apr 25, 2017 at 4:53 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>> wrote: > I suspect it read the file just fine - wha

Re: [OMPI users] In preparation for 3.x

2017-04-25 Thread r...@open-mpi.org
Sure - there is always an MCA param for everything: OMPI_MCA_rmaps_base_oversubscribe=1 > On Apr 25, 2017, at 2:10 PM, Eric Chamberland > wrote: > > On 25/04/17 04:36 PM, r...@open-mpi.org wrote: >> add --oversubscribe to the cmd line > > good, it works! :) >

Re: [OMPI users] OpenMPI 2.1.0: FAIL: opal_path_nfs

2017-04-26 Thread r...@open-mpi.org
You can probably safely ignore it. > On Apr 26, 2017, at 2:29 PM, Prentice Bisbal wrote: > > I'm trying to build OpenMPI 2.1.0 with GCC 5.4.0 on CentOS 6.8. After working > around the '-Lyes/lib' errors I reported in my previous post, opal_path_nfs > fails during 'make check' (see below). Is t

Re: [OMPI users] OpenMPI 2.1.0: FAIL: opal_path_nfs

2017-04-26 Thread r...@open-mpi.org
appearing, etc). So all it is saying is “found something I don’t recognize”. > On Apr 26, 2017, at 3:19 PM, Prentice Bisbal wrote: > > That's what I figured, but I wanted to check first. Any idea of exactly what > it's trying to check? > > Prentice > >

Re: [OMPI users] Closing pipes associated with repeated MPI comm spawns

2017-04-28 Thread r...@open-mpi.org
What version of OMPI are you using? > On Apr 28, 2017, at 8:26 AM, Austin Herrema wrote: > > Hello all, > > I am using mpi4py in an optimization code that iteratively spawns an MPI > analysis code (fortran-based) via "MPI.COMM_SELF.Spawn" (I gather that this > is not an ideal use for comm spa

Re: [OMPI users] [OMPI USERS] Jumbo frames

2017-05-05 Thread r...@open-mpi.org
If you are looking to use TCP packets, then you want to set the send/recv buffer size in the TCP btl, not the openib one, yes? Also, what version of OMPI are you using? > On May 5, 2017, at 7:16 AM, Alberto Ortiz wrote: > > Hi, > I have a program running with openMPI over a network using a gig

Re: [OMPI users] Can OpenMPI support torque and slurm at the same time?

2017-05-10 Thread r...@open-mpi.org
Certainly. Just make sure you have the headers for both on the node where you build OMPI so we build the required components. Then we will auto-detect which one we are running under, so nothing further is required > On May 10, 2017, at 11:41 AM, Belgin, Mehmet > wrote: > > Hello everyone, >

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread r...@open-mpi.org
If I might interject here before lots of time is wasted. Spectrum MPI is an IBM -product- and is not free. What you are likely running into is that their license manager is blocking you from running, albeit without a really nice error message. I’m sure that’s something they are working on. If y

Re: [OMPI users] pmix, lxc, hpcx

2017-05-26 Thread r...@open-mpi.org
You can also get around it by configuring OMPI with “--disable-pmix-dstore” > On May 26, 2017, at 3:02 PM, Howard Pritchard wrote: > > Hi John, > > In the 2.1.x release stream a shared memory capability was introduced into > the PMIx component. > > I know nothing about LXC containers, but it

Re: [OMPI users] MPI_Comm_accept()

2017-05-27 Thread r...@open-mpi.org
gt; > On Tue, Mar 14, 2017 at 10:24 AM, r...@open-mpi.org > <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>> > wrote: > I don’t see an issue right away, though I know it has been brought up before. > I hope to resolve it either this week or next - will rep

Re: [OMPI users] How to launch ompi-server?

2017-05-27 Thread r...@open-mpi.org
report for this one > somewhere, just let me know. > > -Adam > > On Sun, Mar 19, 2017 at 2:46 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>> wrote: > Well, your initial usage looks correct - you don’t launch ompi-server via &g

Re: [OMPI users] Closing pipes associated with repeated MPI comm spawns

2017-05-29 Thread r...@open-mpi.org
fied that initially, sorry. Running on > Ubuntu 12.04.5. > > On Fri, Apr 28, 2017 at 10:29 AM, r...@open-mpi.org > <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>> > wrote: > What version of OMPI are you using? > >> On Apr 28, 2017, at 8:26 AM, Austi

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread r...@open-mpi.org
This behavior is as-expected. When you specify "-host foo,bar”, you have told us to assign one slot to each of those nodes. Thus, running 3 procs exceeds the number of slots you assigned. You can tell it to set the #slots to the #cores it discovers on the node by using “-host foo:*,bar:*” I ca

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread r...@open-mpi.org
verbose 5 hostname > mpiexec -np 1 --host exin --mca plm_base_verbose 5 hostname > mpiexec -np 1 --host exin ldd ./hello_1_mpi > > if Open MPI is not installed on a shared filesystem (NFS for example), please > also double check > both install were built from the same source a

Re: [OMPI users] Issue in installing PMIx

2017-06-07 Thread r...@open-mpi.org
Ummm...what version of OMPI and PMIx are you talking about? > On Jun 6, 2017, at 2:20 PM, Marc Cooper wrote: > > Hi, > > I've been trying to install PMIx external to OpenMPI, with separate libevent > and hwloc. My configuration script is > > ./configure --prefix= --with-platform=optimized

Re: [OMPI users] Issue in installing PMIx

2017-06-07 Thread r...@open-mpi.org
It built fine for me - on your configure path-to-pmix, what did you tell it? It wants the path supplied as when you configured pmix itself > On Jun 7, 2017, at 2:50 PM, Marc Cooper wrote: > > OpenMPI 2.1.1 and PMIx v1.1 > > On 7 June 2017 at 11:54, r...@open-mpi.org <mailto

Re: [OMPI users] Issue in installing PMIx

2017-06-07 Thread r...@open-mpi.org
I guess I should also have clarified - I tested with PMIx v1.1.5 as that is the latest in the 1.1 series. > On Jun 7, 2017, at 8:23 PM, r...@open-mpi.org wrote: > > It built fine for me - on your configure path-to-pmix, what did you tell it? > It wants the path supplied as when yo

Re: [OMPI users] "undefined reference to `MPI_Comm_create_group'" error message when using Open MPI 1.6.2

2017-06-09 Thread r...@open-mpi.org
Sure - just configure OMPI with “--enable-static --disable-shared” > On Jun 9, 2017, at 5:50 AM, Arham Amouie via users > wrote: > > Thank you very much. Could you please answer another somewhat related > question? I'd like to know if ORTE could be linked statically like a library > in order

Re: [OMPI users] Node failure handling

2017-06-09 Thread r...@open-mpi.org
It has been awhile since I tested it, but I believe the --enable-recovery option might do what you want. > On Jun 8, 2017, at 6:17 AM, Tim Burgess wrote: > > Hi! > > So I know from searching the archive that this is a repeated topic of > discussion here, and apologies for that, but since it's

Re: [OMPI users] "undefined reference to `MPI_Comm_create_group'" error message when using Open MPI 1.6.2

2017-06-09 Thread r...@open-mpi.org
the hard disks of > compute nodes. > > Now I know that I can install Open MPI in a shared directory. But is it > possible to make executable files that don't look for any Open MPI's files on > disk? > > Arham > > > From: "r...@open-mpi.org&quo

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-15 Thread r...@open-mpi.org
Here is how the system is working: Master: each process is put into its own process group upon launch. When we issue a “kill”, however, we only issue it to the individual process (instead of the process group that is headed by that child process). This is probably a bug as I don’t believe that

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-15 Thread r...@open-mpi.org
.1.1 process 1 doesn't abort, but stops after it is finished sleeping > > Sincerely, > > Ted Sussman > > On 15 Jun 2017 at 9:18, r...@open-mpi.org wrote: > >> Here is how the system is working: >> >> Master: each process is put into its own process

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-15 Thread r...@open-mpi.org
d not be aborted. > > And users might have several layers of shells in between mpirun and the > executable. > > So now I will look for the latest version of Open MPI that has the 1.4.3 > behavior. > > Sincerely, > > Ted Sussman > > On 15 Jun 2017 at

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-19 Thread r...@open-mpi.org
s > remains after all the signals are sent. > > On 19 Jun 2017 at 10:10, r...@open-mpi.org wrote: > >> >> That is typical behavior when you throw something into "sleep" - not much we >> can do about it, I >> think. >> >>On Jun 19,

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread r...@open-mpi.org
You can add "OMPI_MCA_plm=rsh OMPI_MCA_sec=^munge” to your environment > On Jun 22, 2017, at 7:28 AM, John Hearns via users > wrote: > > Michael, try > --mca plm_rsh_agent ssh > > I've been fooling with this myself recently, in the contect of a PBS cluster > > On 22 June 2017 at 16:16, Mic

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread r...@open-mpi.org
I gather you are using OMPI 2.x, yes? And you configured it --with-pmi=, then moved the executables/libs to your workstation? I suppose I could state the obvious and say “don’t do that - just rebuild it”, and I fear that (after checking the 2.x code) you really have no choice. OMPI v3.0 will ha

Re: [OMPI users] --host works but --hostfile does not

2017-06-22 Thread r...@open-mpi.org
From “man mpirun” - note that not specifying “slots=N” in a hostfile defaults to slots=#cores on that node (as it states in the text): Specifying Host Nodes Host nodes can be identified on the mpirun command line with the -host option or in a hostfile. For example, mpir

Re: [OMPI users] waiting for message either from MPI communicator or from TCP socket

2017-06-25 Thread r...@open-mpi.org
I suspect nobody is answering because the question makes no sense to us. You are implying that the TCP socket is outside of MPI since it isn’t tied to a communicator. If you want to setup a non-MPI communication path between two procs and monitor it separate from the MPI library, you can certain

Re: [OMPI users] Node failure handling

2017-06-26 Thread r...@open-mpi.org
e recent work > on ompi master. Even though the mpiruns will all be associated to the > same ompi-server, do you think this could be sufficient to isolate the > failures? > > Cheers, > Tim > > > > On 10 June 2017 at 00:56, r...@open-mpi.org wrote: >> It has b

Re: [OMPI users] Node failure handling

2017-06-26 Thread r...@open-mpi.org
--------- > [bud96:20652] [[8878,0],0] orted_cmd: received halt_vm cmd > [bud96:20652] [[8878,0],0] orted_cmd: all routes and children gone - exiting > ``` > > On 27 June 2017 at 12:19, r...@open-mpi.org wrote: >> Ah - you should hav

Re: [OMPI users] Node failure handling

2017-06-27 Thread r...@open-mpi.org
r. I was not aware that this > capability exists in the master version of ORTE, but if it does then it makes > our life easier. > > George. > > > On Tue, Jun 27, 2017 at 6:14 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>> wr

Re: [OMPI users] Node failure handling

2017-06-27 Thread r...@open-mpi.org
Okay, this should fix it - https://github.com/open-mpi/ompi/pull/3771 <https://github.com/open-mpi/ompi/pull/3771> > On Jun 27, 2017, at 6:31 AM, r...@open-mpi.org wrote: > > Actually, the error message is coming from mpirun to indicate that it lost > connection to one (or mo

Re: [OMPI users] OpenMPI 2.1.1, --map-to socket, application context files

2017-06-29 Thread r...@open-mpi.org
It’s a difficult call to make as to which is the correct behavior. In Example 1, you are executing a single app_context that has two procs in it. In Example 2, you are executing two app_contexts, each with a single proc in it. Now some people say that the two should be treated the same, with the

Re: [OMPI users] OpenMPI 2.1.1, --map-to socket, application context files

2017-06-30 Thread r...@open-mpi.org
iles. > > How can I do this? > > Sincerely, > > Ted Sussman > > On 29 Jun 2017 at 19:09, r...@open-mpi.org wrote: > >> >> It´s a difficult call to make as to which is the correct behavior. In >> Example 1, you are executing a >> single app_c

Re: [OMPI users] OpenMPI 2.1.1, --map-to socket, application context files

2017-06-30 Thread r...@open-mpi.org
y parameters work with > the app context files. > > I tried an app context file of the format > > > -np 1 afftest01.exe; -np 1 afftest01.exe > > but it didn't work. Only rank 0 was created. Is there a different syntax that > will work? > > Sincerely, &g

Re: [OMPI users] Issue handling SIGUSR1 in OpenMPI

2017-07-25 Thread r...@open-mpi.org
I’m afraid we don’t currently support that use-case. We forward signals sent by the user to mpiexec (i.e., the user “hits” mpiexec with a signal), but we don’t do anything to support an application proc attempting to raise a signal and asking it to be propagated. If you are using OMPI master, o

Re: [OMPI users] Issue handling SIGUSR1 in OpenMPI

2017-07-25 Thread r...@open-mpi.org
ignal); > >printf("My pid is: %d\n", getpid()); > > for (;;) { > printf("\nSleeping for 10 seconds\n"); > sleep(10); > > MPI_Finalize(); > } > > When I run with 3 processes using mpirun -np 3 ./test, I expect the statement

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-25 Thread r...@open-mpi.org
> On Jul 25, 2017, at 3:48 PM, Kulshrestha, Vipul > wrote: > > I have several questions about integration of openmpi with resource queuing > systems. > > 1. > I understand that openmpi supports integration with various resource > distribution systems such as SGE, LSF, torque etc. > > I ne

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread r...@open-mpi.org
, because I don’t understand how mpirun gets access to information > about RAM requirement. > > qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out > > > Regards, > Vipul > > >   <> > From: users [mailto:users-boun...@lists.open-mpi.or

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread r...@open-mpi.org
s [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Reuti > Sent: Wednesday, July 26, 2017 9:25 AM > To: Open MPI Users > Subject: Re: [OMPI users] Questions about integration with resource > distribution systems > > >> Am 26.07.2017 um 15:09 schrieb r...@open-mpi.or

Re: [OMPI users] -host vs -hostfile

2017-07-31 Thread r...@open-mpi.org
?? Doesn't that tell pbs to allocate 1 node with 2 slots on it? I don't see where you get 4 Sent from my iPad > On Jul 31, 2017, at 10:00 AM, Mahmood Naderan wrote: > > OK. The next question is how touse it with torque (PBS)? currently we write > this directive > > Nodes=1:ppn=2 > > which

Re: [OMPI users] error building openmpi-v2.* with SUN C 5.15 on SuSE Linux

2017-08-08 Thread r...@open-mpi.org
Should be fixed for 2.x here: https://github.com/open-mpi/ompi/pull/4054 > On Jul 31, 2017, at 5:56 AM, Siegmar Gross > wrote: > > Hi, > > I've been able to install openmpi-v2.0.x-201707270322-239c439 and > openmpi-v2.x-201707271804-3b1e9fe on my

Re: [OMPI users] mpirun 2.1.1 refuses to start a Torque 6.1.1.1 job if I change the scheduler to Maui 3.3.1

2017-08-09 Thread r...@open-mpi.org
sounds to me like your maui scheduler didn’t provide any allocated slots on the nodes - did you check $PBS_NODEFILE? > On Aug 9, 2017, at 12:41 PM, A M wrote: > > > Hello, > > I have just ran into a strange issue with "mpirun". Here is what happened: > > I successfully installed Torque 6.1.1

Re: [OMPI users] Setting LD_LIBRARY_PATH for orted

2017-08-22 Thread r...@open-mpi.org
I’m afraid not - that only applies the variable to the application, not the daemons. Truly, your only real option is to put something in your .bashrc since you cannot modify the configure. Or, if you are running in a managed environment, you can ask to have your resource manager forward your e

Re: [OMPI users] Slot count parameter in hostfile ignored

2017-09-07 Thread r...@open-mpi.org
My best guess is that SLURM has only allocated 2 slots, and we respect the RM regardless of what you say in the hostfile. You can check this by adding --display-allocation to your cmd line. You probably need to tell slurm to allocate more cpus/node. > On Sep 7, 2017, at 3:33 AM, Maksym Planeta

Re: [OMPI users] Slot count parameter in hostfile ignored

2017-09-08 Thread r...@open-mpi.org
>> Cheers, >>> >>> >>> Gilles >>> >>> >>> On 9/8/2017 4:19 PM, Maksym Planeta wrote: >>>> Indeed mpirun shows slots=1 per node, but I create allocation with >>>> --ntasks-per-node 24, so I do have all cores of the

Re: [OMPI users] OpenMPI 1.10.5 oversubscribing cores

2017-09-08 Thread r...@open-mpi.org
What you probably want to do is add --cpu-list a,b,c... to each mpirun command, where each one lists the cores you want to assign to that job. > On Sep 8, 2017, at 6:46 AM, twu...@goodyear.com wrote: > > > I posted this question last year and we ended up not upgrading to the newer > openmpi.

Re: [OMPI users] Honor host_aliases file for tight SGE integration

2017-09-15 Thread r...@open-mpi.org
Hi Reuti As far as I am concerned, you SGE users “own” the SGE support - so feel free to submit a patch! Ralph > On Sep 13, 2017, at 9:10 AM, Reuti wrote: > > Hi, > > I wonder whether it came ever to the discussion, that SGE can have a similar > behavior like Torque/PBS regarding the mangli

Re: [OMPI users] Fwd: Make All error regarding either "Conflicting" or "Previous Declaration" among others

2017-09-19 Thread r...@open-mpi.org
Err...you might want to ask the MPICH folks. This is the Open MPI mailing list :-) > On Sep 19, 2017, at 7:38 AM, Aragorn Inocencio > wrote: > > Good evening, > > Thank you for taking the time to develop and assist in the use of this tool. > > I am trying to install the latest mpich-3.2 vers

Re: [OMPI users] Fwd: OpenMPI does not obey hostfile

2017-09-26 Thread r...@open-mpi.org
That is correct. If you don’t specify a slot count, we auto-discover the number of cores on each node and set #slots to that number. If an RM is involved, then we use what they give us Sent from my iPad > On Sep 26, 2017, at 8:11 PM, Anthony Thyssen > wrote: > > > I have been having problem

Re: [OMPI users] OpenMPI 3.0.0, compilation using Intel icc 11.1 on Linux, error when compiling pmix_mmap

2017-10-01 Thread r...@open-mpi.org
Afraid I’m rather stumped on this one. There is no such include file in pmix_mmap, nor is there any include file that might lead to it. You might try starting again from scratch to ensure you aren’t getting some weird artifact. > On Sep 29, 2017, at 1:12 PM, Ted Sussman wrote: > > Hello all,

Re: [OMPI users] OpenMPI 3.0.0, compilation using Intel icc 11.1 on Linux, error when compiling pmix_mmap

2017-10-02 Thread r...@open-mpi.org
e > declared in , which is definitely #included by > opal/threads/condition.h. > > Since this error occurred with Intel 11.x but didn't occur with later > versions of the Intel compiler, I'm wondering if the Intel 11.x compiler > suite didn't support (struct t

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-02 Thread r...@open-mpi.org
One thing I can see is that the local host (where mpirun executed) shows as “node21” in the allocation, while all others show their FQDN. This might be causing some confusion. You might try adding "--mca orte_keep_fqdn_hostnames 1” to your cmd line and see if that helps. > On Oct 2, 2017, at

Re: [OMPI users] OMPI users] upgraded mpi and R and now cannot find slots

2017-10-03 Thread r...@open-mpi.org
As Gilles said, we default to slots = cores, not HTs. If you want to treat HTs as independent cpus, then you need to add OMPI_MCA_hwloc_base_use_hwthreads_as_cpus=1 in your environment. > On Oct 3, 2017, at 7:27 AM, Jim Maas wrote: > > Tried this and got this error, and slots are available, no

Re: [OMPI users] OMPI users] upgraded mpi and R and now cannot find slots

2017-10-03 Thread r...@open-mpi.org
You can add it to the default MCA param file, if you want - /etc/openmpi-mca-params.conf > On Oct 3, 2017, at 12:44 PM, Jim Maas wrote: > > Thanks RHC where do I put that so it will be in the environment? > > J > > On 3 October 2017 at 16:01, r...@open-mpi.org <

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-03 Thread r...@open-mpi.org
Can you try a newer version of OMPI, say the 3.0.0 release? Just curious to know if we perhaps “fixed” something relevant. > On Oct 3, 2017, at 5:33 PM, Anthony Thyssen wrote: > > FYI... > > The problem is discussed further in > > Redhat Bugzilla: Bug 1321154 - numa enabled torque don't wor

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-06 Thread r...@open-mpi.org
No problem - glad you were able to work it out! > On Oct 5, 2017, at 11:22 PM, Anthony Thyssen > wrote: > > Sorry r...@open-mpi.org <mailto:r...@open-mpi.org> as Gilles Gouaillardet > pointed out to me the problem wasn't OpenMPI, but with the specific EPEL &g

Re: [OMPI users] Controlling spawned process

2017-10-06 Thread r...@open-mpi.org
Couple of things you can try: * add --oversubscribe to your mpirun cmd line so it doesn’t care how many slots there are * modify your MPI_INFO to be “host”, “node0:22” so it thinks there are more slots available It’s possible that the “host” info processing has a bug in it, but this will tell

Re: [OMPI users] Failed to register memory (openmpi 2.0.2)

2017-10-18 Thread r...@open-mpi.org
Put “oob=tcp” in your default MCA param file > On Oct 18, 2017, at 9:00 AM, Mark Dixon wrote: > > Hi, > > We're intermittently seeing messages (below) about failing to register memory > with openmpi 2.0.2 on centos7 / Mellanox FDR Connect-X 3 and the vanilla IB > stack as shipped by centos. >

Re: [OMPI users] IpV6 Openmpi mpirun failed

2017-10-18 Thread r...@open-mpi.org
Looks like there is a firewall or something blocking communication between those nodes? > On Oct 18, 2017, at 1:29 PM, Mukkie wrote: > > Adding a verbose output. Please check for failed and advise. Thank you. > > [mselvam@ipv-rhel73 examples]$ mpirun -hostfile host --mca oob_base_verbose > 10

Re: [OMPI users] IpV6 Openmpi mpirun failed

2017-10-19 Thread r...@open-mpi.org
:52 PM, Mukkie <mailto:mukunthh...@gmail.com>> wrote: > Thanks for your suggestion. However my firewall's are already disabled on > both the machines. > > Cordially, > Muku. > > On Wed, Oct 18, 2017 at 2:38 PM, r...@open-mpi.org <mailto:r...@open-mpi.org&g

Re: [OMPI users] Problem with MPI jobs terminating when using OMPI 3.0.x

2017-10-27 Thread r...@open-mpi.org
Two questions: 1. are you running this on node04? Or do you have ssh access to node04? 2. I note you are building this against an old version of PMIx for some reason. Does it work okay if you build it with the embedded PMIx (which is 2.0)? Does it work okay if you use PMIx v1.2.4, the latest re

Re: [OMPI users] [OMPI devel] Open MPI 2.0.4rc2 available for testing

2017-11-02 Thread r...@open-mpi.org
I would suggest also considering simply updating to v3.0, or at least to v2.1. I’d rather not play “whack-a-mole” with the Sun compiler again :-( > On Nov 2, 2017, at 6:06 AM, Howard Pritchard wrote: > > HI Siegmar, > > Could you check if you also see a similar problem with OMPI master when y

Re: [OMPI users] Can't connect using MPI Ports

2017-11-03 Thread r...@open-mpi.org
What version of OMPI are you using? > On Nov 3, 2017, at 7:48 AM, Florian Lindner wrote: > > Hello, > > I'm working on a sample program to connect two MPI communicators launched > with mpirun using Ports. > > Firstly, I use MPI_Open_port to obtain a name and write that to a file: > > if (op

Re: [OMPI users] Can't connect using MPI Ports

2017-11-03 Thread r...@open-mpi.org
ote: > > > Am 03.11.2017 um 16:18 schrieb r...@open-mpi.org: >> What version of OMPI are you using? > > 2.1.1 @ Arch Linux. > > Best, > Florian > ___ > users mailing list > users@lists.open-mpi.org > http

Re: [OMPI users] Can't connect using MPI Ports

2017-11-05 Thread r...@open-mpi.org
> On Nov 5, 2017, at 6:48 AM, Florian Lindner wrote: > > Am 04.11.2017 um 00:05 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>: >> Yeah, there isn’t any way that is going to work in the 2.x series. I’m not >> sure it was ever fixed, but you might try the

Re: [OMPI users] Can't connect using MPI Ports

2017-11-06 Thread r...@open-mpi.org
> On Nov 6, 2017, at 7:46 AM, Florian Lindner wrote: > > Am 05.11.2017 um 20:57 schrieb r...@open-mpi.org: >> >>> On Nov 5, 2017, at 6:48 AM, Florian Lindner >> <mailto:mailingli...@xgm.de>> wrote: >>> >>> Am 04.11.2017 um 00:05 schrie

Re: [OMPI users] OpenMPI 1.10.x handling of simultaneous MPI_Abort calls

2017-11-08 Thread r...@open-mpi.org
Nik > > 2017-11-07 19:00 GMT-07:00 r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>>: > Glad to hear it has already been fixed :-) > > Thanks! > >> On Nov 7, 2017, at 4:13 PM, Tru Huynh > <mailto:t...@pasteur.fr>> wrote:

Re: [OMPI users] OpenMPI 1.10.x handling of simultaneous MPI_Abort calls

2017-11-08 Thread r...@open-mpi.org
a failure to > exit mpirun around 25-30% of the time with 2 processes, causing an > inconsistent hang in both my example program and my larger application. > > -Nik > > On Nov 8, 2017 11:40, "r...@open-mpi.org <mailto:r...@open-mpi.org>" > mailto:r...@open-m

Re: [OMPI users] Can't connect using MPI Ports

2017-11-09 Thread r...@open-mpi.org
I did a quick check across the v2.1 and v3.0 OMPI releases and both failed, though with different signatures. Looks like a problem in the OMPI dynamics integration (i.e., the PMIx library looked like it was doing the right things). I’d suggest filing an issue on the OMPI github site so someone c

Re: [OMPI users] --map-by

2017-11-16 Thread r...@open-mpi.org
Do not include the “bind-to core” option.the mapping directive already forces that Sent from my iPad > On Nov 16, 2017, at 7:44 AM, Noam Bernstein > wrote: > > Hi all - I’m trying to run mixed MPI/OpenMP, so I ideally want binding of > each MPI process to a small set of cores (to allow for

Re: [OMPI users] OMPI 2.1.2 and SLURM compatibility

2017-11-16 Thread r...@open-mpi.org
What Charles said was true but not quite complete. We still support the older PMI libraries but you likely have to point us to wherever slurm put them. However,we definitely recommend using PMIx as you will get a faster launch Sent from my iPad > On Nov 16, 2017, at 9:11 AM, Bennet Fauber wro

Re: [OMPI users] --map-by

2017-11-20 Thread r...@open-mpi.org
> On Nov 16, 2017, at 7:08 AM, Noam Bernstein > wrote: > > >> On Nov 16, 2017, at 9:49 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> >> wrote: >> >> Do not include the “bind-to core” option.the mapping directive already >> forces that > &

Re: [OMPI users] --map-by

2017-11-21 Thread r...@open-mpi.org
ern within the current context. > On Nov 21, 2017, at 5:34 AM, Noam Bernstein > wrote: > >> >> On Nov 20, 2017, at 7:02 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> >> wrote: >> >> So there are two options here that will work and hopefully provi

Re: [OMPI users] signal handling with mpirun

2017-11-21 Thread r...@open-mpi.org
Try upgrading to the v3.0, or at least to the latest in the v2.x series. The v1.10 series is legacy and no longer maintained. > On Nov 21, 2017, at 8:20 AM, Kulshrestha, Vipul > wrote: > > Hi, > > I am finding that on Ctrl-C, mpirun immediately stops and does not sends > SIGTERM to the chil

Re: [OMPI users] OMPI 2.1.2 and SLURM compatibility

2017-11-29 Thread r...@open-mpi.org
no event_base set. > [warn] opal_libevent2022_event_active: event has no event_base set. > slurmstepd: error: *** STEP 116.0 ON bn1 CANCELLED AT 2017-11-29T08:42:54 *** > srun: Job step aborted: Waiting up to 32 seconds for job step to finish. > slurmstepd: error: *** JOB 11

Re: [OMPI users] OMPI 2.1.2 and SLURM compatibility

2017-12-11 Thread r...@open-mpi.org
MIx. > > $ gcc -I/tmp/build/openmpi-3.0.0/opal/mca/pmix/pmix2x/pmix/include \ > pmix-test.c > pmix-test.c:95:2: error: #error "not version 3" > #error "not version 3" > ^ > > But the config.log generated when using the internal version of PMIx >

Re: [OMPI users] OMPI 3.0.0 crashing at mpi_init on OS X using Fortran

2017-12-11 Thread r...@open-mpi.org
FWIW: I just cloned the v3.0.x branch to get the latest 3.0.1 release candidate, built and ran it on Mac OSX High Sierra. Everything built and ran fine for both C and Fortran codes. You might want to test the same - could be this was already fixed. > On Dec 11, 2017, at 12:43 PM, Ricardo Parrei

<    1   2   3   >