Re: [OMPI users] Error in file base/plm_base_launch_support.c: OPAL_HWLOC_TOPO

2018-07-21 Thread r...@open-mpi.org
More than likely the problem is the difference in hwloc versions - sounds like the topology to/from xml is different between the two versions, and the older one doesn’t understand the new one. > On Jul 21, 2018, at 12:04 PM, Brian Smith > wrote: > > Greetings, > > I'm having trouble getting

Re: [OMPI users] Locking down TCP ports used

2018-07-07 Thread r...@open-mpi.org
I suspect the OOB is working just fine and you are seeing the TCP/btl opening the other ports. There are two TCP elements at work here: the OOB (which sends management messages between daemons) and the BTL (which handles the MPI traffic). In addition to what you provided, you also need to provid

Re: [OMPI users] Verbose output for MPI

2018-07-04 Thread r...@open-mpi.org
Try adding PMIX_MCA_ptl_base_verbose=10 to your environment > On Jul 4, 2018, at 8:51 AM, Maksym Planeta > wrote: > > Thanks for quick response, > > I tried this out and I do get more output: https://pastebin.com/JkXAYdM4. But > the line I need does not appear in the output. > > On 04/07/18

Re: [OMPI users] Disable network interface selection

2018-06-22 Thread r...@open-mpi.org
> On Jun 22, 2018, at 8:25 PM, r...@open-mpi.org wrote: > > > >> On Jun 22, 2018, at 7:31 PM, Gilles Gouaillardet >> mailto:gilles.gouaillar...@gmail.com>> wrote: >> >> Carlos, >> >> By any chance, could >> >> mpiru

Re: [OMPI users] Disable network interface selection

2018-06-22 Thread r...@open-mpi.org
> On Jun 22, 2018, at 7:31 PM, Gilles Gouaillardet > wrote: > > Carlos, > > By any chance, could > > mpirun—mca oob_tcp_if_exclude 192.168.100.0/24 ... > > work for you ? > > Which Open MPI version are you running ? > > > IIRC, subnets are internally translated

Re: [OMPI users] new core binding issues?

2018-06-22 Thread r...@open-mpi.org
Afraid I’m not familiar with that option, so I really don’t know :-( > On Jun 22, 2018, at 10:13 AM, Noam Bernstein > wrote: > >> On Jun 22, 2018, at 1:00 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> >> wrote: >> >> I suspect it is okay. Keep in min

Re: [OMPI users] new core binding issues?

2018-06-22 Thread r...@open-mpi.org
I suspect it is okay. Keep in mind that OMPI itself is starting multiple progress threads, so that is likely what you are seeing. The binding patter in the mpirun output looks correct as the default would be to map-by socket and you asked that we bind-to core. > On Jun 22, 2018, at 9:33 AM, No

Re: [OMPI users] Enforcing specific interface and subnet usage

2018-06-19 Thread r...@open-mpi.org
nment > On Jun 19, 2018, at 2:08 AM, Maksym Planeta > wrote: > > But what about remote connections parameter? Why is it not set? > > On 19/06/18 00:58, r...@open-mpi.org wrote: >> I’m not entirely sure I understand what you are trying to do. The >> PMIX_SERVER_UR

Re: [OMPI users] Enforcing specific interface and subnet usage

2018-06-18 Thread r...@open-mpi.org
I’m not entirely sure I understand what you are trying to do. The PMIX_SERVER_URI2 envar tells local clients how to connect to their local PMIx server (i.e., the OMPI daemon on that node). This is always done over the loopback device since it is a purely local connection that is never used for

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread r...@open-mpi.org
t; the daemon itself. We cannot recover from this failure, and >> therefore will terminate the job. >> -- >> >> That makes sense, I guess. >> >> I'll keep you posted as to what hap

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread r...@open-mpi.org
ad of ORTE_SUCCESS > > > At one point, I am almost certain that OMPI mpirun did work, and I am > at a loss to explain why it no longer does. > > I have also tried the 3.1.1rc1 version. I am now going to try 3.0.0, > and we'll try downgrading SLURM to a prior version.

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread r...@open-mpi.org
TIME NODES > NODELIST(REASON) > 158 standard bash bennet R 14:30 1 cav01 > [bennet@cavium-hpc ~]$ srun hostname > cav01.arc-ts.umich.edu > [ repeated 23 more times ] > > As always, your help is much appreciated, > > -- bennet > > On S

Re: [OMPI users] Fwd: srun works, mpirun does not

2018-06-17 Thread r...@open-mpi.org
Add --enable-debug to your OMPI configure cmd line, and then add --mca plm_base_verbose 10 to your mpirun cmd line. For some reason, the remote daemon isn’t starting - this will give you some info as to why. > On Jun 17, 2018, at 9:07 AM, Bennet Fauber wrote: > > I have a compiled binary that

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-08 Thread r...@open-mpi.org
olved. Thanks very much Ralph and Artem for your > help! > > -- bennet > > > On Thu, Jun 7, 2018 at 11:06 AM r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>> wrote: > Odd - Artem, do you have any suggestions? > > >

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-07 Thread r...@open-mpi.org
-utils/lib > > > I don't have a saved build log, but I can rebuild this and save the > build logs, in case any information in those logs would help. > > I will also mention that we have, in the past, used the > --disable-dlopen and --enable-shared flags, which we did not use

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-07 Thread r...@open-mpi.org
root 16 Jun 1 08:32 > /opt/slurm/lib64/slurm/mpi_pmix.so -> ./mpi_pmix_v2.so > -rwxr-xr-x 1 root root 828232 May 30 15:20 > /opt/slurm/lib64/slurm/mpi_pmix_v2.so > > > Let me know if anything else would be helpful. > > Thanks,-- bennet > > On Thu,

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-07 Thread r...@open-mpi.org
You didn’t show your srun direct launch cmd line or what version of Slurm is being used (and how it was configured), so I can only provide some advice. If you want to use PMIx, then you have to do two things: 1. Slurm must be configured to use PMIx - depending on the version, that might be ther

Re: [OMPI users] --oversubscribe option

2018-06-06 Thread r...@open-mpi.org
I’m not entirely sure what you are asking here. If you use oversubscribe, we do not bind your processes and you suffer some performance penalty for it. If you want to run one process/thread and retain binding, then do not use --oversubscribe and instead use --use-hwthread-cpus > On Jun 6, 2018

Re: [OMPI users] A hang in Rmpi at PMIx_Disconnect

2018-06-04 Thread r...@open-mpi.org
olutely sure* that your application will successfully > and correctly survive a call to fork(), you may disable this warning > by setting the mpi_warn_on_fork MCA parameter to 0. > -- > And the process hangs as wel

Re: [OMPI users] A hang in Rmpi at PMIx_Disconnect

2018-06-04 Thread r...@open-mpi.org
ient disconnect >> [localhost.localdomain:11646] ext2x:client disconnect >> >> In your example it's only called once per process. >> >> Do you have any suspicion where the second call comes from? Might this be >> the reason for the hang? >> >>

Re: [OMPI users] A hang in Rmpi at PMIx_Disconnect

2018-06-04 Thread r...@open-mpi.org
Try running the attached example dynamic code - if that works, then it likely is something to do with how R operates. simple_spawn.c Description: Binary data > On Jun 4, 2018, at 3:43 AM, marcin.krotkiewski > wrote: > > Hi, > > I have some problems running R + Rmpi with OpenMPI 3.1.0 + P

Re: [OMPI users] slurm configuration override mpirun command line process mapping

2018-05-17 Thread r...@open-mpi.org
mpirun takes the #slots for each node from the slurm allocation. Your hostfile (at least, what you provided) retained that information and shows 2 slots on each node. So both the original allocation _and_ your constructed hostfile are both telling mpirun to assign 2 slots on each node. Like I s

Re: [OMPI users] slurm configuration override mpirun command line process mapping

2018-05-16 Thread r...@open-mpi.org
The problem here is that you have made an incorrect assumption. In the older OMPI versions, the -H option simply indicated that the specified hosts were available for use - it did not imply the number of slots on that host. Since you have specified 2 slots on each host, and you told mpirun to la

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread r...@open-mpi.org
You got that error because the orted is looking for its rank on the cmd line and not finding it. > On May 14, 2018, at 12:37 PM, Max Mellette wrote: > > Hi Gus, > > Thanks for the suggestions. The correct version of openmpi seems to be > getting picked up; I also prepended .bashrc with the i

Re: [OMPI users] Openmpi-3.1.0 + slurm (fixed)

2018-05-08 Thread r...@open-mpi.org
Good news - thanks! > On May 8, 2018, at 7:02 PM, Bill Broadley wrote: > > > Sorry all, > > Chris S over on the slurm list spotted it right away. I didn't have the > MpiDefault set to pmix_v2. > > I can confirm that Ubuntu 18.04, gcc-7.3, openmpi-3.1.0, pmix-2.1.1, and > slurm-17.11.5 seem t

Re: [OMPI users] Memory leak with pmix_finalize not being called

2018-05-04 Thread r...@open-mpi.org
Ouch - trivial fix. I’m inserting it into PMIx and will provide it to the OMPI team > On May 4, 2018, at 5:20 AM, Saurabh T wrote: > > This is with valgrind 3.0.1 on a Centos 6 system. It appears pmix_finalize > isnt called and this reports leaks from valgrind despite the provided > suppressi

Re: [OMPI users] openmpi/slurm/pmix

2018-04-25 Thread r...@open-mpi.org
> On Apr 25, 2018, at 8:16 AM, Michael Di Domenico > wrote: > > On Mon, Apr 23, 2018 at 6:07 PM, r...@open-mpi.org wrote: >> Looks like the problem is that you didn’t wind up with the external PMIx. >> The component listed in your error is the internal PMIx one

Re: [OMPI users] Error in hello_cxx.cc

2018-04-23 Thread r...@open-mpi.org
and 3) we no longer support Windows. You could try using the cygwin port instead. > On Apr 23, 2018, at 7:52 PM, Nathan Hjelm wrote: > > Two things. 1) 1.4 is extremely old and you will not likely get much help > with it, and 2) the c++ bindings were deprecated in MPI-2.2 (2009) and > removed

Re: [OMPI users] openmpi/slurm/pmix

2018-04-23 Thread r...@open-mpi.org
Hi Michael Looks like the problem is that you didn’t wind up with the external PMIx. The component listed in your error is the internal PMIx one which shouldn’t have built given that configure line. Check your config.out and see what happened. Also, ensure that your LD_LIBRARY_PATH is properly

Re: [OMPI users] Error in hello_cxx.cc

2018-04-23 Thread r...@open-mpi.org
Also, I note from the screenshot that you appear to be running on Windows with a Windows binary. Correct? > On Apr 23, 2018, at 7:08 AM, Jeff Squyres (jsquyres) > wrote: > > Can you send all the information listed here: > >https://www.open-mpi.org/community/help/ > > > >> On Apr 22, 2

Re: [OMPI users] ARM/Allinea DDT

2018-04-11 Thread r...@open-mpi.org
This inadvertently was sent directly to me, and Ryan asked if I could please post it on his behalf - he needs to get fully approved on the mailing list. Ralph > On Apr 11, 2018, at 1:19 PM, Ryan Hulguin wrote: > > Hello Charlie Taylor, > > I have replied to your support ticket indicating th

Re: [OMPI users] ARM/Allinea DDT

2018-04-11 Thread r...@open-mpi.org
You probably should provide a little more info here. I know the MPIR attach was broken in the v2.x series, but we fixed that - could be something remains broken in OMPI 3.x. FWIW: I doubt it's an Allinea problem. > On Apr 11, 2018, at 11:54 AM, Charles A Taylor wrote: > > > Contacting ARM se

Re: [OMPI users] OpenMPI 3.0.0 on RHEL-7

2018-03-07 Thread r...@open-mpi.org
So long as it is libevent-devel-2.0.22, you should be okay. You might want to up PMIx to v1.2.5 as Slurm 16.05 should handle that okay. OMPI v3.0.0 has PMIx 2.0 in it, but should be okay with 1.2.5 last I checked (but it has been awhile and I can’t swear to it). > On Mar 7, 2018, at 2:03 PM, C

Re: [OMPI users] libopen-pal not found

2018-03-02 Thread r...@open-mpi.org
Not that I’ve heard - you need to put it in your LD_LIBRARY_PATH > On Mar 2, 2018, at 10:15 AM, Mahmood Naderan wrote: > > Hi, > After a successful installation of opmi v3 with cuda enabled, I see that ldd > can not find a right lib file although it exists. /usr/local/lib is one of > the defa

Re: [OMPI users] Cannot compile 1.10.2 under CentOS 7 rdma-core-devel-13-7.el7.x86_64

2018-02-28 Thread r...@open-mpi.org
Not unless you have a USNIC card in your machine! > On Feb 28, 2018, at 8:08 AM, William T Jones wrote: > > Thank you! > > Will that have any adverse side effects? > Performance penalties? > > On 02/28/2018 10:57 AM, r...@open-mpi.org wrote: >> Add --without-u

Re: [OMPI users] Cannot compile 1.10.2 under CentOS 7 rdma-core-devel-13-7.el7.x86_64

2018-02-28 Thread r...@open-mpi.org
Add --without-usnic > On Feb 28, 2018, at 7:50 AM, William T Jones wrote: > > I realize that OpenMPI 1.10.2 is quite old, however, for compatibility I > am attempting to compile it after a system upgrade to CentOS 7. > > This system does include infiniband and I have configured as follows > us

Re: [OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread r...@open-mpi.org
There are a couple of problems here. First the “^tcp,self,sm” is telling OMPI to turn off all three of those transports, which probably leaves you with nothing. What you really want is to restrict to shared memory, so your param should be “-mca btl self,sm”. This will disable all transports othe

[OMPI users] Fabric manager interactions: request for comments

2018-02-05 Thread r...@open-mpi.org
Hello all The PMIx community is starting work on the next phase of defining support for network interactions, looking specifically at things we might want to obtain and/or control via the fabric manager. A very preliminary draft is shown here: https://pmix.org/home/pmix-standard/fabric-manager-

Re: [OMPI users] Setting mpirun default parameters in a file

2018-01-10 Thread r...@open-mpi.org
Set the MCA param “rmaps_base_oversubscribe=1” in your default MCA param file, or in your environment > On Jan 10, 2018, at 4:42 AM, Florian Lindner wrote: > > Hello, > > a recent openmpi update on my Arch machine seems to have enabled > --nooversubscribe, as described in the manpage. Since I

Re: [OMPI users] latest Intel CPU bug

2018-01-05 Thread r...@open-mpi.org
s > > > On 1/5/2018 3:54 PM, John Chludzinski wrote: > That article gives the best technical assessment I've seen of Intel's > architecture bug. I noted the discussion's subject and thought I'd add some > clarity. Nothing more. > > For the TL;DR c

Re: [OMPI users] latest Intel CPU bug

2018-01-04 Thread r...@open-mpi.org
rassing policies until forced to disclose by a > governmental agency being exploited by a foreign power is another example > that shines a harsh light on their ‘best practices’ line. There are many more > like this. Intel isn’t to be trusted for security practices or disclosures > becau

Re: [OMPI users] latest Intel CPU bug

2018-01-04 Thread r...@open-mpi.org
“problem”. * containers and VMs don’t fully resolve the problem - the only solution other than the patches is to limit allocations to single users on a node HTH Ralph > On Jan 3, 2018, at 10:47 AM, r...@open-mpi.org wrote: > > Well, it appears from that article that the primary imp

Re: [OMPI users] latest Intel CPU bug

2018-01-03 Thread r...@open-mpi.org
Well, it appears from that article that the primary impact comes from accessing kernel services. With an OS-bypass network, that shouldn’t happen all that frequently, and so I would naively expect the impact to be at the lower end of the reported scale for those environments. TCP-based systems,

Re: [OMPI users] Q: Binding to cores on AWS?

2017-12-22 Thread r...@open-mpi.org
> has a big impact. > > Thanks again, and merry Christmas! > - Brian > > > On Fri, Dec 22, 2017 at 1:53 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>> wrote: > Actually, that message is telling you that binding to core is available,

Re: [OMPI users] Q: Binding to cores on AWS?

2017-12-22 Thread r...@open-mpi.org
Actually, that message is telling you that binding to core is available, but that we cannot bind memory to be local to that core. You can verify the binding pattern by adding --report-bindings to your cmd line. > On Dec 22, 2017, at 11:58 AM, Brian Dobbins wrote: > > > Hi all, > > We're t

Re: [OMPI users] openmpi-v3.0.0: error for --map-by

2017-12-20 Thread r...@open-mpi.org
t; an internal error - the locale of the following process was > not set by the mapper code: > ... > > > Kind regards > > Siegmar > > > On 12/20/17 09:22, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: >> I just checked the head of both the master and 3.0.x bran

Re: [OMPI users] openmpi-v3.0.0: error for --map-by

2017-12-20 Thread r...@open-mpi.org
I just checked the head of both the master and 3.0.x branches, and they both work fine: $ mpirun --map-by ppr:1:socket:pe=1 date [rhc001:139231] SETTING BINDING TO CORE [rhc002.cluster:203672] SETTING BINDING TO CORE Wed Dec 20 00:20:55 PST 2017 Wed Dec 20 00:20:55 PST 2017 Tue Dec 19 18:37:03 P

Re: [OMPI users] OpenMPI & Slurm: mpiexec/mpirun vs. srun

2017-12-19 Thread r...@open-mpi.org
eriences are welcome. Also, if > anyone is interested in the tmpdir spank plugin, you can contact me. We are > happy to share. > > Best and Merry Christmas to all, > > Charlie Taylor > UF Research Computing > > > >> On Dec 18, 2017, at 8:12 PM, r...@open-m

Re: [OMPI users] OpenMPI & Slurm: mpiexec/mpirun vs. srun

2017-12-18 Thread r...@open-mpi.org
We have had reports of applications running faster when executing under OMPI’s mpiexec versus when started by srun. Reasons aren’t entirely clear, but are likely related to differences in mapping/binding options (OMPI provides a very large range compared to srun) and optimization flags provided

Re: [OMPI users] OMPI 3.0.0 crashing at mpi_init on OS X using Fortran

2017-12-11 Thread r...@open-mpi.org
FWIW: I just cloned the v3.0.x branch to get the latest 3.0.1 release candidate, built and ran it on Mac OSX High Sierra. Everything built and ran fine for both C and Fortran codes. You might want to test the same - could be this was already fixed. > On Dec 11, 2017, at 12:43 PM, Ricardo Parrei

Re: [OMPI users] OMPI 2.1.2 and SLURM compatibility

2017-12-11 Thread r...@open-mpi.org
MIx. > > $ gcc -I/tmp/build/openmpi-3.0.0/opal/mca/pmix/pmix2x/pmix/include \ > pmix-test.c > pmix-test.c:95:2: error: #error "not version 3" > #error "not version 3" > ^ > > But the config.log generated when using the internal version of PMIx >

Re: [OMPI users] OMPI 2.1.2 and SLURM compatibility

2017-11-29 Thread r...@open-mpi.org
no event_base set. > [warn] opal_libevent2022_event_active: event has no event_base set. > slurmstepd: error: *** STEP 116.0 ON bn1 CANCELLED AT 2017-11-29T08:42:54 *** > srun: Job step aborted: Waiting up to 32 seconds for job step to finish. > slurmstepd: error: *** JOB 11

Re: [OMPI users] signal handling with mpirun

2017-11-21 Thread r...@open-mpi.org
Try upgrading to the v3.0, or at least to the latest in the v2.x series. The v1.10 series is legacy and no longer maintained. > On Nov 21, 2017, at 8:20 AM, Kulshrestha, Vipul > wrote: > > Hi, > > I am finding that on Ctrl-C, mpirun immediately stops and does not sends > SIGTERM to the chil

Re: [OMPI users] --map-by

2017-11-21 Thread r...@open-mpi.org
ern within the current context. > On Nov 21, 2017, at 5:34 AM, Noam Bernstein > wrote: > >> >> On Nov 20, 2017, at 7:02 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> >> wrote: >> >> So there are two options here that will work and hopefully provi

Re: [OMPI users] --map-by

2017-11-20 Thread r...@open-mpi.org
> On Nov 16, 2017, at 7:08 AM, Noam Bernstein > wrote: > > >> On Nov 16, 2017, at 9:49 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> >> wrote: >> >> Do not include the “bind-to core” option.the mapping directive already >> forces that > &

Re: [OMPI users] OMPI 2.1.2 and SLURM compatibility

2017-11-16 Thread r...@open-mpi.org
What Charles said was true but not quite complete. We still support the older PMI libraries but you likely have to point us to wherever slurm put them. However,we definitely recommend using PMIx as you will get a faster launch Sent from my iPad > On Nov 16, 2017, at 9:11 AM, Bennet Fauber wro

Re: [OMPI users] --map-by

2017-11-16 Thread r...@open-mpi.org
Do not include the “bind-to core” option.the mapping directive already forces that Sent from my iPad > On Nov 16, 2017, at 7:44 AM, Noam Bernstein > wrote: > > Hi all - I’m trying to run mixed MPI/OpenMP, so I ideally want binding of > each MPI process to a small set of cores (to allow for

Re: [OMPI users] Can't connect using MPI Ports

2017-11-09 Thread r...@open-mpi.org
I did a quick check across the v2.1 and v3.0 OMPI releases and both failed, though with different signatures. Looks like a problem in the OMPI dynamics integration (i.e., the PMIx library looked like it was doing the right things). I’d suggest filing an issue on the OMPI github site so someone c

Re: [OMPI users] OpenMPI 1.10.x handling of simultaneous MPI_Abort calls

2017-11-08 Thread r...@open-mpi.org
a failure to > exit mpirun around 25-30% of the time with 2 processes, causing an > inconsistent hang in both my example program and my larger application. > > -Nik > > On Nov 8, 2017 11:40, "r...@open-mpi.org <mailto:r...@open-mpi.org>" > mailto:r...@open-m

Re: [OMPI users] OpenMPI 1.10.x handling of simultaneous MPI_Abort calls

2017-11-08 Thread r...@open-mpi.org
Nik > > 2017-11-07 19:00 GMT-07:00 r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>>: > Glad to hear it has already been fixed :-) > > Thanks! > >> On Nov 7, 2017, at 4:13 PM, Tru Huynh > <mailto:t...@pasteur.fr>> wrote:

Re: [OMPI users] Can't connect using MPI Ports

2017-11-06 Thread r...@open-mpi.org
> On Nov 6, 2017, at 7:46 AM, Florian Lindner wrote: > > Am 05.11.2017 um 20:57 schrieb r...@open-mpi.org: >> >>> On Nov 5, 2017, at 6:48 AM, Florian Lindner >> <mailto:mailingli...@xgm.de>> wrote: >>> >>> Am 04.11.2017 um 00:05 schrie

Re: [OMPI users] Can't connect using MPI Ports

2017-11-05 Thread r...@open-mpi.org
> On Nov 5, 2017, at 6:48 AM, Florian Lindner wrote: > > Am 04.11.2017 um 00:05 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>: >> Yeah, there isn’t any way that is going to work in the 2.x series. I’m not >> sure it was ever fixed, but you might try the

Re: [OMPI users] Can't connect using MPI Ports

2017-11-03 Thread r...@open-mpi.org
ote: > > > Am 03.11.2017 um 16:18 schrieb r...@open-mpi.org: >> What version of OMPI are you using? > > 2.1.1 @ Arch Linux. > > Best, > Florian > ___ > users mailing list > users@lists.open-mpi.org > http

Re: [OMPI users] Can't connect using MPI Ports

2017-11-03 Thread r...@open-mpi.org
What version of OMPI are you using? > On Nov 3, 2017, at 7:48 AM, Florian Lindner wrote: > > Hello, > > I'm working on a sample program to connect two MPI communicators launched > with mpirun using Ports. > > Firstly, I use MPI_Open_port to obtain a name and write that to a file: > > if (op

Re: [OMPI users] [OMPI devel] Open MPI 2.0.4rc2 available for testing

2017-11-02 Thread r...@open-mpi.org
I would suggest also considering simply updating to v3.0, or at least to v2.1. I’d rather not play “whack-a-mole” with the Sun compiler again :-( > On Nov 2, 2017, at 6:06 AM, Howard Pritchard wrote: > > HI Siegmar, > > Could you check if you also see a similar problem with OMPI master when y

Re: [OMPI users] Problem with MPI jobs terminating when using OMPI 3.0.x

2017-10-27 Thread r...@open-mpi.org
Two questions: 1. are you running this on node04? Or do you have ssh access to node04? 2. I note you are building this against an old version of PMIx for some reason. Does it work okay if you build it with the embedded PMIx (which is 2.0)? Does it work okay if you use PMIx v1.2.4, the latest re

Re: [OMPI users] IpV6 Openmpi mpirun failed

2017-10-19 Thread r...@open-mpi.org
:52 PM, Mukkie <mailto:mukunthh...@gmail.com>> wrote: > Thanks for your suggestion. However my firewall's are already disabled on > both the machines. > > Cordially, > Muku. > > On Wed, Oct 18, 2017 at 2:38 PM, r...@open-mpi.org <mailto:r...@open-mpi.org&g

Re: [OMPI users] IpV6 Openmpi mpirun failed

2017-10-18 Thread r...@open-mpi.org
Looks like there is a firewall or something blocking communication between those nodes? > On Oct 18, 2017, at 1:29 PM, Mukkie wrote: > > Adding a verbose output. Please check for failed and advise. Thank you. > > [mselvam@ipv-rhel73 examples]$ mpirun -hostfile host --mca oob_base_verbose > 10

Re: [OMPI users] Failed to register memory (openmpi 2.0.2)

2017-10-18 Thread r...@open-mpi.org
Put “oob=tcp” in your default MCA param file > On Oct 18, 2017, at 9:00 AM, Mark Dixon wrote: > > Hi, > > We're intermittently seeing messages (below) about failing to register memory > with openmpi 2.0.2 on centos7 / Mellanox FDR Connect-X 3 and the vanilla IB > stack as shipped by centos. >

Re: [OMPI users] Controlling spawned process

2017-10-06 Thread r...@open-mpi.org
Couple of things you can try: * add --oversubscribe to your mpirun cmd line so it doesn’t care how many slots there are * modify your MPI_INFO to be “host”, “node0:22” so it thinks there are more slots available It’s possible that the “host” info processing has a bug in it, but this will tell

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-06 Thread r...@open-mpi.org
No problem - glad you were able to work it out! > On Oct 5, 2017, at 11:22 PM, Anthony Thyssen > wrote: > > Sorry r...@open-mpi.org <mailto:r...@open-mpi.org> as Gilles Gouaillardet > pointed out to me the problem wasn't OpenMPI, but with the specific EPEL &g

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-03 Thread r...@open-mpi.org
Can you try a newer version of OMPI, say the 3.0.0 release? Just curious to know if we perhaps “fixed” something relevant. > On Oct 3, 2017, at 5:33 PM, Anthony Thyssen wrote: > > FYI... > > The problem is discussed further in > > Redhat Bugzilla: Bug 1321154 - numa enabled torque don't wor

Re: [OMPI users] OMPI users] upgraded mpi and R and now cannot find slots

2017-10-03 Thread r...@open-mpi.org
You can add it to the default MCA param file, if you want - /etc/openmpi-mca-params.conf > On Oct 3, 2017, at 12:44 PM, Jim Maas wrote: > > Thanks RHC where do I put that so it will be in the environment? > > J > > On 3 October 2017 at 16:01, r...@open-mpi.org <

Re: [OMPI users] OMPI users] upgraded mpi and R and now cannot find slots

2017-10-03 Thread r...@open-mpi.org
As Gilles said, we default to slots = cores, not HTs. If you want to treat HTs as independent cpus, then you need to add OMPI_MCA_hwloc_base_use_hwthreads_as_cpus=1 in your environment. > On Oct 3, 2017, at 7:27 AM, Jim Maas wrote: > > Tried this and got this error, and slots are available, no

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-02 Thread r...@open-mpi.org
One thing I can see is that the local host (where mpirun executed) shows as “node21” in the allocation, while all others show their FQDN. This might be causing some confusion. You might try adding "--mca orte_keep_fqdn_hostnames 1” to your cmd line and see if that helps. > On Oct 2, 2017, at

Re: [OMPI users] OpenMPI 3.0.0, compilation using Intel icc 11.1 on Linux, error when compiling pmix_mmap

2017-10-02 Thread r...@open-mpi.org
e > declared in , which is definitely #included by > opal/threads/condition.h. > > Since this error occurred with Intel 11.x but didn't occur with later > versions of the Intel compiler, I'm wondering if the Intel 11.x compiler > suite didn't support (struct t

Re: [OMPI users] OpenMPI 3.0.0, compilation using Intel icc 11.1 on Linux, error when compiling pmix_mmap

2017-10-01 Thread r...@open-mpi.org
Afraid I’m rather stumped on this one. There is no such include file in pmix_mmap, nor is there any include file that might lead to it. You might try starting again from scratch to ensure you aren’t getting some weird artifact. > On Sep 29, 2017, at 1:12 PM, Ted Sussman wrote: > > Hello all,

Re: [OMPI users] Fwd: OpenMPI does not obey hostfile

2017-09-26 Thread r...@open-mpi.org
That is correct. If you don’t specify a slot count, we auto-discover the number of cores on each node and set #slots to that number. If an RM is involved, then we use what they give us Sent from my iPad > On Sep 26, 2017, at 8:11 PM, Anthony Thyssen > wrote: > > > I have been having problem

Re: [OMPI users] Fwd: Make All error regarding either "Conflicting" or "Previous Declaration" among others

2017-09-19 Thread r...@open-mpi.org
Err...you might want to ask the MPICH folks. This is the Open MPI mailing list :-) > On Sep 19, 2017, at 7:38 AM, Aragorn Inocencio > wrote: > > Good evening, > > Thank you for taking the time to develop and assist in the use of this tool. > > I am trying to install the latest mpich-3.2 vers

Re: [OMPI users] Honor host_aliases file for tight SGE integration

2017-09-15 Thread r...@open-mpi.org
Hi Reuti As far as I am concerned, you SGE users “own” the SGE support - so feel free to submit a patch! Ralph > On Sep 13, 2017, at 9:10 AM, Reuti wrote: > > Hi, > > I wonder whether it came ever to the discussion, that SGE can have a similar > behavior like Torque/PBS regarding the mangli

Re: [OMPI users] OpenMPI 1.10.5 oversubscribing cores

2017-09-08 Thread r...@open-mpi.org
What you probably want to do is add --cpu-list a,b,c... to each mpirun command, where each one lists the cores you want to assign to that job. > On Sep 8, 2017, at 6:46 AM, twu...@goodyear.com wrote: > > > I posted this question last year and we ended up not upgrading to the newer > openmpi.

Re: [OMPI users] Slot count parameter in hostfile ignored

2017-09-08 Thread r...@open-mpi.org
>> Cheers, >>> >>> >>> Gilles >>> >>> >>> On 9/8/2017 4:19 PM, Maksym Planeta wrote: >>>> Indeed mpirun shows slots=1 per node, but I create allocation with >>>> --ntasks-per-node 24, so I do have all cores of the

Re: [OMPI users] Slot count parameter in hostfile ignored

2017-09-07 Thread r...@open-mpi.org
My best guess is that SLURM has only allocated 2 slots, and we respect the RM regardless of what you say in the hostfile. You can check this by adding --display-allocation to your cmd line. You probably need to tell slurm to allocate more cpus/node. > On Sep 7, 2017, at 3:33 AM, Maksym Planeta

Re: [OMPI users] Setting LD_LIBRARY_PATH for orted

2017-08-22 Thread r...@open-mpi.org
I’m afraid not - that only applies the variable to the application, not the daemons. Truly, your only real option is to put something in your .bashrc since you cannot modify the configure. Or, if you are running in a managed environment, you can ask to have your resource manager forward your e

Re: [OMPI users] mpirun 2.1.1 refuses to start a Torque 6.1.1.1 job if I change the scheduler to Maui 3.3.1

2017-08-09 Thread r...@open-mpi.org
sounds to me like your maui scheduler didn’t provide any allocated slots on the nodes - did you check $PBS_NODEFILE? > On Aug 9, 2017, at 12:41 PM, A M wrote: > > > Hello, > > I have just ran into a strange issue with "mpirun". Here is what happened: > > I successfully installed Torque 6.1.1

Re: [OMPI users] error building openmpi-v2.* with SUN C 5.15 on SuSE Linux

2017-08-08 Thread r...@open-mpi.org
Should be fixed for 2.x here: https://github.com/open-mpi/ompi/pull/4054 > On Jul 31, 2017, at 5:56 AM, Siegmar Gross > wrote: > > Hi, > > I've been able to install openmpi-v2.0.x-201707270322-239c439 and > openmpi-v2.x-201707271804-3b1e9fe on my

Re: [OMPI users] -host vs -hostfile

2017-07-31 Thread r...@open-mpi.org
?? Doesn't that tell pbs to allocate 1 node with 2 slots on it? I don't see where you get 4 Sent from my iPad > On Jul 31, 2017, at 10:00 AM, Mahmood Naderan wrote: > > OK. The next question is how touse it with torque (PBS)? currently we write > this directive > > Nodes=1:ppn=2 > > which

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread r...@open-mpi.org
s [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Reuti > Sent: Wednesday, July 26, 2017 9:25 AM > To: Open MPI Users > Subject: Re: [OMPI users] Questions about integration with resource > distribution systems > > >> Am 26.07.2017 um 15:09 schrieb r...@open-mpi.or

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread r...@open-mpi.org
, because I don’t understand how mpirun gets access to information > about RAM requirement. > > qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out > > > Regards, > Vipul > > >   <> > From: users [mailto:users-boun...@lists.open-mpi.or

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-25 Thread r...@open-mpi.org
> On Jul 25, 2017, at 3:48 PM, Kulshrestha, Vipul > wrote: > > I have several questions about integration of openmpi with resource queuing > systems. > > 1. > I understand that openmpi supports integration with various resource > distribution systems such as SGE, LSF, torque etc. > > I ne

Re: [OMPI users] Issue handling SIGUSR1 in OpenMPI

2017-07-25 Thread r...@open-mpi.org
ignal); > >printf("My pid is: %d\n", getpid()); > > for (;;) { > printf("\nSleeping for 10 seconds\n"); > sleep(10); > > MPI_Finalize(); > } > > When I run with 3 processes using mpirun -np 3 ./test, I expect the statement

Re: [OMPI users] Issue handling SIGUSR1 in OpenMPI

2017-07-25 Thread r...@open-mpi.org
I’m afraid we don’t currently support that use-case. We forward signals sent by the user to mpiexec (i.e., the user “hits” mpiexec with a signal), but we don’t do anything to support an application proc attempting to raise a signal and asking it to be propagated. If you are using OMPI master, o

Re: [OMPI users] OpenMPI 2.1.1, --map-to socket, application context files

2017-06-30 Thread r...@open-mpi.org
y parameters work with > the app context files. > > I tried an app context file of the format > > > -np 1 afftest01.exe; -np 1 afftest01.exe > > but it didn't work. Only rank 0 was created. Is there a different syntax that > will work? > > Sincerely, &g

Re: [OMPI users] OpenMPI 2.1.1, --map-to socket, application context files

2017-06-30 Thread r...@open-mpi.org
iles. > > How can I do this? > > Sincerely, > > Ted Sussman > > On 29 Jun 2017 at 19:09, r...@open-mpi.org wrote: > >> >> It´s a difficult call to make as to which is the correct behavior. In >> Example 1, you are executing a >> single app_c

Re: [OMPI users] OpenMPI 2.1.1, --map-to socket, application context files

2017-06-29 Thread r...@open-mpi.org
It’s a difficult call to make as to which is the correct behavior. In Example 1, you are executing a single app_context that has two procs in it. In Example 2, you are executing two app_contexts, each with a single proc in it. Now some people say that the two should be treated the same, with the

Re: [OMPI users] Node failure handling

2017-06-27 Thread r...@open-mpi.org
Okay, this should fix it - https://github.com/open-mpi/ompi/pull/3771 <https://github.com/open-mpi/ompi/pull/3771> > On Jun 27, 2017, at 6:31 AM, r...@open-mpi.org wrote: > > Actually, the error message is coming from mpirun to indicate that it lost > connection to one (or mo

Re: [OMPI users] Node failure handling

2017-06-27 Thread r...@open-mpi.org
r. I was not aware that this > capability exists in the master version of ORTE, but if it does then it makes > our life easier. > > George. > > > On Tue, Jun 27, 2017 at 6:14 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>> wr

Re: [OMPI users] Node failure handling

2017-06-26 Thread r...@open-mpi.org
--------- > [bud96:20652] [[8878,0],0] orted_cmd: received halt_vm cmd > [bud96:20652] [[8878,0],0] orted_cmd: all routes and children gone - exiting > ``` > > On 27 June 2017 at 12:19, r...@open-mpi.org wrote: >> Ah - you should hav

Re: [OMPI users] Node failure handling

2017-06-26 Thread r...@open-mpi.org
e recent work > on ompi master. Even though the mpiruns will all be associated to the > same ompi-server, do you think this could be sufficient to isolate the > failures? > > Cheers, > Tim > > > > On 10 June 2017 at 00:56, r...@open-mpi.org wrote: >> It has b

Re: [OMPI users] waiting for message either from MPI communicator or from TCP socket

2017-06-25 Thread r...@open-mpi.org
I suspect nobody is answering because the question makes no sense to us. You are implying that the TCP socket is outside of MPI since it isn’t tied to a communicator. If you want to setup a non-MPI communication path between two procs and monitor it separate from the MPI library, you can certain

  1   2   3   >