Re: [OMPI users] Question on Mapping and Binding

2014-12-22 Thread Ralph Castain
FWIW: it looks like we are indeed binding to core if PE is set, so if you are seeing something different, then we may have a bug somewhere. If you add —report-bindings to your cmd line, you should see where we bound the procs - does that look correct? > On Dec 22, 2014, at 9:49 AM, Ra

Re: [OMPI users] Question on Mapping and Binding

2014-12-22 Thread Ralph Castain
just no MPI standard way of providing it to you. > > Thank you, > Saliya > > On Mon, Dec 22, 2014 at 1:18 PM, Ralph Castain <mailto:r...@open-mpi.org>> wrote: > FWIW: it looks like we are indeed binding to core if PE is set, so if you are > seeing something

Re: [OMPI users] Question on Mapping and Binding

2014-12-22 Thread Ralph Castain
bout ways of getting the info, though they all involve a > collective operation. I’m working on an MPI extension for OMPI to access it > as each proc already has binding/location info for every proc in the job - > just no MPI standard way of providing it to you. > > >> >

Re: [OMPI users] processes hang with openmpi-dev-602-g82c02b4

2014-12-24 Thread Ralph Castain
I’d be a little cautious here - I’m not sure that hetero operations are completely fixed. The README is probably a bit over-stated (reflecting an earlier state), but I’m certain we haven’t extensively tested hetero operations and suspect there are still lingering issues. > On Dec 23, 2014, at

Re: [OMPI users] OMPI users] What could cause a segfault in OpenMPI?

2014-12-28 Thread Ralph Castain
Might also be worth checking to ensure that UD is enabled on your IB installation as we depend upon it for wireup of IB connections. > On Dec 28, 2014, at 12:18 AM, Gilles Gouaillardet > wrote: > > Where does the error occurs ? > MPI_Init ? > MPI_Finalize ? > In between ? > > In the first ca

Re: [OMPI users] OMPI users] What could cause a segfault in OpenMPI?

2014-12-28 Thread Ralph Castain
Have the admin try running the ibv_ud_pingpong test - that will exercise the portion of the system under discussion. > On Dec 28, 2014, at 2:31 PM, Saliya Ekanayake wrote: > > What I heard from the administrator is that, > > "The tests that work are the simple utilities ib_read_lat and ib_re

Re: [OMPI users] OMPI users] What could cause a segfault in OpenMPI?

2014-12-28 Thread Ralph Castain
tting ulimit -l unlimited worked. > > [1] http://lists.openfabrics.org/pipermail/general/2007-June/036941.html > <http://lists.openfabrics.org/pipermail/general/2007-June/036941.html> > > Saliya > > On Sun, Dec 28, 2014 at 5:57 PM, Ralph Castain <mailto:r...@

Re: [OMPI users] Help about Open CA

2015-01-02 Thread Ralph Castain
I suggest you contact the appropriate mailing list - this is Open MPI, and we don’t have anything to do with OpenCA > On Jan 2, 2015, at 5:24 AM, Huynh Thuc Cuoc wrote: > > I have the problem with OpenCA installed on CentOS with msg " > OpenCA Error: Server is not online or does not accept requ

Re: [OMPI users] difference of behaviour for MPI_Publish_name between openmpi-1.4.5 and openmpi-1.8.4

2015-01-07 Thread Ralph Castain
Hmmm…I confess this API gets little, if any, testing as it is so seldom used, so it is quite possible that a buglet has crept into it. I’ll take a look and try to have something in 1.8.5. Thanks! Ralph > On Jan 7, 2015, at 3:14 AM, Bernard Secher wrote: > > Hello, > > With the version openmp

Re: [OMPI users] difference of behaviour for MPI_Publish_name between openmpi-1.4.5 and openmpi-1.8.4

2015-01-07 Thread Ralph Castain
PI_Info info; > MPI_Info_create(&info); > MPI_Info_set(info, "ompi_unique", "true"); > > and then invoke MPI_Publish_name() with info instead of MPI_INFO_NULL > > an updated version of the program > > Cheers, > > Gilles > > On 2015/01/08 10

Re: [OMPI users] difference of behaviour for MPI_Publish_name between openmpi-1.4.5 and openmpi-1.8.4

2015-01-07 Thread Ralph Castain
> ompi_unique) > > Cheers, > > Gilles > > commit 7d2e3028d608163247975397a09f30dbe7bd192a > Author: Ralph Castain > Date: Wed Aug 14 04:24:17 2013 + > >Add unique info_key to documentation > > On 2015/01/08 11:51, Ralph Castain wrote: >> Do

Re: [OMPI users] libevent hangs on app finalize stage

2015-01-15 Thread Ralph Castain
Given that you could only reproduce it with either your custom compiler or by forcibly introducing a delay, is this indicating an issue with the custom compiler? It does seem strange that we don't see this anywhere else, given the number of times that code gets run. Only alternative solution I

Re: [OMPI users] libevent hangs on app finalize stage

2015-01-15 Thread Ralph Castain
you are working, yes? If so, give the changed version a try and see if your problem is resolved. > On Jan 15, 2015, at 12:55 AM, Ralph Castain wrote: > > Given that you could only reproduce it with either your custom compiler or by > forcibly introducing a delay, is this indicat

Re: [OMPI users] libevent hangs on app finalize stage

2015-01-15 Thread Ralph Castain
t won't cause any > other troubles. > > I've tried to update my master branch to the latest version (including your > fix) but now it just crashes for me on *all* benchmarks that I am trying > (both with gcc and our compiler). > > On 15.01.2015 18:57, Ralph Casta

Re: [OMPI users] libevent hangs on app finalize stage

2015-01-15 Thread Ralph Castain
Ah, indeed - I found the problem. Fix coming momentarily > On Jan 15, 2015, at 10:31 AM, Ralph Castain wrote: > > Hmmm…I’m not seeing a failure. Let me try on another system. > > > Modifying libevent is not a viable solution :-( > > >> On Jan 15, 2015, at 10:2

Re: [OMPI users] libevent hangs on app finalize stage

2015-01-15 Thread Ralph Castain
Fixed - sorry about that! > On Jan 15, 2015, at 10:39 AM, Ralph Castain wrote: > > Ah, indeed - I found the problem. Fix coming momentarily > >> On Jan 15, 2015, at 10:31 AM, Ralph Castain wrote: >> >> Hmmm…I’m not seeing a failure. Let me try on anothe

Re: [OMPI users] independent startup of orted and orterun

2015-01-21 Thread Ralph Castain
Theoretically, yes - see the ORCM project, which basically does what you ask. The launch system in there isn’t fully implemented yet, but the fundamental idea is valid and supports some range of capability. We used to have a cmd line option in ORTE for what you propose - it wouldn’t be too hard

Re: [OMPI users] independent startup of orted and orterun

2015-01-21 Thread Ralph Castain
Sorry - should have included the link to the ORCM project: https://github.com/open-mpi/orcm/wiki <https://github.com/open-mpi/orcm/wiki> > On Jan 21, 2015, at 8:16 AM, Ralph Castain wrote: > > Theoretically, yes - see the ORCM project, which basically does what you ask. >

Re: [OMPI users] independent startup of orted and orterun

2015-01-21 Thread Ralph Castain
scription of my > use-case. > > Thanks! > > Mark > > >> On 21 Jan 2015, at 17:18 , Ralph Castain wrote: >> >> Sorry - should have included the link to the ORCM project: >> >> https://github.com/open-mpi/orcm/wiki >> >> >>> On

Re: [OMPI users] independent startup of orted and orterun

2015-01-21 Thread Ralph Castain
ute a shell script that runs the tasks within that allocation - yes? > > The recent discussion we had on Spawn() on Cray's also originates here. > I want to free myself from having to use aprun for every task, and therefore > I am interested to see if ompi and/or orte can be the vehi

Re: [OMPI users] independent startup of orted and orterun

2015-01-21 Thread Ralph Castain
HTH Ralph > On Jan 21, 2015, at 12:52 PM, Mark Santcroos > wrote: > > Hi Ralph, > >> On 21 Jan 2015, at 21:20 , Ralph Castain wrote: >> >> Hi Mark >> >>> On Jan 21, 2015, at 11:21 AM, Mark Santcroos >>> wrote: >>> >&

Re: [OMPI users] 1.8.1 [SEC=UNCLASSIFIED]

2015-01-28 Thread Ralph Castain
I'm not entirely clear on the sequence of commands here. Is the user requesting a new allocation from maui/torque for each run? In this case, it's possible we aren't correctly picking up the external binding from Torque. This would likely be a bug we would need to fix. Or is the user obtaining a s

Re: [OMPI users] 1.8.1 query [SEC=UNCLASSIFIED]

2015-01-28 Thread Ralph Castain
econd 16 core job, > the cpu utilisation of each core of the first job immediately climbs back > to 100%. Any suggestions please, on where > I might start looking for the solution to this problem? > Greg Doherty > ANSTO > -- next part -- > HTML attachment

Re: [OMPI users] independent startup of orted and orterun

2015-02-01 Thread Ralph Castain
:07 PM, Mark Santcroos > wrote: > > Hi Ralph, > > All makes sense! Thanks a lot! > > Looking forward to your modifications. > Please don't hesitate to through things with rough-edges to me! > > Cheers, > > Mark > >> On 21 Jan 2015, at 23:

Re: [OMPI users] slurm openmpi 1.8.3 core bindings

2015-02-01 Thread Ralph Castain
Yeah, I don’t think that the slurm bindings will work for you. Problem is that the slurm directive gets applied to the launch of our daemon, not the application procs. So what you’ve done is bind our daemon to 3 cpus. This has nothing to do with the OMPI-Slurm integration - you told slurm to bin

Re: [OMPI users] orted seg fault when using MPI_Comm_spawn on more than one host

2015-02-01 Thread Ralph Castain
Well, I can reproduce it - but I won’t have time to address it until I return later this week. Whether or not procs get spawned onto a remote host depends on the number of local slots. You asked for 8 processes, so if there are more than 8 slots on the node, then it will launch them all on the

Re: [OMPI users] use accept/connect to merge a new intra-comm

2015-02-01 Thread Ralph Castain
Which OMPI version? > On Jan 25, 2015, at 5:41 AM, haozi wrote: > > Hi guys. > > I am interested in an example from OpenMPI, as attachment: > singleton_client_server.c. > So, I wrote another example. And some error happened. > My example includes two servers and one client. > First, server1

Re: [OMPI users] independent startup of orted and orterun

2015-02-02 Thread Ralph Castain
/odls_base_default_fns.c at line 433 > > * It only seems to work for single nodes (probably related to the previous > point). > > > Is this all expected behaviour given the current implementation? > > > Thanks! > > Mark > > > > > On 02 Feb 2015, at

Re: [OMPI users] independent startup of orted and orterun

2015-02-02 Thread Ralph Castain
21:26 , Mark Santcroos > wrote: > > > > Ok, let me check on some other systems too though, it might be Cray > specific. > > > > > >> On 02 Feb 2015, at 19:07 , Ralph Castain wrote: > >> > >> Yikes - looks like a bug crept into there at the last m

Re: [OMPI users] orted seg fault when using MPI_Comm_spawn on more than one host

2015-02-03 Thread Ralph Castain
BTW: I've confirmed this only happens if you provide the hostfile info key. A simple comm_spawn without the hostfile key works just fine. On Sun, Feb 1, 2015 at 8:53 PM, Ralph Castain wrote: > Well, I can reproduce it - but I won’t have time to address it until I > return late

Re: [OMPI users] use accept/connect to merge a new intra-comm

2015-02-03 Thread Ralph Castain
That's pretty ancient - could you try the nightly 1.8 tarball? On Mon, Feb 2, 2015 at 5:58 PM, haozi wrote: > mpiexec (OpenRTE) 1.4.3 > > > > > > > At 2015-02-02 12:54:11, "Ralph Castain" wrote: > > Which OMPI version? > > On Jan 25, 20

Re: [OMPI users] prob in running two mpi merged program

2015-02-03 Thread Ralph Castain
I'm afraid I don't quite understand what you are saying, so let's see if I can clarify. You have two fortran MPI programs. You start one using "mpiexec". You then start the other one as a singleton - i.e., you just run "myapp" without using mpiexec. The two apps are attempting to execute an MPI_Con

Re: [OMPI users] independent startup of orted and orterun

2015-02-03 Thread Ralph Castain
wrote: > On 03 Feb 2015, at 0:20 , Ralph Castain wrote: > > Okay, thanks - I'll get on it tonight. Looks like a fairly simple bug, > so hopefully I'll have it ironed out tonight. > > Sorry, I was not completely accurate. Let me be more specific: > > * The orte-subm

Re: [OMPI users] orted seg fault when using MPI_Comm_spawn on more than one host

2015-02-03 Thread Ralph Castain
If you add the following to your environment, you should run on multiple nodes: OMPI_MCA_rmaps_base_mapping_policy=node OMPI_MCA_orte_default_hostfile= The first tells OMPI to map-by node. The second passes in your default hostfile so you don't need to specify it as an Info key. HTH Ralph On T

Re: [OMPI users] independent startup of orted and orterun

2015-02-03 Thread Ralph Castain
Hmmmno, I wasn't seeing those warnings/errors, but I only ran one submit job. I'll investigate. On Tue, Feb 3, 2015 at 11:38 AM, Mark Santcroos wrote: > Hi Ralph, > > > On 03 Feb 2015, at 16:28 , Ralph Castain wrote: > > I think I fixed some of the handshake iss

Re: [OMPI users] independent startup of orted and orterun

2015-02-03 Thread Ralph Castain
ntegration with our own tool and the ORTE > abstraction maps like a charm! > ( > https://github.com/radical-cybertools/radical.pilot/commit/2d36e886081bf8531097edfc95ada1826257e460 > ) > > > On 03 Feb 2015, at 20:38 , Mark Santcroos > wrote: > > > > Hi Ralph, &

Re: [OMPI users] orted seg fault when using MPI_Comm_spawn on more than one host

2015-02-03 Thread Ralph Castain
n fault. > > Evan > > On Tue, Feb 3, 2015 at 10:09 AM, Ralph Castain wrote: > >> If you add the following to your environment, you should run on multiple >> nodes: >> >> OMPI_MCA_rmaps_base_mapping_policy=node >> OMPI_MCA_orte_default_hostfile= >>

Re: [OMPI users] orted seg fault when using MPI_Comm_spawn on more than one host

2015-02-03 Thread Ralph Castain
ument of MPI_Comm_spawn with > MPI_INFO_NULL. > > On Tue, Feb 3, 2015 at 5:54 PM, Ralph Castain wrote: > >> When running your comm_spawn code, did you remove the Info key code? You >> wouldn't need to provide a hostfile or hosts any more, which is why it >> should

Re: [OMPI users] use accept/connect to merge a new intra-comm

2015-02-03 Thread Ralph Castain
support it (I believe it might), but otherwise I'll make sure we do. Sorry about that - I'm afraid that's a use-case we've never seen before :-/ Ralph On Mon, Feb 2, 2015 at 9:21 PM, Ralph Castain wrote: > That's pretty ancient - could you try the nightly 1.8 t

Re: [OMPI users] independent startup of orted and orterun

2015-02-04 Thread Ralph Castain
Feb 2015, at 2:53 , Ralph Castain wrote: > > > > Appreciate your patience! I'm somewhat limited this week by being on > travel to our HQ, so I don't have access to my usual test cluster. I'll be > better situated to complete the implementation once I get home. >

Re: [OMPI users] orted seg fault when using MPI_Comm_spawn on more than one host

2015-02-05 Thread Ralph Castain
alling the snapshot on the remote nodes though. Should I do that? It > looked to me like this error happened well before we got to a remote node, so > that's why I didn't. > > Your thoughts? > > Evan > > > > On Tue, Feb 3, 2015 at 7:40 PM, Ralph Castain

Re: [OMPI users] orted seg fault when using MPI_Comm_spawn on more than one host

2015-02-06 Thread Ralph Castain
mentioned in the > pull request? Should it handle symlinks? Apologies if I'm misguided. > > Evan > > On Thu, Feb 5, 2015 at 9:51 AM, Ralph Castain <mailto:r...@open-mpi.org>> wrote: > Okay, I tracked this down - thanks for your patience! I have a fix pending >

Re: [OMPI users] cross-compiling openmpi-1.8.4 with static linking

2015-02-08 Thread Ralph Castain
Well, the first thing is that there is no reason to cross compile in this arrangement. Your target and host are the same, and the configuration won’t do anything with it. Normally you would set host and target. However, like I said, in this case you are providing the same argument to both, and

Re: [OMPI users] Is there a way to define a dynamic installation path for OpenMPI?

2015-02-16 Thread Ralph Castain
Please see the following: http://www.open-mpi.org/faq/?category=building#installdirs > On Feb 16, 2015, at 10:38 AM, Mehmet Belgin > wrote: > > I am sure the subject line is confusing, so let me try to clarify. We > installed open

Re: [OMPI users] 1.8.1 query [SEC=UNCLASSIFIED]

2015-02-16 Thread Ralph Castain
Yo Greg Is this still an issue for you? If so, could you please provide me with the requested info? I'm getting ready to start the release cycle on 1.8.5, so now would be the time to address this > On Jan 28, 2015, at 1:55 PM, Ralph Castain wrote: > > Ah, indeed - sounds l

Re: [OMPI users] mpirun error on MAC OSX 10.6.8

2015-02-17 Thread Ralph Castain
OSX 10.6.8?? Are you sure? That is incredibly old - I haven’t seen such a system in quite some time. > On Feb 17, 2015, at 8:04 AM, Tarandeep Kalra wrote: > > Hello friends, > > I am using mpi for the first time on my MAC OSX (10.6.8). The MPI that I > installed is Open MPI. I have installe

Re: [OMPI users] mpirun error on MAC OSX 10.6.8

2015-02-17 Thread Ralph Castain
nk is old. > > Taran > > On Tue, Feb 17, 2015 at 11:21 AM, Ralph Castain <mailto:r...@open-mpi.org>> wrote: > OSX 10.6.8?? Are you sure? That is incredibly old - I haven’t seen such a > system in quite some time. > > >> On Feb 17, 2015, at 8:04 AM, Tarand

Re: [OMPI users] Slave machine shutdown

2015-02-23 Thread Ralph Castain
I would have expected the job to automatically abort if any processes were located on the slave that shut down - that is the default behavior. > On Feb 23, 2015, at 8:07 AM, Aleix Gimeno Vives wrote: > > Dear Open MPI support team, > > I am running a program using 1 master machine and 4 slave

Re: [OMPI users] Slave machine shutdown

2015-02-23 Thread Ralph Castain
ain? (the job will take several days, so I'd > rather not run it again if possible). > > Regards, > > Aleix > > 2015-02-23 17:20 GMT+01:00 Ralph Castain <mailto:r...@open-mpi.org>>: > I would have expected the job to automatically abort if any processes w

Re: [OMPI users] machinefile binding error

2015-02-24 Thread Ralph Castain
Ah, now that’s a “feature” :-) Seriously, it *is* actually a new feature of the 1.8 series. We now go out and actually sense the number of cores on the system and set the number of slots to that value unless you tell us otherwise. It was something people continually nagged us about, and so we m

Re: [OMPI users] machinefile binding error

2015-02-24 Thread Ralph Castain
made to bind to that would result in binding more > processes than cpus on a resource: > >Bind to: NONE >Node:tebow125 >#processes: 2 >#cpus: 1 > > You can override this protection by adding the "overload-allowed" > option to y

Re: [OMPI users] machinefile binding error

2015-02-25 Thread Ralph Castain
ts/users/2014/05/24467.php> > > Thanks for the help, > --Jack > > > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Tuesday, February 24, 2015 3:24 PM > To: Open MPI Users > Subject: Re: [OMPI users] machinefile binding error &

Re: [OMPI users] using MPI_Comm_spawn in OpenMPI 1.8.4 with SLURM

2015-02-26 Thread Ralph Castain
Pretty sure Slurm doesn’t support dynamic spawn - we don’t support it when direct launched using srun. It is only supported when launched by mpiexec Might change in future releases, assuming Slurm has or adds the support > On Feb 26, 2015, at 5:26 AM, Lev Givon wrote: > > I've been using Open

Re: [OMPI users] compiling OpenMPI 1.8.4 on system with multiarched SLURM libs (Ubuntu 15.04 prerelease)

2015-02-26 Thread Ralph Castain
Looks like we don’t have a separate pmi-libdir configure option, so it may not work. I can add one to the master and set to pull it across to 1.8.5. > On Feb 26, 2015, at 1:07 PM, Lev Givon wrote: > > I recently tried to build OpenMPI 1.8.4 on a daily release of what will > eventually become

Re: [OMPI users] LAM/MPI -> OpenMPI

2015-02-27 Thread Ralph Castain
> On Feb 27, 2015, at 6:42 AM, Sasso, John (GE Power & Water, Non-GE) > wrote: > > Unfortunately, we have a few apps which use LAM/MPI instead of OpenMPI (and > this is something I have NO control over). I have been making an effort to > try and convince those who handle such apps to move o

Re: [OMPI users] compiling OpenMPI 1.8.4 on system with multiarched SLURM libs (Ubuntu 15.04 prerelease)

2015-03-01 Thread Ralph Castain
http://www.open-mpi.org/nightly/v1.8/ > On Feb 26, 2015, at 1:19 PM, Lev Givon wrote: > > Received from Ralph Castain on Thu, Feb 26, 2015 at 04:14:05PM EST: >>> On Feb 26, 2015, at 1:07 PM, Lev Givon wrote: >>> >>> I recently tried to build Open

Re: [OMPI users] compiling OpenMPI 1.8.4 on system with multiarched SLURM libs (Ubuntu 15.04 prerelease)

2015-03-04 Thread Ralph Castain
Rats - the backport missed that part. I’ll fix it. Thanks! > On Mar 3, 2015, at 9:41 AM, Lev Givon wrote: > > Received from Ralph Castain on Sun, Mar 01, 2015 at 10:31:15AM EST: >>> On Feb 26, 2015, at 1:19 PM, Lev Givon wrote: >>> >>> Received from Ralph C

Re: [OMPI users] Strange rank 0 behavior on Mac OS

2015-03-04 Thread Ralph Castain
Have you tried just running the example codes we provide? If you run the ring_c example, with odd numbers, does it work? > On Mar 4, 2015, at 12:30 PM, Oliver wrote: > > hi all, > > I have openmpi 1.8.4 installed on a Mac latop. > > When running a mpi job on localhost for testing, I've notic

Re: [OMPI users] Process Binding Warning

2015-03-12 Thread Ralph Castain
You are missing the numactl and numactl-devel packages on that compute node, and so we cannot bind the memory to the same location as your proc. As the warning indicates, it can impact performance but won't stop you from running > On Mar 12, 2015, at 12:51 PM, Saliya Ekanayake wrote: > > Hi,

Re: [OMPI users] problem with MPI_Comm_spawn in 1.6.5 but not 1.4.3 or 1.8.4

2015-03-12 Thread Ralph Castain
Indeed sounds like a bug in 1.6.5, but we no longer maintain that series. I'm afraid an upgrade to the 1.8 series is the only solution. > On Mar 12, 2015, at 6:11 PM, Chris Paciorek > wrote: > > I'm having an issue with MPI_Comm_spawn not starting workers on the > nodes provided via -machinef

Re: [OMPI users] Process Binding Warning

2015-03-13 Thread Ralph Castain
You shouldn’t have to do so > On Mar 13, 2015, at 7:14 AM, Saliya Ekanayake wrote: > > Thanks Ralph. Do I need to specify where to find numactl-devel when compiling > OpenMPI? > > On Thu, Mar 12, 2015 at 7:17 PM, Ralph Castain <mailto:r...@open-mpi.org>> wrote: &g

Re: [OMPI users] problem with MPI_Comm_spawn in 1.6.5 but not 1.4.3 or 1.8.4

2015-03-13 Thread Ralph Castain
Appreciate it - but as I said in my prior note, we aren’t maintaining 1.6 any more, so an upgrade to 1.8 (which worked, as you noted) is in order > On Mar 13, 2015, at 8:23 AM, Chris Paciorek > wrote: > > And the promised attachment. > > On Thu, Mar 12, 2015 at 6:11 PM, Chris Paciorek > mail

Re: [OMPI users] will openmpi with infiniband support fall back to ethernet if infiniband not available?

2015-03-16 Thread Ralph Castain
Should work just fine. If it finds IB libraries on the machines that don’t have IB hardware, you might see a warning that it couldn’t find an IB NIC (not sure, but I think it might). No configuration tweaks should be required. > On Mar 16, 2015, at 3:59 AM, Pablo Escobar Lopez > wrote: > >

Re: [OMPI users] compiling OpenMPI 1.8.4 on system with multiarched SLURM libs (Ubuntu 15.04 prerelease)

2015-03-16 Thread Ralph Castain
This is now in the nightly tarball - can you see if this meets the need? > On Mar 4, 2015, at 7:03 AM, Ralph Castain wrote: > > Rats - the backport missed that part. I’ll fix it. Thanks! > >> On Mar 3, 2015, at 9:41 AM, Lev Givon wrote: >> >> Received from Ralp

Re: [OMPI users] monitoring the status of processors

2015-03-17 Thread Ralph Castain
Not at the moment - at least, not integrated into OMPI at this time. We used to have sensors for such purposes in the OMPI code itself, but they weren’t used and so we removed them. The resource manager generally does keep track of such things - see for example ORCM: https://github.com/open-mp

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
Hey Mark Your original error flag indicates that you are picking up a connection from some proc built against a different OMPI installation. It’s a very low-level check that looks for matching version numbers. Not sure who is trying to connect, but that is the problem. Check you LD_LIBRARY_PAT

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
Yeah, I removed the need for this flag on Cray. As I said in my other note, this is a red-herring - the issue is in the mismatched libraries. > On Mar 25, 2015, at 8:36 AM, Mark Santcroos > wrote: > > >> On 25 Mar 2015, at 15:46 , Howard Pritchard wrote: >> turn off the disable getpwuid. >

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
space and PATH and LD_LIBRARY_PATH look good. > Any suggestion on how to get more relevant debugging info above the table? > > Thanks > > Mark > > >> On 25 Mar 2015, at 16:33 , Ralph Castain wrote: >> >> Hey Mark >> >> Your original error flag in

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
ng for connect ack from [[8913,0],1] > [nid25257:09350] [[8913,0],0] connect ack received from [[8913,0],1] > [nid25257:09350] [[8913,0],0] connect-ack version from [[8913,0],1] matches > ours > [nid25257:09350] [[8913,0],0] ORTE_ERROR_LOG: Authentication failed in file > ../../../..

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
OHO! You have munge running on the head node, but not on the backends! Okay, all you have to do is set the MCA param “sec” to “basic” in your environment, or add “-mca sec basic” on your cmd line > On Mar 25, 2015, at 8:53 AM, Mark Santcroos > wrote: > > nid25257:09727] sec: munge validate_c

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
It’s working just fine, Howard - we found the problem. > On Mar 25, 2015, at 9:12 AM, Howard Pritchard wrote: > > Mark, > > If you're wanting to use the orte-submit feature, you will need to get mpirun > working. > > Could you rerun using the mpirun launch method but with > > --mca oob_base_

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
ngs > just to get something to work. sounds like time for yet another cray > specific component. > > > > 2015-03-25 10:14 GMT-06:00 Ralph Castain <mailto:r...@open-mpi.org>>: > It’s working just fine, Howard - we found the problem. > >> On Mar

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
becomes difficult to resolve authentications. Let me ponder a bit. We can resolve it easily enough, but I want to ensure we don’t do it by creating a security hole. > On Mar 25, 2015, at 9:25 AM, Mark Santcroos > wrote: > > >> On 25 Mar 2015, at 17:06 , Ralph Castain wrot

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
04, inblocks > ~35, outblocks ~58 > > >> On 25 Mar 2015, at 17:29 , Ralph Castain wrote: >> >> Yeah, what’s happening is that mpirun is picking one security mechanism for >> authenticating connections, but the backend daemons are picking another, and >> h

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
Much appreciated! Interesting problem/configuration :-) > On Mar 25, 2015, at 9:42 AM, Mark Santcroos > wrote: > > >> On 25 Mar 2015, at 17:39 , Ralph Castain wrote: >> Not surprising - I’m surprised to find munge on the mom’s node anyway given >> that you are

Re: [OMPI users] compiling OpenMPI 1.8.4 on system with multiarched SLURM libs (Ubuntu 15.04 prerelease)

2015-03-25 Thread Ralph Castain
Thanks for confirming it!! > On Mar 25, 2015, at 10:57 AM, Lev Givon wrote: > > Received from Ralph Castain on Wed, Mar 04, 2015 at 10:03:06AM EST: >>> On Mar 3, 2015, at 9:41 AM, Lev Givon wrote: >>> >>> Received from Ralph Castain on Sun, Mar 01, 201

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
, see if this fixes the problem. https://github.com/open-mpi/ompi/pull/497 <https://github.com/open-mpi/ompi/pull/497> > On Mar 25, 2015, at 9:43 AM, Ralph Castain wrote: > > Much appreciated! Interesting problem/configuration :-) > >> On Mar 25, 2015, at 9:42 AM, M

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
> On Mar 25, 2015, at 1:59 PM, Mark Santcroos > wrote: > > Hi Ralph, > >> On 25 Mar 2015, at 21:25 , Ralph Castain wrote: >> I think I have this resolved, >> though that I still suspect their is something wrong on that system. You >> shouldn’t have

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
9:55:16 (1427331316) > TTL: 300 > CIPHER: aes128 (4) > MAC: sha1 (3) > ZIP: none (0) > UID: gouaillardet (1011) > GID: gouaillardet (1011) > LENGTH: 7 > > coucou > > On 2015/03

Re: [OMPI users] open mpi on blue waters

2015-03-26 Thread Ralph Castain
“mixed mode” setups. I’ll apply the PR. I’ve also thought of a way to resolve the reverse problem (where the connection initiator is in the higher security zone), but I’ll do that one tomorrow. HTH Ralph > On Mar 25, 2015, at 7:24 PM, Ralph Castain wrote: > > I’ve asked Mark to check

Re: [OMPI users] open mpi on blue waters

2015-03-26 Thread Ralph Castain
need to work in multiple security domains is going to exist into the future. HTH Ralph > > If I misunderstood the drift, please ignore ;-) > > Mark > > >> On 26 Mar 2015, at 5:38 , Gilles Gouaillardet >> wrote: >> >> On 2015/03/26 13:00, Ralph Castain w

Re: [OMPI users] open mpi on blue waters

2015-03-26 Thread Ralph Castain
> On Mar 25, 2015, at 9:38 PM, Gilles Gouaillardet > wrote: > > On 2015/03/26 13:00, Ralph Castain wrote: >> Well, I did some digging around, and this PR looks like the right solution. > ok then :-) > > following stuff is not directly related to ompi, but you migh

Re: [OMPI users] Errors on POWER8 Ubuntu 14.04u2

2015-03-26 Thread Ralph Castain
Could you please send us your configure line? > On Mar 26, 2015, at 4:47 PM, Hammond, Simon David (-EXP) > wrote: > > Hi everyone, > > We are trying to compile custom installs of OpenMPI 1.8.4 on our POWER8 > Ubuntu system. We can configure and build correctly but when running > ompi_info we

Re: [OMPI users] Connection problem on Linux cluster

2015-03-28 Thread Ralph Castain
You mentioned running this in a VM - is that IP address correct for getting across the VMs? > On Mar 28, 2015, at 8:38 AM, LOTFIFAR F. wrote: > > Hi , > > I am wondering how can I solve this problem. > System Spec: > 1- Linux cluster with two nodes (master and slave) with Ubuntu 12.04 LTS

Re: [OMPI users] Connection problem on Linux cluster

2015-03-28 Thread Ralph Castain
he jobs with both of them (both ip > addresses) but it makes no difference. > I have just installed openmpi 1.6.5 to see how does this version works. In > this case I get nothing and I have to press Crtl+c. not output or error is > shown. > > > From: users [users-boun...

Re: [OMPI users] Connection problem on Linux cluster

2015-03-28 Thread Ralph Castain
uerying component [rsh] > [fehg-node-7:02660] mca:base:select:( plm) Query of component [rsh] set > priority to 10 > [fehg-node-7:02660] mca:base:select:( plm) Selected component [rsh] > > and it freezes here. > > > Regards, > Karos > > From: users [use

Re: [OMPI users] Connection problem on Linux cluster

2015-03-28 Thread Ralph Castain
t; Subject: Re: [OMPI users] Connection problem on Linux cluster > > fehg_node_1 and fehg-node-7 are the same. it is just a typo. > > Correction: VM names are fehg-node-0 and fehg-node-7. > > > Regards, > > From: users [users-boun...@open-mpi.org <mailto:users-bou

Re: [OMPI users] Connection problem on Linux cluster

2015-03-28 Thread Ralph Castain
configured and sent it to us. > On Mar 28, 2015, at 1:36 PM, LOTFIFAR F. wrote: > > surprisingly, it is all that I get!! nothing else come after. This is the > same for openmpi-1.6.5. > > > From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain > [r...@

Re: [OMPI users] Connection problem on Linux cluster

2015-03-28 Thread Ralph Castain
rsh (MCA v2.0, API v2.0, Component v1.6.5) >> MCA errmgr: default (MCA v2.0, API v2.0, Component v1.6.5) >> MCA ess: env (MCA v2.0, API v2.0, Component v1.6.5) >> MCA ess: hnp (MCA v2.0, API v2.0, Component v1.6.5) >>

Re: [OMPI users] Connection problem on Linux cluster

2015-03-29 Thread Ralph Castain
entities) *between* > your two VMs may be effecting fireballing policies. This is quite common in > cloud environments. > > - You can use any TCP ping-pong test to verify TCP connectivity between VMs > -- i.e., programs that use random TCP ports to communicate; not the "

Re: [OMPI users] Question about mpirun mca_oob_tcp_recv_handler error.

2016-05-10 Thread Ralph Castain
This usually indicates that the remote process is using a different OMPI version. You might check to ensure that the paths on the remote nodes are correct. On Tue, May 10, 2016 at 8:46 AM, lzfneu wrote: > Hi everyone, > > I have a problem to consult you, when I cd to the /examples folder > cont

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread Ralph Castain
This is a known problem - I committed the fix for PSM with a link down just today. > On May 11, 2016, at 7:34 PM, dpchoudh . wrote: > > Hello Gilles > > Thank you for your continued support. With your help, I have a better > understanding of what is happening. Here are the details. > > 1. Y

Re: [OMPI users] slot problem on "SUSE Linux, Enterprise Server 12 (x86_64)"

2016-05-14 Thread Ralph Castain
> On May 7, 2016, at 1:13 AM, Siegmar Gross > wrote: > > Hi, > > yesterday I installed openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux > Enterprise Server 12 (x86_64)" with Sun C 5.13 and gcc-5.3.0. The > following programs don't run anymore. > > > loki hello_2 112 ompi_info | grep -e "OPAL

Re: [OMPI users] slot problem on "SUSE Linux, Enterprise Server 12 (x86_64)"

2016-05-15 Thread Ralph Castain
:0-5 hello_2_slave_mpi > Process 1 of 4 running on loki > Process 2 of 4 running on loki > Process 0 of 4 running on loki > Process 3 of 4 running on loki > ... > > > Hopefully you know what happens and why it happens so that > you can fix the problem for openmpi-1.10.x and openm

Re: [OMPI users] slot problem on "SUSE Linux, Enterprise Server 12 (x86_64)"

2016-05-15 Thread Ralph Castain
at the slot-list is an MCA param and can have only one value. Probably something for the future. > On May 15, 2016, at 7:55 AM, Ralph Castain wrote: > > You are showing different cmd lines then last time :-) > > I’ll try to take a look as time permits > >> On May 15, 2

Re: [OMPI users] Question about mpirun mca_oob_tcp_recv_handler error.

2016-05-16 Thread Ralph Castain
gt; > > On Tuesday, May 17, 2016, Dave Love <mailto:d.l...@liverpool.ac.uk>> wrote: > Ralph Castain > writes: > > > This usually indicates that the remote process is using a different OMPI > > version. You might check to ensure that the paths on the remote nodes

Re: [OMPI users] ORTE has lost communication

2016-05-16 Thread Ralph Castain
We used to do so, but don’t currently support that model - folks are working on restoring it. No timetable, though I don’t think it will be too much longer before it is in master. Can’t say when it will hit release > On May 16, 2016, at 8:25 AM, Zabiziz Zaz wrote: > > Hi Llolsten, > the proble

Re: [OMPI users] ORTE has lost communication

2016-05-16 Thread Ralph Castain
I honestly have no idea… > On May 16, 2016, at 10:39 AM, Zabiziz Zaz wrote: > > Ok. > Could you please tell me the latest version that is supported? > > Regards, > Guilherme. > > On Mon, May 16, 2016 at 12:30 PM, Ralph Castain <mailto:r...@open-mpi.org>>

Re: [OMPI users] MPI_Finalize() small issue with mutex destruction

2016-05-19 Thread Ralph Castain
No issue at all - I’ll check the latest versions and ensure the problem is present in them. Out of curiosity - what version of OMPI are you describing? > On May 19, 2016, at 9:06 AM, Nicolas Joly wrote: > > > Hi, > > I just discovered a small issue with MPI_Finalize(). When sanity > checking

Re: [OMPI users] MPI_Finalize() small issue with mutex destruction

2016-05-19 Thread Ralph Castain
Here’s the 1.10 version of the PR: https://github.com/open-mpi/ompi-release/pull/1172 <https://github.com/open-mpi/ompi-release/pull/1172> > On May 19, 2016, at 9:18 AM, Nicolas Joly wrote: > > On Thu, May 19, 2016 at 09:13:15AM -0700, Ralph Castain wrote: >> No issue at

<    1   2   3   4   5   6   7   8   9   10   >