[OMPI users] Question on using Github to see bugs fixed in past versions

2016-10-04 Thread Blosch, Edwin L
Apologies for the dumb question... There used to be a way to dive in to see exactly what bugs and features came into 1.10.4, 1.10.3, and on back to 1.8.8. Is there a way to do that on github? Ed ___ users mailing list users@lists.open-mpi.org https:/

Re: [OMPI users] EXTERNAL: Re: Question on run-time error "ORTE was unable to reliably start"

2016-08-11 Thread Blosch, Edwin L
a similar error? Does the application call comm_spawn, for example? Or is it a script that eventually attempts to launch another job? > On Jul 28, 2016, at 6:24 PM, Blosch, Edwin L wrote: > > Cray CS400, RedHat 6.5, PBS Pro (but OpenMPI is built --without-tm), > OpenMPI 1.8

Re: [OMPI users] EXTERNAL: Re: Question on run-time error "ORTE was unable to reliably start"

2016-07-28 Thread Blosch, Edwin L
] Question on run-time error "ORTE was unable to reliably start" What kind of system was this on? ssh, slurm, ...? > On Jul 28, 2016, at 1:55 PM, Blosch, Edwin L wrote: > > I am running cases that are starting just fine and running for a few hours, > then they die with a mes

[OMPI users] Question on run-time error "ORTE was unable to reliably start"

2016-07-28 Thread Blosch, Edwin L
I am running cases that are starting just fine and running for a few hours, then they die with a message that seems like a startup type of failure. Message shown below. The message appears in standard output from rank 0 process. I'm assuming there is a failing card or port or something. What

[OMPI users] Question on OpenMPI backwards compatibility

2016-02-26 Thread Blosch, Edwin L
I am confused about backwards-compatibility. FAQ #111 says: Open MPI reserves the right to break ABI compatibility at new feature release series. . MPI applications compiled/linked against Open MPI 1.6.x will not be ABI compatible with Open MPI 1.7.x But the versioning documentation says:

Re: [OMPI users] How can I discover valid values for MCA parameters?

2015-05-29 Thread Blosch, Edwin L
place and installed in another, even after I set OPAL_PREFIX to reflect the installed location. From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Blosch, Edwin L Sent: Friday, May 29, 2015 11:06 AM To: Open MPI Users (us...@open-mpi.org) Subject: EXTERNAL: [OMPI users] How can I discover

[OMPI users] How can I discover valid values for MCA parameters?

2015-05-29 Thread Blosch, Edwin L
Sometimes I want to use one of the option flags, for example today it is mtl_mxm_verbose. How do I discover the valid possible values of various MCA parameters? I've tried ompi_info --all but it does not show the possible values, only the current value I've tried ompi_info --param all b

Re: [OMPI users] EXTERNAL: Re: Question on mapping processes to hosts file

2014-11-11 Thread Blosch, Edwin L
Nov 11, 2014, at 6:11 AM, Blosch, Edwin L mailto:edwin.l.blo...@lmco.com>> wrote: OK, that’s what I was suspecting. It’s a bug, right? I asked for 4 processes and I supplied a host file with 4 lines in it, and mpirun didn’t launch the processes where I told it to launch them. Actual

Re: [OMPI users] EXTERNAL: Re: Question on mapping processes to hosts file

2014-11-11 Thread Blosch, Edwin L
file to override the default behavior On Nov 7, 2014, at 8:52 AM, Blosch, Edwin L mailto:edwin.l.blo...@lmco.com>> wrote: Here’s my command: /bin/mpirun --machinefile hosts.dat -np 4 Here’s my hosts.dat file: % cat hosts.dat node01 node02 node03 node04 All 4 ranks are launched on

[OMPI users] Question on mapping processes to hosts file

2014-11-07 Thread Blosch, Edwin L
Here's my command: /bin/mpirun --machinefile hosts.dat -np 4 Here's my hosts.dat file: % cat hosts.dat node01 node02 node03 node04 All 4 ranks are launched on node01. I don't believe I've ever seen this before. I had to do a sanity check, so I tried MVAPICH2-2.1a and got what I expected:

Re: [OMPI users] EXTERNAL: Re: Application hangs in 1.8.1 related to collective operations

2014-09-28 Thread Blosch, Edwin L
post the output when you run with mpirun --mca coll_base_verbose 10 "other mpirun args you've been using" that would be great Also, if you know the sizes (number of elements) involved in the reduce and allreduce operations it would be helpful to know this as well. Thanks, H

[OMPI users] Application hangs in 1.8.1 related to collective operations

2014-09-25 Thread Blosch, Edwin L
I had an application suddenly stop making progress. By killing the last process out of 208 processes, then looking at the stack trace, I found 3 of 208 processes were in an MPI_REDUCE call. The other 205 had progressed in their execution to another routine, where they were waiting in an unrela

[OMPI users] Question on process and memory affinity with 1.8.1

2014-07-21 Thread Blosch, Edwin L
In making the leap from 1.6 to 1.8, how can I check whether or not process/memory affinity is supported? I've built OpenMPI on a system where the numactl-devel package was not installed, and another where it was, but I can't see anything in the output of ompi_info that suggests any difference b

Re: [OMPI users] Problem building OpenMPI 1.8 on RHEL6

2014-04-07 Thread Blosch, Edwin L
Sent: Tuesday, April 01, 2014 11:20 AM To: Open MPI Users Subject: Re: [OMPI users] Problem building OpenMPI 1.8 on RHEL6 On Apr 1, 2014, at 10:26 AM, "Blosch, Edwin L" wrote: > I am getting some errors building 1.8 on RHEL6. I tried autoreconf as > suggested, but it failed f

Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs with OpenMPI 1.6.5 rsh

2014-04-07 Thread Blosch, Edwin L
k the remote shell. On Apr 7, 2014, at 1:53 PM, Blosch, Edwin L wrote: > Thanks Noam, that makes sense. > > Yes, I did mean to do ". hello" (with space in between). That was an attempt > to replicate whatever OpenMPI is doing. > > In the first post I mentioned that

Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs with OpenMPI 1.6.5 rsh

2014-04-07 Thread Blosch, Edwin L
m: users [mailto:users-boun...@open-mpi.org] On Behalf Of Noam Bernstein Sent: Monday, April 07, 2014 3:41 PM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs with OpenMPI 1.6.5 rsh On Apr 7, 2014, at 4:36 PM, Blosch, Edwin L wrote: > I

Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs with OpenMPI 1.6.5 rsh

2014-04-07 Thread Blosch, Edwin L
22:04 schrieb Blosch, Edwin L: > I am submitting a job for execution under SGE. My default shell is /bin/csh. Where - in SGE or on the interactive command line you get? > The script that is submitted has #!/bin/bash at the top. The script runs on > the 1st node allocated to the

Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs with OpenMPI 1.6.5 rsh

2014-04-07 Thread Blosch, Edwin L
shell when launching jobs with OpenMPI 1.6.5 rsh Looks to me like the problem is here: /bin/.: Permission denied. Appears you don't have permission to exec bash?? On Apr 7, 2014, at 1:04 PM, Blosch, Edwin L mailto:edwin.l.blo...@lmco.com>> wrote: I am submitting a job for execution u

[OMPI users] Problem with shell when launching jobs with OpenMPI 1.6.5 rsh

2014-04-07 Thread Blosch, Edwin L
I am submitting a job for execution under SGE. My default shell is /bin/csh. The script that is submitted has #!/bin/bash at the top. The script runs on the 1st node allocated to the job. The script runs a Python wrapper that ultimately issues the following mpirun command: /apps/local/test/

[OMPI users] Problem building OpenMPI 1.8 on RHEL6

2014-04-01 Thread Blosch, Edwin L
I am getting some errors building 1.8 on RHEL6. I tried autoreconf as suggested, but it failed for the same reason. Is there a minimum version of m4 required that is newer than that provided by RHEL6? Thanks aclocal.m4:16: warning: this file was generated for autoconf 2.69. You have another v

[OMPI users] Questions on MPI I/O and ompi_info

2014-02-13 Thread Blosch, Edwin L
Why does ompi_info -c say "MPI I/O Support: yes" even though I configured using -disable-io-romio? If ompi_info is going to tell me MPI I/O is supported, then shouldn't I expect my test program (attached) to work correctly? (it doesn't). I didn't disable "built-in" mpi-io, only io-romio. --

Re: [OMPI users] EXTERNAL: Re: What's the status of OpenMPI and thread safety?

2013-12-19 Thread Blosch, Edwin L
CP BTL), not so good for others (e.g., openib is flat-out not thread safe). On Dec 18, 2013, at 3:57 PM, Blosch, Edwin L mailto:edwin.l.blo...@lmco.com>> wrote: I was wondering if the FAQ entry below is considered current opinion or perhaps a little stale. Is multi-threading still c

[OMPI users] What's the status of OpenMPI and thread safety?

2013-12-18 Thread Blosch, Edwin L
I was wondering if the FAQ entry below is considered current opinion or perhaps a little stale. Is multi-threading still considered to be 'lightly tested'? Are there known open bugs? Thank you, Ed 7. Is Open MPI thread safe? Support for MPI_THREAD_MULTIPLE (i.e., multiple threads executing

Re: [OMPI users] EXTERNAL: Re: Application hangs on mpi_waitall

2013-06-27 Thread Blosch, Edwin L
r mpirun command. If this allows your application to run to completion then we know exactly where to start looking. George. On Jun 27, 2013, at 19:59 , "Blosch, Edwin L" mailto:edwin.l.blo...@lmco.com>> wrote: The debug version also hung, roughly the same amount of progre

Re: [OMPI users] Application hangs on mpi_waitall

2013-06-27 Thread Blosch, Edwin L
[mailto:users-boun...@open-mpi.org] On Behalf Of Blosch, Edwin L Sent: Thursday, June 27, 2013 12:48 PM To: Open MPI Users Subject: EXTERNAL: Re: [OMPI users] Application hangs on mpi_waitall Attached is the message list for rank 0 for the communication step that is failing. There are about 160

Re: [OMPI users] EXTERNAL: Re: Application hangs on mpi_waitall

2013-06-27 Thread Blosch, Edwin L
The debug version also hung, roughly the same amount of progress in the computations (although of course it took much longer to make that progress in comparison to the optimized version). On the bright side, the idea of putting an mpi_barrier after the irecvs and before the isends appears to ha

Re: [OMPI users] Application hangs on mpi_waitall

2013-06-27 Thread Blosch, Edwin L
Attached is the message list for rank 0 for the communication step that is failing. There are about 160 isends and irecvs. The ‘message size’ is actually a number of cells. On some steps only one 8-byte word per cell is communicated, at another step we exchange 7 words, and another step we ex

[OMPI users] Application hangs on mpi_waitall

2013-06-18 Thread Blosch, Edwin L
I'm running OpenMPI 1.6.4 and seeing a problem where mpi_waitall never returns. The case runs fine with MVAPICH. The logic associated with the communications has been extensively debugged in the past; we don't think it has errors. Each process posts non-blocking receives, non-blocking sends,

Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

2013-06-12 Thread Blosch, Edwin L
7;d have to >> leave it to Mellanox to advise. >> >> >> On Jun 11, 2013, at 6:55 AM, "Blosch, Edwin L" >> mailto:edwin.l.blo...@lmco.com>> >> wrote: >> >>> I tried adding "-mca btl openib,sm,self" but it did not make any >

Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

2013-06-11 Thread Blosch, Edwin L
;t using IB for some reason when extended to the other nodes. What does your cmd line look like? Have you tried adding "-mca btl openib,sm,self" just to ensure it doesn't use TCP for some reason? On Jun 9, 2013, at 4:31 PM, "Blosch, Edwin L" mailto:edwin.l.blo...@lmco.c

Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

2013-06-09 Thread Blosch, Edwin L
, just to be sure - when you run 320 "cores", you are running across 20 nodes? Just want to ensure we are using "core" the same way - some people confuse cores with hyperthreads. On Jun 9, 2013, at 3:50 PM, "Blosch, Edwin L" mailto:edwin.l.blo...@lmco.com>> wro

Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

2013-06-09 Thread Blosch, Edwin L
okay thru 160, and then things fall apart after that point. How many cores are on a node? On Jun 9, 2013, at 1:59 PM, "Blosch, Edwin L" mailto:edwin.l.blo...@lmco.com>> wrote: I'm having some trouble getting good scaling with OpenMPI 1.6.4 and I don't know whe

[OMPI users] Need advice on performance problem

2013-06-09 Thread Blosch, Edwin L
I'm having some trouble getting good scaling with OpenMPI 1.6.4 and I don't know where to start looking. This is an Infiniband FDR network with Sandy Bridge nodes. I am using affinity (--bind-to-core) but no other options. As the number of cores goes up, the message sizes are typically going do

Re: [OMPI users] Sandy Bridge performance question

2013-06-07 Thread Blosch, Edwin L
f processes is a power of two. You'll see that n8 > is faster than n7, so this is likely the situation. > > > On Jun 6, 2013, at 4:10 PM, "Blosch, Edwin L" wrote: > >> I am running single-node Sandy Bridge cases with OpenMPI and looking at >> scaling. &

[OMPI users] Sandy Bridge performance question

2013-06-06 Thread Blosch, Edwin L
I am running single-node Sandy Bridge cases with OpenMPI and looking at scaling. I'm using -bind-to-core without any other options (default is -bycore I believe). These numbers indicate number of cores first, then the second digit is the run number (except for n=1, all runs repeated 3 times).

Re: [OMPI users] How to diagnose bus error with 1.6.4

2013-06-05 Thread Blosch, Edwin L
n11 ~]$ From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Blosch, Edwin L Sent: Wednesday, June 05, 2013 11:14 AM To: Open MPI Users (us...@open-mpi.org) Subject: EXTERNAL: [OMPI users] How to diagnose bus error with 1.6.4 I am running into a bus error that doe

[OMPI users] How to diagnose bus error with 1.6.4

2013-06-05 Thread Blosch, Edwin L
I am running into a bus error that does not happen with MVAPICH, and I am guessing it has something to do with shared-memory communication. Has anyone had a similar experience or have any insights on what this could be? Thanks [k1n08:12688] mca: base: components_open: Looking for shmem compone

Re: [OMPI users] Problem building OpenMPI 1.6.4 with PGI 13.4

2013-05-29 Thread Blosch, Edwin L
com] Sent: Wednesday, May 29, 2013 3:31 PM To: Open MPI Users Subject: EXTERNAL: Re: [OMPI users] Problem building OpenMPI 1.6.4 with PGI 13.4 Edwin -- Can you ask PGI support about this? I swear that the PGI compiler suite has supported offsetof before. On May 29, 2013, at 5:26 PM, &quo

Re: [OMPI users] EXTERNAL: Re: Problem building OpenMPI 1.6.4 with PGI 13.4

2013-05-29 Thread Blosch, Edwin L
ff Squyres (jsquyres) wrote: > Edwin -- > > Can you ask PGI support about this? I swear that the PGI compiler suite has > supported offsetof before. > > > On May 29, 2013, at 5:26 PM, "Blosch, Edwin L" > wrote: > > > I?m having trouble building OpenMPI

[OMPI users] Problem building OpenMPI 1.6.4 with PGI 13.4

2013-05-29 Thread Blosch, Edwin L
I'm having trouble building OpenMPI 1.6.4 with PGI 13.4. Suggestions? checking alignment of double... 8 checking alignment of long double... 8 checking alignment of float _Complex... 4 checking alignment of double _Complex... 8 checking alignment of long double _Complex... 8 checking alignment of

[OMPI users] Question on building OpenMPI with support for memory affinity

2013-05-29 Thread Blosch, Edwin L
The FAQ talks about building support for memory affinity by adding -with-libnuma= However, I did not do that, and yet when I check ompi_info, it looks like there is support from the hwloc module. Can I assume the FAQ is a little stale and that -with-libnuma is not really necessary anymore? [b

Re: [OMPI users] EXTERNAL: Re: basic questions about compiling OpenMPI

2013-05-23 Thread Blosch, Edwin L
___ From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Tim Prince [n...@aol.com] Sent: Wednesday, May 22, 2013 10:24 AM To: us...@open-mpi.org Subject: EXTERNAL: Re: [OMPI users] basic questions about compiling OpenMPI On 5/22/2013 11:34 AM, Paul Kapinos wro

[OMPI users] basic questions about compiling OpenMPI

2013-05-22 Thread Blosch, Edwin L
Apologies for not exploring the FAQ first. If I want to use Intel or PGI compilers but link against the OpenMPI that ships with RedHat Enterprise Linux 6 (compiled with g++ I presume), are there any issues to watch out for, during linking? Thanks, Ed

Re: [OMPI users] EXTERNAL: Re: Problems with shared libraries while launching jobs

2012-12-18 Thread Blosch, Edwin L
users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Reuti Sent: Tuesday, December 18, 2012 4:14 AM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: Problems with shared libraries while launching jobs Am 17.12.2012 um 16:42 schrieb Blosch, Edwin L: &g

Re: [OMPI users] EXTERNAL: Re: Problems with shared libraries while launching jobs

2012-12-17 Thread Blosch, Edwin L
t: EXTERNAL: Re: [OMPI users] Problems with shared libraries while launching jobs Add -mca plm_base_verbose 5 --leave-session-attached to the cmd line - that will show the ssh command being used to start each orted. On Dec 14, 2012, at 12:17 PM, "Blosch, Edwin L" mailto:edwin.l.blo...@lmco

[OMPI users] Problems with shared libraries while launching jobs

2012-12-14 Thread Blosch, Edwin L
I am having a weird problem launching cases with OpenMPI 1.4.3. It is most likely a problem with a particular node of our cluster, as the jobs will run fine on some submissions, but not other submissions. It seems to depend on the node list. I just am having trouble diagnosing which node, and

Re: [OMPI users] EXTERNAL: Re: Best way to map MPI processes to sockets?

2012-11-08 Thread Blosch, Edwin L
ion. If so, you could do: mpirun -npersocket 2 -bind-to-socket ... That would put two processes in each socket, bind them to that socket, and rank them in series. So ranks 0-1 would be bound to the first socket, ranks 2-3 to the second. Ralph On Thu, Nov 8, 2012 at 6:52 AM, Blosc

Re: [OMPI users] EXTERNAL: Re: How is hwloc used by OpenMPI

2012-11-08 Thread Blosch, Edwin L
Thanks, I definitely appreciate the new, hotness of hwloc. I just couldn't tell from the documentation or the web page how or if it was being used by OpenMPI. I still work with OpenMPI 1.4.x and now that I've looked into the builds, I think I understand that PLPA is used in 1.4 and hwloc is br

Re: [OMPI users] EXTERNAL: Re: Best way to map MPI processes to sockets?

2012-11-08 Thread Blosch, Edwin L
Yes it is a Westmere system. Socket L#0 (P#0 CPUModel="Intel(R) Xeon(R) CPU E7- 8870 @ 2.40GHz" CPUType=x86_64) L3Cache L#0 (size=30720KB linesize=64 ways=24) L2Cache L#0 (size=256KB linesize=64 ways=8) L1dCache L#0 (size=32KB linesize=64 ways=8) L1iCache L#0

Re: [OMPI users] Best way to map MPI processes to sockets?

2012-11-07 Thread Blosch, Edwin L
>>> In your desired ordering you have rank 0 on (socket,core) (0,0) and >>> rank 1 on (0,2). Is there an architectural reason for that? Meaning >>> are cores 0 and 1 hardware threads in the same core, or is there a >>> cache level (say L2 or L3) connecting cores 0 and 1 separate from >>> cores

[OMPI users] How is hwloc used by OpenMPI

2012-11-07 Thread Blosch, Edwin L
I see hwloc is a subproject hosted under OpenMPI but, in reading the documentation, I was unable to figure out if hwloc is a module within OpenMPI, or if some of the code base is borrowed into OpenMPI, or something else. Is hwloc used by OpenMPI internally? Is it a layer above libnuma? Or is

[OMPI users] Best way to map MPI processes to sockets?

2012-11-07 Thread Blosch, Edwin L
I am trying to map MPI processes to sockets in a somewhat compacted pattern and I am wondering the best way to do it. Say there are 2 sockets (0 and 1) and each processor has 4 cores (0,1,2,3) and I have 4 MPI processes, each of which will use 2 OpenMP processes. I've re-ordered my parallel wor

[OMPI users] Question on shmem MCA parameter

2012-11-07 Thread Blosch, Edwin L
I am using this parameter "shmem_mmap_relocate_backing_file" and noticed that the relocation variable is identified as "shmem_mmap_opal_shmem_mmap_backing_file_base_dir" in its documentation, but then the next parameter that appears from ompi_info is spelled differently, namely "shmem_mmap_back

[OMPI users] Trouble with PSM "Could not detect network connectivity"

2012-11-02 Thread Blosch, Edwin L
I am getting a problem where something called "PSM" is failing to start and that in turn is preventing my job from running. Command and output are below. I would like to understand what's going on. Apparently this version of OpenMPI decided to build itself with support for PSM, but if it's no

Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-07 Thread Blosch, Edwin L
users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage On 11/05/2011 09:11 AM, Blosch, Edwin L wrote: .. > > I know where you're coming from, and I probably didn't title the post > correctly because I wasn't sure what to ask. But I definitely saw it,

Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-04 Thread Blosch, Edwin L
Thanks, Ralph, > Having a local /tmp is typically required by Linux for proper operation as > the OS itself needs to ensure its usage is protected, as was > previously > stated and is reiterated in numerous books on managing Linux systems. There is a /tmp, but it's not local. I don't know if

Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-04 Thread Blosch, Edwin L
conclusion of each batch job, an epilogue > process runs that removes all files belonging to the owner of the > current batch job from /tmp (and also looks for and kills orphan > processes belonging to the user). This epilogue had to written > by our systems staff. > > I believe th

Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-03 Thread Blosch, Edwin L
ilto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Thursday, November 03, 2011 5:22 PM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage On Nov 3, 2011, at 2:55 PM, Blosch, Edwin L wrote: > I might be missing someth

Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-03 Thread Blosch, Edwin L
as missing "btl".) On 11/3/2011 11:19 AM, Blosch, Edwin L wrote: > I don't tell OpenMPI what BTLs to use. The default uses sm and puts a session > file on /tmp, which is NFS-mounted and thus not a good choice. > > Are you suggesting something like --mca ^sm? > >

Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-03 Thread Blosch, Edwin L
I don't tell OpenMPI what BTLs to use. The default uses sm and puts a session file on /tmp, which is NFS-mounted and thus not a good choice. Are you suggesting something like --mca ^sm? -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of

Re: [OMPI users] EXTERNAL: Re: Shared-memory problems

2011-11-03 Thread Blosch, Edwin L
ns. > If you create temporary files using mktemp is it being created in > /dev/shm or /tmp? > > > On Thu, Nov 3, 2011 at 11:50 AM, Bogdan Costescu wrote: >> On Thu, Nov 3, 2011 at 15:54, Blosch, Edwin L >> wrote: >>> -/dev/shm is 12 GB and has 755 permis

Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-03 Thread Blosch, Edwin L
de /tmp for OpenMPI usage On Nov 1, 2011, at 7:31 PM, Blosch, Edwin L wrote: > I'm getting this message below which is observing correctly that /tmp is > NFS-mounted. But there is no other directory which has user or group write > permissions. So I think I'm kind of st

Re: [OMPI users] EXTERNAL: Re: Shared-memory problems

2011-11-03 Thread Blosch, Edwin L
and /dev/shm is (always) local, /dev/shm seems to be the right place for shared memory transactions. If you create temporary files using mktemp is it being created in /dev/shm or /tmp? On Thu, Nov 3, 2011 at 11:50 AM, Bogdan Costescu wrote: > On Thu, Nov 3, 2011 at 15:54, Blosch, Edwin L wr

[OMPI users] Shared-memory problems

2011-11-03 Thread Blosch, Edwin L
Can anyone guess what the problem is here? I was under the impression that OpenMPI (1.4.4) would look for /tmp and would create its shared-memory backing file there, i.e. if you don't set orte_tmpdir_base to anything. Well, there IS a /tmp and yet it appears that OpenMPI has chosen to use /dev

[OMPI users] How to set up state-less node /tmp for OpenMPI usage

2011-11-01 Thread Blosch, Edwin L
I'm getting this message below which is observing correctly that /tmp is NFS-mounted. But there is no other directory which has user or group write permissions. So I think I'm kind of stuck, and it sounds like a serious issue. Before I ask the administrators to change their image, i.e. mount

[OMPI users] Performance slowdown for large cases

2011-10-07 Thread Blosch, Edwin L
All, I'm using OpenMPI 1.4.3 and have been running a particular case on 120, 240, 480 and 960 processes. My time-per-work metric reports 60, 30, 15, 15. If I do the same run with MVAPICH 1.2, I get 60, 30, 15, 8. There is something running very slowly with OpenMPI 1.4.3 as the process count g

Re: [OMPI users] EXTERNAL: Re: Unresolved reference 'mbind' and 'get_mempolicy'

2011-09-29 Thread Blosch, Edwin L
eep using them. Thanks again, Ed -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Blosch, Edwin L Sent: Wednesday, September 28, 2011 4:02 PM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: Unresolved reference 'mbind' and

Re: [OMPI users] EXTERNAL: Re: Unresolved reference 'mbind' and 'get_mempolicy'

2011-09-28 Thread Blosch, Edwin L
you have libnuma installed? If so, do you have the .h and .so files? Do you have the .a file? Can you send the last few lines of output from a failed "make V=1" in that tree? (it'll show us the exact commands used to compile/link, etc.) On Sep 28, 2011, at 11:55 AM, Blosch,

[OMPI users] Unresolved reference 'mbind' and 'get_mempolicy'

2011-09-28 Thread Blosch, Edwin L
I am getting some undefined references in building OpenMPI 1.5.4 and I would like to know how to work around it. The errors look like this: /scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o): In function `hwloc_linux_alloc_membind': topology-linux.c:(.text+0x1da): und

Re: [OMPI users] EXTERNAL: Re: Trouble compiling 1.4.3 with PGI 10.9 compilers

2011-09-27 Thread Blosch, Edwin L
onday, September 26, 2011 6:16 PM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: Trouble compiling 1.4.3 with PGI 10.9 compilers On Sep 26, 2011, at 6:53 PM, Blosch, Edwin L wrote: > Actually I can download OpenMPI 1.5.4, 1.4.4rc3 or 1.4.3 - and ALL of them > build just fine. >

Re: [OMPI users] EXTERNAL: Re: Trouble compiling 1.4.3 with PGI 10.9 compilers

2011-09-26 Thread Blosch, Edwin L
we fixed some libtool issues in the 1.4.4 tarball; could you try the 1.4.4rc and see if that fixes the issue? If not, we might have missed some patches to bring over to the v1.4 branch. http://www.open-mpi.org/software/ompi/v1.4/ On Sep 20, 2011, at 1:16 PM, Blosch, Edwin L wrot

Re: [OMPI users] Question about compilng with fPIC

2011-09-21 Thread Blosch, Edwin L
irecv, mpi_isend, mpi_waitall; perhaps there is something unhealthy in the semantics there. -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Blosch, Edwin L Sent: Wednesday, September 21, 2011 10:44 AM To: Open MPI Users Subject: EXTERNAL: [OMPI

Re: [OMPI users] EXTERNAL: Re: Question about compilng with fPIC

2011-09-21 Thread Blosch, Edwin L
-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Tim Prince Sent: Wednesday, September 21, 2011 10:53 AM To: us...@open-mpi.org Subject: EXTERNAL: Re: [OMPI users] Question about compilng with fPIC On 9/21/2011 11:44 AM, Blosch, Edwin L wrote: > Follow-up to a mislabeled thread: "How co

[OMPI users] Question about compilng with fPIC

2011-09-21 Thread Blosch, Edwin L
.@open-mpi.org] On Behalf Of Blosch, Edwin L Sent: Tuesday, September 20, 2011 11:46 AM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results? Thank you for this explanation. I will assume that my problem here is some

Re: [OMPI users] Trouble compiling 1.4.3 with PGI 10.9 compilers

2011-09-20 Thread Blosch, Edwin L
check. -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Blosch, Edwin L Sent: Tuesday, September 20, 2011 12:17 PM To: Open MPI Users Subject: EXTERNAL: [OMPI users] Trouble compiling 1.4.3 with PGI 10.9 compilers I'm having troubl

[OMPI users] Trouble compiling 1.4.3 with PGI 10.9 compilers

2011-09-20 Thread Blosch, Edwin L
I'm having trouble building 1.4.3 using PGI 10.9. I searched the list archives briefly but I didn't stumble across anything that looked like the same problem, so I thought I'd ask if an expert might recognize the nature of the problem here. The configure command: ./configure --prefix=/release

Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Blosch, Edwin L
(or whatever) or are you confirming that the back-end compiler is seeing the same flags? The MPI compiler wrapper (mpicc, et al.) can add flags. E.g., as I remember it, "mpicc" with no flags means no optimization with OMPI but with optimization

Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Blosch, Edwin L
Subject: Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results? On 9/20/2011 10:50 AM, Blosch, Edwin L wrote: > It appears to be a side effect of linkage that is able to change a > compute-only routine's answers. > > I have assumed that max/sqrt/

Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Blosch, Edwin L
nce: > On 9/20/2011 7:25 AM, Reuti wrote: >> Hi, >> >> Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L: >> >>> I am observing differences in floating-point results from an application >>> program that appear to be related to whether I link with OpenMPI 1.4

[OMPI users] How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-19 Thread Blosch, Edwin L
I am observing differences in floating-point results from an application program that appear to be related to whether I link with OpenMPI 1.4.3 or MVAPICH 1.2.0. Both packages were built with the same installation of Intel 11.1, as well as the application program; identical flags passed to the

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-15 Thread Blosch, Edwin L
Sent: Thursday, September 15, 2011 4:37 AM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun? Am 15.09.2011 um 01:15 schrieb Blosch, Edwin L: > I would appreciate trying to fix the multi-word argument to > orte_launch_agent,

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Blosch, Edwin L
know if you want me to pursue this. Ralph On Sep 14, 2011, at 3:31 PM, Blosch, Edwin L wrote: Thank you - I did pursue this kind of workaround, and it worked, but you'll be happy to know that nothing had to be owned by root. ASIDE Just to remind: The job script is a shell scr

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Blosch, Edwin L
ep 14, 2011 at 12:56 PM, Reuti mailto:re...@staff.uni-marburg.de>> wrote: Am 14.09.2011 um 19:02 schrieb Blosch, Edwin L: > Thanks for trying. > > Do you feel that this is an impossible request without the assistance of some > process running as root, for example, as Reuti mentioned

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Blosch, Edwin L
To: Open MPI Users > Subject: Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes > created by mpirun? > > > On Sep 14, 2011, at 9:39 AM, Blosch, Edwin L wrote: > >> Thanks, Ralph, >> >> I get the failure messages, unfortunately: >> &g

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Blosch, Edwin L
ilto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Wednesday, September 14, 2011 11:33 AM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun? On Sep 14, 2011, at 9:39 AM, Blosch, Edwin L wrote: > Thanks, Ralph, >

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Blosch, Edwin L
Thanks, Ralph, I get the failure messages, unfortunately: setgid FAILED setgid FAILED setgid FAILED I actually had attempted to call setgid from within the application previously, which looks similar to what you've done, but it failed. That was when I initiated the post to the mailing list. My

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Blosch, Edwin L
oun...@open-mpi.org] On Behalf Of Reuti Sent: Tuesday, September 13, 2011 5:36 PM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: Problem running under SGE Am 14.09.2011 um 00:25 schrieb Blosch, Edwin L: > Your comment guided me in the right direction, Reuti. And overlapped with >

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Blosch, Edwin L
Sent: Tuesday, September 13, 2011 4:27 PM To: Open MPI Users Subject: EXTERNAL: Re: [OMPI users] Problem running under SGE Am 13.09.2011 um 23:18 schrieb Blosch, Edwin L: > I'm able to run this command below from an interactive shell window: > > /bin/mpirun --machinefile mpiho

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Blosch, Edwin L
Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Reuti Sent: Tuesday, September 13, 2011 4:27 PM To: Open MPI Users Subject: EXTERNAL: Re: [OMPI users] Problem running under SGE Am 13.09.2011 um 23:18 schrieb Blosch, Edwin L: > I'm able to

[OMPI users] Problem running under SGE

2011-09-13 Thread Blosch, Edwin L
I'm able to run this command below from an interactive shell window: /bin/mpirun --machinefile mpihosts.dat -np 16 -mca plm_rsh_agent /usr/bin/rsh -x MPI_ENVIRONMENT=1 ./test_setup but it does not work if I put it into a shell script and 'qsub' that script to SGE. I get the message shown at th

Re: [OMPI users] EXTERNAL: Re: Question on using rsh

2011-09-13 Thread Blosch, Edwin L
sue might be, but I would check for a typo - we don't check that mca params are spelled correctly, nor do we check for params that don't exist (e.g., because you spelled it wrong). On Sep 12, 2011, at 3:03 PM, Blosch, Edwin L wrote: I have a hello world program that runs without pro

Re: [OMPI users] EXTERNAL: Re: qp memory allocation problem

2011-09-12 Thread Blosch, Edwin L
, 2011 12:05 PM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: qp memory allocation problem On Mon, 12 Sep 2011, Blosch, Edwin L wrote: > Nathan, I found this parameters under /sys/module/mlx4_core/parameters. > How do you incorporate a changed value? What to rest

[OMPI users] Question on using rsh

2011-09-12 Thread Blosch, Edwin L
I have a hello world program that runs without prompting for password with plm_rsh_agent but not with orte_rsh_agent, I mean it runs but only after prompting for a password: /bin/mpirun --machinefile mpihosts.dat -np 16 -mca plm_rsh_agent /usr/bin/rsh ./test_setup Hello from process

Re: [OMPI users] qp memory allocation problem

2011-09-12 Thread Blosch, Edwin L
utierrez Los Alamos National Laboratory On Sep 12, 2011, at 9:23 AM, Blosch, Edwin L wrote: I am getting this error message below and I don't know what it means or how to fix it. It only happens when I run on a large number of processes, e.g. 960. Things work fine on 480, and I don

Re: [OMPI users] EXTERNAL: Re: qp memory allocation problem

2011-09-12 Thread Blosch, Edwin L
ue pairs by > default. Do they buy us anything? For what it is worth, we have stopped > using them on all of our large systems here at LANL. > > Thanks, > > Samuel K. Gutierrez > Los Alamos National Laboratory > > On Sep 12, 2011, at 9:23 AM, Blosch, Edwin L wrote: > &g

Re: [OMPI users] EXTERNAL: Re: qp memory allocation problem

2011-09-12 Thread Blosch, Edwin L
Do they buy us anything? For what it is worth, we have stopped using them on all of our large systems here at LANL. Thanks, Samuel K. Gutierrez Los Alamos National Laboratory On Sep 12, 2011, at 9:23 AM, Blosch, Edwin L wrote: I am getting this error message below and I don't know what

[OMPI users] qp memory allocation problem

2011-09-12 Thread Blosch, Edwin L
I am getting this error message below and I don't know what it means or how to fix it. It only happens when I run on a large number of processes, e.g. 960. Things work fine on 480, and I don't think the application has a bug. Any help is appreciated... [c1n01][[30697,1],3][connect/btl_openi

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-08 Thread Blosch, Edwin L
stain [mailto:r...@open-mpi.org] >> Sent: Wednesday, September 07, 2011 8:53 AM >> To: Open MPI Users >> Subject: Re: [OMPI users] Can you set the gid of the processes created by > mpirun? >> >> On Sep 7, 2011, at 7:38 AM, Blosch, Edwin L wrote: >> >>

Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-07 Thread Blosch, Edwin L
Can you set the gid of the processes created by mpirun? On Sep 7, 2011, at 7:38 AM, Blosch, Edwin L wrote: The mpirun command is invoked when the user's group is 'set group' to group 650. When the rank 0 process creates files, they have group ownership 650. But the user&#x

[OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-07 Thread Blosch, Edwin L
The mpirun command is invoked when the user's group is 'set group' to group 650. When the rank 0 process creates files, they have group ownership 650. But the user's login group is group 1040. The child processes that get started on other nodes run with group 1040, and the files they create ha

  1   2   >