[OMPI users] Ubuntu and MPI

2015-11-19 Thread dave
Hello- I have a Ubuntu 12.04 distro, running on a 32 platform. I installed http://www.open-mpi.org/software/ompi/v1.10/downloads/openm . I have hello_c.c in the examples subdirectory. I installed 'c' compiler. When I run mpicc hello_c.c screen dump shows: dave@ubuntu-desk:~/Deskt

[O-MPI users] libtool error

2006-01-27 Thread Dave Hudak
4.4, XCode 2.2, plus assorted utilities installed from darwinports and fink. Regards, Dave Hudak dhudak-error.tgz Description: Binary data --- David E. Hudak, Ph.D. dhu...@osc.edu

Re: [OMPI users] OpenMPI-ROMIO-OrangeFS

2014-03-25 Thread Dave Love
Edgar Gabriel writes: > I am still looking into the PVFS2 with ROMIO problem with the 1.6 > series, where (as I mentioned yesterday) the problem I am having right > now is that the data is wrong. Not sure what causes it, but since I have > teach this afternoon again, it might be friday until I ca

Re: [OMPI users] OpenMPI-ROMIO-OrangeFS

2014-03-25 Thread Dave Love
Edgar Gabriel writes: > yes, the patch has been submitted to the 1.6 branch for review, not sure > what the precise status of it is. The problems found are more or less > independent of the PVFS2 version. Thanks; I should have looked in the tracker.

Re: [OMPI users] OpenMPI-ROMIO-OrangeFS

2014-03-27 Thread Dave Love
rom 1.7 looked similar to what's in mpich, but hard-wired rather than autoconfiscated, whereas the patch for 1.6 on the tracker sets the entries to NULL instead. > Edgar > > On 3/25/2014 9:21 AM, Rob Latham wrote: >> >> >> On 03/25/2014 07:32 AM, Dave Love wrote: &g

Re: [OMPI users] busy waiting and oversubscriptions

2014-03-27 Thread Dave Love
Gus Correa writes: > Torque+Maui, SGE/OGE, and Slurm are free. [OGE certainly wasn't free, but it apparently no longer exists -- another thing Oracle screwed up and eventually dumped.] > If you build the queue system with cpuset control, a node can be > shared among several jobs, but the cpus/c

Re: [OMPI users] busy waiting and oversubscriptions

2014-03-27 Thread Dave Love
Gus Correa writes: > On 03/27/2014 05:05 AM, Andreas Schäfer wrote: >>> >Queue systems won't allow resources to be oversubscribed. [Maybe that meant that resource managers can, and typically do, prevent resources being oversubscribed.] >> I'm fairly confident that you can configure Slurm to ove

Re: [OMPI users] Mapping ranks to hosts (from MPI error messages)

2014-03-27 Thread Dave Love
Reuti writes: > Do all of them have an internal bookkeeping of granted cores to slots > - i.e. not only the number of scheduled slots per job per node, but > also which core was granted to which job? Does Open MPI read this > information would be the next question then. OMPI works with the bindi

Re: [OMPI users] change in behaviour 1.6 -> 1.8 under sge

2014-11-03 Thread Dave Love
Mark Dixon writes: > Hi there, > > We've started looking at moving to the openmpi 1.8 branch from 1.6 on > our CentOS6/Son of Grid Engine cluster and noticed an unexpected > difference when binding multiple cores to each rank. > > Has openmpi's definition 'slot' changed between 1.6 and 1.8? You

Re: [OMPI users] change in behaviour 1.6 -> 1.8 under sge

2014-11-04 Thread Dave Love
I wrote: > #$ -l exclusive > export OMP_NUM_THREADS=2 > exec mpirun --loadbalance --cpus-per-proc $OMP_NUM_THREADS --np > $(($NSLOTS/$OMP_NUM_THREADS)) ... I should have said core binding is the default here [so Intel MPI does

Re: [OMPI users] change in behaviour 1.6 -> 1.8 under sge

2014-11-04 Thread Dave Love
Ralph Castain writes: > If you only have one allocated PE on a node, then mpirun will > correctly tell you that it can’t launch with PE>1 as there aren’t > enough resources to meet your request. IIRC, we may have been ignoring > this under SGE and running as many procs as we wanted on an allocate

Re: [OMPI users] change in behaviour 1.6 -> 1.8 under sge

2014-11-05 Thread Dave Love
Ralph Castain writes: > I confirmed that things are working as intended. I could have been more explicit saying so before. > If you have 12 cores on a machine, and you do > > mpirun -map-by socket:PE=2 > > we will execute 6 copies of foo on the node because 12 cores/2pe/core = 6 > procs. For

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-05 Thread Dave Love
"Jeff Squyres (jsquyres)" writes: > Yes, this is a correct report. > > In short, the MPI_SIZEOF situation before the upcoming 1.8.4 was a bit > of a mess; it actually triggered a bunch of discussion up in the MPI > Forum Fortran working group (because the design of MPI_SIZEOF actually > has some

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-10 Thread Dave Love
"Jeff Squyres (jsquyres)" writes: > There were several commits; this was the first one: > > https://github.com/open-mpi/ompi/commit/d7eaca83fac0d9783d40cac17e71c2b090437a8c I don't have time to follow this properly, but am I reading right that that says mpi_sizeof will now _not_ work with gcc <

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-11 Thread Dave Love
"Jeff Squyres (jsquyres)" writes: > There are several reasons why MPI implementations have not added explicit > interfaces to their mpif.h files, mostly boiling down to: they may/will break > real world MPI programs. > > 1. All modern compilers have ignore-TKR syntax, Hang on! (An equivalent

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-11 Thread Dave Love
"Jeff Squyres (jsquyres)" writes: > On Nov 10, 2014, at 8:27 AM, Dave Love wrote: > >>> https://github.com/open-mpi/ompi/commit/d7eaca83fac0d9783d40cac17e71c2b090437a8c >> >> I don't have time to follow this properly, but am I reading right that >&g

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-11 Thread Dave Love
"SLIM H.A." writes: > We switched on hyper threading on our cluster with two eight core > sockets per node (32 threads per node). Assuming that's Xeon-ish hyperthreading, the best advice is not to. It will typically hurt performance of HPC applications, not least if it defeats core binding, and

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-12 Thread Dave Love
"Jeff Squyres (jsquyres)" writes: > Yeah, we don't actually share man pages. I suppose it wouldn't save much anyhow at this stage of the game. > I think the main issue would be just to edit the *.3in pages here: > > https://github.com/open-mpi/ompi/tree/master/ompi/mpi/man/man3 > > They're

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-12 Thread Dave Love
Ralph Castain writes: > You might also add the —display-allocation flag to mpirun so we can > see what it thinks the allocation looks like. If there are only 16 > slots on the node, it seems odd that OMPI would assign 32 procs to it > unless it thinks there is only 1 node in the job, and oversubs

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-12 Thread Dave Love
Reuti writes: >> If so, I’m wondering if that NULL he shows in there is the source of the >> trouble. The parser doesn’t look like it would handle that very well, though >> I’d need to test it. Is that NULL expected? Or is the NULL not really in the >> file? > > I must admit here: for me the f

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-12 Thread Dave Love
"SLIM H.A." writes: > Dear Reuti and Ralph > > Below is the output of the run for openmpi 1.8.3 with this line > > mpirun -np $NSLOTS --display-map --display-allocation --cpus-per-proc 1 $exe -np is redundant with tight integration unless you're using fewer than NSLOTS from SGE. > ompi_info | g

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-13 Thread Dave Love
Ralph Castain writes: cn6050 16 par6.q@cn6050 cn6045 16 par6.q@cn6045 >> >> The above looks like the PE_HOSTFILE. So it should be 16 slots per node. > > Hey Reuti > > Is that the standard PE_HOSTFILE format? I’m looking at the ras/gridengine > module, and it looks like it is expecti

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-13 Thread Dave Love
Ralph Castain writes: >> I think there's a problem with documentation at least not being >> explicit, and it would really help to have it clarified unless I'm >> missing some. > > Not quite sure I understand this comment - the problem is that we > aren’t correctly reading the allocation, as evide

[OMPI users] mpi_wtime implementation

2014-11-17 Thread Dave Love
I discovered from looking at the mpiP profiler that OMPI always uses gettimeofday rather than clock_gettime to implement mpi_wtime on GNU/Linux, and that looks sub-optimal. I don't remember what the resolution of gettimeofday is in practice, but I did need to write a drop-in replacement for benchm

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-17 Thread Dave Love
Ralph Castain writes: >> On Nov 13, 2014, at 3:36 PM, Dave Love wrote: >> >> Ralph Castain writes: >> >>>>>> cn6050 16 par6.q@cn6050 >>>>>> cn6045 16 par6.q@cn6045 >>>> >>>> The above looks like the PE_

Re: [OMPI users] mpi_wtime implementation

2014-11-19 Thread Dave Love
"Daniels, Marcus G" writes: > On Mon, 2014-11-17 at 17:31 +, Dave Love wrote: >> I discovered from looking at the mpiP profiler that OMPI always uses >> gettimeofday rather than clock_gettime to implement mpi_wtime on >> GNU/Linux, and that looks sub-optimal.

[OMPI users] "default-only MCA variable"?

2014-11-27 Thread Dave Love
Why can't I set parameters like this (not the only one) with 1.8.3? WARNING: A user-supplied value attempted to override the default-only MCA variable named "btl_sm_use_knem".

Re: [OMPI users] "default-only MCA variable"?

2014-11-28 Thread Dave Love
Gilles Gouaillardet writes: > It could be because configure did not find the knem headers and hence knem is > not supported and hence this mca parameter is read-only Yes, in that case (though knem was meant to be used and it's annoying that configure doesn't abort if it doesn't find something y

Re: [OMPI users] "default-only MCA variable"?

2014-11-28 Thread Dave Love
Gustavo Correa writes: > Hi Dave, Gilles, list > > There is a problem with knem in OMPI 1.8.3. > A fix is supposed to come on OMPI 1.8.4. > Please, see this long thread: > http://www.open-mpi.org/community/lists/users/2014/10/25511.php > > Note also, as documented in th

[OMPI users] using multiple IB connections between hosts

2015-01-28 Thread Dave Turner
node, or is the system simply ignoring the 10 Gbps cards because they are the slower option. Any clarification on this would be helpful. The only posts I've found are very old and discuss mostly channel bonding of 1 Gbps cards. Dave Turner -- Work: davetur...@ks

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-06 Thread Dave Love
Gus Correa writes: > Hi Giacomo > > Some programs fail with segmentation fault > because the stack size is too small. Yes, the default for Intel Fortran is to allocate large-ish amounts on the stack, which may matter when the compiled program runs. However, look at the backtrace. It's apparent

Re: [OMPI users] Building vs packaging

2016-05-16 Thread Dave Love
"Rob Malpass" writes: > Almost in desperation, I cheated: Why is that cheating? Unless you specifically want a different version, it seems sensible to me, especially as you then have access to packaged versions of at least some MPI programs. Likewise with rpm-based systems, which I'm afraid I

Re: [OMPI users] No core dump in some cases

2016-05-16 Thread Dave Love
Gilles Gouaillardet writes: > Are you sure ulimit -c unlimited is *really* applied on all hosts > > > can you please run the simple program below and confirm that ? Nothing specifically wrong with that, but it's worth installing procenv(1) as a general solution to checking the (generalized) envi

Re: [OMPI users] Question about mpirun mca_oob_tcp_recv_handler error.

2016-05-16 Thread Dave Love
Ralph Castain writes: > This usually indicates that the remote process is using a different OMPI > version. You might check to ensure that the paths on the remote nodes are > correct. That seems quite a common problem with non-obvious failure modes. Is it not possible to have a mechanism that c

Re: [OMPI users] Building vs packaging

2016-05-20 Thread Dave Love
dani writes: > I don't know about .deb packages, but at least in the rpms there is a > post install scriptlet that re-runs ldconfig to ensure the new libs > are in the ldconfig cache. MPI packages following the Fedora guidelines don't do that (and rpmlint complains bitterly as a consequence). T

Re: [OMPI users] OpenMPI 1.6.5 on CentOS 7.1, silence ib-locked-pages?

2016-05-20 Thread Dave Love
Ryan Novosielski writes: > I’m pretty sure this is no longer relevant (having read Roland’s > messages about it from a couple of years ago now). Can you please > confirm that for me, and then let me know if there is any way that I > can silence this old copy of OpenMPI that I need to use with som

[OMPI users] wtime implementation in 1.10

2016-05-23 Thread Dave Love
I thought the 1.10 branch had been fixed to use clock_gettime for MPI_Wtime where it's available, a la https://www.open-mpi.org/community/lists/users/2016/04/28899.php -- and have been telling people so! However, I realize it hasn't, and it looks as if 1.10 is still being maintained. Is there a g

Re: [OMPI users] wtime implementation in 1.10

2016-05-24 Thread Dave Love
w backports or even put things in a bug tracker. 1.10 isn't used here, and I just subvert gettimeofday whenever I'm running something that might use it for timing short intervals. > I’ll create the PR and copy you for review > > >> On May 23, 2016, at 9:17 AM, Dave Love wrot

Re: [OMPI users] users Digest, Vol 3510, Issue 2

2016-05-24 Thread Dave Love
Megdich Islem writes: > Yes, Empire does the fluid structure coupling. It couples OpenFoam (fluid > analysis) and Abaqus (structural analysis). > Does all the software need to have the same MPI architecture in order to > communicate ? I doubt it's doing that, and presumably you have no control

Re: [OMPI users] users Digest, Vol 3510, Issue 2

2016-05-25 Thread Dave Love
I wrote: > You could wrap one (set of) program(s) in a script to set the > appropriate environment before invoking the real program. I realize I should have said something like "program invocations", i.e. if you have no control over something invoking mpirun for programs using different MPIs,

[OMPI users] 2.0 documentation

2016-06-22 Thread Dave Love
I know it's not traditional, but is there any chance of complete documentation of the important changes in v2.0? Currently NEWS mentions things like minor build issues, but there's nothing, for instance, on the addition and removal of whole frameworks, one of which I've been trying to understand.

Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-22 Thread Dave Love
"Llolsten Kaonga" writes: > Hello Grigory, > > I am not sure what Redhat does exactly but when you install the OS, there is > always an InfiniBand Support module during the installation process. We > never check/install that module when we do OS installations because it is > usually several versi

Re: [OMPI users] Docker Cluster Queue Manager

2016-06-22 Thread Dave Love
Rob Nagler writes: > Thanks, John. I sometimes wonder if I'm the only one out there with this > particular problem. > > Ralph, thanks for sticking with me. :) Using a pool of uids doesn't really > work due to the way cgroups/containers works. It also would require > changing the permissions of al

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-18 Thread Dave Love
"Jeff Squyres (jsquyres)" writes: > On Aug 16, 2016, at 3:07 PM, Reuti wrote: >> >> Thx a bunch - that was it. Despite searching for a solution I found >> only hints that didn't solve the issue. > > FWIW, we talk about this in the HACKING file, but I admit that's not > necessarily the easiest p

Re: [OMPI users] An equivalent to btl_openib_include_if when MXM over Infiniband ?

2016-08-18 Thread Dave Love
"Audet, Martin" writes: > Hi Josh, > > Thanks for your reply. I did try setting MXM_RDMA_PORTS=mlx4_0:1 for all my > MPI processes > and it did improve performance but the performance I obtain isn't completely > satisfying. I raised the issue of MXM hurting p2p latency here a while ago, but do

Re: [OMPI users] Certain files for mpi missing when building mpi4py

2016-08-31 Thread Dave Love
"Mahdi, Sam" writes: > HI everyone, > > I am using a linux fedora. I downloaded/installed > openmpi-1.7.3-1.fc20(64-bit) and openmpi-devel-1.7.3-1.fc20(64-bit). As > well as pypar-openmpi-2.1.5_108-3.fc20(64-bit) and > python3-mpi4py-openmpi-1.3.1-1.fc20(64-bit). The problem I am having is > buil

[OMPI users] mpi4py/fc20 (was: users Digest, Vol 3592, Issue 1)

2016-09-01 Thread Dave Love
"Mahdi, Sam" writes: > To dave, from the installation guide I found, it seemed I couldnt just > directly download it from the package list, but rather Id need to use the > mpicc wrapper to compile and install. That makes no sense to a maintainer of some openmpi Fedora packa

Re: [OMPI users] MPI libraries

2016-09-12 Thread Dave Love
Gilles Gouaillardet writes: > Mahmood, > > mpi_siesta is a siesta library, not an Open MPI library. > > fwiw, you might want to try again from scratch with > MPI_INTERFACE=libmpi_f90.a > DEFS_MPI=-DMPI > in your arch.make > > i do not think libmpi_f90.a is related to an OpenMPI library. libmpi_f

Re: [OMPI users] MPI libraries

2016-09-13 Thread Dave Love
I wrote: > Gilles Gouaillardet writes: > >> Mahmood, >> >> mpi_siesta is a siesta library, not an Open MPI library. >> >> fwiw, you might want to try again from scratch with >> MPI_INTERFACE=libmpi_f90.a >> DEFS_MPI=-DMPI >> in your arch.make >> >> i do not think libmpi_f90.a is related to an Op

Re: [OMPI users] Compilation without NVML support

2016-09-20 Thread Dave Love
Brice Goglin writes: > Hello > Assuming this NVML detection is actually done by hwloc, I guess there's > nothing in OMPI to disable it. It's not the first time we get such an > issue with OMPI not having all hwloc's --disable-foo options, but I > don't think we actually want to propagate all of t

[OMPI users] specifying memory affinity

2016-09-20 Thread Dave Love
I don't think it's possible, but just to check: can you specify memory affinity distinct from core binding somehow with OMPI (i.e. not with hwloc-bind as a shim under mpirun)? It seems to be relevant in Knight's Landing "hybrid" mode with separate MCDRAM NUMA nodes as I assume you still want core

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-10-11 Thread Dave Love
Gilles Gouaillardet writes: > Bennet, > > > my guess is mapping/binding to sockets was deemed the best compromise > from an > > "out of the box" performance point of view. > > > iirc, we did fix some bugs that occured when running under asymmetric > cpusets/cgroups. > > if you still have some iss

Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?

2016-10-11 Thread Dave Love
Wirawan Purwanto writes: > Instead of the scenario above, I was trying to get the MPI processes > side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill > node 0 first, then fill node 1, and so on. How do I do this properly? > > I tried a few attempts that fail: > > $ export OMP_NU

Re: [OMPI users] Using Open MPI with multiple versions of GCC and G++

2016-10-11 Thread Dave Love
"Jeff Squyres (jsquyres)" writes: > Especially with C++, the Open MPI team strongly recommends you > building Open MPI with the target versions of the compilers that you > want to use. Unexpected things can happen when you start mixing > versions of compilers (particularly across major versions

Re: [OMPI users] How to yield CPU more when not computing (was curious behavior during wait for broadcast: 100% cpu)

2016-11-07 Thread Dave Love
[Some time ago] Jeff Hammond writes: > If you want to keep long-waiting MPI processes from clogging your CPU > pipeline and heating up your machines, you can turn blocking MPI > collectives into nicer ones by implementing them in terms of MPI-3 > nonblocking collectives using something like the f

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-11-07 Thread Dave Love
"r...@open-mpi.org" writes: > Yes, I’ve been hearing a growing number of complaints about cgroups for that > reason. Our mapping/ranking/binding options will work with the cgroup > envelope, but it generally winds up with a result that isn’t what the user > wanted or expected. How? I don't u

Re: [OMPI users] Redusing libmpi.so size....

2016-11-07 Thread Dave Love
Mahesh Nanavalla writes: > Hi all, > > I am using openmpi-1.10.3. > > openmpi-1.10.3 compiled for arm(cross compiled on X86_64 for openWRT > linux) libmpi.so.12.0.3 size is 2.4MB,but if i compiled on X86_64 (linux) > libmpi.so.12.0.3 size is 990.2KB. > > can anyone tell how to reduce the size o

Re: [OMPI users] mpi4py+OpenMPI: Qs about submitting bugs and examples

2016-11-07 Thread Dave Love
"r...@open-mpi.org" writes: >> Is this mailing list a good spot to submit bugs for OpenMPI? Or do I >> use github? > > You can use either - I would encourage the use of github “issues” when > you have a specific bug, and the mailing list for general questions I was told not to do that, and to se

Re: [OMPI users] How to yield CPU more when not computing (was curious behavior during wait for broadcast: 100% cpu)

2016-11-09 Thread Dave Love
Jeff Hammond writes: >> I see sleeping for ‘0s’ typically taking ≳50μs on Linux (measured on >> RHEL 6 or 7, without specific tuning, on recent Intel). It doesn't look >> like something you want in paths that should be low latency, but maybe >> there's something you can do to improve that? (sch

Re: [OMPI users] An old code compatibility

2016-11-15 Thread Dave Love
Mahmood Naderan writes: > Hi, > The following mpifort command fails with a syntax error. It seems that the > code is compatible with old gfortran, but I am not aware of that. Any idea > about that? > > mpifort -ffree-form -ffree-line-length-0 -ff2c -fno-second-underscore > -I/opt/fftw-3.3.5/inclu

Re: [OMPI users] How to yield CPU more when not computing (was curious behavior during wait for broadcast: 100% cpu)

2016-12-08 Thread Dave Love
Jeff Hammond writes: >> >> >> > Note that MPI implementations may be interested in taking advantage of >> > https://software.intel.com/en-us/blogs/2016/10/06/intel- >> xeon-phi-product-family-x200-knl-user-mode-ring-3-monitor-and-mwait. >> >> Is that really useful if it's KNL-specific and MSR-bas

[OMPI users] MPI+OpenMP core binding redux

2016-12-08 Thread Dave Love
I think there was a suggestion that the SC16 material would explain how to get appropriate core binding for MPI+OpenMP (i.e. OMP_NUM_THREADS cores/process), but it doesn't as far as I can see. Could someone please say how you're supposed to do that in recent versions (without relying on bound DRM

Re: [OMPI users] How to yield CPU more when not computing (was curious behavior during wait for broadcast: 100% cpu)

2016-12-12 Thread Dave Love
Andreas Schäfer writes: >> Yes, as root, and there are N different systems to at least provide >> unprivileged read access on HPC systems, but that's a bit different, I >> think. > > LIKWID[1] uses a daemon to provide limited RW access to MSRs for > applications. I wouldn't wonder if support for

Re: [OMPI users] rdmacm and udcm failure in 2.0.1 on RoCE

2016-12-15 Thread Dave Turner
lease verify that rdmacm is not currently working in 2.0.1? And therefore I'm assuming that 2.0.1 has not been successfully tested on RoCE??? Dave > -- > > Message: 1 > Date: Wed, 14

[OMPI users] epoll add error with OpenMPI 2.0.1 and SGE

2016-12-17 Thread Dave Turner
I've solved this problem by omitting --with-libevent=/usr from the configuration to force it to use the internal version. I thought I had tried this before posting but evidently did something wrong. Dave On Tue, Dec 13, 2016 at 9:57 PM, wrote: > Send users

[OMPI users] openib/mpi_alloc_mem pathology

2017-03-06 Thread Dave Love
I've been looking at a new version of an application (cp2k, for for what it's worth) which is calling mpi_alloc_mem/mpi_free_mem, and I don't think it did so the previous version I looked at. I found on an IB-based system it's spending about half its time in those allocation routines (according to

Re: [OMPI users] openib/mpi_alloc_mem pathology

2017-03-09 Thread Dave Love
Paul Kapinos writes: > Hi Dave, > > > On 03/06/17 18:09, Dave Love wrote: >> I've been looking at a new version of an application (cp2k, for for what >> it's worth) which is calling mpi_alloc_mem/mpi_free_mem, and I don't > > Welcome to the club! :o)

Re: [OMPI users] openib/mpi_alloc_mem pathology

2017-03-09 Thread Dave Love
Nathan Hjelm writes: > If this is with 1.10.x or older run with --mca memory_linux_disable > 1. There is a bad interaction between ptmalloc2 and psm2 support. This > problem is not present in v2.0.x and newer. Is that applicable to openib too? ___ user

Re: [OMPI users] openib/mpi_alloc_mem pathology

2017-03-15 Thread Dave Love
Paul Kapinos writes: > Nathan, > unfortunately '--mca memory_linux_disable 1' does not help on this > issue - it does not change the behaviour at all. > Note that the pathological behaviour is present in Open MPI 2.0.2 as > well as in /1.10.x, and Intel OmniPath (OPA) network-capable nodes are >

Re: [OMPI users] openib/mpi_alloc_mem pathology

2017-03-21 Thread Dave Love
I wrote: > But it works OK with libfabric (ofi mtl). Is there a problem with > libfabric? Apparently there is, or at least with ompi 1.10. I've now realized IMB pingpong latency on a QDR IB system with ompi 1.10.6+libfabric is ~2.5μs, which it isn't with ompi 1.6 openib. __

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-27 Thread Dave Love
"r...@open-mpi.org" writes: > Oh no, that's not right. Mpirun launches daemons using qrsh and those > daemons spawn the app's procs. SGE has no visibility of the app at all Oh no, that's not right. The whole point of tight integration with remote startup using qrsh is to report resource usage a

Re: [OMPI users] NUMA interaction with Open MPI

2017-07-27 Thread Dave Love
Gilles Gouaillardet writes: > Adam, > > keep in mind that by default, recent Open MPI bind MPI tasks > - to cores if -np 2 > - to NUMA domain otherwise Not according to ompi_info from the latest release; it says socket. > (which is a socket in most cases, unless > you are running on a Xeon Phi)

[OMPI users] absolute paths printed by info programs

2017-08-01 Thread Dave Love
ompi_info et al print absolute compiler paths for some reason. What would they ever be used for, and are they intended to refer to the OMPI build or application building? They're an issue for packaging in Guix, at least. Similarly, what's io_romio_complete_configure_params intended to be used fo

[OMPI users] --enable-builtin-atomics

2017-08-01 Thread Dave Love
What are the pros and cons of configuring with --enable-builtin-atomics? I haven't spotted any discussion of the option. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Questions about integration with resource distribution systems

2017-08-01 Thread Dave Love
Gilles Gouaillardet writes: > Dave, > > > unless you are doing direct launch (for example, use 'srun' instead of > 'mpirun' under SLURM), > > this is the way Open MPI is working : mpirun will use whatever the > resource manager provides > > in

Re: [OMPI users] --enable-builtin-atomics

2017-08-02 Thread Dave Love
Nathan Hjelm writes: > So far only cons. The gcc and sync builtin atomic provide slower > performance on x86-64 (and possible other platforms). I plan to > investigate this as part of the investigation into requiring C11 > atomics from the C compiler. Thanks. Is that a gcc deficiency, or do the

Re: [OMPI users] --enable-builtin-atomics

2017-08-02 Thread Dave Love
"Barrett, Brian via users" writes: > Well, if you’re trying to get Open MPI running on a platform for which > we don’t have atomics support, built-in atomics solves a problem for > you… That's not an issue in this case, I think. (I'd expect it to default to intrinsic if extrinsic support is mis

Re: [OMPI users] Questions about integration with resource distribution systems

2017-08-02 Thread Dave Love
Reuti writes: >> I should qualify that by noting that ENABLE_ADDGRP_KILL has apparently >> never propagated through remote startup, > > Isn't it a setting inside SGE which the sge_execd is aware of? I never > exported any environment variable for this purpose. Yes, but this is surely off-topic,

[OMPI users] built-in memchecker support

2017-08-24 Thread Dave Love
Apropos configuration parameters for packaging: Is there a significant benefit to configuring built-in memchecker support, rather than using the valgrind preload library? I doubt being able to use another PMPI tool directly at the same time counts. Also, are there measurements of the performance

Re: [OMPI users] built-in memchecker support

2017-08-24 Thread Dave Love
Christoph Niethammer writes: > Hi Dave, > > The memchecker interface is an addition which allows other tools to be > used as well. Do you mean it allows other things to be hooked in other than through PMPI? > A more recent one is memPin [1]. Thanks, but Pin is proprietary, so

Re: [OMPI users] built-in memchecker support

2017-08-24 Thread Dave Love
Gilles Gouaillardet writes: > Dave, > > the builtin memchecker can detect MPI usage errors such as modifying > the buffer passed to MPI_Isend() before the request completes OK, thanks. The implementation looks rather different, and it's not clear without checking the code

Re: [OMPI users] Do MPI calls ever sleep?

2010-07-21 Thread Dave Goodell
Are you sure that the planner is always running in parallel? What OS > and OMPI version are you using? sched_yield doesn't work as expected in late 2.6 Linux kernels: http://kerneltrap.org/Linux/CFS_and_sched_yield If this scheduling behavior change is affecting you, you might be able to fix it with: echo "1" >/proc/sys/kernel/sched_compat_yield -Dave

Re: [OMPI users] OpenMPI on the ARM processor architecture?

2010-09-22 Thread Dave Love
Jeff Squyres writes: > I believe that the first step would be to get some assembly for the > ARM platform for some of OMPI's key routines (locks, atomics, etc.). > Beyond that, it *might* "just work"...? Is http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579505 relevant/useful?

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-10-12 Thread Dave Love
Chris Jewell writes: > I've scrapped this system now in favour of the new SGE core binding feature. How does that work, exactly? I thought the OMPI SGE integration didn't support core binding, but good if it does.

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-10-14 Thread Dave Love
Reuti writes: > With the default binding_instance set to "set" (the default) the > shepherd should bind the processes to cores already. With other types > of binding_instance these selected cores must be forward to the > application via an environment variable or in the hostfile. My question was

Re: [OMPI users] Hair depleting issue with Ompi143 and one program

2011-01-20 Thread Dave Goodell
that you might have found a bug in Valgrind itself. It doesn't happen often, but the SSE code can be complicated and isn't exercised as often as the non-vector portions of Valgrind. Good luck, -Dave [1] http://valgrind.org/docs/manual/mc-manual.html#mc-manual.machine [2] http://

[OMPI users] bizarre failure with IMB/openib

2011-03-21 Thread Dave Love
I'm trying to test some new nodes with ConnectX adaptors, and failing to get (so far just) IMB to run on them. The binary runs on the same cluster using TCP, or using PSM on some other IB nodes. A rebuilt PMB and various existing binaries work with openib on the ConnectX nodes running it exactly

[OMPI users] 1.5.3 and SGE integration?

2011-03-21 Thread Dave Love
I've just tried 1.5.3 under SGE with tight integration, which seems to be broken. I built and ran in the same way as for 1.4.{1,3}, which works, and ompi_info reports the same gridengine parameters for 1.5 as for 1.4. The symptoms are that it reports a failure to communicate using ssh, whereas it

Re: [OMPI users] bizarre failure with IMB/openib

2011-03-21 Thread Dave Love
Peter Kjellström writes: > Are you sure you launched it correctly and that you have (re)built OpenMPI > against your Redhat-5 ib stack? Yes. I had to rebuild because I'd omitted openib when we only needed psm. As I said, I did exactly the same thing successfully with PMB (initially because I

Re: [OMPI users] 1.5.3 and SGE integration?

2011-03-21 Thread Dave Love
Terry Dontje writes: > Dave what version of Grid Engine are you using? 6.2u5, plus irrelevant patches. It's fine with ompi 1.4. (All I did to switch was to load the 1.5.3 modules environment.) > The plm checks for the following env-var's to determine if you are >

Re: [OMPI users] 1.5.3 and SGE integration?

2011-03-21 Thread Dave Love
Ralph Castain writes: > Just looking at this for another question. Yes, SGE integration is broken in > 1.5. Looking at how to fix now. > > Meantime, you can get it work by adding "-mca plm ^rshd" to your mpirun cmd > line. Thanks. I'd forgotten about plm when checking, though I guess that wou

Re: [OMPI users] bizarre failure with IMB/openib

2011-03-22 Thread Dave Love
Dave Love writes: > I'm trying to test some new nodes with ConnectX adaptors, and failing to > get (so far just) IMB to run on them. I suspect this is https://svn.open-mpi.org/trac/ompi/ticket/1919. I'm rather surprised it isn't an FAQ (actually frequently asked, not m

Re: [OMPI users] 1.5.3 and SGE integration?

2011-03-22 Thread Dave Love
Ralph Castain writes: >> Should rshd be mentioned in the release notes? > > Just starting the discussion on the best solution going forward. I'd > rather not have to tell SGE users to add this to their cmd line. :-( Sure. I just thought a new component would normally be mentioned in the notes.

Re: [OMPI users] Deadlock with mpi_init_thread + mpi_file_set_view

2011-04-04 Thread Dave Goodell
FWIW, we solved this problem with ROMIO in MPICH2 by making the "big global lock" a recursive mutex. In the past it was implicitly so because of the way that recursive MPI calls were handled. In current MPICH2 it's explicitly initialized with type PTHREAD_MUTEX_RECURSIVE inst

[OMPI users] using openib and psm together

2011-04-21 Thread Dave Love
We have an installation with both Mellanox and Qlogic IB adaptors (in distinct islands), so I built open-mpi 1.4.3 with openib and psm support. Now I've just read this in the OFED source, but I can't see any relevant issue in the open-mpi tracker: OpenMPI support --- It is recom

Re: [OMPI users] using openib and psm together

2011-04-26 Thread Dave Love
Jeff Squyres writes: > I believe it was mainly a startup issue -- there's a complicated > sequence of events that happens during MPI_INIT. IIRC, the issue was > that if OMPI had software support for PSM, it assumed that the lack of > PSM hardware was effectively an error. Thanks. For what it's

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-03 Thread Dave Love
Brock Palen writes: > We managed to have another user hit the bug that causes collectives (this > time MPI_Bcast() ) to hang on IB that was fixed by setting: > > btl_openib_cpc_include rdmacm Could someone explain this? We also have problems with collective hangs with openib/mlx4 (specifically

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-11 Thread Dave Love
Jeff Squyres writes: > We had a user-reported issue of some hangs that the IB vendors have > been unable to replicate in their respective labs. We *suspect* that > it may be an issue with the oob openib CPC, but that code is pretty > old and pretty mature, so all of us would be at least somewhat

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-11 Thread Dave Love
Ralph Castain writes: > I'll go back to my earlier comments. Users always claim that their > code doesn't have the sync issue, but it has proved to help more often > than not, and costs nothing to try, Could you point to that post, or tell us what to try excatly, given we're running IMB? Thanks

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-13 Thread Dave Love
Jeff Squyres writes: > On May 11, 2011, at 3:21 PM, Dave Love wrote: > >> We can reproduce it with IMB. We could provide access, but we'd have to >> negotiate with the owners of the relevant nodes to give you interactive >> access to them. Maybe Brock's

  1   2   3   4   >