Re: [OMPI users] ompio on Lustre

2018-10-09 Thread Dave Love
"Gabriel, Edgar" writes: > Hm, thanks for the report, I will look into this. I did not run the > romio tests, but the hdf5 tests are run regularly and with 3.1.2 you > should not have any problems on a regular unix fs. How many processes > did you use, and which tests did you run specifically? Th

Re: [OMPI users] ompio on Lustre

2018-10-10 Thread Dave Love
"Gabriel, Edgar" writes: > Ok, thanks. I usually run these test with 4 or 8, but the major item > is that atomicity is one of the areas that are not well supported in > ompio (along with data representations), so a failure in those tests > is not entirely surprising . If it's not expected to wo

Re: [OMPI users] no openmpi over IB on new CentOS 7 system

2018-10-10 Thread Dave Love
RDMA was just broken in the last-but-one(?) RHEL7 kernel release, in case that's the problem. (Fixed in 3.10.0-862.14.4.) ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] ompio on Lustre

2018-10-15 Thread Dave Love
For what it's worth, I found the following from running ROMIO's tests with OMPIO on Lustre mounted without flock (or localflock). I used 48 processes on two nodes with Lustre for tests which don't require a specific number. OMPIO fails tests atomicity, misc, and error on ext4; it additionally fai

Re: [OMPI users] ompio on Lustre

2018-10-16 Thread Dave Love
"Latham, Robert J." writes: > it's hard to implement fcntl-lock-free versions of Atomic mode and > Shared file pointer so file systems like PVFS don't support those modes > (and return an error indicating such at open time). Ah. For some reason I thought PVFS had the support to pass the tests s

Re: [OMPI users] ompio on Lustre

2018-10-16 Thread Dave Love
"Gabriel, Edgar" writes: > a) if we detect a Lustre file system without flock support, we can > printout an error message. Completely disabling MPI I/O is on the > ompio architecture not possible at the moment, since the Lustre > component can disqualify itself, but the generic Unix FS component

[OMPI users] filesystem-dependent failure building Fortran interfaces

2018-12-04 Thread Dave Love
If you try to build somewhere out of tree, not in a subdir of the source, the Fortran build is likely to fail because mpi-ext-module.F90 does include '/openmpi-4.0.0/ompi/mpiext/pcollreq/mpif-h/mpiext_pcollreq_mpifh.h' and can exceed the fixed line length. It either needs to add (the com

Re: [OMPI users] filesystem-dependent failure building Fortran interfaces

2018-12-05 Thread Dave Love
"Jeff Squyres (jsquyres) via users" writes: > Hi Dave; thanks for reporting. > > Yes, we've fixed this -- it should be included in 4.0.1. > > https://github.com/open-mpi/ompi/pull/6121 Good, but I'm confused; I checked the repo before reporting it. [I w

Re: [OMPI users] filesystem-dependent failure building Fortran interfaces

2018-12-11 Thread Dave Love
Jeff Hammond writes: > Preprocessor is fine in Fortran compilers. We’ve used in NWChem for many > years, and NWChem supports “all the compilers”. > > Caveats: > - Cray dislikes recursive preprocessing logic that other compilers handle. > You won’t use this so please ignore. > - IBM XLF requires -

[OMPI users] relocating an installation

2019-04-09 Thread Dave Love
Is it possible to use the environment or mpirun flags to run an OMPI that's been relocated from where it was configured/installed? (Say you've unpacked a system package that expects to be under /usr and want to run it from home without containers etc.) I thought that was possible, but I haven't f

Re: [OMPI users] relocating an installation

2019-04-09 Thread Dave Love
Reuti writes: > export OPAL_PREFIX= > > to point it to the new location of installation before you start `mpiexec`. Thanks; that's now familiar, and I don't know how I missed it with strings. It should be documented. I'd have expected --prefix to have the same effect, and for there to be an MC

Re: [OMPI users] relocating an installation

2019-04-10 Thread Dave Love
Reuti writes: >> It should be documented. > > There is this FAQ entry: > > https://www.open-mpi.org/faq/?category=building#installdirs For what it's worth, I looked under "running" in the FAQ, as I was after a runtime switch. I expect FAQs to point to the actual documentation, though, and an en

Re: [OMPI users] relocating an installation

2019-04-10 Thread Dave Love
"Jeff Squyres (jsquyres) via users" writes: > Reuti's right. > > Sorry about the potentially misleading use of "--prefix" -- we > basically inherited that CLI option from a different MPI > implementation (i.e., people asked for it). So we were locked into > that meaning for the "--prefix" CLI op

Re: [OMPI users] relocating an installation

2019-04-10 Thread Dave Love
In fact, setting OPAL_PREFIX doesn't work for a relocated tree (with OMPI 1.10 or 3.0). You also need $OPAL_PREFIX/lib and $OPAL_PREFIX/lib/openmpi on LD_LIBRARY_PATH (assuming $MPI_LIB=$MPI_HOME/lib): $ OPAL_PREFIX=$(pwd)/usr/lib64/openmpi3 ./usr/lib64/openmpi3/bin/mpirun mpirun true ./usr/

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-02-04 Thread Dave Love
Jeff Squyres writes: > Could the nodes be running out of shared memory and/or temp filesystem > space? I'm also seeing this non-reproducibly (on OpenSuSE 10.3, with Sun's Clustertools 8.1 prerelease on dual Barcelona nodes during PMB runs under SGE). I haven't had time to build the final 1.3 re

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-03-19 Thread Dave Love
Prentice Bisbal writes: > I just installed OpenMPI 1.3 with tight integration for SGE. Version > 1.2.8 was working just fine for several months in the same arrangement. > > Now that I've upgraded to 1.3, I get the following errors in my standard > error file: > > mca_common_sm_mmap_init: open /tm

Re: [OMPI users] OpenMPI 1.3.1 + BLCR build problem

2009-03-31 Thread Dave Love
M C writes: > --- MCA component crs:blcr (m4 configuration macro) > checking for MCA component crs:blcr compile mode... dso > checking --with-blcr value... sanity check ok (/opt/blcr) > checking --with-blcr-libdir value... sanity check ok (/opt/blcr/lib) > configure: WARNING: BLCR support request

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-03-31 Thread Dave Love
Rolf Vandevaart writes: >> However, I found that if I explicitly specify the "-machinefile >> $TMPDIR/machines", all 8 mpi processes were spawned within a single >> node, i.e. node0002. I had that sort of behaviour recently when the tight integration was broken on the installation we'd been give

Re: [OMPI users] OpenMPI 1.3.1 + BLCR build problem

2009-04-01 Thread Dave Love
Josh Hursey writes: > The configure flag that you are looking for is: > --with-ft=cr Is there a good reason why --with-blcr doesn't imply it? > You may also want to consider using the thread options too for > improved C/R response: > --enable-mpi-threads --enable-ft-thread Incidentally, the

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread Dave Love
Rolf Vandevaart writes: > No, orte_leave_session_attached is needed to avoid the errno=2 errors > from the sm btl. (It is fixed in 1.3.2 and trunk) [It does cause other trouble, but I forget what the exact behaviour was when I lost it as a default.] >> Yes, but there's a problem with the recomm

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-02 Thread Dave Love
I wrote: > E.g. on > 8-core nodes, if you submit a 16-process job, there are four cores left > over on the relevant nodes which might get something else scheduled on > them. Of course, that doesn't make much sense because I thought `12' and typed `16' for some reason... Thanks to Rolf for off-li

Re: [OMPI users] OpenMPI 1.3.1 + BLCR build problem

2009-04-02 Thread Dave Love
Josh Hursey writes: > Thanks. I'll fix this and post a new draft soon (I have a few other > items to put in there anyway). One thing to note in the mean time is that building with BLCR failed for me with the PGI compiler with a link-time message about a bad file format. I assume it's a libtool

[OMPI users] MX questions

2009-06-25 Thread Dave Love
It's not reproducible, but I sometimes see messages like [node01:29645] MX BTL delete procs running 1.3.1 with Open-MX and the MX BTL. Looking at the code, it's a dummy routine, but I didn't get as far as figuring out why it's (sometimes) called and what its significance is. Can someone expl

Re: [OMPI users] MX questions

2009-06-26 Thread Dave Love
Scott Atchley writes: > I believe the answer is yes as long as all NICs are in the same fabric > (they usually are). Thanks. Do you mean it won't if, in this case, the two NICs are on separate switches?

Re: [OMPI users] MX questions

2009-06-26 Thread Dave Love
George Bosilca writes: > It is not the BTL who open the second endpoint, it is the MTL. It's a > very long story, but unfortunately right now the two components (MTL > and BTL) each open an endpoint. Once the upper level complete the > selection of the component for the run, one of the endpoints

Re: [OMPI users] MX questions

2009-06-28 Thread Dave Love
Scott Atchley writes: > George's answer supersedes mine. You must be using the MX bonding > driver to use more than one NIC per host. Will that be relevant for Open-MX, which I'm using rather than normal MX? (I'm afraid I don't know anything about how MX systems work generally.) For what it's

[OMPI users] allreduce produces "error(8) registering gm memory"

2006-08-21 Thread Dave Grote
nce the default integer size used by g95 is 8 bytes but the openmpi fortran interface was compiled with f77 which uses 4 byte integers. Any suggestions on what to look for? Thanks for the help, Dave program parallel_sum_mmnts real(kind=8):: zmmnts(0:360,28,0:8) c Use reduct

[OMPI users] x11 forwarding

2006-11-29 Thread Dave Grote
useful. This is a major issue since my parallel code heavily depends on having the ability to open X windows on the remote machine. Any and all help would be appreciated! Thanks! Dave

Re: [OMPI users] x11 forwarding

2006-11-30 Thread Dave Grote
issue with the X server (xorg) or with the version of linux, so I am also seeking help from the person who maintains caos linux. If it matters, the machine uses myrinet for the interconnects.   Thanks! Dave Galen Shipman wrote: what does your command line look like? - Galen On Nov 29,

Re: [OMPI users] x11 forwarding

2006-11-30 Thread Dave Grote
Title: Re: [OMPI users] x11 forwarding I don't think that that is the problem. As far as I can tell, the DISPLAY environment variable is being set properly on the slave (it will sometimes have a different value than in the shell where mpirun was executed).   Dave Ralph H Castain

Re: [OMPI users] x11 forwarding

2006-12-01 Thread Dave Grote
my problem.    Dave Galen Shipman wrote: I think this might be as simple as adding "-d" to the mpirun command line If I run:  mpirun  -np 2 -d -mca pls_rsh_agent "ssh -X"   xterm -e gdb ./mpi-ping All is well, I get the

Re: [OMPI users] x11 forwarding

2006-12-01 Thread Dave Grote
eing picky.    Thanks!   Dave Galen Shipman wrote: -d leaves the ssh session open Try using:   mpirun -d -host boxtop2 -mca pls_rsh_agent "ssh -X -n" xterm -e cat  Note the "ssh -X -n", this will tell ssh not to open stdin..  You should then be

Re: [OMPI users] x11 forwarding

2006-12-01 Thread Dave Grote
Is there a place where I can hack the openmpi code to force it to keep the ssh sessions open without the -d option? I looked through some of the code, including orterun.c and a few other places, but don't have the familiarity with the code to find the place.   Thanks!      Dave Galen Sh

Re: [OMPI users] x11 forwarding

2006-12-04 Thread Dave Grote
ew command line flag to keep the ssh sessions running without turning on the debugging output. I know that others have the same XForwarding problem and this would offer a general solution.    Thanks for all of your help!!   Dave Ralph Castain wrote: I’m afraid that would be a rather signi

Re: [OMPI users] Problem building OpenMPI 1.8 on RHEL6

2014-04-01 Thread Dave Goodell (dgoodell)
run "autoreconf" by hand, make sure to run the "./autogen.sh" script that is packaged with OMPI. It will also check your versions and warn you if they are out of date. Do you need to build OMPI from the SVN source? Or would a (pre-autogen'ed) release tarball work for you? -Dave

Re: [OMPI users] usNIC point-to-point messaging module

2014-04-01 Thread Dave Goodell (dgoodell)
not pass a value, then it is "/usr/local". Then reinstall (with "make install" in the OMPI build tree). What I think is happening is that you still have an "mca_btl_usnic.so" file leftover from the last time you installed OMPI (before passing "--enable-mca-no-build=btl-usnic"). So OMPI is using this shared library and you get exactly the same problem. -Dave

Re: [OMPI users] usNIC point-to-point messaging module

2014-04-02 Thread Dave Goodell (dgoodell)
On Apr 2, 2014, at 12:57 PM, Filippo Spiga wrote: > I still do not understand why this keeps appearing... > > srun: cluster configuration lacks support for cpu binding > > Any clue? I don't know what causes that message. Ralph, any thoughts here? -Dave

Re: [OMPI users] mpirun runs in serial even I set np to several processors

2014-04-14 Thread Dave Goodell (dgoodell)
a different MPI implementation than you are using to run it (e.g., MPICH vs. Open MPI). -Dave

Re: [OMPI users] OMPI 1.8.1 Deadlock in mpi_finalize with mpi_init_thread

2014-04-29 Thread Dave Goodell (dgoodell)
I don't know of any workaround. I've created a ticket to track this, but it probably won't be very high priority in the short term: https://svn.open-mpi.org/trac/ompi/ticket/4575 -Dave On Apr 25, 2014, at 3:27 PM, Jamil Appa wrote: > > Hi > > The fol

Re: [OMPI users] importing to MPI data already in memory from another process

2014-06-27 Thread Dave Goodell (dgoodell)
ent, since any page you gift away should probably come from mmap(2) directly). Otherwise, as George mentioned, I would investigate converting your current data collector processes to also be MPI processes so that they can simply communicate the data to the rest of the cluster. -Dave

Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Dave Goodell (dgoodell)
/3772826/158513. -Dave On Sep 29, 2014, at 1:34 PM, Ralph Castain wrote: > Afraid I cannot replicate a problem with singleton behavior in the 1.8 series: > > 11:31:52 /home/common/openmpi/v1.8/orte/test/mpi$ ./hello foo bar > Hello, World, I am 0 of 1 [0 local peers]:

Re: [OMPI users] mpi_wtime implementation

2014-11-24 Thread Dave Goodell (dgoodell)
On Nov 24, 2014, at 12:06 AM, George Bosilca wrote: > https://github.com/open-mpi/ompi/pull/285 is a potential answer. I would like > to hear Dave Goodell comment on this before pushing it upstream. > > George. I'll take a look at it today. My notification settings were m

Re: [OMPI users] send and receive vectors + variable length

2015-01-09 Thread Dave Goodell (dgoodell)
requests (assuming they can be progressed). The following should not deadlock: ✂ for (...) MPI_Isend(...) for (...) MPI_Irecv(...) MPI_Waitall(send_requests...) MPI_Waitall(recv_requests...) ✂ -Dave

Re: [OMPI users] New to (Open)MPI

2016-09-02 Thread Dave Goodell (dgoodell)
ption here. -Dave > On Sep 2, 2016, at 5:35 AM, Jeff Squyres (jsquyres) > wrote: > > Greetings Lachlan. > > Yes, Gilles and John are correct: on Cisco hardware, our usNIC transport is > the lowest latency / best HPC-performance transport. I'm not aware of any >

Re: [OMPI users] trying to use personal copy of 1.7.4

2014-03-12 Thread Dave Goodell (dgoodell)
ary -- But what we're getting is: app ---> /usr/OMPI \ --> library ---> ~ross/OMPI If one of them was first linked against the /usr/OMPI and managed to get an RPATH then it could override your LD_LIBRARY_PATH. -Dave On Mar 12, 2014, at 5:39 AM, Jeff Squyres (jsquyres)

Re: [OMPI users] Bug: Disabled mpi_leave_pinned for GPUDirect and InfiniBand during run-time caused by GCC optimizations

2015-06-08 Thread Dave Goodell (dgoodell)
ound for now though, and the "volatile" approach seems fine to me. -Dave

Re: [OMPI users] Using POSIX shared memory as send buffer

2015-09-28 Thread Dave Goodell (dgoodell)
numa_maps". There's lots of info about NUMA affinity here: https://queue.acm.org/detail.cfm?id=2513149 -Dave

[OMPI users] experience on POWER?

2020-10-24 Thread Dave Love via users
Can anyone report experience with recent OMPI on POWER (ppc64le) hardware, e.g. Summit? When I tried on similar nodes to Summit's (but fewer!), the IMB-RMA benchmark SEGVs early on. Before I try to debug it, I'd be interested to know if anyone else has investigated that or had better luck and, if

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-23 Thread Dave Love via users
Mark Dixon via users writes: > Surely I cannot be the only one who cares about using a recent openmpi > with hdf5 on lustre? I generally have similar concerns. I dug out the romio tests, assuming something more basic is useful. I ran them with ompi 4.0.5+ucx on Mark's lustre system (similar to

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-25 Thread Dave Love via users
I wrote: > The perf test says romio performs a bit better. Also -- from overall > time -- it's faster on IMB-IO (which I haven't looked at in detail, and > ran with suboptimal striping). I take that back. I can't reproduce a significant difference for total IMB-IO runtime, with both run in par

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-27 Thread Dave Love via users
Mark Dixon via users writes: > But remember that IMB-IO doesn't cover everything. I don't know what useful operations it omits, but it was the obvious thing to run, that should show up pathology, with simple things first. It does at least run, which was the first concern. > For example, hdf5's

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-30 Thread Dave Love via users
As a check of mpiP, I ran HDF5 testpar/t_bigio under it. This was on one node with four ranks (interactively) on lustre with its default of one 1MB stripe, ompi-4.0.5 + ucx-1.9, hdf5-1.10.7, MCA defaults. I don't know how useful it is, but here's the summary: romio: @--- Aggregate Time (top t

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-12-02 Thread Dave Love via users
Mark Allen via users writes: > At least for the topic of why romio fails with HDF5, I believe this is the > fix we need (has to do with how romio processes the MPI datatypes in its > flatten routine). I made a different fix a long time ago in SMPI for that, > then somewhat more recently it was r

[OMPI users] RMA breakage

2020-12-07 Thread Dave Love via users
After seeing several failures with RMA with the change needed to get 4.0.5 through IMB, I looked for simple tests. So, I built the mpich 3.4b1 tests -- or the ones that would build, and I haven't checked why some fail -- and ran the rma set. Three out of 180 passed. Many (most?) aborted in ucx,

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-12-07 Thread Dave Love via users
Ralph Castain via users writes: > Just a point to consider. OMPI does _not_ want to get in the mode of > modifying imported software packages. That is a blackhole of effort we > simply cannot afford. It's already done that, even in flatten.c. Otherwise updating to the current version would be t

Re: [OMPI users] [EXTERNAL] RMA breakage

2020-12-11 Thread Dave Love via users
"Pritchard Jr., Howard" writes: > Hello Dave, > > There's an issue opened about this - > > https://github.com/open-mpi/ompi/issues/8252 Thanks. I don't know why I didn't find that, unless I searched before it appeared. Obviously I was wrong to think it

[OMPI users] 4.1 mpi-io test failures on lustre

2021-01-14 Thread Dave Love via users
I tried mpi-io tests from mpich 4.3 with openmpi 4.1 on the ac922 system that I understand was used to fix ompio problems on lustre. I'm puzzled that I still see failures. I don't know why there are disjoint sets in mpich's test/mpi/io and src/mpi/romio/test, but I ran all the non-Fortran ones wi

[OMPI users] bad defaults with ucx

2021-01-14 Thread Dave Love via users
Why does 4.1 still not use the right defaults with UCX? Without specifying osc=ucx, IMB-RMA crashes like 4.0.5. I haven't checked what else it is UCX says you must set for openmpi to avoid memory corruption, at least, but I guess that won't be right either. Users surely shouldn't have to explore

Re: [OMPI users] bad defaults with ucx

2021-01-14 Thread Dave Love via users
"Jeff Squyres (jsquyres)" writes: > Good question. I've filed > https://github.com/open-mpi/ompi/issues/8379 so that we can track > this. For the benefit of the list: I mis-remembered that osc=ucx was general advice. The UCX docs just say you need to avoid the uct btl, which can cause memory

Re: [OMPI users] 4.1 mpi-io test failures on lustre

2021-01-15 Thread Dave Love via users
"Gabriel, Edgar via users" writes: > I will have a look at those tests. The recent fixes were not > correctness, but performance fixes. > Nevertheless, we used to pass the mpich tests, but I admit that it is > not a testsuite that we run regularly, I will have a look at them. The > atomicity test

Re: [OMPI users] 4.1 mpi-io test failures on lustre

2021-01-18 Thread Dave Love via users
"Gabriel, Edgar via users" writes: >> How should we know that's expected to fail? It at least shouldn't fail like >> that; set_atomicity doesn't return an error (which the test is prepared for >> on a filesystem like pvfs2). >> I assume doing nothing, but appearing to, can lead to corrupt da

[OMPI users] vectorized reductions

2021-07-19 Thread Dave Love via users
I meant to ask a while ago about vectorized reductions after I saw a paper that I can't now find. I didn't understand what was behind it. Can someone explain why you need to hand-code the avx implementations of the reduction operations now used on x86_64? As far as I remember, the paper didn't j

Re: [OMPI users] vectorized reductions

2021-07-20 Thread Dave Love via users
Gilles Gouaillardet via users writes: > One motivation is packaging: a single Open MPI implementation has to be > built, that can run on older x86 processors (supporting only SSE) and the > latest ones (supporting AVX512). I take dispatch on micro-architecture for granted, but it doesn't require

Re: [OMPI users] Using OSU benchmarks for checking network

2022-02-08 Thread Dave Turner via users
work. It doesn't run global tests, but does point-to-point unidirectional, bi-directional, and aggregate and may give you some information about the performance change at 16 KB and whether it is coming from OpenMPI or IB. https://netpipe.cs.ksu.edu Dave Turner On Tue,

[OMPI users] ucx configuration

2023-01-05 Thread Dave Love via users
I see assorted problems with OMPI 4.1 on IB, including failing many of the mpich tests (non-mpich-specific ones) particularly with RMA. Now I wonder if UCX build options could have anything to do with it, but I haven't found any relevant information. What configure options would be recommended wi

Re: [OMPI users] ucx configuration

2023-01-11 Thread Dave Love via users
Gilles Gouaillardet via users writes: > Dave, > > If there is a bug you would like to report, please open an issue at > https://github.com/open-mpi/ompi/issues and provide all the required > information > (in this case, it should also include the UCX library you are usin

[OMPI users] Set maximum number of CPU (or threads) for a user

2023-06-26 Thread Dave Martin via users
to use -np 2 will not suffice. Thank you, Dave Martin

<    1   2   3   4