Re: [OMPI users] ucx configuration

2023-01-11 Thread Dave Love via users
Gilles Gouaillardet via users writes: > Dave, > > If there is a bug you would like to report, please open an issue at > https://github.com/open-mpi/ompi/issues and provide all the required > information > (in this case, it should also include the UCX library you are using and how > it was obtaine

[OMPI users] ucx configuration

2023-01-05 Thread Dave Love via users
I see assorted problems with OMPI 4.1 on IB, including failing many of the mpich tests (non-mpich-specific ones) particularly with RMA. Now I wonder if UCX build options could have anything to do with it, but I haven't found any relevant information. What configure options would be recommended wi

Re: [OMPI users] vectorized reductions

2021-07-20 Thread Dave Love via users
Gilles Gouaillardet via users writes: > One motivation is packaging: a single Open MPI implementation has to be > built, that can run on older x86 processors (supporting only SSE) and the > latest ones (supporting AVX512). I take dispatch on micro-architecture for granted, but it doesn't require

[OMPI users] vectorized reductions

2021-07-19 Thread Dave Love via users
I meant to ask a while ago about vectorized reductions after I saw a paper that I can't now find. I didn't understand what was behind it. Can someone explain why you need to hand-code the avx implementations of the reduction operations now used on x86_64? As far as I remember, the paper didn't j

Re: [OMPI users] 4.1 mpi-io test failures on lustre

2021-01-18 Thread Dave Love via users
"Gabriel, Edgar via users" writes: >> How should we know that's expected to fail? It at least shouldn't fail like >> that; set_atomicity doesn't return an error (which the test is prepared for >> on a filesystem like pvfs2). >> I assume doing nothing, but appearing to, can lead to corrupt da

Re: [OMPI users] 4.1 mpi-io test failures on lustre

2021-01-15 Thread Dave Love via users
"Gabriel, Edgar via users" writes: > I will have a look at those tests. The recent fixes were not > correctness, but performance fixes. > Nevertheless, we used to pass the mpich tests, but I admit that it is > not a testsuite that we run regularly, I will have a look at them. The > atomicity test

Re: [OMPI users] bad defaults with ucx

2021-01-14 Thread Dave Love via users
"Jeff Squyres (jsquyres)" writes: > Good question. I've filed > https://github.com/open-mpi/ompi/issues/8379 so that we can track > this. For the benefit of the list: I mis-remembered that osc=ucx was general advice. The UCX docs just say you need to avoid the uct btl, which can cause memory

[OMPI users] bad defaults with ucx

2021-01-14 Thread Dave Love via users
Why does 4.1 still not use the right defaults with UCX? Without specifying osc=ucx, IMB-RMA crashes like 4.0.5. I haven't checked what else it is UCX says you must set for openmpi to avoid memory corruption, at least, but I guess that won't be right either. Users surely shouldn't have to explore

[OMPI users] 4.1 mpi-io test failures on lustre

2021-01-14 Thread Dave Love via users
I tried mpi-io tests from mpich 4.3 with openmpi 4.1 on the ac922 system that I understand was used to fix ompio problems on lustre. I'm puzzled that I still see failures. I don't know why there are disjoint sets in mpich's test/mpi/io and src/mpi/romio/test, but I ran all the non-Fortran ones wi

Re: [OMPI users] [EXTERNAL] RMA breakage

2020-12-11 Thread Dave Love via users
"Pritchard Jr., Howard" writes: > Hello Dave, > > There's an issue opened about this - > > https://github.com/open-mpi/ompi/issues/8252 Thanks. I don't know why I didn't find that, unless I searched before it appeared. Obviously I was wrong to think it didn't look system-specific without time

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-12-07 Thread Dave Love via users
Ralph Castain via users writes: > Just a point to consider. OMPI does _not_ want to get in the mode of > modifying imported software packages. That is a blackhole of effort we > simply cannot afford. It's already done that, even in flatten.c. Otherwise updating to the current version would be t

[OMPI users] RMA breakage

2020-12-07 Thread Dave Love via users
After seeing several failures with RMA with the change needed to get 4.0.5 through IMB, I looked for simple tests. So, I built the mpich 3.4b1 tests -- or the ones that would build, and I haven't checked why some fail -- and ran the rma set. Three out of 180 passed. Many (most?) aborted in ucx,

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-12-02 Thread Dave Love via users
Mark Allen via users writes: > At least for the topic of why romio fails with HDF5, I believe this is the > fix we need (has to do with how romio processes the MPI datatypes in its > flatten routine). I made a different fix a long time ago in SMPI for that, > then somewhat more recently it was r

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-30 Thread Dave Love via users
As a check of mpiP, I ran HDF5 testpar/t_bigio under it. This was on one node with four ranks (interactively) on lustre with its default of one 1MB stripe, ompi-4.0.5 + ucx-1.9, hdf5-1.10.7, MCA defaults. I don't know how useful it is, but here's the summary: romio: @--- Aggregate Time (top t

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-27 Thread Dave Love via users
Mark Dixon via users writes: > But remember that IMB-IO doesn't cover everything. I don't know what useful operations it omits, but it was the obvious thing to run, that should show up pathology, with simple things first. It does at least run, which was the first concern. > For example, hdf5's

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-25 Thread Dave Love via users
I wrote: > The perf test says romio performs a bit better. Also -- from overall > time -- it's faster on IMB-IO (which I haven't looked at in detail, and > ran with suboptimal striping). I take that back. I can't reproduce a significant difference for total IMB-IO runtime, with both run in par

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-23 Thread Dave Love via users
Mark Dixon via users writes: > Surely I cannot be the only one who cares about using a recent openmpi > with hdf5 on lustre? I generally have similar concerns. I dug out the romio tests, assuming something more basic is useful. I ran them with ompi 4.0.5+ucx on Mark's lustre system (similar to

[OMPI users] experience on POWER?

2020-10-24 Thread Dave Love via users
Can anyone report experience with recent OMPI on POWER (ppc64le) hardware, e.g. Summit? When I tried on similar nodes to Summit's (but fewer!), the IMB-RMA benchmark SEGVs early on. Before I try to debug it, I'd be interested to know if anyone else has investigated that or had better luck and, if