Re: [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-22 Thread Gilles Gouaillardet

Daniel,

i double checked this and i cannot make any sense with these logs.

if coll_ml_priority is zero, then i do not any way how 
ml_coll_hier_barrier_setup can be invoked.


could you please run again with --mca coll_base_verbose 100
with and without --mca coll ^ml

Cheers,

Gilles

On 6/22/2015 12:08 AM, Gilles Gouaillardet wrote:

Daniel,

ok, thanks

it seems that even if priority is zero, some code gets executed
I will confirm this tomorrow and send you a patch to work around the 
issue if that if my guess is proven right


Cheers,

Gilles

On Sunday, June 21, 2015, Daniel Letai > wrote:


MCA coll: parameter "coll_ml_priority" (current value: "0", data
source: default, level: 9 dev/all, type: int)

Not sure how to read this, but for any n>1 mpirun only works with
--mca coll ^ml

Thanks for helping

On 06/18/2015 04:36 PM, Gilles Gouaillardet wrote:

This is really odd...

you can run
ompi_info --all
and search coll_ml_priority

it will display the current value and the origin
(e.g. default, system wide config, user config, cli, environment
variable)

Cheers,

Gilles

On Thursday, June 18, 2015, Daniel Letai > wrote:

No, that's the issue.
I had to disable it to get things working.

That's why I included my config settings - I couldn't figure
out which option enabled it, so I could remove it from the
configuration...

On 06/18/2015 02:43 PM, Gilles Gouaillardet wrote:

Daniel,

ML module is not ready for production and is disabled by
default.

Did you explicitly enable this module ?
If yes, I encourage you to disable it

Cheers,

Gilles

On Thursday, June 18, 2015, Daniel Letai 
wrote:

given a simple hello.c:

#include 
#include 

int main(int argc, char* argv[])
{
int size, rank, len;
char name[MPI_MAX_PROCESSOR_NAME];

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(name, &len);

printf("%s: Process %d out of %d\n", name, rank,
size);

MPI_Finalize();
}

for n=1
mpirun -n 1 ./hello
it works correctly.

for n>1 it segfaults with signal 11
used gdb to trace the problem to ml coll:

Program received signal SIGSEGV, Segmentation fault.
0x76750845 in ml_coll_hier_barrier_setup()
from /lib/openmpi/mca_coll_ml.so

running with
mpirun -n 2 --mca coll ^ml ./hello
works correctly

using mellanox ofed 2.3-2.0.5-rhel6.4-x86_64, if it's at
all relevant.
openmpi 1.8.5 was built with following options:
rpmbuild --rebuild --define 'configure_options
--with-verbs=/usr --with-verbs-libdir=/usr/lib64 CC=gcc
CXX=g++ FC=gfortran CFLAGS="-g -O3"
--enable-mpirun-prefix-by-default
--enable-orterun-prefix-by-default --disable-debug
--with-knem=/opt/knem-1.1.1.90mlnx
--with-platform=optimized --without-mpi-param-check
--with-contrib-vt-flags=--disable-iotrace
--enable-builtin-atomics --enable-cxx-exceptions
--enable-sparse-groups --enable-mpi-thread-multiple
--enable-memchecker --enable-btl-openib-failover
--with-hwloc=internal --with-verbs --with-x --with-slurm
--with-pmi=/opt/slurm --with-fca=/opt/mellanox/fca
--with-mxm=/opt/mellanox/mxm
--with-hcoll=/opt/mellanox/hcoll' openmpi-1.8.5-1.src.rpm

gcc version 5.1.1

Thanks in advance
___
users mailing list
us...@open-mpi.org
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/06/27154.php



___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/06/27155.php




___
users mailing list
us...@open-mpi.org  
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/06/27157.php




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/use

[OMPI users] PMIx 2.0: seeking input

2015-06-22 Thread Ralph Castain
Hello all

Sorry to "spam" the list, but we'd really like to get as wide a range of
input as possible on features for the next release of PMIx (see below). I
haven't attached

Ralph
==

PMIx 1.0 has now been released. If you haven’t looked at it, I invite you
to please do so. I’ve attached the API definitions, and you’ll find more
(slightly outdated, I’m afraid) here:

https://github.com/open-mpi/pmix/wiki

As a reminder, the intent behind PMIx is to transparently provide backward
compatibility for PMI-1 and PMI-2, while extending the APIs to support
advanced capabilities and providing exascale performance. Support by SLURM,
ORCM, and other RMs will be coming later this year. I am working right now
on completing the embedded support for OpenMPI, and hope to release that in
the next week or two - at that time, any job executed via mpirun will have
full support for PMIx functions.

I’d like to invite your input for the upcoming v2.0 APIs. Our initial plan
is to release 2.0 in time for SC15, with the expectation that we may not
have all the features implemented yet - whether we add them during the 2.0
series, or delay some to 3.0 remains TBD.

The initial thought is to focus 2.0 in the following areas - please note
that we would deeply appreciate the involvement of each relevant community,
so please feel free to forward this note and/or reach out to relevant
representatives:

1. Performance improvements
   * dynamic spawn/reap of listening threads to achieve target performance
of completing 1000 client connections in < 1 sec
   * shared memory use to reduce memory footprint (Elena has already sent
out some thoughts on this)

2. Fault response support
We currently provide application notification of faults (existing and
impending) that includes information on the impacted processes. However,
the response is currently limited to calling PMIx_Abort - i.e., the app can
take internal action, but the only request it can make of the RM is to
abort. We do allow for abort of specific procs as opposed to the entire
job, but we’d like to support a broader set of options. For example, the
app might request a coordinated checkpoint, ask for replacement nodes to be
allocated, or request immediate restart at a reduced size.

3. File system support
We would like to begin supporting file positioning directives - e.g.,
hot/warm/cold data movement, persistence requests to maintain files and/or
shared memory regions across job steps, and burst buffer management.

4. Network/fabric support
The existing notification capability can be used to notify of network
issues. However, there has been interest expressed in further interactions
that would allow an application to specify quality of service and security
requirements, request information on network topology, etc.

5. Power directives
On very large scale systems, it is expected that some form of power
management will be required or desired. Most of that happens at allocation
request time, but there may be some possible directives an app could want
to pass during execution. We’re open to suggestion.

6. Workflow support
We have the "spawn" support in PMIx 1.0, but that was designed expressly
for support of MPI applications. Other programming models may require
different or additional support. PMIx is intended to support a wide range
of models, and we'd welcome input on how workflows can be better supported.

Any other topics of interest are always welcome!
Ralph


[OMPI users] Anyone successfully running Abaqus with OpenMPI?

2015-06-22 Thread Belgin, Mehmet
Hi everyone,

We recently upgraded the hwloc library used by the schedulers (along with 
openmpi and all applications compiled with it), but apparently Abaqus is using 
an internal MPI (PMPI) which still points to old hwloc versions that are not 
compatible. As a result, Abaqus either crashes or runs extremely slow.

Abaqus documentation suggests that it may be possible to run it using an 
external MPI stack, and I am hoping to make it work with our stock 
openmpi/1.8.4 that knows how to talk with the scheduler's hwloc. Unfortunately, 
however, all of my attempts failed miserably so far (no specific instructions 
for openmpi).

I was wondering if anyone had success with getting Abaqus running with openmpi. 
Even the information of whether it is possible or not will help us a great deal.

Thanks for your help!

-Mehmet


PS: I sent a similar question to the mvapich2 list as well. If they respond 
with some information that can be applied to openmpi, I will happily share them 
here.


Re: [OMPI users] Anyone successfully running Abaqus with OpenMPI?

2015-06-22 Thread Brice Goglin
Hello
Can you send more details about the incompatibility between hwloc old
and recent versions? Maybe there's a workaround there.
hwloc is supposed to maintain compatibility but we've seen cases where
XML export/import doesn't work because the old version exports buggy
XMLs that the recent version considers as invalid.
Do you get hwloc warnings or error messages?
Brice



Le 23/06/2015 00:06, Belgin, Mehmet a écrit :
>
> Hi everyone,
>
> We recently upgraded the hwloc library used by the schedulers (along
> with openmpi and all applications compiled with it), but apparently
> Abaqus is using an internal MPI (PMPI) which still points to old hwloc
> versions that are not compatible. As a result, Abaqus either crashes
> or runs extremely slow.
>
> Abaqus documentation suggests that it may be possible to run it using
> an external MPI stack, and I am hoping to make it work with our stock
> openmpi/1.8.4 that knows how to talk with the scheduler's hwloc.
> Unfortunately, however, all of my attempts failed miserably so far (no
> specific instructions for openmpi).
>
> I was wondering if anyone had success with getting Abaqus running with
> openmpi. Even the information of whether it is possible or not will
> help us a great deal.
>
> Thanks for your help!
>
> -Mehmet
>
>
> PS: I sent a similar question to the mvapich2 list as well. If they
> respond with some information that can be applied to openmpi, I will
> happily share them here.
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27172.php



Re: [OMPI users] Anyone successfully running Abaqus with OpenMPI?

2015-06-22 Thread Tim Prince


On 6/22/2015 6:06 PM, Belgin, Mehmet wrote:
>
>
> Abaqus documentation suggests that it may be possible to run it using
> an external MPI stack, and I am hoping to make it work with our stock
> openmpi/1.8.4 that knows how to talk with the scheduler's hwloc.
> Unfortunately, however, all of my attempts failed miserably so far (no
> specific instructions for openmpi).
>
> I was wondering if anyone had success with getting Abaqus running with
> openmpi. Even the information of whether it is possible or not will
> help us a great deal.
>
>
Data types encodings are incompatible between openmpi and mpich
derivatives, and, I think, with the HP or Platform-MPI normally used by
past Abaqus releases.  You should be looking at Abaqus release notes for
your version.
Comparing include files between the various MPI families should give you
a clue about type encoding compatibility.  Lack of instructions for
openmpi probably means something.

-- 
Tim Prince