Re: [OMPI users] stdout/stderr question

2018-09-10 Thread Ralph H Castain
I’m not sure why this would be happening. These error outputs go through the 
“show_help” functionality, and we specifically target it at stderr:

/* create an output stream for us */
OBJ_CONSTRUCT(&lds, opal_output_stream_t);
lds.lds_want_stderr = true;
orte_help_output = opal_output_open(&lds);

Jeff: is it possible the opal_output system is ignoring the request and pushing 
it to stdout??
Ralph


> On Sep 5, 2018, at 4:11 AM, emre brookes  wrote:
> 
> Thanks Gilles,
> 
> My goal is to separate openmpi errors from the stdout of the MPI program 
> itself so that errors can be identified externally (in particular in an 
> external framework running MPI jobs from various developers).
> 
> My not so "well written MPI program" was doing this:
>   MPI_Finalize();
>   exit( errorcode );
> Which I assume you are telling me was bad practice & will replace with
>   MPI_Abort( MPI_COMM_WORLD, errorcode );
>   MPI_Finalize();
>   exit( errorcode );
> I was previously a bit put off of MPI_Abort due to the vagueness of the man 
> page:
>> _Description_
>> This routine makes a "best attempt" to abort all tasks in the group of comm. 
>> This function does not require that the invoking environment take any action 
>> with the error code. However, a UNIX or POSIX environment should handle this 
>> as a return errorcode from the main program or an abort (errorcode). 
> & I didn't really have an MPI issue to "Abort", but had used this for a user 
> input or parameter issue.
> Nevertheless, I accept your best practice recommendation.
> 
> It was not only the originally reported message, other messages went to 
> stdout.
> Initially used the Ubuntu 16 LTS  "$ apt install openmpi-bin libopenmpi-dev" 
> which got me version (1.10.2),
> but this morning compiled and tested 2.1.5, with the same behavior, e.g.:
> 
> $ /src/ompi-2.1.5/bin/mpicxx test_using_mpi_abort.cpp
> $ /src/ompi-2.1.5/bin/mpirun -np 2 a.out > stdout
> [domain-name-embargoed:26078] 1 more process has sent help message 
> help-mpi-api.txt / mpi-abort
> [domain-name-embargoed:26078] Set MCA parameter "orte_base_help_aggregate" to 
> 0 to see all help / error messages
> $ cat stdout
> hello from 0
> hello from 1
> --
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode -1.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --
> $
> 
> Tested 3.1.2, where this has been *somewhat* fixed:
> 
> $ /src/ompi-3.1.2/bin/mpicxx test_using_mpi_abort.cpp
> $ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout
> --
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode -1.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --
> [domain-name-embargoed:19784] 1 more process has sent help message 
> help-mpi-api.txt / mpi-abort
> [domain-name-embargoed:19784] Set MCA parameter "orte_base_help_aggregate" to 
> 0 to see all help / error messages
> $ cat stdout
> hello from 1
> hello from 0
> $
> 
> But the originally reported error still goes to stdout:
> 
> $ /src/ompi-3.1.2/bin/mpicxx test_without_mpi_abort.cpp
> $ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout
> --
> mpirun detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>  Process name: [[22380,1],0]
>  Exit code:255
> --
> $ cat stdout
> hello from 0
> hello from 1
> ---
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> ---
> $
> 
> Summary:
> 1.10.2, 2.1.5 both send most openmpi generated messages to stdout.
> 3.1.2 sends at least one type of openmpi generated messages to stdout.
> I'll continue with my "wrapper" strategy for now, as it seems safest and
> most broadly deployable [e.g. on compute resources where I need to use admin 
> installed versions of MPI],
> but it would be nice for openmpi to ensure all generated messages end up in 
> stderr.
> 
> -Emre
> 
> Gilles Gouaillardet wrote:
>> Open MPI should likely write this message on stderr, I will have a look at 
>> that.
>> 
>> 
>> That being said, and though I have no intention to dodge the question, this 
>> case should not happen.
>> 
>> A well w

[OMPI users] *** Error in `orted': double free or corruption (out): 0x00002aaab4001680 ***, in some node combos.

2018-09-10 Thread Balazs HAJGATO
Dear list readers,

I have some problems with OpenMPI 3.1.1. In some node combos, I got the error 
(libibverbs: GRH is mandatory For RoCE address handle; *** Error in 
`/apps/brussel/CO7/ivybridge-ib/software/OpenMPI/3.1.1-GCC-7.3.0-2.30/bin/orted':
 double free or corruption (out): 0x2aaab4001680 ***), see details in file 
114_151.out.bz2, even with the most simplest run, like
mpirun -host nic114,nic151 hostname
In the file 114_151.out.bz2, you can see the output if I run the command from 
nic114. If I run the same command from nic151, it simply spits out the 
hostnames, without any errors. 

I also enclosed the ompi_info --all --parsable outputs from nic114 (nic151 is 
identical, see ompi.nic114.bz2). I do not have the config.log file, although I 
still have the config output (see confilg.out.bz2). The nodes have identical 
opsystems (as we use the same image), and the OpenMPI is also loaded from a 
central directory shared amongst the nodes. We have an infiniband network (with 
IP over IB) and an ethernet network. Intel MPI works without a problem, and I 
confirmed that the network is IB when I use the Intel MPI) It is not clear 
whether the orted error is the consequence of the libibverbs error, but it is 
not clear why OpenMPI wants to use RoCE at all. (ibv_devinfo is also attached, 
we do have a somewhat creative infiniband topology, based on fat-tree, but 
changing the topology did not solved the problem). The /tmp directory is 
writable, and not full. As a matter of fact, I get the same error incase of 
OpenMPI 2.0.2, and 2.1.1, and I do not get this error in case of OpenMPI 
1.10.2, and 1.10.3. Can anyone have some thoughts about this issue?

Regards,

Balazs Hajgato


ibv_dev.nic114
Description: ibv_dev.nic114


ibv_dev.nic151
Description: ibv_dev.nic151


114_151.out.bz2
Description: 114_151.out.bz2


config.out.bz2
Description: config.out.bz2


ompi.nic114.bz2
Description: ompi.nic114.bz2
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] RDMA over Ethernet in Open MPI - RoCE on AWS?

2018-09-10 Thread Barrett, Brian via users
It sounds like what you’re asking is “how do I get the best performance from 
Open MPI in AWS?”.

The TCP BTL is your best option for performance in AWS.  RoCE is going to be a 
bunch of work to get setup, and you’ll still end up with host processing of 
every packet.  There are a couple simple instance tweaks that can make a big 
difference.  AWS has published a very nice guide for setting up an EDA workload 
environment [1], which has a number of useful tweaks, particularly if you’re 
using C4 or earlier compute instances.  The biggest improvement, however, is to 
make sure you’re using a version of Open MPI newer than 2.1.2.  We fixed some 
fairly serious performance issues in the Open MPI TCP stack (that, humorously 
enough, were also in the MPICH TCP stack and have been fixed there as well) in 
2.1.2.

Given that your application is fairly asynchronous, you might want to 
experiment with the btl_tcp_progress_thread MCA parameter.  If your application 
benefits from asynchronous progress, using a progress thread might be the best 
option.

Brian

> On Sep 6, 2018, at 7:10 PM, Benjamin Brock  wrote:
> 
> I'm setting up a cluster on AWS, which will have a 10Gb/s or 25Gb/s Ethernet 
> network.  Should I expect to be able to get RoCE to work in Open MPI on AWS?
> 
> More generally, what optimizations and performance tuning can I do to an Open 
> MPI installation to get good performance on an Ethernet network?
> 
> My codes use a lot of random access AMOs and asynchronous block transfers, so 
> it seems to me like setting up RDMA over Ethernet would be essential to 
> getting good performance, but I can't seem to find much information about it 
> online.
> 
> Any pointers you have would be appreciated.
> 
> Ben
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] stdout/stderr question

2018-09-10 Thread Gilles Gouaillardet
I investigated a this a bit and found that the (latest ?) v3 branches 
have the expected behavior


(e.g. the error messages is sent to stderr)


Since it is very unlikely Open MPI 2.1 will ever be updated, I can 
simply encourage you to upgrade to a newer Open MPI version.


Latest fully supported versions are currently such as 3.1.2 or 3.0.2



Cheers,

Gilles




On 9/11/2018 2:27 AM, Ralph H Castain wrote:

I’m not sure why this would be happening. These error outputs go through the 
“show_help” functionality, and we specifically target it at stderr:

 /* create an output stream for us */
 OBJ_CONSTRUCT(&lds, opal_output_stream_t);
 lds.lds_want_stderr = true;
 orte_help_output = opal_output_open(&lds);

Jeff: is it possible the opal_output system is ignoring the request and pushing 
it to stdout??
Ralph



On Sep 5, 2018, at 4:11 AM, emre brookes  wrote:

Thanks Gilles,

My goal is to separate openmpi errors from the stdout of the MPI program itself 
so that errors can be identified externally (in particular in an external 
framework running MPI jobs from various developers).

My not so "well written MPI program" was doing this:
   MPI_Finalize();
   exit( errorcode );
Which I assume you are telling me was bad practice & will replace with
   MPI_Abort( MPI_COMM_WORLD, errorcode );
   MPI_Finalize();
   exit( errorcode );
I was previously a bit put off of MPI_Abort due to the vagueness of the man 
page:

_Description_
This routine makes a "best attempt" to abort all tasks in the group of comm. 
This function does not require that the invoking environment take any action with the 
error code. However, a UNIX or POSIX environment should handle this as a return errorcode 
from the main program or an abort (errorcode).

& I didn't really have an MPI issue to "Abort", but had used this for a user 
input or parameter issue.
Nevertheless, I accept your best practice recommendation.

It was not only the originally reported message, other messages went to stdout.
Initially used the Ubuntu 16 LTS  "$ apt install openmpi-bin libopenmpi-dev" 
which got me version (1.10.2),
but this morning compiled and tested 2.1.5, with the same behavior, e.g.:

$ /src/ompi-2.1.5/bin/mpicxx test_using_mpi_abort.cpp
$ /src/ompi-2.1.5/bin/mpirun -np 2 a.out > stdout
[domain-name-embargoed:26078] 1 more process has sent help message 
help-mpi-api.txt / mpi-abort
[domain-name-embargoed:26078] Set MCA parameter "orte_base_help_aggregate" to 0 
to see all help / error messages
$ cat stdout
hello from 0
hello from 1
--
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--
$

Tested 3.1.2, where this has been *somewhat* fixed:

$ /src/ompi-3.1.2/bin/mpicxx test_using_mpi_abort.cpp
$ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout
--
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--
[domain-name-embargoed:19784] 1 more process has sent help message 
help-mpi-api.txt / mpi-abort
[domain-name-embargoed:19784] Set MCA parameter "orte_base_help_aggregate" to 0 
to see all help / error messages
$ cat stdout
hello from 1
hello from 0
$

But the originally reported error still goes to stdout:

$ /src/ompi-3.1.2/bin/mpicxx test_without_mpi_abort.cpp
$ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout
--
mpirun detected that one or more processes exited with non-zero status, thus 
causing
the job to be terminated. The first process to do so was:

  Process name: [[22380,1],0]
  Exit code:255
--
$ cat stdout
hello from 0
hello from 1
---
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
---
$

Summary:
1.10.2, 2.1.5 both send most openmpi generated messages to stdout.
3.1.2 sends at least one type of openmpi generated messages to stdout.
I'll continue with my "wrapper" strategy for now, as it seems safest and
most broadly deployable [e.g. on compute resources where I need to use admin 
installed versions of MPI],
but it would be nice for openmpi to ensure all generated messages end up in 
stderr.

-Emre

Gilles Gouaillardet wrote:

Open 

Re: [OMPI users] stdout/stderr question

2018-09-10 Thread emre brookes

Gilles Gouaillardet wrote:
I investigated a this a bit and found that the (latest ?) v3 branches 
have the expected behavior


(e.g. the error messages is sent to stderr)


Since it is very unlikely Open MPI 2.1 will ever be updated, I can 
simply encourage you to upgrade to a newer Open MPI version.


Latest fully supported versions are currently such as 3.1.2 or 3.0.2



Cheers,

Gilles



So you tested 3.1.2 or something newer with this error?


But the originally reported error still goes to stdout:

$ /src/ompi-3.1.2/bin/mpicxx test_without_mpi_abort.cpp
$ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout
-- 

mpirun detected that one or more processes exited with non-zero 
status, thus causing

the job to be terminated. The first process to do so was:

  Process name: [[22380,1],0]
  Exit code:255
-- 


$ cat stdout
hello from 0
hello from 1
---
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
---
$

-Emre





On 9/11/2018 2:27 AM, Ralph H Castain wrote:
I’m not sure why this would be happening. These error outputs go 
through the “show_help” functionality, and we specifically target it 
at stderr:


 /* create an output stream for us */
 OBJ_CONSTRUCT(&lds, opal_output_stream_t);
 lds.lds_want_stderr = true;
 orte_help_output = opal_output_open(&lds);

Jeff: is it possible the opal_output system is ignoring the request 
and pushing it to stdout??

Ralph



On Sep 5, 2018, at 4:11 AM, emre brookes  wrote:

Thanks Gilles,

My goal is to separate openmpi errors from the stdout of the MPI 
program itself so that errors can be identified externally (in 
particular in an external framework running MPI jobs from various 
developers).


My not so "well written MPI program" was doing this:
   MPI_Finalize();
   exit( errorcode );
Which I assume you are telling me was bad practice & will replace with
   MPI_Abort( MPI_COMM_WORLD, errorcode );
   MPI_Finalize();
   exit( errorcode );
I was previously a bit put off of MPI_Abort due to the vagueness of 
the man page:

_Description_
This routine makes a "best attempt" to abort all tasks in the group 
of comm. This function does not require that the invoking 
environment take any action with the error code. However, a UNIX or 
POSIX environment should handle this as a return errorcode from the 
main program or an abort (errorcode).
& I didn't really have an MPI issue to "Abort", but had used this 
for a user input or parameter issue.

Nevertheless, I accept your best practice recommendation.

It was not only the originally reported message, other messages went 
to stdout.
Initially used the Ubuntu 16 LTS  "$ apt install openmpi-bin 
libopenmpi-dev" which got me version (1.10.2),
but this morning compiled and tested 2.1.5, with the same behavior, 
e.g.:


$ /src/ompi-2.1.5/bin/mpicxx test_using_mpi_abort.cpp
$ /src/ompi-2.1.5/bin/mpirun -np 2 a.out > stdout
[domain-name-embargoed:26078] 1 more process has sent help message 
help-mpi-api.txt / mpi-abort
[domain-name-embargoed:26078] Set MCA parameter 
"orte_base_help_aggregate" to 0 to see all help / error messages

$ cat stdout
hello from 0
hello from 1
-- 


MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
-- 


$

Tested 3.1.2, where this has been *somewhat* fixed:

$ /src/ompi-3.1.2/bin/mpicxx test_using_mpi_abort.cpp
$ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout
-- 


MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
-- 

[domain-name-embargoed:19784] 1 more process has sent help message 
help-mpi-api.txt / mpi-abort
[domain-name-embargoed:19784] Set MCA parameter 
"orte_base_help_aggregate" to 0 to see all help / error messages

$ cat stdout
hello from 1
hello from 0
$

But the originally reported error still goes to stdout:

$ /src/ompi-3.1.2/bin/mpicxx test_without_mpi_abort.cpp
$ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout
-- 

mpirun detected that one or more processes exited with non-zero 
status, thus causing

th

Re: [OMPI users] stdout/stderr question

2018-09-10 Thread Gilles Gouaillardet

It seems I got it wrong :-(


Can you please give the attached patch a try ?


FWIW, an other option would be to opal_output(orte_help_output, ...) but 
we would have to make orte_help_output "public first.



Cheers,


Gilles




On 9/11/2018 11:14 AM, emre brookes wrote:

Gilles Gouaillardet wrote:
I investigated a this a bit and found that the (latest ?) v3 branches 
have the expected behavior


(e.g. the error messages is sent to stderr)


Since it is very unlikely Open MPI 2.1 will ever be updated, I can 
simply encourage you to upgrade to a newer Open MPI version.


Latest fully supported versions are currently such as 3.1.2 or 3.0.2



Cheers,

Gilles



So you tested 3.1.2 or something newer with this error?


But the originally reported error still goes to stdout:

$ /src/ompi-3.1.2/bin/mpicxx test_without_mpi_abort.cpp
$ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout
-- 

mpirun detected that one or more processes exited with non-zero 
status, thus causing

the job to be terminated. The first process to do so was:

  Process name: [[22380,1],0]
  Exit code:    255
-- 


$ cat stdout
hello from 0
hello from 1
---
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
---
$

-Emre





On 9/11/2018 2:27 AM, Ralph H Castain wrote:
I’m not sure why this would be happening. These error outputs go 
through the “show_help” functionality, and we specifically target it 
at stderr:


 /* create an output stream for us */
 OBJ_CONSTRUCT(&lds, opal_output_stream_t);
 lds.lds_want_stderr = true;
 orte_help_output = opal_output_open(&lds);

Jeff: is it possible the opal_output system is ignoring the request 
and pushing it to stdout??

Ralph



On Sep 5, 2018, at 4:11 AM, emre brookes  wrote:

Thanks Gilles,

My goal is to separate openmpi errors from the stdout of the MPI 
program itself so that errors can be identified externally (in 
particular in an external framework running MPI jobs from various 
developers).


My not so "well written MPI program" was doing this:
   MPI_Finalize();
   exit( errorcode );
Which I assume you are telling me was bad practice & will replace with
   MPI_Abort( MPI_COMM_WORLD, errorcode );
   MPI_Finalize();
   exit( errorcode );
I was previously a bit put off of MPI_Abort due to the vagueness of 
the man page:

_Description_
This routine makes a "best attempt" to abort all tasks in the 
group of comm. This function does not require that the invoking 
environment take any action with the error code. However, a UNIX 
or POSIX environment should handle this as a return errorcode from 
the main program or an abort (errorcode).
& I didn't really have an MPI issue to "Abort", but had used this 
for a user input or parameter issue.

Nevertheless, I accept your best practice recommendation.

It was not only the originally reported message, other messages 
went to stdout.
Initially used the Ubuntu 16 LTS  "$ apt install openmpi-bin 
libopenmpi-dev" which got me version (1.10.2),
but this morning compiled and tested 2.1.5, with the same behavior, 
e.g.:


$ /src/ompi-2.1.5/bin/mpicxx test_using_mpi_abort.cpp
$ /src/ompi-2.1.5/bin/mpirun -np 2 a.out > stdout
[domain-name-embargoed:26078] 1 more process has sent help message 
help-mpi-api.txt / mpi-abort
[domain-name-embargoed:26078] Set MCA parameter 
"orte_base_help_aggregate" to 0 to see all help / error messages

$ cat stdout
hello from 0
hello from 1
-- 


MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
-- 


$

Tested 3.1.2, where this has been *somewhat* fixed:

$ /src/ompi-3.1.2/bin/mpicxx test_using_mpi_abort.cpp
$ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout
-- 


MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
-- 

[domain-name-embargoed:19784] 1 more process has sent help message 
help-mpi-api.txt / mpi-abort
[domain-name-embargoed:19784] Set MCA parameter 
"orte_base_help_aggregate" to 0 to see all help / error messages

$ cat stdout
hello from 1
hello from 0
$

But the originally reported error still goes to stdout

Re: [OMPI users] stdout/stderr question

2018-09-10 Thread Ralph H Castain
Looks like there is a place in orte/mca/state/state_base_fns.c:850 that also 
outputs to orte_clean_output instead of using show_help. Outside of those two 
places, everything else seems to go to show_help.


> On Sep 10, 2018, at 8:58 PM, Gilles Gouaillardet  wrote:
> 
> It seems I got it wrong :-(
> 
> 
> Can you please give the attached patch a try ?
> 
> 
> FWIW, an other option would be to opal_output(orte_help_output, ...) but we 
> would have to make orte_help_output "public first.
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> 
> 
> 
> On 9/11/2018 11:14 AM, emre brookes wrote:
>> Gilles Gouaillardet wrote:
>>> I investigated a this a bit and found that the (latest ?) v3 branches have 
>>> the expected behavior
>>> 
>>> (e.g. the error messages is sent to stderr)
>>> 
>>> 
>>> Since it is very unlikely Open MPI 2.1 will ever be updated, I can simply 
>>> encourage you to upgrade to a newer Open MPI version.
>>> 
>>> Latest fully supported versions are currently such as 3.1.2 or 3.0.2
>>> 
>>> 
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> 
>> So you tested 3.1.2 or something newer with this error?
>> 
>>> But the originally reported error still goes to stdout:
>>> 
>>> $ /src/ompi-3.1.2/bin/mpicxx test_without_mpi_abort.cpp
>>> $ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout
>>> -- 
>>> mpirun detected that one or more processes exited with non-zero status, 
>>> thus causing
>>> the job to be terminated. The first process to do so was:
>>> 
>>>   Process name: [[22380,1],0]
>>>   Exit code:255
>>> -- 
>>> $ cat stdout
>>> hello from 0
>>> hello from 1
>>> ---
>>> Primary job  terminated normally, but 1 process returned
>>> a non-zero exit code. Per user-direction, the job has been aborted.
>>> ---
>>> $
>> -Emre
>> 
>> 
>> 
>>> 
>>> On 9/11/2018 2:27 AM, Ralph H Castain wrote:
 I’m not sure why this would be happening. These error outputs go through 
 the “show_help” functionality, and we specifically target it at stderr:
 
  /* create an output stream for us */
  OBJ_CONSTRUCT(&lds, opal_output_stream_t);
  lds.lds_want_stderr = true;
  orte_help_output = opal_output_open(&lds);
 
 Jeff: is it possible the opal_output system is ignoring the request and 
 pushing it to stdout??
 Ralph
 
 
> On Sep 5, 2018, at 4:11 AM, emre brookes  wrote:
> 
> Thanks Gilles,
> 
> My goal is to separate openmpi errors from the stdout of the MPI program 
> itself so that errors can be identified externally (in particular in an 
> external framework running MPI jobs from various developers).
> 
> My not so "well written MPI program" was doing this:
>MPI_Finalize();
>exit( errorcode );
> Which I assume you are telling me was bad practice & will replace with
>MPI_Abort( MPI_COMM_WORLD, errorcode );
>MPI_Finalize();
>exit( errorcode );
> I was previously a bit put off of MPI_Abort due to the vagueness of the 
> man page:
>> _Description_
>> This routine makes a "best attempt" to abort all tasks in the group of 
>> comm. This function does not require that the invoking environment take 
>> any action with the error code. However, a UNIX or POSIX environment 
>> should handle this as a return errorcode from the main program or an 
>> abort (errorcode).
> & I didn't really have an MPI issue to "Abort", but had used this for a 
> user input or parameter issue.
> Nevertheless, I accept your best practice recommendation.
> 
> It was not only the originally reported message, other messages went to 
> stdout.
> Initially used the Ubuntu 16 LTS  "$ apt install openmpi-bin 
> libopenmpi-dev" which got me version (1.10.2),
> but this morning compiled and tested 2.1.5, with the same behavior, e.g.:
> 
> $ /src/ompi-2.1.5/bin/mpicxx test_using_mpi_abort.cpp
> $ /src/ompi-2.1.5/bin/mpirun -np 2 a.out > stdout
> [domain-name-embargoed:26078] 1 more process has sent help message 
> help-mpi-api.txt / mpi-abort
> [domain-name-embargoed:26078] Set MCA parameter 
> "orte_base_help_aggregate" to 0 to see all help / error messages
> $ cat stdout
> hello from 0
> hello from 1
> --
>  
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode -1.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --
>  
> $
> 
> Teste