Looks like there is a place in orte/mca/state/state_base_fns.c:850 that also outputs to orte_clean_output instead of using show_help. Outside of those two places, everything else seems to go to show_help.
> On Sep 10, 2018, at 8:58 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > > It seems I got it wrong :-( > > > Can you please give the attached patch a try ? > > > FWIW, an other option would be to opal_output(orte_help_output, ...) but we > would have to make orte_help_output "public first. > > > Cheers, > > > Gilles > > > > > On 9/11/2018 11:14 AM, emre brookes wrote: >> Gilles Gouaillardet wrote: >>> I investigated a this a bit and found that the (latest ?) v3 branches have >>> the expected behavior >>> >>> (e.g. the error messages is sent to stderr) >>> >>> >>> Since it is very unlikely Open MPI 2.1 will ever be updated, I can simply >>> encourage you to upgrade to a newer Open MPI version. >>> >>> Latest fully supported versions are currently such as 3.1.2 or 3.0.2 >>> >>> >>> >>> Cheers, >>> >>> Gilles >>> >>> >> So you tested 3.1.2 or something newer with this error? >> >>> But the originally reported error still goes to stdout: >>> >>> $ /src/ompi-3.1.2/bin/mpicxx test_without_mpi_abort.cpp >>> $ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout >>> -------------------------------------------------------------------------- >>> mpirun detected that one or more processes exited with non-zero status, >>> thus causing >>> the job to be terminated. The first process to do so was: >>> >>> Process name: [[22380,1],0] >>> Exit code: 255 >>> -------------------------------------------------------------------------- >>> $ cat stdout >>> hello from 0 >>> hello from 1 >>> ------------------------------------------------------- >>> Primary job terminated normally, but 1 process returned >>> a non-zero exit code. Per user-direction, the job has been aborted. >>> ------------------------------------------------------- >>> $ >> -Emre >> >> >> >>> >>> On 9/11/2018 2:27 AM, Ralph H Castain wrote: >>>> I’m not sure why this would be happening. These error outputs go through >>>> the “show_help” functionality, and we specifically target it at stderr: >>>> >>>> /* create an output stream for us */ >>>> OBJ_CONSTRUCT(&lds, opal_output_stream_t); >>>> lds.lds_want_stderr = true; >>>> orte_help_output = opal_output_open(&lds); >>>> >>>> Jeff: is it possible the opal_output system is ignoring the request and >>>> pushing it to stdout?? >>>> Ralph >>>> >>>> >>>>> On Sep 5, 2018, at 4:11 AM, emre brookes <broo...@uthscsa.edu> wrote: >>>>> >>>>> Thanks Gilles, >>>>> >>>>> My goal is to separate openmpi errors from the stdout of the MPI program >>>>> itself so that errors can be identified externally (in particular in an >>>>> external framework running MPI jobs from various developers). >>>>> >>>>> My not so "well written MPI program" was doing this: >>>>> MPI_Finalize(); >>>>> exit( errorcode ); >>>>> Which I assume you are telling me was bad practice & will replace with >>>>> MPI_Abort( MPI_COMM_WORLD, errorcode ); >>>>> MPI_Finalize(); >>>>> exit( errorcode ); >>>>> I was previously a bit put off of MPI_Abort due to the vagueness of the >>>>> man page: >>>>>> _Description_ >>>>>> This routine makes a "best attempt" to abort all tasks in the group of >>>>>> comm. This function does not require that the invoking environment take >>>>>> any action with the error code. However, a UNIX or POSIX environment >>>>>> should handle this as a return errorcode from the main program or an >>>>>> abort (errorcode). >>>>> & I didn't really have an MPI issue to "Abort", but had used this for a >>>>> user input or parameter issue. >>>>> Nevertheless, I accept your best practice recommendation. >>>>> >>>>> It was not only the originally reported message, other messages went to >>>>> stdout. >>>>> Initially used the Ubuntu 16 LTS "$ apt install openmpi-bin >>>>> libopenmpi-dev" which got me version (1.10.2), >>>>> but this morning compiled and tested 2.1.5, with the same behavior, e.g.: >>>>> >>>>> $ /src/ompi-2.1.5/bin/mpicxx test_using_mpi_abort.cpp >>>>> $ /src/ompi-2.1.5/bin/mpirun -np 2 a.out > stdout >>>>> [domain-name-embargoed:26078] 1 more process has sent help message >>>>> help-mpi-api.txt / mpi-abort >>>>> [domain-name-embargoed:26078] Set MCA parameter >>>>> "orte_base_help_aggregate" to 0 to see all help / error messages >>>>> $ cat stdout >>>>> hello from 0 >>>>> hello from 1 >>>>> -------------------------------------------------------------------------- >>>>> >>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>>>> with errorcode -1. >>>>> >>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>> You may or may not see output from other processes, depending on >>>>> exactly when Open MPI kills them. >>>>> -------------------------------------------------------------------------- >>>>> >>>>> $ >>>>> >>>>> Tested 3.1.2, where this has been *somewhat* fixed: >>>>> >>>>> $ /src/ompi-3.1.2/bin/mpicxx test_using_mpi_abort.cpp >>>>> $ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout >>>>> -------------------------------------------------------------------------- >>>>> >>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>>>> with errorcode -1. >>>>> >>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>> You may or may not see output from other processes, depending on >>>>> exactly when Open MPI kills them. >>>>> -------------------------------------------------------------------------- >>>>> >>>>> [domain-name-embargoed:19784] 1 more process has sent help message >>>>> help-mpi-api.txt / mpi-abort >>>>> [domain-name-embargoed:19784] Set MCA parameter >>>>> "orte_base_help_aggregate" to 0 to see all help / error messages >>>>> $ cat stdout >>>>> hello from 1 >>>>> hello from 0 >>>>> $ >>>>> >>>>> But the originally reported error still goes to stdout: >>>>> >>>>> $ /src/ompi-3.1.2/bin/mpicxx test_without_mpi_abort.cpp >>>>> $ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout >>>>> -------------------------------------------------------------------------- >>>>> >>>>> mpirun detected that one or more processes exited with non-zero status, >>>>> thus causing >>>>> the job to be terminated. The first process to do so was: >>>>> >>>>> Process name: [[22380,1],0] >>>>> Exit code: 255 >>>>> -------------------------------------------------------------------------- >>>>> >>>>> $ cat stdout >>>>> hello from 0 >>>>> hello from 1 >>>>> ------------------------------------------------------- >>>>> Primary job terminated normally, but 1 process returned >>>>> a non-zero exit code. Per user-direction, the job has been aborted. >>>>> ------------------------------------------------------- >>>>> $ >>>>> >>>>> Summary: >>>>> 1.10.2, 2.1.5 both send most openmpi generated messages to stdout. >>>>> 3.1.2 sends at least one type of openmpi generated messages to stdout. >>>>> I'll continue with my "wrapper" strategy for now, as it seems safest and >>>>> most broadly deployable [e.g. on compute resources where I need to use >>>>> admin installed versions of MPI], >>>>> but it would be nice for openmpi to ensure all generated messages end up >>>>> in stderr. >>>>> >>>>> -Emre >>>>> >>>>> Gilles Gouaillardet wrote: >>>>>> Open MPI should likely write this message on stderr, I will have a look >>>>>> at that. >>>>>> >>>>>> >>>>>> That being said, and though I have no intention to dodge the question, >>>>>> this case should not happen. >>>>>> >>>>>> A well written (MPI) program should either exit(0) or have main() return >>>>>> 0, so this scenario >>>>>> >>>>>> (e.g. all MPI tasks call MPI_Finalize() and then at least one MPI task >>>>>> exit with a non zero error code) >>>>>> >>>>>> should not happen. >>>>>> >>>>>> >>>>>> If your program might fail, it should call MPI_Abort() with a non zero >>>>>> error code *before* calling MPI_Finalize(). >>>>>> >>>>>> note this error can occur if your main() subroutine does not return any >>>>>> value (e.g. it returns an undefined value, that might be non zero) >>>>>> >>>>>> >>>>>> Cheers, >>>>>> >>>>>> >>>>>> Gilles >>>>>> >>>>>> >>>>>> On 9/5/2018 6:08 AM, emre brookes wrote: >>>>>>> Background: >>>>>>> --- >>>>>>> Running on ubuntu 16.04 with apt install openmpi-bin libopenmpi-dev >>>>>>> $ mpirun --version >>>>>>> mpirun (Open MPI) 1.10.2 >>>>>>> >>>>>>> I did search thru the docs a bit (ok, maybe I missed something obvious, >>>>>>> my apologies if so) >>>>>>> --- >>>>>>> Question: >>>>>>> >>>>>>> Is there some setting to turn off the extra messages generated by >>>>>>> openmpi ? >>>>>>> >>>>>>> e.g. >>>>>>> $ mpirun -np 2 my_job > my_job.stdout >>>>>>> adds this message to my_job.stdout >>>>>>> ------------------------------------------------------- >>>>>>> Primary job terminated normally, but 1 process returned >>>>>>> a non-zero exit code.. Per user-direction, the job has been aborted. >>>>>>> ------------------------------------------------------- >>>>>>> which strangely goes to stdout and not stderr. >>>>>>> I would intuitively expect error or notice messages to go to stderr. >>>>>>> Is there a way to redirect these messages to stderr or some specified >>>>>>> file? >>>>>>> >>>>>>> I need to separate this from the collected stdout of the job processes >>>>>>> themselves. >>>>>>> >>>>>>> Somewhat kludgy options that come to mind: >>>>>>> >>>>>>> 1. I can use --output-filename outfile, which does separate the >>>>>>> "openmpi" messages, >>>>>>> but this creates a file for each process and I'd rather keep them as >>>>>>> produced in one file, >>>>>>> but without any messages from openmpi, which I'd like to keep >>>>>>> separately. >>>>>>> >>>>>>> 2. Or I could write a script to filter the output and separate. A bit >>>>>>> risky as someone could conceivably put something that looks like a >>>>>>> openmpi message pattern in the mpi executable output. >>>>>>> >>>>>>> 3. hack the source code of openmpi. >>>>>>> >>>>>>> Any suggestions as to a more elegant or standard way of dealing with >>>>>>> this? >>>>>>> >>>>>>> TIA, >>>>>>> Emre. >>>>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@lists.open-mpi.org >>>>> https://lists.open-mpi.org/mailman/listinfo/users >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://lists.open-mpi.org/mailman/listinfo/users >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > > <default_hnp_abort.diff>_______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users