Gilles: Can you submit a PR to fix these 2 places? Thanks!
> On Sep 11, 2018, at 9:10 AM, emre brookes <broo...@uthscsa.edu> wrote: > > Gilles Gouaillardet wrote: >> It seems I got it wrong :-( > Ah, you've joined the rest of us :) >> >> Can you please give the attached patch a try ? >> > Working with a git clone of 3.1.x, patch applied > > $ /src/ompi-3.1.x/bin/mpicxx test.cpp > $ /src/ompi-3.1.x/bin/mpirun a.out > stdout > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun detected that one or more processes exited with non-zero status, thus > causing > the job to be terminated. The first process to do so was: > > Process name: [[2667,1],2] > Exit code: 255 > -------------------------------------------------------------------------- > $ cat stdout > hello from 1 > hello from 2 > hello from 3 > hello from 5 > hello from 0 > hello from 4 > $ > > Works correctly for this error message. > > Thanks, > -Emre > >> >> FWIW, an other option would be to opal_output(orte_help_output, ...) but we >> would have to make orte_help_output "public first. >> >> >> Cheers, >> >> >> Gilles >> >> >> >> >> On 9/11/2018 11:14 AM, emre brookes wrote: >>> Gilles Gouaillardet wrote: >>>> I investigated a this a bit and found that the (latest ?) v3 branches have >>>> the expected behavior >>>> >>>> (e.g. the error messages is sent to stderr) >>>> >>>> >>>> Since it is very unlikely Open MPI 2.1 will ever be updated, I can simply >>>> encourage you to upgrade to a newer Open MPI version. >>>> >>>> Latest fully supported versions are currently such as 3.1.2 or 3.0.2 >>>> >>>> >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> >>> So you tested 3.1.2 or something newer with this error? >>> >>>> But the originally reported error still goes to stdout: >>>> >>>> $ /src/ompi-3.1.2/bin/mpicxx test_without_mpi_abort.cpp >>>> $ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout >>>> -------------------------------------------------------------------------- >>>> mpirun detected that one or more processes exited with non-zero status, >>>> thus causing >>>> the job to be terminated. The first process to do so was: >>>> >>>> Process name: [[22380,1],0] >>>> Exit code: 255 >>>> -------------------------------------------------------------------------- >>>> $ cat stdout >>>> hello from 0 >>>> hello from 1 >>>> ------------------------------------------------------- >>>> Primary job terminated normally, but 1 process returned >>>> a non-zero exit code. Per user-direction, the job has been aborted. >>>> ------------------------------------------------------- >>>> $ >>> -Emre >>> >>> >>> >>>> >>>> On 9/11/2018 2:27 AM, Ralph H Castain wrote: >>>>> I’m not sure why this would be happening. These error outputs go through >>>>> the “show_help” functionality, and we specifically target it at stderr: >>>>> >>>>> /* create an output stream for us */ >>>>> OBJ_CONSTRUCT(&lds, opal_output_stream_t); >>>>> lds.lds_want_stderr = true; >>>>> orte_help_output = opal_output_open(&lds); >>>>> >>>>> Jeff: is it possible the opal_output system is ignoring the request and >>>>> pushing it to stdout?? >>>>> Ralph >>>>> >>>>> >>>>>> On Sep 5, 2018, at 4:11 AM, emre brookes <broo...@uthscsa.edu> wrote: >>>>>> >>>>>> Thanks Gilles, >>>>>> >>>>>> My goal is to separate openmpi errors from the stdout of the MPI program >>>>>> itself so that errors can be identified externally (in particular in an >>>>>> external framework running MPI jobs from various developers). >>>>>> >>>>>> My not so "well written MPI program" was doing this: >>>>>> MPI_Finalize(); >>>>>> exit( errorcode ); >>>>>> Which I assume you are telling me was bad practice & will replace with >>>>>> MPI_Abort( MPI_COMM_WORLD, errorcode ); >>>>>> MPI_Finalize(); >>>>>> exit( errorcode ); >>>>>> I was previously a bit put off of MPI_Abort due to the vagueness of the >>>>>> man page: >>>>>>> _Description_ >>>>>>> This routine makes a "best attempt" to abort all tasks in the group of >>>>>>> comm. This function does not require that the invoking environment take >>>>>>> any action with the error code. However, a UNIX or POSIX environment >>>>>>> should handle this as a return errorcode from the main program or an >>>>>>> abort (errorcode). >>>>>> & I didn't really have an MPI issue to "Abort", but had used this for a >>>>>> user input or parameter issue. >>>>>> Nevertheless, I accept your best practice recommendation. >>>>>> >>>>>> It was not only the originally reported message, other messages went to >>>>>> stdout. >>>>>> Initially used the Ubuntu 16 LTS "$ apt install openmpi-bin >>>>>> libopenmpi-dev" which got me version (1.10.2), >>>>>> but this morning compiled and tested 2.1.5, with the same behavior, e.g.: >>>>>> >>>>>> $ /src/ompi-2.1.5/bin/mpicxx test_using_mpi_abort.cpp >>>>>> $ /src/ompi-2.1.5/bin/mpirun -np 2 a.out > stdout >>>>>> [domain-name-embargoed:26078] 1 more process has sent help message >>>>>> help-mpi-api.txt / mpi-abort >>>>>> [domain-name-embargoed:26078] Set MCA parameter >>>>>> "orte_base_help_aggregate" to 0 to see all help / error messages >>>>>> $ cat stdout >>>>>> hello from 0 >>>>>> hello from 1 >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>>>>> with errorcode -1. >>>>>> >>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>>> You may or may not see output from other processes, depending on >>>>>> exactly when Open MPI kills them. >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> $ >>>>>> >>>>>> Tested 3.1.2, where this has been *somewhat* fixed: >>>>>> >>>>>> $ /src/ompi-3.1.2/bin/mpicxx test_using_mpi_abort.cpp >>>>>> $ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>>>>> with errorcode -1. >>>>>> >>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>>> You may or may not see output from other processes, depending on >>>>>> exactly when Open MPI kills them. >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> [domain-name-embargoed:19784] 1 more process has sent help message >>>>>> help-mpi-api.txt / mpi-abort >>>>>> [domain-name-embargoed:19784] Set MCA parameter >>>>>> "orte_base_help_aggregate" to 0 to see all help / error messages >>>>>> $ cat stdout >>>>>> hello from 1 >>>>>> hello from 0 >>>>>> $ >>>>>> >>>>>> But the originally reported error still goes to stdout: >>>>>> >>>>>> $ /src/ompi-3.1.2/bin/mpicxx test_without_mpi_abort.cpp >>>>>> $ /src/ompi-3.1.2/bin/mpirun -np 2 a.out > stdout >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> mpirun detected that one or more processes exited with non-zero status, >>>>>> thus causing >>>>>> the job to be terminated. The first process to do so was: >>>>>> >>>>>> Process name: [[22380,1],0] >>>>>> Exit code: 255 >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> $ cat stdout >>>>>> hello from 0 >>>>>> hello from 1 >>>>>> ------------------------------------------------------- >>>>>> Primary job terminated normally, but 1 process returned >>>>>> a non-zero exit code. Per user-direction, the job has been aborted. >>>>>> ------------------------------------------------------- >>>>>> $ >>>>>> >>>>>> Summary: >>>>>> 1.10.2, 2.1.5 both send most openmpi generated messages to stdout. >>>>>> 3.1.2 sends at least one type of openmpi generated messages to stdout. >>>>>> I'll continue with my "wrapper" strategy for now, as it seems safest and >>>>>> most broadly deployable [e.g. on compute resources where I need to use >>>>>> admin installed versions of MPI], >>>>>> but it would be nice for openmpi to ensure all generated messages end up >>>>>> in stderr. >>>>>> >>>>>> -Emre >>>>>> >>>>>> Gilles Gouaillardet wrote: >>>>>>> Open MPI should likely write this message on stderr, I will have a look >>>>>>> at that. >>>>>>> >>>>>>> >>>>>>> That being said, and though I have no intention to dodge the question, >>>>>>> this case should not happen. >>>>>>> >>>>>>> A well written (MPI) program should either exit(0) or have main() >>>>>>> return 0, so this scenario >>>>>>> >>>>>>> (e.g. all MPI tasks call MPI_Finalize() and then at least one MPI task >>>>>>> exit with a non zero error code) >>>>>>> >>>>>>> should not happen. >>>>>>> >>>>>>> >>>>>>> If your program might fail, it should call MPI_Abort() with a non zero >>>>>>> error code *before* calling MPI_Finalize(). >>>>>>> >>>>>>> note this error can occur if your main() subroutine does not return any >>>>>>> value (e.g. it returns an undefined value, that might be non zero) >>>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> >>>>>>> Gilles >>>>>>> >>>>>>> >>>>>>> On 9/5/2018 6:08 AM, emre brookes wrote: >>>>>>>> Background: >>>>>>>> --- >>>>>>>> Running on ubuntu 16.04 with apt install openmpi-bin libopenmpi-dev >>>>>>>> $ mpirun --version >>>>>>>> mpirun (Open MPI) 1.10.2 >>>>>>>> >>>>>>>> I did search thru the docs a bit (ok, maybe I missed something >>>>>>>> obvious, my apologies if so) >>>>>>>> --- >>>>>>>> Question: >>>>>>>> >>>>>>>> Is there some setting to turn off the extra messages generated by >>>>>>>> openmpi ? >>>>>>>> >>>>>>>> e.g. >>>>>>>> $ mpirun -np 2 my_job > my_job.stdout >>>>>>>> adds this message to my_job.stdout >>>>>>>> ------------------------------------------------------- >>>>>>>> Primary job terminated normally, but 1 process returned >>>>>>>> a non-zero exit code.. Per user-direction, the job has been aborted. >>>>>>>> ------------------------------------------------------- >>>>>>>> which strangely goes to stdout and not stderr. >>>>>>>> I would intuitively expect error or notice messages to go to stderr. >>>>>>>> Is there a way to redirect these messages to stderr or some specified >>>>>>>> file? >>>>>>>> >>>>>>>> I need to separate this from the collected stdout of the job processes >>>>>>>> themselves. >>>>>>>> >>>>>>>> Somewhat kludgy options that come to mind: >>>>>>>> >>>>>>>> 1. I can use --output-filename outfile, which does separate the >>>>>>>> "openmpi" messages, >>>>>>>> but this creates a file for each process and I'd rather keep them as >>>>>>>> produced in one file, >>>>>>>> but without any messages from openmpi, which I'd like to keep >>>>>>>> separately. >>>>>>>> >>>>>>>> 2. Or I could write a script to filter the output and separate. A bit >>>>>>>> risky as someone could conceivably put something that looks like a >>>>>>>> openmpi message pattern in the mpi executable output. >>>>>>>> >>>>>>>> 3. hack the source code of openmpi. >>>>>>>> >>>>>>>> Any suggestions as to a more elegant or standard way of dealing with >>>>>>>> this? >>>>>>>> >>>>>>>> TIA, >>>>>>>> Emre. >>>>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> users@lists.open-mpi.org >>>>>> https://lists.open-mpi.org/mailman/listinfo/users >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@lists.open-mpi.org >>>>> https://lists.open-mpi.org/mailman/listinfo/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://lists.open-mpi.org/mailman/listinfo/users >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users