Okay, I have this fixed and the man page updated as of r20396.

Thanks again for finding and reporting this bug!

Ralph


On Feb 2, 2009, at 5:55 AM, Ralph Castain wrote:

Hmnmm...well, it shouldn't crash (so I'll have to fix that), but it should fail. The --report-pid option takes an argument, which wasn't provided here. I'll check the man page to ensure it is up-to-date.

What it should tell you is that --report-pid takes either a '-' to indicate that the pid should be output to stdout, a '+' for stderr, or a filename.

Thanks for smoke testing it!
Ralph

On Feb 2, 2009, at 3:06 AM, jody wrote:

Hi Ralph
one more thing i noticed while trying out orte_iof again.

The option --report-pid crashes mpirun:
[jody@localhost neander]$ mpirun -report-pid -np 2 ./MPITest
[localhost:31146] *** Process received signal ***
[localhost:31146] Signal: Segmentation fault (11)
[localhost:31146] Signal code: Address not mapped (1)
[localhost:31146] Failing at address: 0x24
[localhost:31146] [ 0] [0x11040c]
[localhost:31146] [ 1] /opt/openmpi/lib/openmpi/mca_odls_default.so [0x1e8f9d]
[localhost:31146] [ 2]
/opt/openmpi/lib/libopen-rte.so.0(orte_daemon_cmd_processor+0x4d1)
[0x132541]
[localhost:31146] [ 3] /opt/openmpi/lib/libopen-pal.so.0 [0x170248]
[localhost:31146] [ 4]
/opt/openmpi/lib/libopen-pal.so.0(opal_event_loop+0x27) [0x170497]
[localhost:31146] [ 5]
/opt/openmpi/lib/libopen-pal.so.0(opal_progress+0xcb) [0x16399b]
[localhost:31146] [ 6]
/opt/openmpi/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x30d)
[0x1441ad]
[localhost:31146] [ 7] /opt/openmpi/lib/openmpi/mca_plm_rsh.so [0x1c833b]
[localhost:31146] [ 8] mpirun [0x804acf6]
[localhost:31146] [ 9] mpirun [0x804a0a6]
[localhost:31146] [10] /lib/libc.so.6(__libc_start_main+0xe0) [0x98d390]
[localhost:31146] [11] mpirun [0x8049fd1]
[localhost:31146] *** End of error message ***
Segmentation fault

This always happens, irrespective of the number of processes,
or whether locally only or with remote machines.

Jody

On Mon, Feb 2, 2009 at 10:55 AM, jody <jody....@gmail.com> wrote:
Hi Ralph
The new options are great stuff!
Following your suggestion, i downloaded and installed

http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r20392.tar.gz

and tested the new options. (i have a simple cluster of
8 machines over tcp). Not everything worked as specified, though:
* timestamp-output : works
* xterm : doesn't work completely -
comma-separated rank list:
Only for the local processes a xterm is opened. The other processes
(the ones on remote machines) only output to the stdout of the
calling window.
(Just to be sure i started my own script for opening separate xterms
- that did work for the remoties, too)

If a '-1' is given instead of a list of ranks, it fails (locally &
with remotes):
  [jody@localhost neander]$  mpirun -np 4 --xterm -1 ./MPITest
--------------------------------------------------------------------------
  Sorry!  You were supposed to get help about:
      orte-odls-base:xterm-rank-out-of-bounds
  from the file:
      help-odls-base.txt
  But I couldn't find any file matching that name.  Sorry!
-------------------------------------------------------------------------- --------------------------------------------------------------------------
  mpirun was unable to start the specified application as it
encountered an error
  on node localhost. More information may be available above.
--------------------------------------------------------------------------
* output-filename : doesn't work here:
[jody@localhost neander]$ mpirun -np 4 --output-filename gnagna ./MPITest
 [jody@localhost neander]$ ls -l gna*
 -rw-r--r-- 1 jody morpho 549 2009-02-02 09:07 gnagna.%10lu

There is output from the processes on remote machines on stdout, but none
 from the local ones.


A question about installing: i installed the usual way (configure,
make all install),
but the new man-files apparently weren't copied to their destination:
If i do 'man mpirun' i get shown the contents of an old man-file
(without the new options).
I had to do ' less /opt//openmpi-1.4a1r20394/share/man/man1/ mpirun.1'
to see them.

About the xterm-option : when the application ends all xterms are
closed immediately.
(when doing things 'by hand' i used the -hold option for xterm)
Would it be possible to add this feature for your xterm option?
Perhaps by adding a '!' at the end of the rank list?

About orte_iof: with the new version it works, but no matter which
rank i specify,
it only prints out rank0's output:
[jody@localhost ~]$ orte-iof --pid 31049   --rank 4 --stdout
[localhost]I am #0/9 before the barrier



Thanks

Jody

On Sun, Feb 1, 2009 at 10:49 PM, Ralph Castain <r...@lanl.gov> wrote:
I'm afraid we discovered a bug in optimized builds with r20392. Please use
any tarball with r20394 or above.

Sorry for the confusion
Ralph


On Feb 1, 2009, at 5:27 AM, Jeff Squyres wrote:

On Jan 31, 2009, at 11:39 AM, Ralph Castain wrote:

For anyone following this thread:

I have completed the IOF options discussed below. Specifically, I have
added the following:

* a new "timestamp-output" option that timestamp's each line of output

* a new "output-filename" option that redirects each proc's output to a
separate rank-named file.

* a new "xterm" option that redirects the output of the specified ranks
to a separate xterm window.

You can obtain a copy of the updated code at:

http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r20392.tar.gz

Sweet stuff.  :-)

Note that the URL/tarball that Ralph cites is a nightly snapshot and will expire after a while -- we only keep the most 5 recent nightly tarballs available. You can find Ralph's new IOF stuff in any 1.4a1 nightly tarball after the one he cited above. Note that the last part of the tarball name refers to the subversion commit number (which increases monotonically); any 1.4 nightly snapshot tarball beyond "r20392" will contain this new IOF
stuff.  Here's where to get our nightly snapshot tarballs:

http://www.open-mpi.org/nightly/trunk/

Don't read anything into the "1.4" version number -- we've just bumped the version number internally to be different than the current stable series (1.3). We haven't yet branched for the v1.4 series; hence, "1.4a1"
currently refers to our development trunk.

--
Jeff Squyres
Cisco Systems

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to