Hi Ralph Thanks for the fixes and the "!". --xterm: The "!" works, but i still don't have any xterms from my remote nodes even with all my xhost+ and -x DISPLAY tricks explained below :(
--output-filename It creates files, but only for the local processes: [jody@localhost neander]$ mpirun -np 8 -hostfile testhosts --output-filename gnana ./MPITest ... output ... [jody@localhost neander]$ ls -l gna* -rw-r--r-- 1 jody morpho 549 2009-02-03 18:02 gnana.0 -rw-r--r-- 1 jody morpho 549 2009-02-03 18:02 gnana.1 -rw-r--r-- 1 jody morpho 549 2009-02-03 18:02 gnana.2 ( i set slots=3 on my workstation) --- Regarding xterms - i'm also no big expert on xterms, but i managed to get things working for my environment... Generally, in order to enable X-forwarding, i *would* set the option X11Fowarding yes in the /etc/ssh/sshd_config on the server, and X11Fowarding yes in the /etc/ssh/ssh_config on the client. I say 'would', because to actually use x forwarding you need to call ssh with the '-X' option. Correct me if i'm wrong, but i suspect the -X option is not used when Open-MPI makes a connection. So what i currently do to have my xterms running: on my workstation i call xhost + <hostname> for all machines in my hostfile, to allow them to use X on my workstation. Then i set my DISPLAY variable to point to my workstation export DISPLAY=<mymachine>:0.0 Finally, i call mpirun with the -x option (to exports the DISPLAY variable to all nodes) : mpirun -np 4 -hostfile myfiles -x DISPLAY run_xterm.sh MyApplication arg1 arg2 Here run_xterm.sh is a shell script which creates a useful title for the xterm window and calls the application with all its arguments (-hold leaves the xterm open after the program terminates): #!/bin/sh -f # feedback for command line echo "Running on node `hostname`" # for version 1.2 use undocumented env variable # for version 1.3 use documented env variable export ID=$OMPI_COMM_WORLD_RANK if [ X$ID = X ]; then export ID=$OMPI_MCA_ns_nds_vpid fi export TITLE="node #$ID" # start terminal xterm -T "$TITLE" -hold -e $* exit 0 (i have similar scripts to run gdb or valgrind in xterm windows) I know that the 'xhost +' is a horror for certain sysadmins, but i feel quite safe, because the machines listed in my hostfile are not accessible from outside our department. I haven't found any other alternative to have nice xterms when i can't use 'ssh -X'. To come back to the '--xterm' option: i just ran my xterm-script after doing the above xhost+ and DISPLAY things, and it worked - all local and remote processes created their xterm windows. (In other words, the environment was set to have my remote nodes use xterms on my workstation.) Immediately thereafter i called the same application with mpirun -np 8 -hostfile testhosts --xterm 2,3,4,5! -x DISPLAY ./MPITest but still, only the local process (#2) created an xterm. Do you think it would be possible to have open MPI make its ssh-connections with '-X', or are there technical or security-related objections? Regards Jody On Mon, Feb 2, 2009 at 4:47 PM, Ralph Castain <r...@lanl.gov> wrote: > > On Feb 2, 2009, at 2:55 AM, jody wrote: > >> Hi Ralph >> The new options are great stuff! >> Following your suggestion, i downloaded and installed >> >> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r20392.tar.gz >> >> and tested the new options. (i have a simple cluster of >> 8 machines over tcp). Not everything worked as specified, though: >> * timestamp-output : works > > good! > >> >> * xterm : doesn't work completely - >> comma-separated rank list: >> Only for the local processes a xterm is opened. The other processes >> (the ones on remote machines) only output to the stdout of the >> calling window. >> (Just to be sure i started my own script for opening separate xterms >> - that did work for the remoties, too) > > This is a problem we wrestled with for some time. The issue is that we > really aren't comfortable modifying the DISPLAY envar on the remote nodes > like you do in your script. It is fine for a user to do whatever they want, > but for OMPI to do it...that's another matter. We can't even know for sure > what to do because of the wide range of scenarios that might occur (e.g., is > mpirun local to you, or on a remote node connected to you via xterm, > or...?). > > What you (the user) need to do is ensure that X11 is setup properly so that > an Xwindow opened on the remote host is displayed on your screen. In this > case, I believe you have to enable xforwarding - I'm not an xterm expert, so > I can't advise you on how to do this. Suspect you may already know - in > which case, can you please pass it along and I'll add it to our docs? :-) > >> >> >> If a '-1' is given instead of a list of ranks, it fails (locally & >> with remotes): >> [jody@localhost neander]$ mpirun -np 4 --xterm -1 ./MPITest >> >> -------------------------------------------------------------------------- >> Sorry! You were supposed to get help about: >> orte-odls-base:xterm-rank-out-of-bounds >> from the file: >> help-odls-base.txt >> But I couldn't find any file matching that name. Sorry! >> >> -------------------------------------------------------------------------- >> >> -------------------------------------------------------------------------- >> mpirun was unable to start the specified application as it >> encountered an error >> on node localhost. More information may be available above. >> >> -------------------------------------------------------------------------- > > > Fixed as of r20398 - this was a bug, had an if statement out of sequence. > > >> >> * output-filename : doesn't work here: >> [jody@localhost neander]$ mpirun -np 4 --output-filename gnagna >> ./MPITest >> [jody@localhost neander]$ ls -l gna* >> -rw-r--r-- 1 jody morpho 549 2009-02-02 09:07 gnagna.%10lu >> >> There is output from the processes on remote machines on stdout, but >> none >> from the local ones. > > Fixed as of r20400 - had a format statement syntax that was okay in some > compilers, but not others. > >> >> >> >> A question about installing: i installed the usual way (configure, >> make all install), >> but the new man-files apparently weren't copied to their destination: >> If i do 'man mpirun' i get shown the contents of an old man-file >> (without the new options). >> I had to do ' less /opt//openmpi-1.4a1r20394/share/man/man1/mpirun.1' >> to see them. > > Strange - the install should put them in the right place, but I wonder if > you updated your manpath to point at it? > >> >> >> About the xterm-option : when the application ends all xterms are >> closed immediately. >> (when doing things 'by hand' i used the -hold option for xterm) >> Would it be possible to add this feature for your xterm option? >> Perhaps by adding a '!' at the end of the rank list? > > Done! A "!" at the end of the list will activate -hold as of r20398. > >> >> >> About orte_iof: with the new version it works, but no matter which >> rank i specify, >> it only prints out rank0's output: >> [jody@localhost ~]$ orte-iof --pid 31049 --rank 4 --stdout >> [localhost]I am #0/9 before the barrier >> > > The problem here is that the option name changed from "rank" to "ranks" > since you can now specify any number of ranks as comma-separated ranges. I > have updated orte-iof so it will gracefully fail if you provide an > unrecognized cmd line option and output the "help" detailing the accepted > options. > > >> >> >> Thanks >> >> Jody >> >> On Sun, Feb 1, 2009 at 10:49 PM, Ralph Castain <r...@lanl.gov> wrote: >>> >>> I'm afraid we discovered a bug in optimized builds with r20392. Please >>> use >>> any tarball with r20394 or above. >>> >>> Sorry for the confusion >>> Ralph >>> >>> >>> On Feb 1, 2009, at 5:27 AM, Jeff Squyres wrote: >>> >>>> On Jan 31, 2009, at 11:39 AM, Ralph Castain wrote: >>>> >>>>> For anyone following this thread: >>>>> >>>>> I have completed the IOF options discussed below. Specifically, I have >>>>> added the following: >>>>> >>>>> * a new "timestamp-output" option that timestamp's each line of output >>>>> >>>>> * a new "output-filename" option that redirects each proc's output to a >>>>> separate rank-named file. >>>>> >>>>> * a new "xterm" option that redirects the output of the specified ranks >>>>> to a separate xterm window. >>>>> >>>>> You can obtain a copy of the updated code at: >>>>> >>>>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r20392.tar.gz >>>> >>>> Sweet stuff. :-) >>>> >>>> Note that the URL/tarball that Ralph cites is a nightly snapshot and >>>> will >>>> expire after a while -- we only keep the most 5 recent nightly tarballs >>>> available. You can find Ralph's new IOF stuff in any 1.4a1 nightly >>>> tarball >>>> after the one he cited above. Note that the last part of the tarball >>>> name >>>> refers to the subversion commit number (which increases monotonically); >>>> any >>>> 1.4 nightly snapshot tarball beyond "r20392" will contain this new IOF >>>> stuff. Here's where to get our nightly snapshot tarballs: >>>> >>>> http://www.open-mpi.org/nightly/trunk/ >>>> >>>> Don't read anything into the "1.4" version number -- we've just bumped >>>> the >>>> version number internally to be different than the current stable series >>>> (1.3). We haven't yet branched for the v1.4 series; hence, "1.4a1" >>>> currently refers to our development trunk. >>>> >>>> -- >>>> Jeff Squyres >>>> Cisco Systems >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >