(off list) Are you sure about OMPI_MCA_* params not being treated specially? I know for a fact that they *used* to be. I.e., we bundled up all env variables that began with OMPI_MCA_* and sent them with the job to back-end nodes. It allowed sysadmins to set global MCA param values without editing the MCA param file on every node.
It looks like this is still happening on the trunk: [14:38] svbu-mpi:~ % cat run #!/bin/csh -f echo on `hostname`, foo is: $OMPI_MCA_foo exit 0 [14:38] svbu-mpi:~ % setenv OMPI_MCA_foo bar [14:38] svbu-mpi:~ % ./run on svbu-mpi.cisco.com, foo is: bar [14:38] svbu-mpi:~ % mpirun -np 2 --bynode run on svbu-mpi044, foo is: bar on svbu-mpi043, foo is: bar [14:38] svbu-mpi:~ % unsetenv OMPI_MCA_foo [14:38] svbu-mpi:~ % mpirun -np 2 --bynode run OMPI_MCA_foo: Undefined variable. OMPI_MCA_foo: Undefined variable. ------------------------------------------------------- While the primary job terminated normally, 2 processes returned non-zero exit codes.. Further examination may be required. ------------------------------------------------------- [14:38] svbu-mpi:~ % (I did not read this thread too carefully, so perhaps I missed an inference in here somewhere...) On Nov 25, 2011, at 5:21 PM, Ralph Castain wrote: > > On Nov 25, 2011, at 12:29 PM, Paul Kapinos wrote: > >> Hello again, >> >>>> Ralph Castain wrote: >>>>> Yes, that would indeed break things. The 1.5 series isn't correctly >>>>> checking connections across multiple interfaces until it finds one that >>>>> works - it just uses the first one it sees. :-( >>>> Yahhh!! >>>> This behaviour - catch a random interface and hang forever if something is >>>> wrong with it - is somewhat less than perfect. >>>> >>>> From my perspective - the users one - OpenMPI should try to use eitcher >>>> *all* available networks (as 1.4 it does...), starting with the high >>>> performance ones, or *only* those interfaces on which the hostnames from >>>> the hostfile are bound to. >>> It is indeed supposed to do the former - as I implied, this is a bug in the >>> 1.5 series. >> >> Thanks for clarification. I was not sure about this is a bug or a feature :-) >> >> >> >>>> Also, there should be timeouts (if you cannot connect to a node within a >>>> minute you probably will never ever be connected...) >>> We have debated about this for some time - there is a timeout mca param one >>> can set, but we'll consider again making it default. >>>> If some connection runs into a timeout a warning would be great (and a >>>> hint to take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude). >>>> >>>> Should it not? >>>> Maybe you can file it as a "call for enhancement"... >>> Probably the right approach at this time. >> >> Ahhh.. sorry, did not understand what you mean. >> Did you filed it, or someone else, or should I do it in some way? Or should >> not? > > I'll take care of it, and copy you on the ticket so you can see what happens. > > I'll also do the same for the connection bug - sorry for the problem :-( > > >> >> >> >> >> >> >>>> But then I ran into yet another one issue. In >>>> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params >>>> the way to define MCA parameters over environment variables is described. >>>> >>>> I tried it: >>>> $ export OMPI_MCA_oob_tcp_if_include=ib0 >>>> $ export OMPI_MCA_btl_tcp_if_include=ib0 >>>> >>>> >>>> I checked it: >>>> $ ompi_info --param all all | grep oob_tcp_if_include >>>> MCA oob: parameter "oob_tcp_if_include" (current value: >>>> <ib0>, data source: environment or cmdline) >>>> $ ompi_info --param all all | grep btl_tcp_if_include >>>> MCA btl: parameter "btl_tcp_if_include" (current value: >>>> <ib0>, data source: environment or cmdline) >>>> >>>> >>>> But then I get again the hang-up issue! >>>> >>>> ==> seem, mpiexec does not understand these environment variables! and >>>> only get the command line options. This should not be so? >>> No, that isn't what is happening. The problem lies in the behavior of >>> rsh/ssh. This environment does not forward environmental variables. Because >>> of limits on cmd line length, we don't automatically forward MCA params >>> from the environment, but only from the cmd line. It is an annoying >>> limitation, but one outside our control. >> >> We know about "ssh does not forward environmental variables." But in this >> case, are these parameters not the parameters of mpiexec itself, too? >> >> The crucial thing is, that setting of the parameters works over the command >> line but *does not work* over the envvar way (as in >> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params described). >> This looks like a bug for me! >> >> >> >> >> >>> Put those envars in the default mca param file and the problem will be >>> resolved. >> >> You mean e.g. $prefix/etc/openmpi-mca-params.conf as described in 4. of >> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params >> >> Well, this is possible, but not flexible enough for us (because there are >> some machines which only can run if the parameters are *not* set - on those >> the ssh goes just over these eth0 devices). >> >> By now we use the command line parameters and hope the envvar way will work >> sometimes. >> >> >>>> (I also tried to advise to provide the envvars by -x >>>> OMPI_MCA_oob_tcp_if_include -x OMPI_MCA_btl_tcp_if_include - nothing >>>> changed. >>> I'm surprised by that - they should be picked up and forwarded. Could be a >>> bug >> >> Well, I also mean this is a bug, but as said not on providing the values of >> envvars but on detecting of these parameters at all. Or maybe on both. >> >> >> >> >>>> Well, they are OMPI_ variables and should be provided in any case). >>> No, they aren't - they are not treated differently than any other envar. >> >> [after performing some RTFM...] >> at least the man page of mpiexec says, the OMPI_ environment variables are >> always provided and thus treated *differently* than other envvars: >> >> $ man mpiexec >> .... >> Exported Environment Variables >> All environment variables that are named in the form OMPI_* will >> automatically be exported to new processes on the local and remote nodes. >> >> So, tells the man page lies, or this is an removed feature, or something >> else? >> >> >> Best wishes, >> >> Paul Kapinos >> >> >> >> >> >>>>> Specifying both include and exclude should generate an error as those are >>>>> mutually exclusive options - I think this was also missed in early 1.5 >>>>> releases and was recently patched. >>>>> HTH >>>>> Ralph >>>>> On Nov 23, 2011, at 12:14 PM, TERRY DONTJE wrote: >>>>>> On 11/23/2011 2:02 PM, Paul Kapinos wrote: >>>>>>> Hello Ralph, hello all, >>>>>>> >>>>>>> Two news, as usual a good and a bad one. >>>>>>> >>>>>>> The good: we believe to find out *why* it hangs >>>>>>> >>>>>>> The bad: it seem for me, this is a bug or at least undocumented feature >>>>>>> of Open MPI /1.5.x. >>>>>>> >>>>>>> In detail: >>>>>>> As said, we see mystery hang-ups if starting on some nodes using some >>>>>>> permutation of hostnames. Usually removing "some bad" nodes helps, >>>>>>> sometimes a permutation of node names in the hostfile is enough(!). The >>>>>>> behaviour is reproducible. >>>>>>> >>>>>>> The machines have at least 2 networks: >>>>>>> >>>>>>> *eth0* is used for installation, monitoring, ... - this ethernet is >>>>>>> very slim >>>>>>> >>>>>>> *ib0* - is the "IP over IB" interface and is used for everything: the >>>>>>> file systems, ssh and so on. The hostnames are bound to the ib0 >>>>>>> network; our idea was not to use eth0 for MPI at all. >>>>>>> >>>>>>> all machines are available from any over ib0 (are in one network). >>>>>>> >>>>>>> But on eth0 there are at least two different networks; especially the >>>>>>> computer linuxbsc025 is in different network than the others and is not >>>>>>> reachable from other nodes over eth0! (but reachable over ib0. The name >>>>>>> used in the hostfile is resolved to the IP of ib0 ). >>>>>>> >>>>>>> So I believe that Open MPI /1.5.x tries to communicate over eth0 and >>>>>>> cannot do it, and hangs. The /1.4.3 does not hang, so this issue is >>>>>>> 1.5.x-specific (seen in 1.5.3 and 1.5.4). A bug? >>>>>>> >>>>>>> I also tried to disable the eth0 completely: >>>>>>> >>>>>>> $ mpiexec -mca btl_tcp_if_exclude eth0,lo -mca btl_tcp_if_include ib0 >>>>>>> ... >>>>>>> >>>>>> I believe if you give "-mca btl_tcp_if_include ib0" you do not need to >>>>>> specify the exclude parameter. >>>>>>> ...but this does not help. All right, the above command should disable >>>>>>> the usage of eth0 for MPI communication itself, but it hangs just >>>>>>> before the MPI is started, isn't it? (because one process lacks, the >>>>>>> MPI_INIT cannot be passed) >>>>>>> >>>>>> By "just before the MPI is started" do you mean while orte is launching >>>>>> the processes. >>>>>> I wonder if you need to specify "-mca oob_tcp_if_include ib0" also but I >>>>>> think that may depend on which oob you are using. >>>>>>> Now a question: is there a way to forbid the mpiexec to use some >>>>>>> interfaces at all? >>>>>>> >>>>>>> Best wishes, >>>>>>> >>>>>>> Paul Kapinos >>>>>>> >>>>>>> P.S. Of course we know about the good idea to bring all nodes into the >>>>>>> same net on eth0, but at this point it is impossible due of technical >>>>>>> reason[s]... >>>>>>> >>>>>>> P.S.2 I'm not sure that the issue is really rooted in the above >>>>>>> mentioned misconfiguration of eth0, but I have no better idea at this >>>>>>> point... >>>>>>> >>>>>>> >>>>>>>>> The map seem to be correctly build, also the output if the daemons >>>>>>>>> seem to be the same (see helloworld.txt) >>>>>>>> Unfortunately, it appears that OMPI was not built with --enable-debug >>>>>>>> as there is no debug info in the output. Without a debug installation >>>>>>>> of OMPI, the ability to determine the problem is pretty limited. >>>>>>> well, this will be the next option we will activate. We also have >>>>>>> another issue here, on (not) using uDAPL.. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>>> You should also try putting that long list of nodes in a hostfile - >>>>>>>>>> see if that makes a difference. >>>>>>>>>> It will process the nodes thru a different code path, so if there is >>>>>>>>>> some problem in --host, >>>>>>>>>> this will tell us. >>>>>>>>> No, with the host file instead of host list on command line the >>>>>>>>> behaviour is the same. >>>>>>>>> >>>>>>>>> But, I just found out that the 1.4.3 does *not* hang on this >>>>>>>>> constellation. The next thing I will try will be the installation of >>>>>>>>> 1.5.4 :o) >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> Paul >>>>>>>>> >>>>>>>>> P.S. started: >>>>>>>>> >>>>>>>>> $ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile >>>>>>>>> hostfile-mini -mca odls_base_verbose 5 --leave-session-attached >>>>>>>>> --display-map helloworld 2>&1 | tee helloworld.txt >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote: >>>>>>>>>>> Hello Open MPI volks, >>>>>>>>>>> >>>>>>>>>>> We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand >>>>>>>>>>> cluster, and we have some strange hangups if starting OpenMPI >>>>>>>>>>> processes. >>>>>>>>>>> >>>>>>>>>>> The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna >>>>>>>>>>> due of offline nodes). Each node is accessible from each other >>>>>>>>>>> over SSH (without password), also MPI programs between any two >>>>>>>>>>> nodes are checked to run. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> So long, I tried to start some bigger number of processes, one >>>>>>>>>>> process per node: >>>>>>>>>>> $ mpiexec -np NN --host linuxbsc001,linuxbsc002,... >>>>>>>>>>> MPI_FastTest.exe >>>>>>>>>>> >>>>>>>>>>> Now the problem: there are some constellations of names in the host >>>>>>>>>>> list on which mpiexec reproducible hangs forever; and more >>>>>>>>>>> surprising: other *permutation* of the *same* node names may run >>>>>>>>>>> without any errors! >>>>>>>>>>> >>>>>>>>>>> Example: the command in laueft.txt runs OK, the command in >>>>>>>>>>> haengt.txt hangs. Note: the only difference is that the node >>>>>>>>>>> linuxbsc025 is put on the end of the host list. Amazed, too? >>>>>>>>>>> >>>>>>>>>>> Looking on the particular nodes during the above mpiexec hangs, we >>>>>>>>>>> found the orted daemons started on *each* node and the binary on >>>>>>>>>>> all but one node (orted.txt, MPI_FastTest.txt). >>>>>>>>>>> Again amazing that the node with no user process started (leading >>>>>>>>>>> to hangup in MPI_Init of all processes and thus to hangup, I >>>>>>>>>>> believe) was always the same, linuxbsc005, which is NOT the >>>>>>>>>>> permuted item linuxbsc025... >>>>>>>>>>> >>>>>>>>>>> This behaviour is reproducible. The hang-on only occure if the >>>>>>>>>>> started application is a MPI application ("hostname" does not hang). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Any Idea what is gonna on? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> Paul Kapinos >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> P.S: no alias names used, all names are real ones >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Dipl.-Inform. Paul Kapinos - High Performance Computing, >>>>>>>>>>> RWTH Aachen University, Center for Computing and Communication >>>>>>>>>>> Seffenter Weg 23, D 52074 Aachen (Germany) >>>>>>>>>>> Tel: +49 241/80-24915 >>>>>>>>>>> linuxbsc001: STDOUT: 24323 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc002: STDOUT: 2142 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc003: STDOUT: 69266 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc004: STDOUT: 58899 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc006: STDOUT: 68255 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc007: STDOUT: 62026 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc008: STDOUT: 54221 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc009: STDOUT: 55482 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc010: STDOUT: 59380 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc011: STDOUT: 58312 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc014: STDOUT: 56013 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc016: STDOUT: 58563 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc017: STDOUT: 54693 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc018: STDOUT: 54187 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc020: STDOUT: 55811 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc021: STDOUT: 54982 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc022: STDOUT: 50032 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc023: STDOUT: 54044 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc024: STDOUT: 51247 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc025: STDOUT: 18575 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc027: STDOUT: 48969 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc028: STDOUT: 52397 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc029: STDOUT: 52780 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc030: STDOUT: 47537 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc031: STDOUT: 54609 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> linuxbsc032: STDOUT: 52833 ? SLl 0:00 MPI_FastTest.exe >>>>>>>>>>> $ timex /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec -np 27 >>>>>>>>>>> --host >>>>>>>>>>> linuxbsc001,linuxbsc002,linuxbsc003,linuxbsc004,linuxbsc005,linuxbsc006,linuxbsc007,linuxbsc008,linuxbsc009,linuxbsc010,linuxbsc011,linuxbsc014,linuxbsc016,linuxbsc017,linuxbsc018,linuxbsc020,linuxbsc021,linuxbsc022,linuxbsc023,linuxbsc024,linuxbsc025,linuxbsc027,linuxbsc028,linuxbsc029,linuxbsc030,linuxbsc031,linuxbsc032 >>>>>>>>>>> MPI_FastTest.exe >>>>>>>>>>> $ timex /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec -np 27 >>>>>>>>>>> --host >>>>>>>>>>> linuxbsc001,linuxbsc002,linuxbsc003,linuxbsc004,linuxbsc005,linuxbsc006,linuxbsc007,linuxbsc008,linuxbsc009,linuxbsc010,linuxbsc011,linuxbsc014,linuxbsc016,linuxbsc017,linuxbsc018,linuxbsc020,linuxbsc021,linuxbsc022,linuxbsc023,linuxbsc024,linuxbsc027,linuxbsc028,linuxbsc029,linuxbsc030,linuxbsc031,linuxbsc032,linuxbsc025 >>>>>>>>>>> MPI_FastTest.exe >>>>>>>>>>> linuxbsc001: STDOUT: 24322 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 1 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc002: STDOUT: 2141 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 2 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc003: STDOUT: 69265 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 3 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc004: STDOUT: 58898 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 4 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc005: STDOUT: 65642 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 5 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc006: STDOUT: 68254 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 6 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc007: STDOUT: 62025 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 7 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc008: STDOUT: 54220 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 8 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc009: STDOUT: 55481 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 9 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc010: STDOUT: 59379 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 10 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc011: STDOUT: 58311 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 11 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc014: STDOUT: 56012 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 12 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc016: STDOUT: 58562 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 13 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc017: STDOUT: 54692 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 14 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc018: STDOUT: 54186 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 15 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc020: STDOUT: 55810 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 16 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc021: STDOUT: 54981 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 17 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc022: STDOUT: 50031 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 18 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc023: STDOUT: 54043 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 19 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc024: STDOUT: 51246 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 20 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc025: STDOUT: 18574 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 21 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc027: STDOUT: 48968 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 22 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc028: STDOUT: 52396 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 23 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc029: STDOUT: 52779 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 24 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc030: STDOUT: 47536 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 25 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc031: STDOUT: 54608 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 26 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> linuxbsc032: STDOUT: 52832 ? Ss 0:00 >>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess >>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 27 -mca >>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri >>>>>>>>>>> 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> -- >>>>>>>>> Dipl.-Inform. Paul Kapinos - High Performance Computing, >>>>>>>>> RWTH Aachen University, Center for Computing and Communication >>>>>>>>> Seffenter Weg 23, D 52074 Aachen (Germany) >>>>>>>>> Tel: +49 241/80-24915 >>>>>>>>> linuxbsc005 slots=1 >>>>>>>>> linuxbsc006 slots=1 >>>>>>>>> linuxbsc007 slots=1 >>>>>>>>> linuxbsc008 slots=1 >>>>>>>>> linuxbsc009 slots=1 >>>>>>>>> linuxbsc010 slots=1 >>>>>>>>> linuxbsc011 slots=1 >>>>>>>>> linuxbsc014 slots=1 >>>>>>>>> linuxbsc016 slots=1 >>>>>>>>> linuxbsc017 slots=1 >>>>>>>>> linuxbsc018 slots=1 >>>>>>>>> linuxbsc020 slots=1 >>>>>>>>> linuxbsc021 slots=1 >>>>>>>>> linuxbsc022 slots=1 >>>>>>>>> linuxbsc023 slots=1 >>>>>>>>> linuxbsc024 slots=1 >>>>>>>>> linuxbsc025 slots=1[linuxc2.rz.RWTH-Aachen.DE:22229] >>>>>>>>> mca:base:select:( odls) Querying component [default] >>>>>>>>> [linuxc2.rz.RWTH-Aachen.DE:22229] mca:base:select:( odls) Query of >>>>>>>>> component [default] set priority to 1 >>>>>>>>> [linuxc2.rz.RWTH-Aachen.DE:22229] mca:base:select:( odls) Selected >>>>>>>>> component [default] >>>>>>>>> >>>>>>>>> ======================== JOB MAP ======================== >>>>>>>>> >>>>>>>>> Data for node: linuxbsc005 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 0 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc006 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 1 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc007 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 2 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc008 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 3 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc009 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 4 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc010 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 5 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc011 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 6 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc014 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 7 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc016 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 8 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc017 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 9 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc018 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 10 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc020 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 11 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc021 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 12 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc022 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 13 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc023 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 14 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc024 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 15 >>>>>>>>> >>>>>>>>> Data for node: linuxbsc025 Num procs: 1 >>>>>>>>> Process OMPI jobid: [87,1] Process rank: 16 >>>>>>>>> >>>>>>>>> ============================================================= >>>>>>>>> [linuxbsc007.rz.RWTH-Aachen.DE:07574] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc007.rz.RWTH-Aachen.DE:07574] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc007.rz.RWTH-Aachen.DE:07574] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc016.rz.RWTH-Aachen.DE:03146] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc016.rz.RWTH-Aachen.DE:03146] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc016.rz.RWTH-Aachen.DE:03146] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc005.rz.RWTH-Aachen.DE:22051] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc005.rz.RWTH-Aachen.DE:22051] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc005.rz.RWTH-Aachen.DE:22051] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc011.rz.RWTH-Aachen.DE:07131] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc011.rz.RWTH-Aachen.DE:07131] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc011.rz.RWTH-Aachen.DE:07131] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc025.rz.RWTH-Aachen.DE:43153] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc025.rz.RWTH-Aachen.DE:43153] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc025.rz.RWTH-Aachen.DE:43153] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc017.rz.RWTH-Aachen.DE:05044] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc017.rz.RWTH-Aachen.DE:05044] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc017.rz.RWTH-Aachen.DE:05044] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc018.rz.RWTH-Aachen.DE:01840] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc018.rz.RWTH-Aachen.DE:01840] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc018.rz.RWTH-Aachen.DE:01840] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc024.rz.RWTH-Aachen.DE:79549] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc024.rz.RWTH-Aachen.DE:79549] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc024.rz.RWTH-Aachen.DE:79549] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc022.rz.RWTH-Aachen.DE:73501] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc022.rz.RWTH-Aachen.DE:73501] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc022.rz.RWTH-Aachen.DE:73501] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc023.rz.RWTH-Aachen.DE:03364] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc023.rz.RWTH-Aachen.DE:03364] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc023.rz.RWTH-Aachen.DE:03364] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc006.rz.RWTH-Aachen.DE:16811] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc006.rz.RWTH-Aachen.DE:16811] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc006.rz.RWTH-Aachen.DE:16811] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc014.rz.RWTH-Aachen.DE:10206] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc014.rz.RWTH-Aachen.DE:10206] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc014.rz.RWTH-Aachen.DE:10206] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc008.rz.RWTH-Aachen.DE:00858] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc008.rz.RWTH-Aachen.DE:00858] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc008.rz.RWTH-Aachen.DE:00858] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc010.rz.RWTH-Aachen.DE:09727] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc010.rz.RWTH-Aachen.DE:09727] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc010.rz.RWTH-Aachen.DE:09727] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc020.rz.RWTH-Aachen.DE:06680] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc020.rz.RWTH-Aachen.DE:06680] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc020.rz.RWTH-Aachen.DE:06680] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc009.rz.RWTH-Aachen.DE:05145] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc009.rz.RWTH-Aachen.DE:05145] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc009.rz.RWTH-Aachen.DE:05145] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> [linuxbsc021.rz.RWTH-Aachen.DE:01405] mca:base:select:( odls) >>>>>>>>> Querying component [default] >>>>>>>>> [linuxbsc021.rz.RWTH-Aachen.DE:01405] mca:base:select:( odls) Query >>>>>>>>> of component [default] set priority to 1 >>>>>>>>> [linuxbsc021.rz.RWTH-Aachen.DE:01405] mca:base:select:( odls) >>>>>>>>> Selected component [default] >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> -- >>>>>> <Mail Attachment.gif> >>>>>> Terry D. Dontje | Principal Software Engineer >>>>>> Developer Tools Engineering | +1.781.442.2631 >>>>>> Oracle * - Performance Technologies* >>>>>> 95 Network Drive, Burlington, MA 01803 >>>>>> Email terry.don...@oracle.com <mailto:terry.don...@oracle.com> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> ------------------------------------------------------------------------ >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> -- >>>> Dipl.-Inform. Paul Kapinos - High Performance Computing, >>>> RWTH Aachen University, Center for Computing and Communication >>>> Seffenter Weg 23, D 52074 Aachen (Germany) >>>> Tel: +49 241/80-24915 >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- >> Dipl.-Inform. Paul Kapinos - High Performance Computing, >> RWTH Aachen University, Center for Computing and Communication >> Seffenter Weg 23, D 52074 Aachen (Germany) >> Tel: +49 241/80-24915 >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/