On Nov 25, 2011, at 12:29 PM, Paul Kapinos wrote:

> Hello again,
> 
>>> Ralph Castain wrote:
>>>> Yes, that would indeed break things. The 1.5 series isn't correctly 
>>>> checking connections across multiple interfaces until it finds one that 
>>>> works - it just uses the first one it sees. :-(
>>> Yahhh!!
>>> This behaviour - catch a random interface and hang forever if something is 
>>> wrong with it - is somewhat less than perfect.
>>> 
>>> From my perspective - the users one - OpenMPI should try to use eitcher 
>>> *all* available networks (as 1.4 it does...), starting with the high 
>>> performance ones, or *only* those interfaces on which the hostnames from 
>>> the hostfile are bound to.
>> It is indeed supposed to do the former - as I implied, this is a bug in the 
>> 1.5 series.
> 
> Thanks for clarification. I was not sure about this is a bug or a feature :-)
> 
> 
> 
>>> Also, there should be timeouts (if you cannot connect to a node within a 
>>> minute you probably will never ever be connected...)
>> We have debated about this for some time - there is a timeout mca param one 
>> can set, but we'll consider again making it default.
>>> If some connection runs into a timeout a warning would be great (and a hint 
>>> to take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude).
>>> 
>>> Should it not?
>>> Maybe you can file it as a "call for enhancement"...
>> Probably the right approach at this time.
> 
> Ahhh.. sorry, did not understand what you mean.
> Did you filed it, or someone else, or should I do it in some way? Or should 
> not?

I'll take care of it, and copy you on the ticket so you can see what happens.

I'll also do the same for the connection bug - sorry for the problem :-(


> 
> 
> 
> 
> 
> 
>>> But then I ran into yet another one issue. In 
>>> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params
>>> the way to define MCA parameters over environment variables is described.
>>> 
>>> I tried it:
>>> $ export OMPI_MCA_oob_tcp_if_include=ib0
>>> $ export OMPI_MCA_btl_tcp_if_include=ib0
>>> 
>>> 
>>> I checked it:
>>> $ ompi_info --param all all | grep oob_tcp_if_include
>>>                MCA oob: parameter "oob_tcp_if_include" (current value: 
>>> <ib0>, data source: environment or cmdline)
>>> $ ompi_info --param all all | grep btl_tcp_if_include
>>>                MCA btl: parameter "btl_tcp_if_include" (current value: 
>>> <ib0>, data source: environment or cmdline)
>>> 
>>> 
>>> But then I get again the hang-up issue!
>>> 
>>> ==> seem, mpiexec does not understand these environment variables! and only 
>>> get the command line options. This should not be so?
>> No, that isn't what is happening. The problem lies in the behavior of 
>> rsh/ssh. This environment does not forward environmental variables. Because 
>> of limits on cmd line length, we don't automatically forward MCA params from 
>> the environment, but only from the cmd line. It is an annoying limitation, 
>> but one outside our control.
> 
> We know about "ssh does not forward environmental variables." But in this 
> case, are these parameters not the parameters of mpiexec itself, too?
> 
> The crucial thing is, that setting of the parameters works over the command 
> line but *does not work* over the envvar way (as in 
> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params described). 
> This looks like a bug for me!
> 
> 
> 
> 
> 
>> Put those envars in the default mca param file and the problem will be 
>> resolved.
> 
> You mean e.g. $prefix/etc/openmpi-mca-params.conf as described in 4. of 
> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params
> 
> Well, this is possible, but not flexible enough for us (because there are 
> some machines which only can run if the parameters are *not* set - on those 
> the ssh goes just over these eth0 devices).
> 
> By now we use the command line parameters and hope the envvar way will work 
> sometimes.
> 
> 
>>> (I also tried to advise to provide the envvars by -x 
>>> OMPI_MCA_oob_tcp_if_include -x OMPI_MCA_btl_tcp_if_include - nothing 
>>> changed.
>> I'm surprised by that - they should be picked up and forwarded. Could be a 
>> bug
> 
> Well, I also mean this is a bug, but as said not on providing the values of 
> envvars but on detecting of these parameters at all. Or maybe on both.
> 
> 
> 
> 
>>> Well, they are OMPI_ variables and should be provided in any case).
>> No, they aren't - they are not treated differently than any other envar.
> 
> [after performing some RTFM...]
> at least the man page of mpiexec says, the OMPI_ environment variables are 
> always provided and thus treated *differently* than other envvars:
> 
> $ man mpiexec
> ....
> Exported Environment Variables
>       All environment variables that are named in the form OMPI_* will  
> automatically  be  exported to new processes on the local and remote nodes.
> 
> So, tells the man page lies, or this is an removed feature, or something else?
> 
> 
> Best wishes,
> 
> Paul Kapinos
> 
> 
> 
> 
> 
>>>> Specifying both include and exclude should generate an error as those are 
>>>> mutually exclusive options - I think this was also missed in early 1.5 
>>>> releases and was recently patched.
>>>> HTH
>>>> Ralph
>>>> On Nov 23, 2011, at 12:14 PM, TERRY DONTJE wrote:
>>>>> On 11/23/2011 2:02 PM, Paul Kapinos wrote:
>>>>>> Hello Ralph, hello all,
>>>>>> 
>>>>>> Two news, as usual a good and a bad one.
>>>>>> 
>>>>>> The good: we believe to find out *why* it hangs
>>>>>> 
>>>>>> The bad: it seem for me, this is a bug or at least undocumented feature 
>>>>>> of Open MPI /1.5.x.
>>>>>> 
>>>>>> In detail:
>>>>>> As said, we see mystery hang-ups if starting on some nodes using some 
>>>>>> permutation of hostnames. Usually removing "some bad" nodes helps, 
>>>>>> sometimes a permutation of node names in the hostfile is enough(!). The 
>>>>>> behaviour is reproducible.
>>>>>> 
>>>>>> The machines have at least 2 networks:
>>>>>> 
>>>>>> *eth0* is used for installation, monitoring, ... - this ethernet is very 
>>>>>> slim
>>>>>> 
>>>>>> *ib0* - is the "IP over IB" interface and is used for everything: the 
>>>>>> file systems, ssh and so on. The hostnames are bound to the ib0 network; 
>>>>>> our idea was not to use eth0 for MPI at all.
>>>>>> 
>>>>>> all machines are available from any over ib0 (are in one network).
>>>>>> 
>>>>>> But on eth0 there are at least two different networks; especially the 
>>>>>> computer linuxbsc025 is in different network than the others and is not 
>>>>>> reachable from other nodes over eth0! (but reachable over ib0. The name 
>>>>>> used in the hostfile is resolved to the IP of ib0 ).
>>>>>> 
>>>>>> So I believe that Open MPI /1.5.x tries to communicate over eth0 and 
>>>>>> cannot do it, and hangs. The /1.4.3 does not hang, so this issue is 
>>>>>> 1.5.x-specific (seen in 1.5.3 and 1.5.4). A bug?
>>>>>> 
>>>>>> I also tried to disable the eth0 completely:
>>>>>> 
>>>>>> $ mpiexec -mca btl_tcp_if_exclude eth0,lo  -mca btl_tcp_if_include ib0 
>>>>>> ...
>>>>>> 
>>>>> I believe if you give "-mca btl_tcp_if_include ib0" you do not need to 
>>>>> specify the exclude parameter.
>>>>>> ...but this does not help. All right, the above command should disable 
>>>>>> the usage of eth0 for MPI communication itself, but it hangs just before 
>>>>>> the MPI is started, isn't it? (because one process lacks, the MPI_INIT 
>>>>>> cannot be passed)
>>>>>> 
>>>>> By "just before the MPI is started" do you mean while orte is launching 
>>>>> the processes.
>>>>> I wonder if you need to specify "-mca oob_tcp_if_include ib0" also but I 
>>>>> think that may depend on which oob you are using.
>>>>>> Now a question: is there a way to forbid the mpiexec to use some 
>>>>>> interfaces at all?
>>>>>> 
>>>>>> Best wishes,
>>>>>> 
>>>>>> Paul Kapinos
>>>>>> 
>>>>>> P.S. Of course we know about the good idea to bring all nodes into the 
>>>>>> same net on eth0, but at this point it is impossible due of technical 
>>>>>> reason[s]...
>>>>>> 
>>>>>> P.S.2 I'm not sure that the issue is really rooted in the above 
>>>>>> mentioned misconfiguration of eth0, but I have no better idea at this 
>>>>>> point...
>>>>>> 
>>>>>> 
>>>>>>>> The map seem to be correctly build, also the output if the daemons 
>>>>>>>> seem to be the same (see helloworld.txt)
>>>>>>> Unfortunately, it appears that OMPI was not built with --enable-debug 
>>>>>>> as there is no debug info in the output. Without a debug installation 
>>>>>>> of OMPI, the ability to determine the problem is pretty limited.
>>>>>> well, this will be the next option we will activate. We also have 
>>>>>> another issue here, on (not) using uDAPL..
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>>> You should also try putting that long list of nodes in a hostfile - 
>>>>>>>>> see if that makes a difference.
>>>>>>>>> It will process the nodes thru a different code path, so if there is 
>>>>>>>>> some problem in --host,
>>>>>>>>> this will tell us.
>>>>>>>> No, with the host file instead of host list on command line the 
>>>>>>>> behaviour is the same.
>>>>>>>> 
>>>>>>>> But, I just found out that the 1.4.3 does *not* hang on this 
>>>>>>>> constellation. The next thing I will try will be the installation of 
>>>>>>>> 1.5.4 :o)
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> 
>>>>>>>> Paul
>>>>>>>> 
>>>>>>>> P.S. started:
>>>>>>>> 
>>>>>>>> $ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile 
>>>>>>>> hostfile-mini -mca odls_base_verbose 5 --leave-session-attached 
>>>>>>>> --display-map  helloworld 2>&1 | tee helloworld.txt
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote:
>>>>>>>>>> Hello Open MPI volks,
>>>>>>>>>> 
>>>>>>>>>> We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand 
>>>>>>>>>> cluster, and we have some strange hangups if starting OpenMPI 
>>>>>>>>>> processes.
>>>>>>>>>> 
>>>>>>>>>> The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna 
>>>>>>>>>> due of  offline nodes). Each node is accessible from each other over 
>>>>>>>>>> SSH (without password), also MPI programs between any two nodes are 
>>>>>>>>>> checked to run.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> So long, I tried to start some bigger number of processes, one 
>>>>>>>>>> process per node:
>>>>>>>>>> $ mpiexec -np NN  --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe
>>>>>>>>>> 
>>>>>>>>>> Now the problem: there are some constellations of names in the host 
>>>>>>>>>> list on which mpiexec reproducible hangs forever; and more 
>>>>>>>>>> surprising: other *permutation* of the *same* node names may run 
>>>>>>>>>> without any errors!
>>>>>>>>>> 
>>>>>>>>>> Example: the command in laueft.txt runs OK, the command in 
>>>>>>>>>> haengt.txt hangs. Note: the only difference is that the node 
>>>>>>>>>> linuxbsc025 is put on the end of the host list. Amazed, too?
>>>>>>>>>> 
>>>>>>>>>> Looking on the particular nodes during the above mpiexec hangs, we 
>>>>>>>>>> found the orted daemons started on *each* node and the binary on all 
>>>>>>>>>> but one node (orted.txt, MPI_FastTest.txt).
>>>>>>>>>> Again amazing that the node with no user process started (leading to 
>>>>>>>>>> hangup in MPI_Init of all processes and thus to hangup, I believe) 
>>>>>>>>>> was always the same, linuxbsc005, which is NOT the permuted item 
>>>>>>>>>> linuxbsc025...
>>>>>>>>>> 
>>>>>>>>>> This behaviour is reproducible. The hang-on only occure if the 
>>>>>>>>>> started application is a MPI application ("hostname" does not hang).
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Any Idea what is gonna on?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> 
>>>>>>>>>> Paul Kapinos
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> P.S: no alias names used, all names are real ones
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
>>>>>>>>>> RWTH Aachen University, Center for Computing and Communication
>>>>>>>>>> Seffenter Weg 23,  D 52074  Aachen (Germany)
>>>>>>>>>> Tel: +49 241/80-24915
>>>>>>>>>> linuxbsc001: STDOUT: 24323 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc002: STDOUT:  2142 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc003: STDOUT: 69266 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc004: STDOUT: 58899 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc006: STDOUT: 68255 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc007: STDOUT: 62026 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc008: STDOUT: 54221 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc009: STDOUT: 55482 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc010: STDOUT: 59380 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc011: STDOUT: 58312 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc014: STDOUT: 56013 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc016: STDOUT: 58563 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc017: STDOUT: 54693 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc018: STDOUT: 54187 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc020: STDOUT: 55811 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc021: STDOUT: 54982 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc022: STDOUT: 50032 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc023: STDOUT: 54044 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc024: STDOUT: 51247 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc025: STDOUT: 18575 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc027: STDOUT: 48969 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc028: STDOUT: 52397 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc029: STDOUT: 52780 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc030: STDOUT: 47537 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc031: STDOUT: 54609 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> linuxbsc032: STDOUT: 52833 ?        SLl    0:00 MPI_FastTest.exe
>>>>>>>>>> $ timex /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec -np 27  
>>>>>>>>>> --host 
>>>>>>>>>> linuxbsc001,linuxbsc002,linuxbsc003,linuxbsc004,linuxbsc005,linuxbsc006,linuxbsc007,linuxbsc008,linuxbsc009,linuxbsc010,linuxbsc011,linuxbsc014,linuxbsc016,linuxbsc017,linuxbsc018,linuxbsc020,linuxbsc021,linuxbsc022,linuxbsc023,linuxbsc024,linuxbsc025,linuxbsc027,linuxbsc028,linuxbsc029,linuxbsc030,linuxbsc031,linuxbsc032
>>>>>>>>>>  MPI_FastTest.exe
>>>>>>>>>> $ timex /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec -np 27  
>>>>>>>>>> --host 
>>>>>>>>>> linuxbsc001,linuxbsc002,linuxbsc003,linuxbsc004,linuxbsc005,linuxbsc006,linuxbsc007,linuxbsc008,linuxbsc009,linuxbsc010,linuxbsc011,linuxbsc014,linuxbsc016,linuxbsc017,linuxbsc018,linuxbsc020,linuxbsc021,linuxbsc022,linuxbsc023,linuxbsc024,linuxbsc027,linuxbsc028,linuxbsc029,linuxbsc030,linuxbsc031,linuxbsc032,linuxbsc025
>>>>>>>>>>  MPI_FastTest.exe
>>>>>>>>>> linuxbsc001: STDOUT: 24322 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 1 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc002: STDOUT:  2141 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 2 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc003: STDOUT: 69265 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 3 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc004: STDOUT: 58898 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 4 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc005: STDOUT: 65642 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 5 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc006: STDOUT: 68254 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 6 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc007: STDOUT: 62025 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 7 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc008: STDOUT: 54220 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 8 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc009: STDOUT: 55481 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 9 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc010: STDOUT: 59379 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 10 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc011: STDOUT: 58311 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 11 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc014: STDOUT: 56012 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 12 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc016: STDOUT: 58562 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 13 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc017: STDOUT: 54692 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 14 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc018: STDOUT: 54186 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 15 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc020: STDOUT: 55810 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 16 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc021: STDOUT: 54981 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 17 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc022: STDOUT: 50031 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 18 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc023: STDOUT: 54043 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 19 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc024: STDOUT: 51246 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 20 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc025: STDOUT: 18574 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 21 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc027: STDOUT: 48968 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 22 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc028: STDOUT: 52396 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 23 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc029: STDOUT: 52779 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 24 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc030: STDOUT: 47536 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 25 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc031: STDOUT: 54608 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 26 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> linuxbsc032: STDOUT: 52832 ?        Ss     0:00 
>>>>>>>>>> /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess 
>>>>>>>>>> env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 27 -mca 
>>>>>>>>>> orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 
>>>>>>>>>> -mca plm rsh
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> -- 
>>>>>>>> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
>>>>>>>> RWTH Aachen University, Center for Computing and Communication
>>>>>>>> Seffenter Weg 23,  D 52074  Aachen (Germany)
>>>>>>>> Tel: +49 241/80-24915
>>>>>>>> linuxbsc005 slots=1
>>>>>>>> linuxbsc006 slots=1
>>>>>>>> linuxbsc007 slots=1
>>>>>>>> linuxbsc008 slots=1
>>>>>>>> linuxbsc009 slots=1
>>>>>>>> linuxbsc010 slots=1
>>>>>>>> linuxbsc011 slots=1
>>>>>>>> linuxbsc014 slots=1
>>>>>>>> linuxbsc016 slots=1
>>>>>>>> linuxbsc017 slots=1
>>>>>>>> linuxbsc018 slots=1
>>>>>>>> linuxbsc020 slots=1
>>>>>>>> linuxbsc021 slots=1
>>>>>>>> linuxbsc022 slots=1
>>>>>>>> linuxbsc023 slots=1
>>>>>>>> linuxbsc024 slots=1
>>>>>>>> linuxbsc025 slots=1[linuxc2.rz.RWTH-Aachen.DE:22229] mca:base:select:( 
>>>>>>>> odls) Querying component [default]
>>>>>>>> [linuxc2.rz.RWTH-Aachen.DE:22229] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxc2.rz.RWTH-Aachen.DE:22229] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> 
>>>>>>>> ========================   JOB MAP   ========================
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc005    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 0
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc006    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 1
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc007    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 2
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc008    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 3
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc009    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 4
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc010    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 5
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc011    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 6
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc014    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 7
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc016    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 8
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc017    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 9
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc018    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 10
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc020    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 11
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc021    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 12
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc022    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 13
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc023    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 14
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc024    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 15
>>>>>>>> 
>>>>>>>> Data for node: linuxbsc025    Num procs: 1
>>>>>>>>   Process OMPI jobid: [87,1] Process rank: 16
>>>>>>>> 
>>>>>>>> =============================================================
>>>>>>>> [linuxbsc007.rz.RWTH-Aachen.DE:07574] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc007.rz.RWTH-Aachen.DE:07574] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc007.rz.RWTH-Aachen.DE:07574] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc016.rz.RWTH-Aachen.DE:03146] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc016.rz.RWTH-Aachen.DE:03146] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc016.rz.RWTH-Aachen.DE:03146] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc005.rz.RWTH-Aachen.DE:22051] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc005.rz.RWTH-Aachen.DE:22051] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc005.rz.RWTH-Aachen.DE:22051] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc011.rz.RWTH-Aachen.DE:07131] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc011.rz.RWTH-Aachen.DE:07131] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc011.rz.RWTH-Aachen.DE:07131] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc025.rz.RWTH-Aachen.DE:43153] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc025.rz.RWTH-Aachen.DE:43153] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc025.rz.RWTH-Aachen.DE:43153] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc017.rz.RWTH-Aachen.DE:05044] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc017.rz.RWTH-Aachen.DE:05044] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc017.rz.RWTH-Aachen.DE:05044] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc018.rz.RWTH-Aachen.DE:01840] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc018.rz.RWTH-Aachen.DE:01840] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc018.rz.RWTH-Aachen.DE:01840] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc024.rz.RWTH-Aachen.DE:79549] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc024.rz.RWTH-Aachen.DE:79549] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc024.rz.RWTH-Aachen.DE:79549] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc022.rz.RWTH-Aachen.DE:73501] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc022.rz.RWTH-Aachen.DE:73501] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc022.rz.RWTH-Aachen.DE:73501] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc023.rz.RWTH-Aachen.DE:03364] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc023.rz.RWTH-Aachen.DE:03364] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc023.rz.RWTH-Aachen.DE:03364] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc006.rz.RWTH-Aachen.DE:16811] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc006.rz.RWTH-Aachen.DE:16811] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc006.rz.RWTH-Aachen.DE:16811] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc014.rz.RWTH-Aachen.DE:10206] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc014.rz.RWTH-Aachen.DE:10206] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc014.rz.RWTH-Aachen.DE:10206] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc008.rz.RWTH-Aachen.DE:00858] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc008.rz.RWTH-Aachen.DE:00858] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc008.rz.RWTH-Aachen.DE:00858] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc010.rz.RWTH-Aachen.DE:09727] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc010.rz.RWTH-Aachen.DE:09727] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc010.rz.RWTH-Aachen.DE:09727] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc020.rz.RWTH-Aachen.DE:06680] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc020.rz.RWTH-Aachen.DE:06680] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc020.rz.RWTH-Aachen.DE:06680] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc009.rz.RWTH-Aachen.DE:05145] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc009.rz.RWTH-Aachen.DE:05145] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc009.rz.RWTH-Aachen.DE:05145] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc021.rz.RWTH-Aachen.DE:01405] mca:base:select:( odls) Querying 
>>>>>>>> component [default]
>>>>>>>> [linuxbsc021.rz.RWTH-Aachen.DE:01405] mca:base:select:( odls) Query of 
>>>>>>>> component [default] set priority to 1
>>>>>>>> [linuxbsc021.rz.RWTH-Aachen.DE:01405] mca:base:select:( odls) Selected 
>>>>>>>> component [default]
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> -- 
>>>>> <Mail Attachment.gif>
>>>>> Terry D. Dontje | Principal Software Engineer
>>>>> Developer Tools Engineering | +1.781.442.2631
>>>>> Oracle * - Performance Technologies*
>>>>> 95 Network Drive, Burlington, MA 01803
>>>>> Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> ------------------------------------------------------------------------
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> -- 
>>> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
>>> RWTH Aachen University, Center for Computing and Communication
>>> Seffenter Weg 23,  D 52074  Aachen (Germany)
>>> Tel: +49 241/80-24915
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
> RWTH Aachen University, Center for Computing and Communication
> Seffenter Weg 23,  D 52074  Aachen (Germany)
> Tel: +49 241/80-24915
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to