> On Jun 5, 2016, at 4:30 PM, Du, Fan <fan...@intel.com> wrote:
> 
> Thanks for your reply!
> 
> On 2016/6/5 3:01, Ralph Castain wrote:
>> The closest thing we have to what you describe is the “orte-dvm” - this
>> allows one to launch a persistent collection of daemons. You can then
>> run your applications against it using “mpiexec -hnp <url>” where the
>> url is that of the orte-dvm “head” daemon.
> 
> I tried this, maybe I miss something.
> 
> On host1:
> orte-dvm  --allow-run-as-root
> VMURI: 2783903744.0;tcp://192.168.10.55:47325
> DVM ready
> 
> On host2:
> mpiexec -hnp 2783903744.0;tcp://192.168.10.55:47325 
> <tcp://192.168.10.55:47325>

Your shell will take the semi-colon to mean the end of the line - you have to 
enclose it all in quotes

> OMPI_MCA_orte_hnp_uri=2783903744.0
> OMPI_MCA_ess=tool
> [grantleyIPDC01:03305] [[21695,0],0] ORTE_ERROR_LOG: Bad parameter in file 
> base/rml_base_contact.c at line 161
> -bash: tcp://192.168.10.55:47325: No such file or directory
> 
> digging the code a bit deeper, the uri expects to have job id and rank id.
> and how to make the subsequent orte-dvm know where the head orte-dvm is?
> I checked orte-dvm help, seems no such option there.
> 
>> If I understand you correctly, however, then you would want the orte-dvm
>> to assemble itself based on the asynchronous start of the individual
>> daemons. In other words, Mesos would start a daemon on each node as that
>> node became available. Then, once all the daemons have been started,
>> Mesos would execute “mpiexec” to start the application.
>> 
>> Is that correct?
> 
> Yes
> 
>> If so, then we don’t support that mode today, but it could fairly easily
>> be added.
>> However, I don’t see why you couldn’t just write a small
>> standalone tool that collects all the Mesos resources in a file until
>> all have been assembled, and then executes “mpiexec -hostfile <myfile>”.
> 
> Because, mpiexec will eventually rely ssh to run mpi proxy on hosts,

What’s the problem with that? It’s how many HPC clusters work. Is ssh not 
enabled?

> while
> in Mesos, it works like: framework passes information about on which host
> to run what commands, and pass such information to Mesos master, then Mesos
> master will instruct hosts to run commands.
> 
> This is where Mesos work module doesn't fit into Open MPI.

Easiest thing would be to add a Mesos PLM plugin to OMPI - IIRC, someone once 
did that, but nobody was interested and so it died


> 
>> Is there some reason this won’t work? It would be much simpler and would
>> work with any MPI.
>> 
>> Ralph
>> 
>> 
>>> On Jun 3, 2016, at 5:10 PM, Du, Fan <fan...@intel.com
>>> <mailto:fan...@intel.com>> wrote:
>>> 
>>> 
>>> 
>>> On 2016/6/2 19:14, Gilles Gouaillardet wrote:
>>>> Hi,
>>>> 
>>>> may I ask why you need/want to launch orted manually ?
>>> 
>>> Good question.
>>> 
>>> The intention is to get orted commands, and run orted with Mesos.
>>> This all comes from who Mesos works, in essence it offers
>>> resources(cpu/memory/ports)
>>> in a per host basis to framework, framework then builds information of
>>> how to run
>>> specific tasks, and pass those information to Mesos master, at last
>>> Mesos will
>>> instructs hosts to execute the framework tasks.
>>> 
>>> Take MPICH2 as example, the framework to support MPICH2 works as above.
>>> 1. framework gets offers from Mesos master, and tells the Mesos master
>>> to run a wrapper
>>> of MPICH2 proxy(hydra_pmi_proxy), at this time, the wrapper waits for
>>> commands to
>>> execute the proxy.
>>> 
>>> 2. After launch enough MPICH2 proxy wrapper on hosts as user expect,
>>> then run the
>>> real mpiexec program with '-launcher manual' to grab commands for the
>>> proxy, then
>>> pass those commands to the proxy wrapper, so finally the real MPICH2
>>> proxy got launched,
>>> and mpiexec will proceed on normally.
>>> 
>>> That's why I'm looking for similar functionality as '-launcher manual
>>> MPICH2.
>>> Non native speaker, I hope I tell the story clear :)
>>> 
>>> 
>>> 
>>>> unless you are running under a batch manager, Open MPI uses the rsh pml
>>>> to remotely start orted.
>>>> basically, it does
>>>> ssh host orted <orted params>
>>>> 
>>>> the best I can suggest is you do
>>>> 
>>>> mpirun --mca orte_rsh_agent myrshagent.sh --mca orte_launch_agent
>>>> mylaunchagent.sh  ...
>>>> under the hood, mpirun will do
>>>> myrshagent.sh host mylaunchagent.sh <orted params>
>>>> 
>>>> Cheers,
>>>> 
>>>> Gilles
>>>> 
>>>> On Thursday, June 2, 2016, Du, Fan <fan...@intel.com
>>>> <mailto:fan...@intel.com>
>>>> <mailto:fan...@intel.com>> wrote:
>>>> 
>>>>   Hi folks
>>>> 
>>>>   Starting from Open MPI, I can launch mpi application a.out as
>>>>   following on host1
>>>>   mpirun --allow-run-as-root --host host1,host2 -np 4 /tmp/a.out
>>>> 
>>>>   On host2, I saw an proxy, say orted here is spawned:
>>>>   orted --hnp-topo-sig 4N:2S:4L3:20L2:20L1:20C:40H:x86_64 -mca ess env
>>>>   -mca orte_ess_jobid 1275133952 -mca orte_ess_vpid 1 -mca
>>>>   orte_ess_num_procs 2 -mca orte_hnp_uri
>>>>   1275133952.0;tcp://host1_ip:40024 --tree-spawn -mca plm rsh
>>>> --tree-spawn
>>>> 
>>>>   It seems mpirun use ssh as launcher on my system.
>>>>   What if I want to run orted things manually, not by mpirun
>>>>   automatically,
>>>>   I mean, does mpirun has any option to produce commands for orted?
>>>> 
>>>>   As for MPICH2 implementation, there is "-launcher manual" option to
>>>>   make this works,
>>>>   for example:
>>>>   # mpiexec.hydra -launcher manual -np 4 htop
>>>>   HYDRA_LAUNCH: /usr/local/bin/hydra_pmi_proxy --control-port
>>>>   grantleyIPDC04:34652 --rmk user --launcher manual --demux poll
>>>>   --pgid 0 --retries 10 --usize -2 --proxy-id 0
>>>>   HYDRA_LAUNCH_END
>>>> 
>>>>   Then I can manually run hydra_pmi_proxy with commands, and
>>>>   mpiexec.hydra will proceed on.
>>>> 
>>>>   Thanks!
>>>>   _______________________________________________
>>>>   users mailing list
>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>   Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>   Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2016/06/29346.php
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2016/06/29347.php
>>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>> Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this
>>> post:http://www.open-mpi.org/community/lists/users/2016/06/29364.php
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/06/29367.php
>> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29375.php

Reply via email to