> On Jun 5, 2016, at 4:30 PM, Du, Fan <fan...@intel.com> wrote: > > Thanks for your reply! > > On 2016/6/5 3:01, Ralph Castain wrote: >> The closest thing we have to what you describe is the “orte-dvm” - this >> allows one to launch a persistent collection of daemons. You can then >> run your applications against it using “mpiexec -hnp <url>” where the >> url is that of the orte-dvm “head” daemon. > > I tried this, maybe I miss something. > > On host1: > orte-dvm --allow-run-as-root > VMURI: 2783903744.0;tcp://192.168.10.55:47325 > DVM ready > > On host2: > mpiexec -hnp 2783903744.0;tcp://192.168.10.55:47325 > <tcp://192.168.10.55:47325>
Your shell will take the semi-colon to mean the end of the line - you have to enclose it all in quotes > OMPI_MCA_orte_hnp_uri=2783903744.0 > OMPI_MCA_ess=tool > [grantleyIPDC01:03305] [[21695,0],0] ORTE_ERROR_LOG: Bad parameter in file > base/rml_base_contact.c at line 161 > -bash: tcp://192.168.10.55:47325: No such file or directory > > digging the code a bit deeper, the uri expects to have job id and rank id. > and how to make the subsequent orte-dvm know where the head orte-dvm is? > I checked orte-dvm help, seems no such option there. > >> If I understand you correctly, however, then you would want the orte-dvm >> to assemble itself based on the asynchronous start of the individual >> daemons. In other words, Mesos would start a daemon on each node as that >> node became available. Then, once all the daemons have been started, >> Mesos would execute “mpiexec” to start the application. >> >> Is that correct? > > Yes > >> If so, then we don’t support that mode today, but it could fairly easily >> be added. >> However, I don’t see why you couldn’t just write a small >> standalone tool that collects all the Mesos resources in a file until >> all have been assembled, and then executes “mpiexec -hostfile <myfile>”. > > Because, mpiexec will eventually rely ssh to run mpi proxy on hosts, What’s the problem with that? It’s how many HPC clusters work. Is ssh not enabled? > while > in Mesos, it works like: framework passes information about on which host > to run what commands, and pass such information to Mesos master, then Mesos > master will instruct hosts to run commands. > > This is where Mesos work module doesn't fit into Open MPI. Easiest thing would be to add a Mesos PLM plugin to OMPI - IIRC, someone once did that, but nobody was interested and so it died > >> Is there some reason this won’t work? It would be much simpler and would >> work with any MPI. >> >> Ralph >> >> >>> On Jun 3, 2016, at 5:10 PM, Du, Fan <fan...@intel.com >>> <mailto:fan...@intel.com>> wrote: >>> >>> >>> >>> On 2016/6/2 19:14, Gilles Gouaillardet wrote: >>>> Hi, >>>> >>>> may I ask why you need/want to launch orted manually ? >>> >>> Good question. >>> >>> The intention is to get orted commands, and run orted with Mesos. >>> This all comes from who Mesos works, in essence it offers >>> resources(cpu/memory/ports) >>> in a per host basis to framework, framework then builds information of >>> how to run >>> specific tasks, and pass those information to Mesos master, at last >>> Mesos will >>> instructs hosts to execute the framework tasks. >>> >>> Take MPICH2 as example, the framework to support MPICH2 works as above. >>> 1. framework gets offers from Mesos master, and tells the Mesos master >>> to run a wrapper >>> of MPICH2 proxy(hydra_pmi_proxy), at this time, the wrapper waits for >>> commands to >>> execute the proxy. >>> >>> 2. After launch enough MPICH2 proxy wrapper on hosts as user expect, >>> then run the >>> real mpiexec program with '-launcher manual' to grab commands for the >>> proxy, then >>> pass those commands to the proxy wrapper, so finally the real MPICH2 >>> proxy got launched, >>> and mpiexec will proceed on normally. >>> >>> That's why I'm looking for similar functionality as '-launcher manual >>> MPICH2. >>> Non native speaker, I hope I tell the story clear :) >>> >>> >>> >>>> unless you are running under a batch manager, Open MPI uses the rsh pml >>>> to remotely start orted. >>>> basically, it does >>>> ssh host orted <orted params> >>>> >>>> the best I can suggest is you do >>>> >>>> mpirun --mca orte_rsh_agent myrshagent.sh --mca orte_launch_agent >>>> mylaunchagent.sh ... >>>> under the hood, mpirun will do >>>> myrshagent.sh host mylaunchagent.sh <orted params> >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> On Thursday, June 2, 2016, Du, Fan <fan...@intel.com >>>> <mailto:fan...@intel.com> >>>> <mailto:fan...@intel.com>> wrote: >>>> >>>> Hi folks >>>> >>>> Starting from Open MPI, I can launch mpi application a.out as >>>> following on host1 >>>> mpirun --allow-run-as-root --host host1,host2 -np 4 /tmp/a.out >>>> >>>> On host2, I saw an proxy, say orted here is spawned: >>>> orted --hnp-topo-sig 4N:2S:4L3:20L2:20L1:20C:40H:x86_64 -mca ess env >>>> -mca orte_ess_jobid 1275133952 -mca orte_ess_vpid 1 -mca >>>> orte_ess_num_procs 2 -mca orte_hnp_uri >>>> 1275133952.0;tcp://host1_ip:40024 --tree-spawn -mca plm rsh >>>> --tree-spawn >>>> >>>> It seems mpirun use ssh as launcher on my system. >>>> What if I want to run orted things manually, not by mpirun >>>> automatically, >>>> I mean, does mpirun has any option to produce commands for orted? >>>> >>>> As for MPICH2 implementation, there is "-launcher manual" option to >>>> make this works, >>>> for example: >>>> # mpiexec.hydra -launcher manual -np 4 htop >>>> HYDRA_LAUNCH: /usr/local/bin/hydra_pmi_proxy --control-port >>>> grantleyIPDC04:34652 --rmk user --launcher manual --demux poll >>>> --pgid 0 --retries 10 --usize -2 --proxy-id 0 >>>> HYDRA_LAUNCH_END >>>> >>>> Then I can manually run hydra_pmi_proxy with commands, and >>>> mpiexec.hydra will proceed on. >>>> >>>> Thanks! >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2016/06/29346.php >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2016/06/29347.php >>>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this >>> post:http://www.open-mpi.org/community/lists/users/2016/06/29364.php >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/06/29367.php >> > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/06/29375.php