> On Jun 7, 2016, at 7:17 AM, Du, Fan <fan...@intel.com> wrote: > > > > On 2016/6/6 18:00, Ralph Castain wrote: >> Perhaps it would help if you could give us some idea of the interest >> here? The prior Mesos integration was done as an academic project, which >> is why it died once the student graduated. > > Could you point me the repo of previous work?
Unfortunately, it was never committed and has long since disappeared > The intention is simple, to provide a framework, which can run Open MPI > application > on top of cluster, and it's not an academic project. One of my customer use > OpenMPI, > and they want to deploy Mesos cluster, and move OpenMPI stuff on it, now it's > early > stage of evaluation phase. > > A sample to Mesos framework to support MPICH can be found here: > [1]: https://github.com/apache/mesos/tree/master/mpi <- mpd ring version > [2]:https://github.com/mesosphere/mesos-hydra <- hydra version I can take a look, but cannot promise how soon I can do this - do you have a timeline? > > Another question here: > Application compiled with Open MPI should run ok with MPICH mpirun, because > API > compatible with each MPI implementation, right? IOW, Is application based on > Open MPI > able to run on MPICH environment? I’m afraid not - while the MPI-level functions are standard, the underlying implementation is not compatible > > >> Is there some long-term interest here? Or is this part of an academic >> effort? >> >> >>> On Jun 5, 2016, at 7:22 PM, Ralph Castain <r...@open-mpi.org >>> <mailto:r...@open-mpi.org>> wrote: >>> >>>> >>>> On Jun 5, 2016, at 4:30 PM, Du, Fan <fan...@intel.com >>>> <mailto:fan...@intel.com>> wrote: >>>> >>>> Thanks for your reply! >>>> >>>> On 2016/6/5 3:01, Ralph Castain wrote: >>>>> The closest thing we have to what you describe is the “orte-dvm” - this >>>>> allows one to launch a persistent collection of daemons. You can then >>>>> run your applications against it using “mpiexec -hnp <url>” where the >>>>> url is that of the orte-dvm “head” daemon. >>>> >>>> I tried this, maybe I miss something. >>>> >>>> On host1: >>>> orte-dvm --allow-run-as-root >>>> VMURI: 2783903744.0;tcp://192.168.10.55:47325 >>>> DVM ready >>>> >>>> On host2: >>>> mpiexec -hnp 2783903744.0;tcp://192.168.10.55:47325 >>> >>> Your shell will take the semi-colon to mean the end of the line - you >>> have to enclose it all in quotes >>> >>>> OMPI_MCA_orte_hnp_uri=2783903744.0 >>>> OMPI_MCA_ess=tool >>>> [grantleyIPDC01:03305] [[21695,0],0] ORTE_ERROR_LOG: Bad parameter in >>>> file base/rml_base_contact.c at line 161 >>>> -bash:tcp://192.168.10.55:47325:No such file or directory >>>> >>>> digging the code a bit deeper, the uri expects to have job id and >>>> rank id. >>>> and how to make the subsequent orte-dvm know where the head orte-dvm is? >>>> I checked orte-dvm help, seems no such option there. >>>> >>>>> If I understand you correctly, however, then you would want the orte-dvm >>>>> to assemble itself based on the asynchronous start of the individual >>>>> daemons. In other words, Mesos would start a daemon on each node as that >>>>> node became available. Then, once all the daemons have been started, >>>>> Mesos would execute “mpiexec” to start the application. >>>>> >>>>> Is that correct? >>>> >>>> Yes >>>> >>>>> If so, then we don’t support that mode today, but it could fairly easily >>>>> be added. >>>>> However, I don’t see why you couldn’t just write a small >>>>> standalone tool that collects all the Mesos resources in a file until >>>>> all have been assembled, and then executes “mpiexec -hostfile <myfile>”. >>>> >>>> Because, mpiexec will eventually rely ssh to run mpi proxy on hosts, >>> >>> What’s the problem with that? It’s how many HPC clusters work. Is ssh >>> not enabled? >>> >>>> while >>>> in Mesos, it works like: framework passes information about on which host >>>> to run what commands, and pass such information to Mesos master, then >>>> Mesos >>>> master will instruct hosts to run commands. >>>> >>>> This is where Mesos work module doesn't fit into Open MPI. >>> >>> Easiest thing would be to add a Mesos PLM plugin to OMPI - IIRC, >>> someone once did that, but nobody was interested and so it died >>> >>> >>>> >>>>> Is there some reason this won’t work? It would be much simpler and would >>>>> work with any MPI. >>>>> >>>>> Ralph >>>>> >>>>> >>>>>> On Jun 3, 2016, at 5:10 PM, Du, Fan <fan...@intel.com >>>>>> <mailto:fan...@intel.com> >>>>>> <mailto:fan...@intel.com>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 2016/6/2 19:14, Gilles Gouaillardet wrote: >>>>>>> Hi, >>>>>>> >>>>>>> may I ask why you need/want to launch orted manually ? >>>>>> >>>>>> Good question. >>>>>> >>>>>> The intention is to get orted commands, and run orted with Mesos. >>>>>> This all comes from who Mesos works, in essence it offers >>>>>> resources(cpu/memory/ports) >>>>>> in a per host basis to framework, framework then builds information of >>>>>> how to run >>>>>> specific tasks, and pass those information to Mesos master, at last >>>>>> Mesos will >>>>>> instructs hosts to execute the framework tasks. >>>>>> >>>>>> Take MPICH2 as example, the framework to support MPICH2 works as above. >>>>>> 1. framework gets offers from Mesos master, and tells the Mesos master >>>>>> to run a wrapper >>>>>> of MPICH2 proxy(hydra_pmi_proxy), at this time, the wrapper waits for >>>>>> commands to >>>>>> execute the proxy. >>>>>> >>>>>> 2. After launch enough MPICH2 proxy wrapper on hosts as user expect, >>>>>> then run the >>>>>> real mpiexec program with '-launcher manual' to grab commands for the >>>>>> proxy, then >>>>>> pass those commands to the proxy wrapper, so finally the real MPICH2 >>>>>> proxy got launched, >>>>>> and mpiexec will proceed on normally. >>>>>> >>>>>> That's why I'm looking for similar functionality as '-launcher manual >>>>>> MPICH2. >>>>>> Non native speaker, I hope I tell the story clear :) >>>>>> >>>>>> >>>>>> >>>>>>> unless you are running under a batch manager, Open MPI uses the >>>>>>> rsh pml >>>>>>> to remotely start orted. >>>>>>> basically, it does >>>>>>> ssh host orted <orted params> >>>>>>> >>>>>>> the best I can suggest is you do >>>>>>> >>>>>>> mpirun --mca orte_rsh_agent myrshagent.sh --mca orte_launch_agent >>>>>>> mylaunchagent.sh ... >>>>>>> under the hood, mpirun will do >>>>>>> myrshagent.sh host mylaunchagent.sh <orted params> >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Gilles >>>>>>> >>>>>>> On Thursday, June 2, 2016, Du, Fan <fan...@intel.com >>>>>>> <mailto:fan...@intel.com> >>>>>>> <mailto:fan...@intel.com> >>>>>>> <mailto:fan...@intel.com>> wrote: >>>>>>> >>>>>>> Hi folks >>>>>>> >>>>>>> Starting from Open MPI, I can launch mpi application a.out as >>>>>>> following on host1 >>>>>>> mpirun --allow-run-as-root --host host1,host2 -np 4 /tmp/a.out >>>>>>> >>>>>>> On host2, I saw an proxy, say orted here is spawned: >>>>>>> orted --hnp-topo-sig 4N:2S:4L3:20L2:20L1:20C:40H:x86_64 -mca ess env >>>>>>> -mca orte_ess_jobid 1275133952 -mca orte_ess_vpid 1 -mca >>>>>>> orte_ess_num_procs 2 -mca orte_hnp_uri >>>>>>> 1275133952.0;tcp://host1_ip:40024--tree-spawn -mca plm rsh >>>>>>> --tree-spawn >>>>>>> >>>>>>> It seems mpirun use ssh as launcher on my system. >>>>>>> What if I want to run orted things manually, not by mpirun >>>>>>> automatically, >>>>>>> I mean, does mpirun has any option to produce commands for orted? >>>>>>> >>>>>>> As for MPICH2 implementation, there is "-launcher manual" option to >>>>>>> make this works, >>>>>>> for example: >>>>>>> # mpiexec.hydra -launcher manual -np 4 htop >>>>>>> HYDRA_LAUNCH: /usr/local/bin/hydra_pmi_proxy --control-port >>>>>>> grantleyIPDC04:34652 --rmk user --launcher manual --demux poll >>>>>>> --pgid 0 --retries 10 --usize -2 --proxy-id 0 >>>>>>> HYDRA_LAUNCH_END >>>>>>> >>>>>>> Then I can manually run hydra_pmi_proxy with commands, and >>>>>>> mpiexec.hydra will proceed on. >>>>>>> >>>>>>> Thanks! >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> <mailto:us...@open-mpi.org><mailto:us...@open-mpi.org> >>>>>>> Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2016/06/29346.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>> <mailto:us...@open-mpi.org> >>>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2016/06/29347.php >>>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> <mailto:us...@open-mpi.org><mailto:us...@open-mpi.org> >>>>>> Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>> Link to this >>>>>> post:http://www.open-mpi.org/community/lists/users/2016/06/29364.php >>>>>> <http://www.open-mpi.org/community/lists/users/2016/06/29364.php> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2016/06/29367.php >>>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2016/06/29375.php >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/06/29379.php >> > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/06/29389.php