> On Jun 7, 2016, at 7:17 AM, Du, Fan <fan...@intel.com> wrote:
> 
> 
> 
> On 2016/6/6 18:00, Ralph Castain wrote:
>> Perhaps it would help if you could give us some idea of the interest
>> here? The prior Mesos integration was done as an academic project, which
>> is why it died once the student graduated.
> 
> Could you point me the repo of previous work?

Unfortunately, it was never committed and has long since disappeared

> The intention is simple, to provide a framework, which can run Open MPI 
> application
> on top of cluster, and it's not an academic project. One of my customer use 
> OpenMPI,
> and they want to deploy Mesos cluster, and move OpenMPI stuff on it, now it's 
> early
> stage of evaluation phase.
> 
> A sample to Mesos framework to support MPICH can be found here:
> [1]: https://github.com/apache/mesos/tree/master/mpi   <- mpd ring version
> [2]:https://github.com/mesosphere/mesos-hydra         <- hydra version

I can take a look, but cannot promise how soon I can do this - do you have a 
timeline?

> 
> Another question here:
> Application compiled with Open MPI should run ok with MPICH mpirun, because 
> API
> compatible with each MPI implementation, right? IOW, Is application based on 
> Open MPI
> able to run on MPICH environment?

I’m afraid not - while the MPI-level functions are standard, the underlying 
implementation is not compatible

> 
> 
>> Is there some long-term interest here? Or is this part of an academic
>> effort?
>> 
>> 
>>> On Jun 5, 2016, at 7:22 PM, Ralph Castain <r...@open-mpi.org
>>> <mailto:r...@open-mpi.org>> wrote:
>>> 
>>>> 
>>>> On Jun 5, 2016, at 4:30 PM, Du, Fan <fan...@intel.com
>>>> <mailto:fan...@intel.com>> wrote:
>>>> 
>>>> Thanks for your reply!
>>>> 
>>>> On 2016/6/5 3:01, Ralph Castain wrote:
>>>>> The closest thing we have to what you describe is the “orte-dvm” - this
>>>>> allows one to launch a persistent collection of daemons. You can then
>>>>> run your applications against it using “mpiexec -hnp <url>” where the
>>>>> url is that of the orte-dvm “head” daemon.
>>>> 
>>>> I tried this, maybe I miss something.
>>>> 
>>>> On host1:
>>>> orte-dvm  --allow-run-as-root
>>>> VMURI: 2783903744.0;tcp://192.168.10.55:47325
>>>> DVM ready
>>>> 
>>>> On host2:
>>>> mpiexec -hnp 2783903744.0;tcp://192.168.10.55:47325
>>> 
>>> Your shell will take the semi-colon to mean the end of the line - you
>>> have to enclose it all in quotes
>>> 
>>>> OMPI_MCA_orte_hnp_uri=2783903744.0
>>>> OMPI_MCA_ess=tool
>>>> [grantleyIPDC01:03305] [[21695,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>> file base/rml_base_contact.c at line 161
>>>> -bash:tcp://192.168.10.55:47325:No such file or directory
>>>> 
>>>> digging the code a bit deeper, the uri expects to have job id and
>>>> rank id.
>>>> and how to make the subsequent orte-dvm know where the head orte-dvm is?
>>>> I checked orte-dvm help, seems no such option there.
>>>> 
>>>>> If I understand you correctly, however, then you would want the orte-dvm
>>>>> to assemble itself based on the asynchronous start of the individual
>>>>> daemons. In other words, Mesos would start a daemon on each node as that
>>>>> node became available. Then, once all the daemons have been started,
>>>>> Mesos would execute “mpiexec” to start the application.
>>>>> 
>>>>> Is that correct?
>>>> 
>>>> Yes
>>>> 
>>>>> If so, then we don’t support that mode today, but it could fairly easily
>>>>> be added.
>>>>> However, I don’t see why you couldn’t just write a small
>>>>> standalone tool that collects all the Mesos resources in a file until
>>>>> all have been assembled, and then executes “mpiexec -hostfile <myfile>”.
>>>> 
>>>> Because, mpiexec will eventually rely ssh to run mpi proxy on hosts,
>>> 
>>> What’s the problem with that? It’s how many HPC clusters work. Is ssh
>>> not enabled?
>>> 
>>>> while
>>>> in Mesos, it works like: framework passes information about on which host
>>>> to run what commands, and pass such information to Mesos master, then
>>>> Mesos
>>>> master will instruct hosts to run commands.
>>>> 
>>>> This is where Mesos work module doesn't fit into Open MPI.
>>> 
>>> Easiest thing would be to add a Mesos PLM plugin to OMPI - IIRC,
>>> someone once did that, but nobody was interested and so it died
>>> 
>>> 
>>>> 
>>>>> Is there some reason this won’t work? It would be much simpler and would
>>>>> work with any MPI.
>>>>> 
>>>>> Ralph
>>>>> 
>>>>> 
>>>>>> On Jun 3, 2016, at 5:10 PM, Du, Fan <fan...@intel.com
>>>>>> <mailto:fan...@intel.com>
>>>>>> <mailto:fan...@intel.com>> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 2016/6/2 19:14, Gilles Gouaillardet wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> may I ask why you need/want to launch orted manually ?
>>>>>> 
>>>>>> Good question.
>>>>>> 
>>>>>> The intention is to get orted commands, and run orted with Mesos.
>>>>>> This all comes from who Mesos works, in essence it offers
>>>>>> resources(cpu/memory/ports)
>>>>>> in a per host basis to framework, framework then builds information of
>>>>>> how to run
>>>>>> specific tasks, and pass those information to Mesos master, at last
>>>>>> Mesos will
>>>>>> instructs hosts to execute the framework tasks.
>>>>>> 
>>>>>> Take MPICH2 as example, the framework to support MPICH2 works as above.
>>>>>> 1. framework gets offers from Mesos master, and tells the Mesos master
>>>>>> to run a wrapper
>>>>>> of MPICH2 proxy(hydra_pmi_proxy), at this time, the wrapper waits for
>>>>>> commands to
>>>>>> execute the proxy.
>>>>>> 
>>>>>> 2. After launch enough MPICH2 proxy wrapper on hosts as user expect,
>>>>>> then run the
>>>>>> real mpiexec program with '-launcher manual' to grab commands for the
>>>>>> proxy, then
>>>>>> pass those commands to the proxy wrapper, so finally the real MPICH2
>>>>>> proxy got launched,
>>>>>> and mpiexec will proceed on normally.
>>>>>> 
>>>>>> That's why I'm looking for similar functionality as '-launcher manual
>>>>>> MPICH2.
>>>>>> Non native speaker, I hope I tell the story clear :)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> unless you are running under a batch manager, Open MPI uses the
>>>>>>> rsh pml
>>>>>>> to remotely start orted.
>>>>>>> basically, it does
>>>>>>> ssh host orted <orted params>
>>>>>>> 
>>>>>>> the best I can suggest is you do
>>>>>>> 
>>>>>>> mpirun --mca orte_rsh_agent myrshagent.sh --mca orte_launch_agent
>>>>>>> mylaunchagent.sh  ...
>>>>>>> under the hood, mpirun will do
>>>>>>> myrshagent.sh host mylaunchagent.sh <orted params>
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> Gilles
>>>>>>> 
>>>>>>> On Thursday, June 2, 2016, Du, Fan <fan...@intel.com
>>>>>>> <mailto:fan...@intel.com>
>>>>>>> <mailto:fan...@intel.com>
>>>>>>> <mailto:fan...@intel.com>> wrote:
>>>>>>> 
>>>>>>>  Hi folks
>>>>>>> 
>>>>>>>  Starting from Open MPI, I can launch mpi application a.out as
>>>>>>>  following on host1
>>>>>>>  mpirun --allow-run-as-root --host host1,host2 -np 4 /tmp/a.out
>>>>>>> 
>>>>>>>  On host2, I saw an proxy, say orted here is spawned:
>>>>>>>  orted --hnp-topo-sig 4N:2S:4L3:20L2:20L1:20C:40H:x86_64 -mca ess env
>>>>>>>  -mca orte_ess_jobid 1275133952 -mca orte_ess_vpid 1 -mca
>>>>>>>  orte_ess_num_procs 2 -mca orte_hnp_uri
>>>>>>>  1275133952.0;tcp://host1_ip:40024--tree-spawn -mca plm rsh
>>>>>>> --tree-spawn
>>>>>>> 
>>>>>>>  It seems mpirun use ssh as launcher on my system.
>>>>>>>  What if I want to run orted things manually, not by mpirun
>>>>>>>  automatically,
>>>>>>>  I mean, does mpirun has any option to produce commands for orted?
>>>>>>> 
>>>>>>>  As for MPICH2 implementation, there is "-launcher manual" option to
>>>>>>>  make this works,
>>>>>>>  for example:
>>>>>>>  # mpiexec.hydra -launcher manual -np 4 htop
>>>>>>>  HYDRA_LAUNCH: /usr/local/bin/hydra_pmi_proxy --control-port
>>>>>>>  grantleyIPDC04:34652 --rmk user --launcher manual --demux poll
>>>>>>>  --pgid 0 --retries 10 --usize -2 --proxy-id 0
>>>>>>>  HYDRA_LAUNCH_END
>>>>>>> 
>>>>>>>  Then I can manually run hydra_pmi_proxy with commands, and
>>>>>>>  mpiexec.hydra will proceed on.
>>>>>>> 
>>>>>>>  Thanks!
>>>>>>>  _______________________________________________
>>>>>>>  users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> <mailto:us...@open-mpi.org><mailto:us...@open-mpi.org>
>>>>>>>  Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>  Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/users/2016/06/29346.php
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>>>> <mailto:us...@open-mpi.org>
>>>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/users/2016/06/29347.php
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> <mailto:us...@open-mpi.org><mailto:us...@open-mpi.org>
>>>>>> Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>>> Link to this
>>>>>> post:http://www.open-mpi.org/community/lists/users/2016/06/29364.php 
>>>>>> <http://www.open-mpi.org/community/lists/users/2016/06/29364.php>
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2016/06/29367.php
>>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2016/06/29375.php
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/06/29379.php
>> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29389.php

Reply via email to