MPI won't do this - if a node dies, the entire MPI job is terminated.

Take a look at OpenRCM, a subproject of Open MPI:

http://www.open-mpi.org/projects/orcm/

This is designed to do what you describe as we have a similar (open source) 
project underway at Cisco. If I were writing your system, I would:

(a) add my sensors to the orte/mca/sensor framework. You'll find that we 
already monitor memory usage, for example. Use the orte/mca/db framework to 
store your data in a database. Several different databases are already 
supported, though it is easy to add another if you want (e.g., sqlite support).

(b) add my desired error response to the src/orte/mca/errmgr/orcm module. The 
ability to migrate processes is already implemented, but you may need to do 
something additional to migrate a VM. If you prefer, you can create your own 
module in that area and use one of the other components as an example.

Then let orcm start its daemons across your nodes. Orcm daemons will do the 
monitoring and reporting for you, and will start and monitor the virtual 
machines. If you set the max local restarts to 0, and max global restarts to 
some number, the system will automatically migrate any failures to other nodes.

See the June 2010 presentation under "Publications" on the web page above for 
an overview of how it all works. If you decide to go this route, I'll be happy 
to provide advice and further explanation. And of course, you are welcome to 
participate in ORCM if you choose.

Ralph

On Oct 22, 2010, at 6:09 AM, Vasiliy G Tolstov wrote:

> On Fri, 2010-10-22 at 14:07 +0200, Reuti wrote:
>> Hi,
>> 
>> Am 22.10.2010 um 10:58 schrieb Vasiliy G Tolstov:
>> 
>>> Hello. May be this question already answered, but i can't see it in list
>>> archive.
>>> 
>>> I'm running about 60 Xen nodes with about 7-20 virtual machines under
>>> it. I want to gather disk,cpu,memory,network utilisation from virtual
>>> machines and get it into database for later processing.
>>> 
>>> As i see, my architecture like this - One or two master servers with mpi
>>> process with rank 0, that can insert data into database. This master
>>> servers spawns on each Xen node mpi process, that gather statistics from
>>> virtual machines on that node and send it to masters (may be with
>>> multicast request). On each virtual machine i have process (mpi) that
>>> can get and send data to mpi process on each Xen node. Virtual machine
>>> have ability to migrate on other Xen node....
>> 
>> do you want just to monitor the physical and virtual machines by an 
>> application running under MPI? It sounds like it could be done by Ganglia or 
>> Nagios then.
> 
> No.. I want to get realtime data to decide what virtual machine i need
> to migrate to other Xen, becouse it need more resources.
> 
> 
> -- 
> Vasiliy G Tolstov <v.tols...@selfip.ru>
> Selfip.Ru
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to