Why would you want to take a perfectly good machine and then try to virtualize 
it?
I mean if I have 4 quad core cpus, I can run a lot of simultaneous map tasks. 
However if I virtualize the box, I lose at least 1 core per VM so I end up with 
4 nodes that have less capabilities and performance than I would have under my 
original box....


-----Original Message-----
From: Saikat Kanjilal [mailto:sxk1...@hotmail.com]
Sent: Friday, September 09, 2011 10:59 AM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org
Subject: Research projects for hadoop


Hi  Folks,I was looking through the following wiki page:  
http://wiki.apache.org/hadoop/HadoopResearchProjects and was wondering if 
there's been any work done (or any interest to do work) for the following 
topics:
Integration of Virtualization (such as Xen) with Hadoop toolsHow does one 
integrate sandboxing of arbitrary user code in C++ and other languages in a VM 
such as Xen with the Hadoop framework? How does this interact with SGE, Torque, 
Condor?As each individual machine has more and more cores/cpus, it makes sense 
to partition each machine into multiple virtual machines. That gives us a 
number of benefits:By assigning a virtual machine to a datanode, we effectively 
isolate the datanode from the load on the machine caused by other processes, 
making the datanode more responsive/reliable.With multiple virtual machines on 
each machine, we can lower the granularity of hod scheduling units, making it 
possible to schedule multiple tasktrackers on the same machine, improving the 
overall utilization of the whole clusters.With virtualization, we can easily 
snapshot a virtual cluster before releasing it, making it possible to 
re-activate the same cluster in the future and start to work from the 
snapshot.Provisioning of long running Services via HODWork on a computation 
model for services on the grid. The model would include:Various tools for 
defining clients and servers of the service, and at the least a C++ and Java 
instantiation of the abstractionsLogical definitions of how to partition work 
onto a set of servers, i.e. a generalized shard implementationA few useful 
abstractions like locks (exclusive and RW, fairness), leader election, 
transactions,Various communication models for groups of servers belonging to a 
service, such as broadcast, unicast, etc.Tools for assuring QoS, reliability, 
managing pools of servers for a service with spares, etc.Integration with HDFS 
for persistence, as well as access to local filesystemsIntegration with 
ZooKeeper so that applications can use the namespace I would like to either 
help out with a design for the above or prototyping code, please let me know if 
and what the process may be to move forward with this.
Regards

The information contained in this communication may be CONFIDENTIAL and is 
intended only for the use of the recipient(s) named above.  If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this communication, or any of its contents, is 
strictly prohibited.  If you have received this communication in error, please 
notify the sender and delete/destroy the original message and any copy of it 
from your computer or paper files.

Reply via email to