Thanks a lot, all! An end goal of mine was to make Hadoop as flexible as possible. Along the same lines, but unrelated to the above idea, was another I encountered, courtesy http://hadoopblog.blogspot.com/2010/11/hadoop-research-topics.html
The blog mentions the ability to dynamically append Input. Specifically, can I append input to the Map and Reduce tasks after they've been started? I haven't been able to find something like this at a precursory glance, but could someone advice me on this before I dig deeper? 1. Does such functionality exist, or is it being attempted? 2. I would assume most cases would simply require starting a second Job for the new input. However, are there practical use cases to such a feature? 3. Are there any other ideas on such "flexibility" of the system that I could contribute to? Thanks again for your help! On 14 September 2011 17:24, Arun C Murthy <a...@hortonworks.com> wrote: > > On Sep 14, 2011, at 1:27 PM, Bharath Ravi wrote: > > > Hi all, > > > > I'm a newcomer to Hadoop development, and I'm planning to work on an idea > > that I wanted to run by the dev community. > > > > My apologies if this is not the right place to post this. > > > > Amazon has an "Elastic MapReduce" Service ( > > http://aws.amazon.com/elasticmapreduce/) that runs on Hadoop. > > The service allows dynamic/runtime changes in resource allocation: more > > specifically, varying the number of > > compute nodes that a job is running on. > > > > I was wondering if such a facility could be added to the publicly > available > > Hadoop MapReduce. > > From a long while you can bring up either DataNodes or TaskTrackers and > point them (via config) to the NameNode/JobTracker and they will be part of > the cluster. > > Similarly you can just kill the DataNode or TaskTracker and the respective > masters will deal with their loss. > > Arun > > -- Bharath Ravi