Hi Brian, That is a good idea. Other block placement algorithms to try (using HDFS-385) would be place blocks using heat-map-topology of a data center, or using a dynamic network topology (based on network performance instead of the static network topology that HDFS currently uses), simulate a a new network topology that reduces cost of costly network switches in the data center, etc.etc
thanks, dhruba On Tue, Sep 1, 2009 at 5:50 AM, Brian Bockelman <bbock...@cse.unl.edu>wrote: > Hey all, > > One place which would be an exceptionally good research project is the new > pluggable interface for replica placement. > > https://issues.apache.org/jira/browse/HDFS-385 > > It's something which taps into many lines of CS research (such as > scheduling) and is meant to be experimental for a release or two. I think > if you could come up with a few example placement policies, it would help > Dhruba refine the interface. Because it's only developing a plug-in, the > barrier of entry is much lower than core FS features. > > To get you started, one problem we've seen is the multi-datacenter problem. > How do you allocate blocks when there is a wildly heterogeneous network > topology (such as an HDFS instance spread between two centers with only > 10Gbps between the two)? How do your scheduling decisions affect the > performance of MapReduce jobs? How do you balance good performance with > maximum resiliency (placing copies of blocks in two separate buildings)? > > Brian > > > On Sep 1, 2009, at 5:28 AM, Steve Loughran wrote: > > Hrishikesh Mantri wrote: >> >>> Hi All. >>> I am Masters student in CS . We are a group of two and are looking for >>> adding some additional features to the HDFS as a part of the Distributed >>> Computing course project . Can someone please provide us with pointers as in >>> which direction we should take so that it can benefit the Hadoop community >>> ? >>> Regards, >>> Hrishi >>> >> >> I have some thoughts here : >> http://www.slideshare.net/steve_l/hadoop-and-universities >> >> * I would recommend steering clear of the big HA problem because while it >> is the big issue with HDFS, it's the one where someone may set an entire >> engineering team up to to solving, at which point your work is going to have >> a hard time surviving. >> >> * It might also be interesting to find some potential in-university users >> of Hadoop, and work on their use cases. >> >> * What's your timescale, location? It would be good if there were other >> Hadoop developers locally, to give you a bit of in-apache mentorship >> >> * Don't forget the tests. Apache code is very test centric. One key >> benefit of working with an OSS project is your code gets used, but it does >> mean you need to embrace the community's test/development process, which >> means junit tests for everything. >> >> >> >> -Steve >> > >