Re: Contributing to HDFS - Distributed Computing

Dhruba Borthakur Tue, 01 Sep 2009 10:38:38 -0700

Hi Brian,

That is a good idea. Other block placement algorithms to try (using
HDFS-385) would be place blocks using heat-map-topology of a data center, or
using a dynamic network topology (based on network performance instead of
the static network topology that HDFS currently uses), simulate a a new
network topology that reduces cost of costly network switches in the data
center, etc.etc


thanks,
dhruba


On Tue, Sep 1, 2009 at 5:50 AM, Brian Bockelman <bbock...@cse.unl.edu>wrote:

> Hey all,
>
> One place which would be an exceptionally good research project is the new
> pluggable interface for replica placement.
>
> https://issues.apache.org/jira/browse/HDFS-385
>
> It's something which taps into many lines of CS research (such as
> scheduling) and is meant to be experimental for a release or two.  I think
> if you could come up with a few example placement policies, it would help
> Dhruba refine the interface.  Because it's only developing a plug-in, the
> barrier of entry is much lower than core FS features.
>
> To get you started, one problem we've seen is the multi-datacenter problem.
>  How do you allocate blocks when there is a wildly heterogeneous network
> topology (such as an HDFS instance spread between two centers with only
> 10Gbps between the two)?  How do your scheduling decisions affect the
> performance of MapReduce jobs?  How do you balance good performance with
> maximum resiliency (placing copies of blocks in two separate buildings)?
>
> Brian
>
>
> On Sep 1, 2009, at 5:28 AM, Steve Loughran wrote:
>
>  Hrishikesh Mantri wrote:
>>
>>> Hi All.
>>> I am Masters student in CS . We are a group of two and are looking for
>>> adding some additional features to the HDFS as a part of the Distributed
>>> Computing course project . Can someone please provide us with pointers as in
>>> which direction we should take so that it can benefit the Hadoop community
>>>  ?
>>> Regards,
>>> Hrishi
>>>
>>
>> I have some thoughts here :
>> http://www.slideshare.net/steve_l/hadoop-and-universities
>>
>> * I would recommend steering clear of the big HA problem because while it
>> is the big issue with HDFS, it's the one where someone may set an entire
>> engineering team up to to solving, at which point your work is going to have
>> a hard time surviving.
>>
>> * It might also be interesting to find some potential in-university users
>> of Hadoop, and work on their use cases.
>>
>> * What's your timescale, location? It would be good if there were other
>> Hadoop developers locally, to give you a bit of in-apache mentorship
>>
>> * Don't forget the tests. Apache code is very test centric. One key
>> benefit of working with an OSS project is your code gets used,  but it does
>> mean you need to embrace the community's test/development process, which
>> means junit tests for everything.
>>
>>
>>
>> -Steve
>>
>
>

Re: Contributing to HDFS - Distributed Computing

Reply via email to