> Or do you plan to have different data across sites and then run MR jobs
> across them? This would be an interesting problem, but its way above the FS.

MAPREDUCE-4502 relates to this problem. Please check it out if you
have interests.
https://issues.apache.org/jira/browse/MAPREDUCE-4502

- Tsuyoshi

On Sat, Sep 22, 2012 at 5:39 PM, Steve Loughran <ste...@hortonworks.com> wrote:
> On 11 September 2012 00:29, Sujee Maniyam <su...@sujee.net> wrote:
>
>> HI devs
>> now that hfds HA is is a reality,  how about HDFS spanning multiple
>> data centers?  Are there any discussions / work going on in this area?
>>
>> It could be a single cluster spanning multiple data centers or having
>> a 'standby cluster' in another data center.
>>
>> curious, and thanks for your time!
>>
>> regards
>> Sujee Maniyam
>> http://sujee.net
>
>
> what are your goals here?
>
>    - store 1 of the 3 replicas off-site for (possible) recovery on a site
>    failure
>    - store 2+ replicas on each site for better recovery of site+block
>    failure
>    - be able to back up all of the data to a different site
>    - be able to back up some the data to a different site
>    - stream the metadata/NN log to a remote site (you could get away with
>    that today
>
> Or do you plan to have different data across sites and then run MR jobs
> across them? This would be an interesting problem, but its way above the FS.
>
> There's still a lot of work that could be done for single-site failure
> tolerance, in particular
> -better failure topology awareness, if you run the site on two external
> power supplies -as telcos do- then you want at least one copy on each power
> source
> -better partition failure awareness -differentiate "loss of rack"
> differently from "all the machines on  rack have stopped reporting in",
> which is how it is treated today,
>
> -steve



-- 
OZAWA Tsuyoshi

Reply via email to