Hi Lohit,

There are basically three main options here:

1) Symlinks. As you suggested, you could have one of the namespaces have
top-levels cross-filesystem symlinks to the other explicit namespaces in
your cluster. The downside of this is that currently symlinks are not well
supported by the FileSystem API, so you may run into serious issues using
it with MR applications.

2) Explicitly reference individual namespaces: this is basically separate
HDFS clusters which share a pool of datanodes. If you are using namespaces
to separate entirely separate applications, then the different apps would
just reference their own namenodes with no knowledge that the storage
underneath is pooled. Of course you may run a job which has input and
output on different namesystems, and that's completely fine.

3) Use viewfs (client side mount tables). This is essentially a client-side
mapping of viewfs paths to the other namenodes.

Hope that helps

-Todd

On Mon, Oct 29, 2012 at 4:14 PM, lohit <lohit.vijayar...@gmail.com> wrote:

> Hi Devs,
>
> I am trying to understand about cluster setup with Federated NameNodes and
> YARN (or MR1) on top of it specifically.
> From federation documentation (
>
> http://hadoop.apache.org/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html
> )
> I can see how each namenode will have its own namespace.
> Can somebody help me understand how would YARN work on this. If I have 2
> NameServices, how would YARN work with both of them. YARN or clients would
> look at fs.defaultFS (which point to one NameService) to resolve to DFS,
> right? Is the setup something like YARN and others would connect to one
> nameservices (call it top level name service) and admins would setup
> symlinks from different nameservcies to this top level name service?
>
> Thanks,
> Lohit
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to