Hi Lohit, There are basically three main options here:
1) Symlinks. As you suggested, you could have one of the namespaces have top-levels cross-filesystem symlinks to the other explicit namespaces in your cluster. The downside of this is that currently symlinks are not well supported by the FileSystem API, so you may run into serious issues using it with MR applications. 2) Explicitly reference individual namespaces: this is basically separate HDFS clusters which share a pool of datanodes. If you are using namespaces to separate entirely separate applications, then the different apps would just reference their own namenodes with no knowledge that the storage underneath is pooled. Of course you may run a job which has input and output on different namesystems, and that's completely fine. 3) Use viewfs (client side mount tables). This is essentially a client-side mapping of viewfs paths to the other namenodes. Hope that helps -Todd On Mon, Oct 29, 2012 at 4:14 PM, lohit <lohit.vijayar...@gmail.com> wrote: > Hi Devs, > > I am trying to understand about cluster setup with Federated NameNodes and > YARN (or MR1) on top of it specifically. > From federation documentation ( > > http://hadoop.apache.org/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html > ) > I can see how each namenode will have its own namespace. > Can somebody help me understand how would YARN work on this. If I have 2 > NameServices, how would YARN work with both of them. YARN or clients would > look at fs.defaultFS (which point to one NameService) to resolve to DFS, > right? Is the setup something like YARN and others would connect to one > nameservices (call it top level name service) and admins would setup > symlinks from different nameservcies to this top level name service? > > Thanks, > Lohit > -- Todd Lipcon Software Engineer, Cloudera