Hey Eli,

From past experience, static, manual namespace partitioning can really get you 
in trouble - you have to manually keep things balanced.

The following things can go wrong:

1) One of your pesky users grows unexpectedly by a factor of 10.
2) Your entire system grew so much that there's not enough excess capacity to 
split and balance the cluster into new pieces - the extra bandwidth required 
would drive down production performance too much (or you need downtime to do it 
and can't afford the downtime).
3) Your production system began as a proof of concept, and your file name 
system makes it hard to split in a sane manner because you never planned on 
splitting the proof of concept in the first place!

Any one of these can be solved with enough effort, but it can require a huge 
amount of effort if you don't realize things soon enough!  In fact, I seem to 
remember a ACM Queue article with the original Google authors who cited 
explosive application growth as one reason that manual balancing quickly fell 
out of favor.

I wouldn't deny that symlinks are an incredible tool to fight namespace growth 
- but it's not a 100% solution.

That said, I'm looking forward to symlinks to solve a few local problems!

Brian

On Mar 1, 2010, at 8:15 PM, Eli Collins wrote:

> On Mon, Mar 1, 2010 at 5:42 PM, Ketan Dixit <ketan.di...@gmail.com> wrote:
>> Hello,
>> Thank you Konstantin and  Allen for your reply. The information
>> provided really helped to improve my understanding.
>> However I still have few questions.
>> How Symlinks/ soft links are used to solve the probem of partitioning.
>> (Where do the symlinks point to? All the mapping is
>> stored in memory but symlinks point to file objects? This is little
>> confusing to me)
>> Can you please provide insight into this?
> 
> The idea is to use symlinks to present a single namespace to clients
> that is backed by multiple file systems (hdfs or other supported
> hadoop file systems). Eg a "root" HDFS file system could contain links
> to other file systems, eg /dir1 could point to S3, /dir2 could point
> to a local file system, /dir3 could point to another HDFS file system,
> etc. Clients always contact the "root" HDFS file system but are
> transparently redirected to other file systems by symlinks. This way a
> single namespace is partitioned across multiple file systems, but the
> client only needs to know about the root file system. This
> partitioning is static (you have to establish the symlinks), though
> you can grow on the fly by adding file systems and links that point to
> them.
> 
> Thanks,
> Eli

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to