On 3/1/10 8:48 AM, "Ketan Dixit" <ketan.di...@gmail.com> wrote:
> How LSH is better than normal hashing? Because still, a client or a fixed
> namenode has to take decision of which namenode to contact in whatever
> hashing ? It looks to me that requests to files under same subtree are
> directed to the same namenode then the performance will be faster as the
> requests to the same namenode are clustered around the a part of namespace
> subtree
> (For example a part of on which client is doing some operation.) Is this
> assumption correct? Can I have more insight in this regard.
IIRC, the thought process was this was a scalability feature, not being done
for performance. There is a general reluctance by the HDFS dev's to only
store hot file metadata structures in memory. So in order to prevent the
JVM's heap size from spiral out of control, using separate name spaces
allows you to divide and conquer.
With symlinks , this feature is essentially a solved problem. The 'who is
the decision maker' issue is now the client's to resolve. As an added
bonus, because it is URI based, the client may get pushed off to a
completely different service. [This is definitely a feature--just think,
you can store really hot files on the local file system, completely
bypassing the overhead that HDFS incurs.]