02, 2010 10:21 AM
Subject: Re: Namespace partitioning using Locality Sensitive Hashing
> Symlinks is a brand new feature in HDFS.
> You can read about it in
> https://issues.apache.org/jira/browse/HDFS-245
> Documentation is here:
> https://issues.apache.org/jira/secure/attachment/1243
Folks,
I am still wondering why would an LSH (Locality Sensitive Hashing)
based partitioning scheme provide better scalability than a normal
cryptographic hash scheme. Is there a chance that LSH will offer
better performance than a normal one?
Best,
Ketan
On Mon, Mar 1, 2010 at 9:15 PM, Eli Coll
Hey Brian,
Great points. Agree that federating a set of file systems via symlinks
doesn't solve the general problem of scaling a namespace.
Imagine GFS' "Name Spaces" was mostly useful for systems that grew w/o
much need for rebalancing, eg log storage.
Thanks,
Eli
On Mon, Mar 1, 2010 at 6:31 PM
Hey Eli,
From past experience, static, manual namespace partitioning can really get you
in trouble - you have to manually keep things balanced.
The following things can go wrong:
1) One of your pesky users grows unexpectedly by a factor of 10.
2) Your entire system grew so much that there's no
Symlinks is a brand new feature in HDFS.
You can read about it in
https://issues.apache.org/jira/browse/HDFS-245
Documentation is here:
https://issues.apache.org/jira/secure/attachment/12434745/design-doc-v4.txt
Symbolic links in HDFS can point to a directory in a different file system,
particula
On Mon, Mar 1, 2010 at 5:42 PM, Ketan Dixit wrote:
> Hello,
> Thank you Konstantin and Allen for your reply. The information
> provided really helped to improve my understanding.
> However I still have few questions.
> How Symlinks/ soft links are used to solve the probem of partitioning.
> (Wher
Hello,
Thank you Konstantin and Allen for your reply. The information
provided really helped to improve my understanding.
However I still have few questions.
How Symlinks/ soft links are used to solve the probem of partitioning.
(Where do the symlinks point to? All the mapping is
stored in memory
Hi Ketan,
AFAIU, hashing is used to map files and directories into different name-nodes.
Suppose you use a simple hash function on a file path h(path), and that files
with the same hash value (or within a hash range) are mapped to the same
name-node.
Then files with the same parent will be rando
On 3/1/10 8:48 AM, "Ketan Dixit" wrote:
> How LSH is better than normal hashing? Because still, a client or a fixed
> namenode has to take decision of which namenode to contact in whatever
> hashing ? It looks to me that requests to files under same subtree are
> directed to the same namenode
Hi,
I am a graduate student in Computer Science department at SUNY Stony Brook.
I am thinking of doing a project on Hadoop for my course "Cloud Computing"
conducted by Prof. Radu Sion.
While going through the links of the "Yahoo open source projects for
students" page I found the idea
"Research o
10 matches
Mail list logo