Re: Namespace partitioning using Locality Sensitive Hashing

2010-03-02 Thread springring
02, 2010 10:21 AM Subject: Re: Namespace partitioning using Locality Sensitive Hashing > Symlinks is a brand new feature in HDFS. > You can read about it in > https://issues.apache.org/jira/browse/HDFS-245 > Documentation is here: > https://issues.apache.org/jira/secure/attachment/1243

Re: Namespace partitioning using Locality Sensitive Hashing

2010-03-01 Thread Ketan Dixit
Folks, I am still wondering why would an LSH (Locality Sensitive Hashing) based partitioning scheme provide better scalability than a normal cryptographic hash scheme. Is there a chance that LSH will offer better performance than a normal one? Best, Ketan On Mon, Mar 1, 2010 at 9:15 PM, Eli Coll

Re: Namespace partitioning using Locality Sensitive Hashing

2010-03-01 Thread Eli Collins
Hey Brian, Great points. Agree that federating a set of file systems via symlinks doesn't solve the general problem of scaling a namespace. Imagine GFS' "Name Spaces" was mostly useful for systems that grew w/o much need for rebalancing, eg log storage. Thanks, Eli On Mon, Mar 1, 2010 at 6:31 PM

Re: Namespace partitioning using Locality Sensitive Hashing

2010-03-01 Thread Brian Bockelman
Hey Eli, From past experience, static, manual namespace partitioning can really get you in trouble - you have to manually keep things balanced. The following things can go wrong: 1) One of your pesky users grows unexpectedly by a factor of 10. 2) Your entire system grew so much that there's no

Re: Namespace partitioning using Locality Sensitive Hashing

2010-03-01 Thread Konstantin Shvachko
Symlinks is a brand new feature in HDFS. You can read about it in https://issues.apache.org/jira/browse/HDFS-245 Documentation is here: https://issues.apache.org/jira/secure/attachment/12434745/design-doc-v4.txt Symbolic links in HDFS can point to a directory in a different file system, particula

Re: Namespace partitioning using Locality Sensitive Hashing

2010-03-01 Thread Eli Collins
On Mon, Mar 1, 2010 at 5:42 PM, Ketan Dixit wrote: > Hello, > Thank you Konstantin and  Allen for your reply. The information > provided really helped to improve my understanding. > However I still have few questions. > How Symlinks/ soft links are used to solve the probem of partitioning. > (Wher

Re: Namespace partitioning using Locality Sensitive Hashing

2010-03-01 Thread Ketan Dixit
Hello, Thank you Konstantin and  Allen for your reply. The information provided really helped to improve my understanding. However I still have few questions. How Symlinks/ soft links are used to solve the probem of partitioning. (Where do the symlinks point to? All the mapping is stored in memory

Re: Namespace partitioning using Locality Sensitive Hashing

2010-03-01 Thread Konstantin Shvachko
Hi Ketan, AFAIU, hashing is used to map files and directories into different name-nodes. Suppose you use a simple hash function on a file path h(path), and that files with the same hash value (or within a hash range) are mapped to the same name-node. Then files with the same parent will be rando

Re: Namespace partitioning using Locality Sensitive Hashing

2010-03-01 Thread Allen Wittenauer
On 3/1/10 8:48 AM, "Ketan Dixit" wrote: > How LSH is better than normal hashing? Because still, a client or a fixed > namenode has to take decision of which namenode to contact in whatever > hashing ? It looks to me that requests to files under same subtree are > directed to the same namenode

Namespace partitioning using Locality Sensitive Hashing

2010-03-01 Thread Ketan Dixit
Hi, I am a graduate student in Computer Science department at SUNY Stony Brook. I am thinking of doing a project on Hadoop for my course "Cloud Computing" conducted by Prof. Radu Sion. While going through the links of the "Yahoo open source projects for students" page I found the idea "Research o