Re: Hash keys

2011-04-21 Thread Alex Baranau
I needed to be sure it works for pre-0.90 version *too*, because this is what one of our clusters use (as I believe most clusters which are in use by others). It works for newer versions too. There's a most recent jar available for download in project sources root which one can just download and us

Re: Hash keys

2011-04-21 Thread Eric Charles
Hi Alex, Yep, saw the "[ANN]: HBaseWD: Distribute Sequential Writes in HBase" thread. Tks for this :) - I will need some more time to test it (quite busy atm). Why did you declare 0.89.20100924. and not 0.91 (or 92) as HBaseWD dependency? Tks, - Eric On 21/04/2011 15:56, Alex Baranau wro

Re: Hash keys

2011-04-21 Thread Alex Baranau
For those who are looking for the solution to this or similar issue, this can be useful: Take a look at HBaseWD (https://github.com/sematext/HBaseWD) lib, which implements solution close to what Lars described. Also some info here: http://search-hadoop.com/m/AQ7CG2GkiO Alex Baranau Sematext

Re: Hash keys

2011-03-17 Thread Eric Charles
Hi Lars, Many tks for your reply. For now, I just rely on random or hashed keys and don't need any range queries. I will have to choose a nice solution one day for ordered keys upon which I will range-query. I will post the results of the different data models I will try (looking for other t

Re: Hash keys

2011-03-16 Thread Lars George
Hi Eric, Oops, you are right, my example was not clear and actually confusing the keys with sequential ones. The hash should map every Nth row key to the same bucket, so that you would for example see an interleaved distribution of row keys to regions. Region 1 holds 1, 8, 15,... while region 2 ho

Re: Hash keys

2011-03-16 Thread Eric Charles
...and probably the additional hashing doesn't help the performance. Eric On 16/03/2011 19:17, Eric Charles wrote: A new laptop is definitively on my invest plan :) Tks, Eric On 16/03/2011 18:56, Harsh J wrote: On Wed, Mar 16, 2011 at 8:36 PM, Eric Charles wrote: Cool. Everything is alread

Re: Hash keys

2011-03-16 Thread Eric Charles
A new laptop is definitively on my invest plan :) Tks, Eric On 16/03/2011 18:56, Harsh J wrote: On Wed, Mar 16, 2011 at 8:36 PM, Eric Charles wrote: Cool. Everything is already available. Great! 1 row(s) in 0.0840 seconds 1 row(s) in 0.0420 seconds Interesting, how your test's get time i

Re: Hash keys

2011-03-16 Thread Harsh J
On Wed, Mar 16, 2011 at 8:36 PM, Eric Charles wrote: > Cool. > Everything is already available. Great! > 1 row(s) in 0.0840 seconds >> 1 row(s) in 0.0420 seconds Interesting, how your test's get time is exactly the double of my test ;-) -- Harsh J http://harshj.com

Re: Hash keys

2011-03-16 Thread Eric Charles
Cool. Everything is already available. I simply have to import MD5Hash and use the to_java_bytes ruby function. hbase(main):001:0> import org.apache.hadoop.hbase.util.MD5Hash => Java::OrgApacheHadoopHbaseUtil::MD5Hash hbase(main):002:0> put 'test', MD5Hash.getMD5AsHex('row1'.to_java_bytes), 'c

Re: Hash keys

2011-03-16 Thread Harsh J
Using Java classes itself is possible from within HBase shell (since it is JRuby), but yes some Ruby knowledge should be helpful too! For instance, I can use java.lang.String by simply importing it: hbase(main):004:0> import java.lang.String => Java::JavaLang::String hbase(main):004:0> get String

Re: Hash keys

2011-03-16 Thread Eric Charles
Hi Lars, Many tks for your explanations! About DFR (sequential-keys) vs DFW (random-keys) distinction, I imagine different cases (just rephrasing what you said to be sure I get it): - Keys are really random (GUID or whatever): you have the distribution for free, still can't do, and probably d

Re: Hash keys

2011-03-16 Thread Lars George
Hi Eric, Socorro is Java and Python, I was just mentioning it as a possible source of inspiration :) You can learn Ruby and implement it (I hear it is easy... *cough*) or write that same in a small Java app and use it from the command line or so. And yes, you can range scan using a prefix. We wer

Re: Hash keys

2011-03-16 Thread Eric Charles
Hi Lars, Are you talking about http://code.google.com/p/socorro/ ? I can find python scripts, but no jruby one... Aside the hash function I could reuse, are you saying that range queries are possible even with hashed keys (randomly distributed)? (If possible with the script, it will also be poss

Re: Hash keys

2011-03-16 Thread Eric Charles
Hi, I understand from your answer that it's possible but not available. Did anyone already implemented such a functionality? If not, where should I begin to look at (hirb.rb, any tutorial,... ?) - I know nothing about jruby. Tks, - Eric On 16/03/2011 10:39, Harsh J wrote: (For 2) I think the h

Re: Hash keys

2011-03-16 Thread Lars George
Hi Eric, Mozilla Socorro uses an approach where they bucket ranges using leading hashes to distribute them across servers. When you want to do scans you need to create N scans, where N is the number of hashes and then do a next() on each scanner, putting all KVs into one sorted list (use the KeyCo

Re: Hash keys

2011-03-16 Thread Harsh J
(For 2) I think the hash function should work in the shell if it returns a string type (like what '' defines in-place). On Wed, Mar 16, 2011 at 2:22 PM, Eric Charles wrote: > Hi, > > To help avoid hotspots, I'm planning to use hashed keys in some tables. > > 1. I wonder if this strategy is advice

Re: Hash keys

2011-03-16 Thread Eric Charles
Oops, forget my first question about range query (if keys are hashed, they can not be queried based on a range...) Still curious to have info on hash function in shell shell (2.) and advice on md5/jenkins/sha1 (3.) Tks, Eric On 16/03/2011 09:52, Eric Charles wrote: Hi, To help avoid hotspots,

Hash keys

2011-03-16 Thread Eric Charles
Hi, To help avoid hotspots, I'm planning to use hashed keys in some tables. 1. I wonder if this strategy is adviced for range queries (from/to key) use case, because the rows will be randomly distributed in different regions. Will it cause some performance loose? 2. Is it possible to query fro