HBase Schema: how to sort row by last update?

2011-04-19 Thread Bui Ngoc Son
Hi everybody, I am desiging a two-tiers comment system like facebook: the system inlcuded main comments and each main comment has a various number of sub-comments. My schema is as follow: table comments family "data": "data:content" - content of main comment "data:uid" - u

Re: HBase and Lucene for realtime search

2011-04-19 Thread tsuna
On Sat, Feb 12, 2011 at 7:13 AM, Jason Rutherglen wrote: >> solr/katta/elasticsearch > > These don't have a distributed solution for realtime search [yet]. Sorry if this is a naive question but can you explain why you consider that ElasticSearch isn't a distributed solution for realtime search?

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-04-19 Thread Alex Baranau
Hi Ted, We currently use this tool in the scenario where data is consumed by MapReduce jobs, so we haven't tested the performance of pure "distributed scan" (i.e. N scans instead of 1) a lot. I expect it to be close to simple scan performance, or may be sometimes even faster depending on your data

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-04-19 Thread Ted Yu
Interesting project, Alex. Since there're bucketsCount scanners compared to one scanner originally, have you performed load testing to see the impact ? Thanks On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau wrote: > Hello guys, > > I'd like to introduce a new small java project/lib around HBase:

Re: Possible dead lock

2011-04-19 Thread Jean-Daniel Cryans
I see what you are saying, and I understand the deadlock, but what escapes me is why ResourceBundle has to go touch all the classes every time to find the locale as I see 2 threads doing the same. Maybe my understanding of what it does is just poor, but I also see that you are using the yourkit pro

Re: Rs: Does it necessarily to handle the "Zookeeper.ConnectionLossException" in ZKUtil.getDataAndWatch?

2011-04-19 Thread Jean-Daniel Cryans
If connection loss is followed by session expired, then you can't recover as the region server will be forced offline. In a small cluster, keep only 1 zookeeper on the master node/namenode, and leave the other nodes for regionserver/datanode. Heavy IO can give weird results when mixed with zookeep

Rs: Does it necessarily to handle the "Zookeeper.ConnectionLossException" in ZKUtil.getDataAndWatch?

2011-04-19 Thread bijieshan
Thanks J-D. I have learned that there's several possibilities can lead to ConnectionLossException, like FullGC, heavily swap space, or IO waits reasons. Especially about the IO waits reasons, does any good suggestions you can provide about the networking mode? In my current env, I put the Zookeep

Re: 0.90 latency performance, cdh3b4

2011-04-19 Thread Ted Dunning
For a tiny test like this, everything should be in memory and latency should be very low. On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov wrote: > PS so what should latency be for reads in 0.90, assuming moderate thruput? > > On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov wrote: >> for this

Re: 0.90 latency performance, cdh3b4

2011-04-19 Thread Dmitriy Lyubimov
also we had another cluster running previous CDH versions with pre-0.89 hbase and the latencies weren't as nearly as bad. On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov wrote: > PS so what should latency be for reads in 0.90, assuming moderate thruput? > > On Tue, Apr 19, 2011 at 5:39 PM, Dmit

Re: HBase - Map Reduce - Client Question

2011-04-19 Thread Bill Graham
We've been using pig to read bulk data from hdfs, transform it and load it into HBase using the HBaseStorage class, which has worked well for us. If you try it out you'll want to build from the 0.9.0 branch (being cut as we speak I beleive) or the trunk. There's an open pig JIRA with a patch to dis

Re: 0.90 latency performance, cdh3b4

2011-04-19 Thread Dmitriy Lyubimov
PS so what should latency be for reads in 0.90, assuming moderate thruput? On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov wrote: > for this test, there's just no more than 40 rows in every given table. > This is just a laugh check. > > so i think it's safe to assume it all goes to same region

Re: 0.90 latency performance, cdh3b4

2011-04-19 Thread Dmitriy Lyubimov
for this test, there's just no more than 40 rows in every given table. This is just a laugh check. so i think it's safe to assume it all goes to same region server. But latency would not depend on which server call is going to, would it? Only throughput would, assuming we are not overloading. An

Re: 0.90 latency performance, cdh3b4

2011-04-19 Thread Ted Dunning
How many regions? How are they distributed? Typically it is good to fill the table some what and then drive some splits and balance operations via the shell. One more split to make the regions be local and you should be good to go. Make sure you have enough keys in the table to support these sp

0.90 latency performance, cdh3b4

2011-04-19 Thread Dmitriy Lyubimov
Hi, I would like to see how i can attack hbase performance. Right now i am shooting scans returning between 3 and 40 rows and regardless of data size, approximately 500-400 QPS. The data tables are almost empty and in-memory, so they surely should fit in those 40% heap dedicated to them. My loca

HBase - Map Reduce - Client Question

2011-04-19 Thread Peter Haidinyak
Hidey Ho, I went to a talk last week on HBase Do's and Don'ts and discovered the Java client I used to populate my HBase tables is a "don't". I spent the weekend trying to come up with a better way to populate the table but couldn't, so I throw the question to the group. Conditions: Rec

Re: hlog async replay tool.

2011-04-19 Thread Stack
I put a script up in https://issues.apache.org/jira/browse/HBASE-3752. I did some basic testing. Try it out. If it works for you, add a comment to the issue. St.Ack On Mon, Apr 18, 2011 at 10:49 PM, Jack Levin wrote: > In some cases its important to bring hbase up after hdfs crash without > rec

Re: Region replication?

2011-04-19 Thread Jean-Daniel Cryans
That configuration is more like what 2357 would be used for. You wrote: "that you could route all requests for X to the place where X is when you don't want to have X cached" And it's for that case that I say you should not go through the nodes and talk directly to the RS. J-D On Tue, Apr 19, 2

Re: Region replication?

2011-04-19 Thread Otis Gospodnetic
To make Configuration 4 possible (last slide in http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) -- Big Request Load, not so Big Data. Otis -- We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-ma

Re: Region replication?

2011-04-19 Thread Jean-Daniel Cryans
I don't know why you would want to serve from other region servers if all they did was transferring data, the current situation would be better. J-D On Tue, Apr 19, 2011 at 2:26 PM, Otis Gospodnetic wrote: > Thanks J-D! > > Yeah, what you describe below is also something that I think Edward poin

Re: Region replication?

2011-04-19 Thread Otis Gospodnetic
Thanks J-D! Yeah, what you describe below is also something that I think Edward pointed out in some of his slides - that you could route all requests for X to the place where X is when you don't want to have X cached (in app-level caches and/or OS-level caches) on multiple servers, but that som

Re: Region replication?

2011-04-19 Thread Edward Capriolo
On Tue, Apr 19, 2011 at 4:09 PM, Ted Dunning wrote: > This is kind of true. > > There is only one regionserver to handle the reads, but there are > multiple copies of the data to handle fail-over. > > On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic > wrote: >> My question has to do with one of

Re: Region replication?

2011-04-19 Thread Jean-Daniel Cryans
We have something on the menu: https://issues.apache.org/jira/browse/HBASE-2357 Coprocessors: Add read-only region replicas (slaves) for availability and fast region recovery Something to keep in mind is that you have to cache the data for each replica, so a row could be in 3 different caches (whi

Re: Region replication?

2011-04-19 Thread Ted Dunning
This is kind of true. There is only one regionserver to handle the reads, but there are multiple copies of the data to handle fail-over. On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic wrote: > My question has to do with one of the good comments from Edward Capriolo, who > pointed out that  s

Re: apache hbase 0.90.2 vs CDH3 hbase0.90.1+15.18

2011-04-19 Thread Todd Lipcon
On Tue, Apr 12, 2011 at 11:01 AM, Stack wrote: > On Tue, Apr 12, 2011 at 7:28 AM, 茅旭峰 wrote: > > Hi, > > > > I've noticed that Cloudera has announced the CDH3 release, but the apache > > hbase 0.90.2 is also just released. > > > All should upgrade to the CDH3 release. It includes hdfs-1520, > h

Region replication?

2011-04-19 Thread Otis Gospodnetic
Hi, I imagine lots of HBase folks have read or will want to read http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/ , including comments. My question has to do with one of the good comments from Edward Capriolo, who pointed out that some of the Configurations he descr

Latency related configs for 0.90

2011-04-19 Thread George P. Stathis
Hi all, In this chapter of our 0.89 to 0.90 migration saga, we are seeing what we suspect might be latency related artifacts. The setting: - Our EC2 dev environment running our CI builds - CDH3 U0 (both hadoop and hbase) setup in pseudo-clustered mode We have several unit tests that have

Re: hbase 0.90.2 - incredibly slow response

2011-04-19 Thread Venkatesh
I was hoping that too.. I don't have scripts to generate # requests from shell..I will try that.. I did n't pre-create regions in 0.20.6 & it handled fine the same load.. I'll try performance in 0.90.2 by precreating regions.. Would sharing a single HBaseConfiguration object for all threads hur

Re: Repeating log message in a [custom] unit test

2011-04-19 Thread Jean-Daniel Cryans
Some more digging, the reason it stays stuck is that the DaughterOpener thread uses the region server's CatalogTracker which has a default timeout of Integer.MAX_VALUE and it was stuck in this code: while(!stopped && !metaAvailable.get() && (timeout == 0 || System.currentTimeMillis

Re: HBase is not ready for prime time

2011-04-19 Thread tsuna
On Tue, Apr 12, 2011 at 9:14 AM, Robert Gonzalez wrote: > Seems that Hbase is just too flaky to depend on for a serious system, we've > not had this type of problem to this degree with conventional DB systems. I'm sorry to hear that you ran into those issues. While I agree that running and opera

Re: HBase Performance

2011-04-19 Thread tsuna
On Wed, Apr 6, 2011 at 2:39 PM, Jean-Daniel Cryans wrote: > Look for how Facebook is using HBase for messages. Also look for how > we have been using HBase at StumbleUpon for 2 years now and for both > live and batch queries. Numbers are usually included in the decks. In addition to this, one of

Re: Repeating log message in a [custom] unit test

2011-04-19 Thread Jean-Daniel Cryans
So you have your special lucene region that's opened on some region server and when the master starts shutting down, it doesn't seem to see it because while closing regions it says: 2011-04-18 21:35:09,221 INFO [IPC Server handler 4 on 32141] master.ServerManager(283): Only catalog regions remain

Re: hbase 0.90.2 - incredibly slow response

2011-04-19 Thread Stack
0.90.2 should be faster. Running same query from shell, it gives you same lag? St.Ack On Tue, Apr 19, 2011 at 10:35 AM, Venkatesh wrote: > > > Just upgraded to 0.90.2 from 0.20.6..Doing a simple put to  table (< 100 > bytes per put).. > Only code change was to retrofit the HTable API to work w

hbase 0.90.2 - incredibly slow response

2011-04-19 Thread Venkatesh
Just upgraded to 0.90.2 from 0.20.6..Doing a simple put to table (< 100 bytes per put).. Only code change was to retrofit the HTable API to work with 0.90.2 Initializing HBaseConfiguration in servlet.init()...& reusing that config for HTable constructor & doing put Performance is very slow

Re: HBase 0.90.2 CDH3B4 -Compression algorithm 'lzo' previously failed test

2011-04-19 Thread Stack
What was the issue (so the rest of us can learn from your experience?). Thanks Vadim, St.Ack On Tue, Apr 19, 2011 at 10:20 AM, Vadim Keylis wrote: > Thanks so much. Figure the problem that caused lzo not to work. > > Thanks again. > > On Tue, Apr 19, 2011 at 9:50 AM, Vadim Keylis wrote: >> >> Go

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-04-19 Thread Stack
On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau wrote: > Hello guys, > And girls! Thanks for making this addition Alex (and posting the list). Good stuff, St.Ack

[ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-04-19 Thread Alex Baranau
Hello guys, I'd like to introduce a new small java project/lib around HBase: HBaseWD. It is aimed to help with distribution of the load (across regionservers) when writing sequential (becasue of the row key nature) records. It implements the solution which was discussed several times on this maili

Re: HBase 0.90.2 CDH3B4 -Compression algorithm 'lzo' previously failed test

2011-04-19 Thread Vadim Keylis
Thanks so much. Figure the problem that caused lzo not to work. Thanks again. On Tue, Apr 19, 2011 at 9:50 AM, Vadim Keylis wrote: > Good morning. No it seems test was successful. Native libraries located in > /home/ > hbase/hadoop-lzo-0.4.10/lib/native/Linux-amd64-64/ that were compiled after

Re: HBase 0.90.2 CDH3B4 -Compression algorithm 'lzo' previously failed test

2011-04-19 Thread Vadim Keylis
Good morning. No it seems test was successful. Native libraries located in /home/ hbase/hadoop-lzo-0.4.10/lib/native/Linux-amd64-64/ that were compiled after built project. [hbase@dhbasetest01 shell]$ ../hbase/bin/hbase org.apache.hadoop.hbase.util.CompressionTest hdfs:// dhbasetest01.tag-dev.com:

Re: A question about Hmaster startup.

2011-04-19 Thread Stack
Mind making an issue and a patch? We can apply it for 0.90.3 which should be out soon. Thank you Gaojinchao. St.Ack 2011/4/19 Gaojinchao : > I think it need fix. Because Hmaster can't start up when DN is up. > > Can It recover the code ? > > Hmaster logs. > > 2011-04-19 16:49:09,208 DEBUG > org

Re: Do u know more details about facebook use hbase

2011-04-19 Thread Stack
I don't know the details but I believe they had a good idea of the key space since versions of the applications now running on hbase were migrated from elsewhere. In conversations, they've said that they have disabled splitting and run splits manually "on Tuesdays" from which I understand, someone

Re: A question about Hmaster startup.

2011-04-19 Thread Gaojinchao
I think it need fix. Because Hmaster can't start up when DN is up. Can It recover the code ? Hmaster logs. 2011-04-19 16:49:09,208 DEBUG org.apache.hadoop.hbase.master.ActiveMasterManager: A master is now available 2011-04-19 16:49:09,400 WARN org.apache.hadoop.hbase.util.FSUtils: Version file

re: A question about Hmaster startup.

2011-04-19 Thread Gaojinchao
It reproduces when HMaster is started for the first time and NN is started without starting DN. So, It may be nothing. Hbase version 0.90.1 : public static void waitOnSafeMode(final Configuration conf, final long wait) throws IOException { FileSystem fs = FileSystem.get(conf);