Hi everybody,
I am desiging a two-tiers comment system like facebook: the system
inlcuded main comments and each main comment has a various number of
sub-comments. My schema is as follow:
table comments
family "data":
"data:content" - content of main comment
"data:uid" - u
On Sat, Feb 12, 2011 at 7:13 AM, Jason Rutherglen
wrote:
>> solr/katta/elasticsearch
>
> These don't have a distributed solution for realtime search [yet].
Sorry if this is a naive question but can you explain why you consider
that ElasticSearch isn't a distributed solution for realtime search?
Hi Ted,
We currently use this tool in the scenario where data is consumed by
MapReduce jobs, so we haven't tested the performance of pure "distributed
scan" (i.e. N scans instead of 1) a lot. I expect it to be close to simple
scan performance, or may be sometimes even faster depending on your data
Interesting project, Alex.
Since there're bucketsCount scanners compared to one scanner originally,
have you performed load testing to see the impact ?
Thanks
On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau wrote:
> Hello guys,
>
> I'd like to introduce a new small java project/lib around HBase:
I see what you are saying, and I understand the deadlock, but what
escapes me is why ResourceBundle has to go touch all the classes every
time to find the locale as I see 2 threads doing the same. Maybe my
understanding of what it does is just poor, but I also see that you
are using the yourkit pro
If connection loss is followed by session expired, then you can't
recover as the region server will be forced offline.
In a small cluster, keep only 1 zookeeper on the master node/namenode,
and leave the other nodes for regionserver/datanode. Heavy IO can give
weird results when mixed with zookeep
Thanks J-D.
I have learned that there's several possibilities can lead to
ConnectionLossException, like FullGC, heavily swap space, or IO waits reasons.
Especially about the IO waits reasons, does any good suggestions you can
provide about the networking mode? In my current env, I put the Zookeep
For a tiny test like this, everything should be in memory and latency
should be very low.
On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov wrote:
> PS so what should latency be for reads in 0.90, assuming moderate thruput?
>
> On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov wrote:
>> for this
also we had another cluster running previous CDH versions with
pre-0.89 hbase and the latencies weren't as nearly as bad.
On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov wrote:
> PS so what should latency be for reads in 0.90, assuming moderate thruput?
>
> On Tue, Apr 19, 2011 at 5:39 PM, Dmit
We've been using pig to read bulk data from hdfs, transform it and
load it into HBase using the HBaseStorage class, which has worked well
for us. If you try it out you'll want to build from the 0.9.0 branch
(being cut as we speak I beleive) or the trunk. There's an open pig
JIRA with a patch to dis
PS so what should latency be for reads in 0.90, assuming moderate thruput?
On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov wrote:
> for this test, there's just no more than 40 rows in every given table.
> This is just a laugh check.
>
> so i think it's safe to assume it all goes to same region
for this test, there's just no more than 40 rows in every given table.
This is just a laugh check.
so i think it's safe to assume it all goes to same region server.
But latency would not depend on which server call is going to, would
it? Only throughput would, assuming we are not overloading.
An
How many regions? How are they distributed?
Typically it is good to fill the table some what and then drive some
splits and balance operations via the shell. One more split to make
the regions be local and you should be good to go. Make sure you have
enough keys in the table to support these sp
Hi,
I would like to see how i can attack hbase performance.
Right now i am shooting scans returning between 3 and 40 rows and
regardless of data size, approximately 500-400 QPS. The data tables
are almost empty and in-memory, so they surely should fit in those 40%
heap dedicated to them.
My loca
Hidey Ho,
I went to a talk last week on HBase Do's and Don'ts and discovered the
Java client I used to populate my HBase tables is a "don't". I spent the
weekend trying to come up with a better way to populate the table but couldn't,
so I throw the question to the group.
Conditions:
Rec
I put a script up in https://issues.apache.org/jira/browse/HBASE-3752.
I did some basic testing. Try it out. If it works for you, add a
comment to the issue.
St.Ack
On Mon, Apr 18, 2011 at 10:49 PM, Jack Levin wrote:
> In some cases its important to bring hbase up after hdfs crash without
> rec
That configuration is more like what 2357 would be used for.
You wrote: "that you could route all requests for X to the place
where X is when you don't want to have X cached"
And it's for that case that I say you should not go through the nodes
and talk directly to the RS.
J-D
On Tue, Apr 19, 2
To make Configuration 4 possible (last slide in
http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) --
Big Request Load, not so Big Data.
Otis
--
We're hiring HBase hackers for Data Mining and Analytics
http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-ma
I don't know why you would want to serve from other region servers if
all they did was transferring data, the current situation would be
better.
J-D
On Tue, Apr 19, 2011 at 2:26 PM, Otis Gospodnetic
wrote:
> Thanks J-D!
>
> Yeah, what you describe below is also something that I think Edward poin
Thanks J-D!
Yeah, what you describe below is also something that I think Edward pointed out
in some of his slides - that you could route all requests for X to the place
where X is when you don't want to have X cached (in app-level caches and/or
OS-level caches) on multiple servers, but that som
On Tue, Apr 19, 2011 at 4:09 PM, Ted Dunning wrote:
> This is kind of true.
>
> There is only one regionserver to handle the reads, but there are
> multiple copies of the data to handle fail-over.
>
> On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic
> wrote:
>> My question has to do with one of
We have something on the menu:
https://issues.apache.org/jira/browse/HBASE-2357 Coprocessors: Add
read-only region replicas (slaves) for availability and fast region
recovery
Something to keep in mind is that you have to cache the data for each
replica, so a row could be in 3 different caches (whi
This is kind of true.
There is only one regionserver to handle the reads, but there are
multiple copies of the data to handle fail-over.
On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic
wrote:
> My question has to do with one of the good comments from Edward Capriolo, who
> pointed out that s
On Tue, Apr 12, 2011 at 11:01 AM, Stack wrote:
> On Tue, Apr 12, 2011 at 7:28 AM, 茅旭峰 wrote:
> > Hi,
> >
> > I've noticed that Cloudera has announced the CDH3 release, but the apache
> > hbase 0.90.2 is also just released.
>
>
> All should upgrade to the CDH3 release. It includes hdfs-1520,
> h
Hi,
I imagine lots of HBase folks have read or will want to read
http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/ ,
including comments.
My question has to do with one of the good comments from Edward Capriolo, who
pointed out that some of the Configurations he descr
Hi all,
In this chapter of our 0.89 to 0.90 migration saga, we are seeing what we
suspect might be latency related artifacts.
The setting:
- Our EC2 dev environment running our CI builds
- CDH3 U0 (both hadoop and hbase) setup in pseudo-clustered mode
We have several unit tests that have
I was hoping that too..
I don't have scripts to generate # requests from shell..I will try that..
I did n't pre-create regions in 0.20.6 & it handled fine the same load..
I'll try performance in 0.90.2 by precreating regions..
Would sharing a single HBaseConfiguration object for all threads hur
Some more digging, the reason it stays stuck is that the
DaughterOpener thread uses the region server's CatalogTracker which
has a default timeout of Integer.MAX_VALUE and it was stuck in this
code:
while(!stopped && !metaAvailable.get() &&
(timeout == 0 || System.currentTimeMillis
On Tue, Apr 12, 2011 at 9:14 AM, Robert Gonzalez
wrote:
> Seems that Hbase is just too flaky to depend on for a serious system, we've
> not had this type of problem to this degree with conventional DB systems.
I'm sorry to hear that you ran into those issues. While I agree that
running and opera
On Wed, Apr 6, 2011 at 2:39 PM, Jean-Daniel Cryans wrote:
> Look for how Facebook is using HBase for messages. Also look for how
> we have been using HBase at StumbleUpon for 2 years now and for both
> live and batch queries. Numbers are usually included in the decks.
In addition to this, one of
So you have your special lucene region that's opened on some region
server and when the master starts shutting down, it doesn't seem to
see it because while closing regions it says:
2011-04-18 21:35:09,221 INFO [IPC Server handler 4 on 32141]
master.ServerManager(283): Only catalog regions remain
0.90.2 should be faster.
Running same query from shell, it gives you same lag?
St.Ack
On Tue, Apr 19, 2011 at 10:35 AM, Venkatesh wrote:
>
>
> Just upgraded to 0.90.2 from 0.20.6..Doing a simple put to table (< 100
> bytes per put)..
> Only code change was to retrofit the HTable API to work w
Just upgraded to 0.90.2 from 0.20.6..Doing a simple put to table (< 100 bytes
per put)..
Only code change was to retrofit the HTable API to work with 0.90.2
Initializing HBaseConfiguration in servlet.init()...& reusing that config for
HTable constructor & doing put
Performance is very slow
What was the issue (so the rest of us can learn from your experience?).
Thanks Vadim,
St.Ack
On Tue, Apr 19, 2011 at 10:20 AM, Vadim Keylis wrote:
> Thanks so much. Figure the problem that caused lzo not to work.
>
> Thanks again.
>
> On Tue, Apr 19, 2011 at 9:50 AM, Vadim Keylis wrote:
>>
>> Go
On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau wrote:
> Hello guys,
>
And girls!
Thanks for making this addition Alex (and posting the list).
Good stuff,
St.Ack
Hello guys,
I'd like to introduce a new small java project/lib around HBase: HBaseWD. It
is aimed to help with distribution of the load (across regionservers) when
writing sequential (becasue of the row key nature) records. It implements
the solution which was discussed several times on this maili
Thanks so much. Figure the problem that caused lzo not to work.
Thanks again.
On Tue, Apr 19, 2011 at 9:50 AM, Vadim Keylis wrote:
> Good morning. No it seems test was successful. Native libraries located in
> /home/
> hbase/hadoop-lzo-0.4.10/lib/native/Linux-amd64-64/ that were compiled after
Good morning. No it seems test was successful. Native libraries located in
/home/
hbase/hadoop-lzo-0.4.10/lib/native/Linux-amd64-64/ that were compiled after
built project.
[hbase@dhbasetest01 shell]$ ../hbase/bin/hbase
org.apache.hadoop.hbase.util.CompressionTest hdfs://
dhbasetest01.tag-dev.com:
Mind making an issue and a patch? We can apply it for 0.90.3 which
should be out soon. Thank you Gaojinchao.
St.Ack
2011/4/19 Gaojinchao :
> I think it need fix. Because Hmaster can't start up when DN is up.
>
> Can It recover the code ?
>
> Hmaster logs.
>
> 2011-04-19 16:49:09,208 DEBUG
> org
I don't know the details but I believe they had a good idea of the key
space since versions of the applications now running on hbase were
migrated from elsewhere.
In conversations, they've said that they have disabled splitting and
run splits manually "on Tuesdays" from which I understand, someone
I think it need fix. Because Hmaster can't start up when DN is up.
Can It recover the code ?
Hmaster logs.
2011-04-19 16:49:09,208 DEBUG
org.apache.hadoop.hbase.master.ActiveMasterManager: A master is now available
2011-04-19 16:49:09,400 WARN org.apache.hadoop.hbase.util.FSUtils: Version file
It reproduces when HMaster is started for the first time and NN is started
without starting DN.
So, It may be nothing.
Hbase version 0.90.1 :
public static void waitOnSafeMode(final Configuration conf,
final long wait)
throws IOException {
FileSystem fs = FileSystem.get(conf);
42 matches
Mail list logo