Did you pre split your table or did you let balancer assign regions to
regionservers for you ?
Did your regionserver(s) fail ?
On Thu, Aug 2, 2012 at 8:31 AM, Bryan Keller wrote:
> I have an 8 node cluster and a table that is pretty well balanced with on
> average 36 regions/node. When I run a
I have an 8 node cluster and a table that is pretty well balanced with on
average 36 regions/node. When I run a mapreduce job on the cluster against this
table, the data locality of the mappers is poor, e.g 100 rack local mappers and
only 188 data local mappers. I would expect nearly all of the
I have a table on a 4 node test cluster. I also have some other tables on the
cluster. The table in question has a total of 12 regions. I noticed that 1 node
has 6 regions, another has zero, and the remaining two nodes have the expected
3 regions. I'm a little confused how this can happen.
The
As I said, I have not used this myself... So take this with a grain of salt :)
I imagine the advantage would be no additional servers/processes that would
need to be monitored and managed, as well as a (slight) reduction in overall
resource consumption.
On the downside any resource leak in the
Lars,
Thanks for the pointer, its indeed interesting way. Two follow up questions
:
1. Author states "Rather than a separate process, it can be *advantageous
* in some situations for each RegionServer to embed their own
ThriftServer" , do you happen to have insights on what are
those
+1. Anyway all mutations extends OperationsWithAttributes also.
Regards
Ram
> -Original Message-
> From: Anoop Sam John [mailto:anoo...@huawei.com]
> Sent: Thursday, August 02, 2012 10:13 AM
> To: user@hbase.apache.org
> Subject: RE: Retrieve Put timestamp
>
> Currently in Append there i
Currently in Append there is a setter to specify whether to return the result
or not. Similar way we can use for Put? Only with specific use cases the return
TS might be needed.
May be in a generic way we can return the attributes of the Mutation? So any
thing which the client needs back can be
There is a little documented feature that Jonathan Gray added a while back:
Running a thrift server as a thread as part of each region server.
This is enabled by settting hbase.regionserver.export.thrift to true in your
configuration.
While I have not personally tried it, it looks like a fairly
The Filter is initialized per Region as part of a RegionScannerImpl.
So as long as all the rows you are interested are co-located in the same region
you can keep that state in the Filter instance.
You can use a custom RegionSplitPolicy to control (to some extend at least) how
the rows are coloc
On Wed, Aug 1, 2012 at 12:52 PM, Mohammad Tariq wrote:
> Hello Mohit,
>
> If replication factor is set to some value > 1, then the data is
> still present on some other node(perhaps within the same rack or a
> different one). And, as far as this post is concerned it tells us
> about Write Ah
Hi Lars,
I understand that it is more difficult to carry states across regions/servers,
how about in a single region? Knowing that the rows in a single region have
dependencies, can we have filter with state? If filter doesn't provide this
ability, is there other mechanism in hbase to offer thi
The issue here is that different rows can be located in different regions or
even different region servers, so no local state will carry over all rows.
- Original Message -
From: Jerry Lam
To: "user@hbase.apache.org"
Cc: "user@hbase.apache.org"
Sent: Wednesday, August 1, 2012 5:48 PM
Hi St.Ack:
Schema cannot be changed to a single row.
The API describes "Do not rely on filters carrying state across rows; its not
reliable in current hbase as we have no handlers in place for when regions
split, close or server crashes." If we manage region splitting ourselves, so
the split is
In case anyone is interested in hbase and disaster recovery, here is a
writeup I just posted:
http://bruteforcedata.blogspot.com/2012/08/hbase-disaster-recovery-and-whisky.html
Feedback appreciated.
Thanks,
Paul
On Wed, Aug 1, 2012 at 10:44 PM, Jerry Lam wrote:
> Hi HBase guru:
>
> From Lars George talk, he mentions that filter has no state. What if I need
> to scan rows in which the decision to filter one row or not is based on the
> previous row's column values? Any idea how one can implement this type
On Wed, Aug 1, 2012 at 7:12 PM, Wei Tan wrote:
> We have a similar requirement and here is the solution in our mind:
> add a coprocessor, in prePut() get the current ms and set it to put ---
> the current implementation get the current ms and set it in put()
> return the ms generated to prePut() t
Hi,
We have a linux instance for HBase. Now I am trying to connect to HBase
using Java, when I tried using a simple program it is connecting to HBase
and doing the the operations like create and get the table rows etc. But
the same code I used in my application as a ear file and deployed its not
Thanks Suraj. I looked at the code but it looks like the logic is not
self-contained, particularly for the way hbase works with search for a
specific version using TimeRange.
Best Regards,
Jerry
On Mon, Jul 30, 2012 at 12:53 PM, Suraj Varma wrote:
> You may need to setup your Eclipse workspace
Hello Mohit,
If replication factor is set to some value > 1, then the data is
still present on some other node(perhaps within the same rack or a
different one). And, as far as this post is concerned it tells us
about Write Ahead Logs, i.e data that is still not written onto the
disk. This is
We have a similar requirement and here is the solution in our mind:
add a coprocessor, in prePut() get the current ms and set it to put ---
the current implementation get the current ms and set it in put()
return the ms generated to prePut() to client. For now put() does not
return any value. we
On Wed, Aug 1, 2012 at 9:29 AM, lars hofhansl wrote:
> "sync" is a fluffy term in HDFS. HDFS has hsync and hflush.
> hflush forces all current changes at a DFSClient to all replica nodes (but
> not to disk).
>
> Until HDFS-744 hsync would be identical to hflush. After HDFS-744 hsync
> can be used
There is no HBase API for this.
However, this could useful in some scenario, so maybe we could add an API for
this.
It's not entirely trivial, though.
From: Pablo Musa
To: "user@hbase.apache.org"
Sent: Monday, July 30, 2012 3:13 PM
Subject: Retrieve Put timestam
"sync" is a fluffy term in HDFS. HDFS has hsync and hflush.
hflush forces all current changes at a DFSClient to all replica nodes (but not
to disk).
Until HDFS-744 hsync would be identical to hflush. After HDFS-744 hsync can be
used to force data to disk at the replicas.
When HBase refers to "
I believe you are talking about enabling dfs.support.append feature? I
benchmarked the difference (disable/enable) previously and I don't find
much differences. It would be great if someone else can confirm on this.
Best Regards,
Jerry
On Wednesday, August 1, 2012, Alex Baranau wrote:
> I belie
These questions were raised many times in this ML and in other sources
(blogs, etc.). You can find them with a little effort.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Wed, Aug 1, 2012 at 1:33 AM, Mohammad Tariq wrote:
> Hello Mohit,
>
> Is there a way to execute multiple scans in parallel like get?
I guess the Q is can we (and does it makes sense) to execute multiple scans
in parallel, e.g. in multiple threads inside the client. The answer is yes,
you can do it and it makes sense: HBase is likely to be able to process
much more
I believe that this is *not default*, but *current* implementation of
sync(). I.e. (please correct me if I'm wrong) n-way write approach is not
available yet.
You might confuse it with the fact that by default, sync() is called on
every edit. And you can change it by using "deferred log flushing".
Actually w coprocessors you can create a secondary index in short order.
Then your cost is going to be 2 fetches. Trying to do a partial table scan will
be more expensive.
On Jul 31, 2012, at 12:41 PM, Matt Corgan wrote:
> When deciding between a table scan vs secondary index, you should try
Thanks Matt & Jerry for your replies.
The data for each row is small (some hundred Bytes).
So, I will try the parallel table scan at first as you suggested...
Before organizing that by myself, wouldn't it be a better idea to create a map
reduce job for that?
I'm not so keen on implementing seco
Running thrift server on the client is more ideal. You get to cut down 1
network hop.
On Tue, Jul 31, 2012 at 2:22 PM, Stack wrote:
> On Tue, Jul 31, 2012 at 12:32 PM, Eric wrote:
> > I'm currently running thrift on all region server nodes. The reasoning is
> > that you can run jobs on this clu
Hi,
If the row for your key, is not present then get() will return an empty
Result (a result with no key values in it)
you should call result.isEmpty() first.
Igal.
On Wed, Aug 1, 2012 at 3:20 AM, Mohit Anchlia wrote:
> Not sure how but I am getting one null row per 9 writes when I do a GET in
>
31 matches
Mail list logo