Minor addition to what Lars G said.
In trunk, load balancer is able to utilize block location information when
it chooses the region server receiving a region.
See the following in RegionLocationFinder:
* Returns an ordered list of hosts that are hosting the blocks for this
region. The weight o
As I mentioned earlier, prepareDeleteTimestamps() performs one get
operation per column qualifier:
get.addColumn(family, qual);
List result = get(get, false);
This is too costly in your case.
I think you can group some configurable number of qualifiers in each get
and perform c
Hi Lars,
I appreciate a lot for your reply.
As you told, a regionserver processes hfiles so that all data blocks are
located in the same physical machine unless the regionserver failes.
I ran following hadoop command to see location of a HFile
*hadoop fsck
/hbase/testtable/9488ef7fbd23b62b9bf85b7
> Do your 100s of thousands cell deletes overlap (in terms of column family)
> across rows ?
Our schema contains only one column family per table. So, each Delete contains
cells from a single column family. I hope this answers your question.
Ted T:
Do your 100s of thousands cell deletes overlap (in terms of column family)
across rows ?
In HRegionServer:
public MultiResponse multi(MultiAction multi) throws IOException {
...
for (Action a : actionsForRegion) {
action = a.getAction();
...
if (action instanceof
Looking at the stack trace, I found the following hot spot:
1.
org.apache.hadoop.hbase.regionserver.StoreFileScanner.realSeekDone(StoreFileScanner.java:340)
2.
org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:331)
3.
org.apache.hadoop.hbase.regions
First off, J-D, thanks for helping me work through this. You've
inspired some different angles and I think I've finally made it bleed in
a controlled way.
> - That data you are deleting needs to be read when you scan, like I
> said earlier a delete is in fact an insert in HBase and this isn't
> c
What you are describing here seems very different from what shown earlier.
In any case, a few remarks:
- You have major compactions running during the time of that log
trace, this usually sucks up a lot of IO. See
http://hbase.apache.org/book.html#managed.compactions
- That data you are deletin
> Like Stack said in his reply, have you thread dumped the slow region
> servers when this happens?
I've been having difficulty reproducing this behavior in controlled
manner. While I haven't been able to get my client to hang up while
doing deletes, I have found a query that when issued after a
Yeah I've overlooked the versions issue.
What I usually recommend is that if the timestamp is part of your data
model, it should be in the row key, a qualifier or a value. Since you
seem to rely on the timestamp for querying, it should definitely be
part of the row key but not at the beginning lik
I'd also remove the DN and RS from the node running ZK, NN, etc. as you
don't want heavweight processes on that node.
- Dave
On Wed, Jun 20, 2012 at 9:31 AM, Elliott Clark wrote:
> Basically without metrics on what's going on it's tough to know for sure.
>
> I would turn on GC logging and make s
Basically without metrics on what's going on it's tough to know for sure.
I would turn on GC logging and make sure that is not playing a part, get
metrics on IO while this is going on, and look through the logs to see what
is happening when you notice the pause.
On Wed, Jun 20, 2012 at 6:39 AM, M
Hi,
Ok...
Just a couple of nits...
1) Please don't write your Mapper and Reducer classes as inner classes.
I don't know who started this ... maybe its easier as example code. But It
really makes it harder to learn M/R code. (Also harder to teach, but that's
another story... ;-)
2) Looking a
Hi
I'm doing some evaluations with HBase. The workload I'm facing is mainly
insert-only.
Currently I'm inserting 1KB rows, where 100Bytes go into one column.
I have the following cluster machines at disposal:
Intel Xeon L5520 2.26 Ghz (Nehalem, with HT enabled)
24 GiB Memory
1 GigE
2x 15k RPM Sa
*
*
Well , I a bit changed my previous solution , it works but it is very slow
!!!
I think it is because I pass SINGLE DELETE object and not LIST of DELETES.
Is it possible to pass List of Deletes thru map instead of single delete?
import org.apache.commons.cli.*;
import org.apache.hadoop
Hi,
The simple way to do this as a map/reduce is the following
Use the HTable Input and scan the records you want to delete.
In side Mapper.Setup() create a connection to the HTable where you want to
delete the records.
In side Mapper.Map() for each iteration you will get a row which match
Hi
Do some one tried for the possibility of an Endpoint implementation using
which the delete can be done directly with the scan at server side.
In the below samples I can see
Client -> Server - Scan for certain rows ( we want the rowkeys satisfying our
criteria)
Client <- Server - returns
17 matches
Mail list logo