Re: Data locality in HBase

2012-06-20 Thread Ted Yu
Minor addition to what Lars G said. In trunk, load balancer is able to utilize block location information when it chooses the region server receiving a region. See the following in RegionLocationFinder: * Returns an ordered list of hosts that are hosting the blocks for this region. The weight o

Re: RS unresponsive after series of deletes

2012-06-20 Thread Ted Yu
As I mentioned earlier, prepareDeleteTimestamps() performs one get operation per column qualifier: get.addColumn(family, qual); List result = get(get, false); This is too costly in your case. I think you can group some configurable number of qualifiers in each get and perform c

Re: Data locality in HBase

2012-06-20 Thread Ben Kim
Hi Lars, I appreciate a lot for your reply. As you told, a regionserver processes hfiles so that all data blocks are located in the same physical machine unless the regionserver failes. I ran following hadoop command to see location of a HFile *hadoop fsck /hbase/testtable/9488ef7fbd23b62b9bf85b7

Re: RS unresponsive after series of deletes

2012-06-20 Thread Ted Tuttle
> Do your 100s of thousands cell deletes overlap (in terms of column family) > across rows ? Our schema contains only one column family per table. So, each Delete contains cells from a single column family. I hope this answers your question.

Re: RS unresponsive after series of deletes

2012-06-20 Thread Ted Yu
Ted T: Do your 100s of thousands cell deletes overlap (in terms of column family) across rows ? In HRegionServer: public MultiResponse multi(MultiAction multi) throws IOException { ... for (Action a : actionsForRegion) { action = a.getAction(); ... if (action instanceof

Re: RS unresponsive after series of deletes

2012-06-20 Thread Ted Yu
Looking at the stack trace, I found the following hot spot: 1. org.apache.hadoop.hbase.regionserver.StoreFileScanner.realSeekDone(StoreFileScanner.java:340) 2. org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:331) 3. org.apache.hadoop.hbase.regions

RE: RS unresponsive after series of deletes

2012-06-20 Thread Ted Tuttle
First off, J-D, thanks for helping me work through this. You've inspired some different angles and I think I've finally made it bleed in a controlled way. > - That data you are deleting needs to be read when you scan, like I > said earlier a delete is in fact an insert in HBase and this isn't > c

Re: RS unresponsive after series of deletes

2012-06-20 Thread Jean-Daniel Cryans
What you are describing here seems very different from what shown earlier. In any case, a few remarks: - You have major compactions running during the time of that log trace, this usually sucks up a lot of IO. See http://hbase.apache.org/book.html#managed.compactions - That data you are deletin

RE: RS unresponsive after series of deletes

2012-06-20 Thread Ted Tuttle
> Like Stack said in his reply, have you thread dumped the slow region > servers when this happens? I've been having difficulty reproducing this behavior in controlled manner. While I haven't been able to get my client to hang up while doing deletes, I have found a query that when issued after a

Re: performance of Get from MR Job

2012-06-20 Thread Jean-Daniel Cryans
Yeah I've overlooked the versions issue. What I usually recommend is that if the timestamp is part of your data model, it should be in the row key, a qualifier or a value. Since you seem to rely on the timestamp for querying, it should definitely be part of the row key but not at the beginning lik

Re: Blocking Inserts

2012-06-20 Thread Dave Wang
I'd also remove the DN and RS from the node running ZK, NN, etc. as you don't want heavweight processes on that node. - Dave On Wed, Jun 20, 2012 at 9:31 AM, Elliott Clark wrote: > Basically without metrics on what's going on it's tough to know for sure. > > I would turn on GC logging and make s

Re: Blocking Inserts

2012-06-20 Thread Elliott Clark
Basically without metrics on what's going on it's tough to know for sure. I would turn on GC logging and make sure that is not playing a part, get metrics on IO while this is going on, and look through the logs to see what is happening when you notice the pause. On Wed, Jun 20, 2012 at 6:39 AM, M

Re: delete rows from hbase

2012-06-20 Thread Michael Segel
Hi, Ok... Just a couple of nits... 1) Please don't write your Mapper and Reducer classes as inner classes. I don't know who started this ... maybe its easier as example code. But It really makes it harder to learn M/R code. (Also harder to teach, but that's another story... ;-) 2) Looking a

Blocking Inserts

2012-06-20 Thread Martin Alig
Hi I'm doing some evaluations with HBase. The workload I'm facing is mainly insert-only. Currently I'm inserting 1KB rows, where 100Bytes go into one column. I have the following cluster machines at disposal: Intel Xeon L5520 2.26 Ghz (Nehalem, with HT enabled) 24 GiB Memory 1 GigE 2x 15k RPM Sa

Re: delete rows from hbase

2012-06-20 Thread Oleg Ruchovets
* * Well , I a bit changed my previous solution , it works but it is very slow !!! I think it is because I pass SINGLE DELETE object and not LIST of DELETES. Is it possible to pass List of Deletes thru map instead of single delete? import org.apache.commons.cli.*; import org.apache.hadoop

Re: delete rows from hbase

2012-06-20 Thread Michael Segel
Hi, The simple way to do this as a map/reduce is the following Use the HTable Input and scan the records you want to delete. In side Mapper.Setup() create a connection to the HTable where you want to delete the records. In side Mapper.Map() for each iteration you will get a row which match

RE: delete rows from hbase

2012-06-20 Thread Anoop Sam John
Hi Do some one tried for the possibility of an Endpoint implementation using which the delete can be done directly with the scan at server side. In the below samples I can see Client -> Server - Scan for certain rows ( we want the rowkeys satisfying our criteria) Client <- Server - returns