Re: timeouts with lots of coprocessor puts on single row

2013-08-26 Thread Olle Mårtensson
Thank you for the link Anil it was a good explanation indeed. >It's not recommended to do put/deletes across >region servers like this. That was not my intention, I want to keep the region for the aggregates and the aggregated values on the same server. I read in the link that you gave me that I

Re: timeouts with lots of coprocessor puts on single row

2013-08-26 Thread Asaf Mesika
We did the same but on the client side, without any issue On Monday, August 26, 2013, Olle Mårtensson wrote: > Hi, > > I have developed a coprocessor that is extending BaseRegionObserver and > implements the > postPut method. The postPut method scans the columns of the row that the > put was issu

Newbie in hbase Trying to run an example

2013-08-26 Thread jamal sasha
Hi, I am new to hbase, so few noob questions. So, I created a table in hbase: A quick scan gives me the following: hbase(main):001:0> scan 'test' ROW COLUMN+CELL row1column=cf:word, timestamp=1377

Re: timeouts with lots of coprocessor puts on single row

2013-08-26 Thread anil gupta
On Mon, Aug 26, 2013 at 7:27 AM, Olle Mårtensson wrote: > Hi, > > I have developed a coprocessor that is extending BaseRegionObserver and > implements the > postPut method. The postPut method scans the columns of the row that the > put was issued on and calculates an aggregated based on these valu

timeouts with lots of coprocessor puts on single row

2013-08-26 Thread Olle Mårtensson
Hi, I have developed a coprocessor that is extending BaseRegionObserver and implements the postPut method. The postPut method scans the columns of the row that the put was issued on and calculates an aggregated based on these values, when this is done a row in another table is updated with the agg

Re: Store specific rows on specific region server

2013-08-26 Thread Ted Yu
bq. store particular row on a particular region server Can you let us know your use case ? Any single region server may go down, due to various reasons. How do you plan to maintain row key distribution after that ? Thanks On Mon, Aug 26, 2013 at 3:52 AM, Vamshi Krishna wrote: > Hi all, >

Re: regions are not getting distributed

2013-08-26 Thread Vamshi Krishna
Hi all, The problem got solved by changing the value for below property from local directory path to the hdfs:// path AND running hadoop before i start running my hbase. hbase.rootdir /home/biginfolabs/BILSftwrs/hbase-0.94.10/hbstmp/ Now, i see the data gets distributed acros

Store specific rows on specific region server

2013-08-26 Thread Vamshi Krishna
Hi all, Is there any facility in hbase such that, in a task of storing 1000 rows on a cluster of 10 machines with a specification like, the Nth row should be stored in N%1000 th region server. In essence, how to store particular row on a particular region server..? (Can we specify which ro

Re: Input split for a HBase of 80,000 rows?

2013-08-26 Thread Pavan Sudheendra
Awesome.. Thanks :) Now my map and reduce tasks are super fast.. Although, the table i'll eventually be using has a region split of 25.. 4 on 5 machines and 5 on the master region node.. I don't know if thats enough though.. But i'll look into this.. On Mon, Aug 26, 2013 at 2:55 PM, Ashwanth Kum

Re: Can I make use of TableSplit across Regions to make my MR job faster?

2013-08-26 Thread Michael Segel
A 'table split' is a region split and as you split regions, balance the regions, you should see some parallelism in your M/R jobs. Of course depending on your choice of row keys... YMMV. HTH -Mike On Aug 26, 2013, at 2:16 AM, Pavan Sudheendra wrote: > Hi all, > > How to make use of a Table

Re: regions are not getting distributed

2013-08-26 Thread Vamshi Krishna
Ted, I guessed the problem could be due to only single zookeeper server in hbase.zookeepr.quorumpeer. So, i have added the region server machine also apart from the master. Now, i don't see any such FAIL cases as mentioned below. (which was the case earlier) Handling transition=RS_ZK_REGION_ FAILE

Re: Input split for a HBase of 80,000 rows?

2013-08-26 Thread Ashwanth Kumar
Just click on "Split" that should be fine. It would pick up a key in the middle of each region and split them. Split happens like 1 -> 2 -> 4 -> 8 regions and so on. # of regions for a table is something that you should be able to come up given the # of region servers and size of data that you are

Re: Input split for a HBase of 80,000 rows?

2013-08-26 Thread Pavan Sudheendra
Further more, what can we do if a table has 25 online regions? Can we safely set caching to a bigger number? Is a split necessary as well? On Mon, Aug 26, 2013 at 2:42 PM, Pavan Sudheendra wrote: > Hi Ashwanth, thanks for the reply.. > > I went to the HBase Web UI and saw that my table had 1 Onl

Re: Input split for a HBase of 80,000 rows?

2013-08-26 Thread Pavan Sudheendra
Hi Ashwanth, thanks for the reply.. I went to the HBase Web UI and saw that my table had 1 Online Regions.. Can you please guide me as to how to do the split on this table? I see the UI asking for a region key and a split button... How many splits can i make exactly? Can i give two different 'keys

Re: FuzzyRowFilter question

2013-08-26 Thread Kiru Pakkirisamy
Thanks Ted, it is explicitly mentioned in the limitations section but I seem to have missed it .. oh well. It is an awesome filter.. great work by Alex, you and the team. Thanks to you all.    Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From

Re: Input split for a HBase of 80,000 rows?

2013-08-26 Thread Ashwanth Kumar
setCaching is setting the value via API, other way is to set it in the job configuration using the Key "hbase.client.scanner.caching". I just realized, given that you have just 1 region Caching wouldn't help much in reducing the time. Splitting might be an ideal solution. Based on your Heap space

Re: Input split for a HBase of 80,000 rows?

2013-08-26 Thread Pavan Sudheendra
Hi Ashwanth, My caching is set to 1500 .. scan.setCaching(1500); scan.setCacheBlocks(false); Can i set the number of splits via an API? On Mon, Aug 26, 2013 at 2:22 PM, Ashwanth Kumar < ashwanthku...@googlemail.com> wrote: > To answer your question - Go to HBase Web UI and you can initiate a m

Re: Input split for a HBase of 80,000 rows?

2013-08-26 Thread Ashwanth Kumar
To answer your question - Go to HBase Web UI and you can initiate a manual split on the table. But, before you do that. May be you can try increasing your client caching value (hbase.client.scanner.caching) in your Job. On Mon, Aug 26, 2013 at 2:09 PM, Pavan Sudheendra wrote: > What is the inpu

Re: regions are not getting distributed

2013-08-26 Thread Andrew Purtell
Two nodes is insufficient. Default DFS replication is 3. That would be the bare minimum just for kicking the tires IMO but is still a degenerate case. In my opinion 5 is the lowest you should go. You shouldn't draw conclusions from inadequate deploys. On Friday, August 23, 2013, Vamshi Krishna wro

Input split for a HBase of 80,000 rows?

2013-08-26 Thread Pavan Sudheendra
What is the input split of the HBase Table in this job status? map() completion: 0.0 reduce() completion: 0.0 Counters: 24 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=216030 FILE: Number of read operations=

Re: FuzzyRowFilter question

2013-08-26 Thread Ted Yu
That is right. See http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/ On Aug 25, 2013, at 10:56 PM, Kiru Pakkirisamy wrote: > I am using FuzzyRowFilter with my coprocessors as it seems to give the best > performance (even though I

Can I make use of TableSplit across Regions to make my MR job faster?

2013-08-26 Thread Pavan Sudheendra
Hi all, How to make use of a TableSplit or a Region Split? How is it used in TableInputFormatBase# getSplits() ? I have 6 Region Servers across the cluster for the map-reduce task which i am using, How to leverage this so that the table is split across the clusters and the map-reduce application