Re: How to get table by HTablePool in HBaseTestingUtility

2014-08-07 Thread Arun Allamsetty
And please also tell is the versions of Hadoop and HBase you are using, because HTablePool is deprecated in the newer versions of HBase. Thanks, Arun Sent from a mobile device. Please don't mind the typos. On Aug 7, 2014 1:16 AM, "Dai, Kevin" wrote: > Hi, all > > I am doing unit testing with HB

Re: Why hbase need manual split?

2014-08-05 Thread Arun Allamsetty
Hi Ming, The reason why we have it is because the user can decide where each key goes. I can think multiple scenarios off the top of my head where it would be useful and others can correct me if I am wrong. 1. Cases where you cannot have row keys which are equally lexically distributed, leading i

Re: Hbase Mapreduce API - Reduce to a file is not working properly.

2014-08-01 Thread Arun Allamsetty
Hi, The @Override annotation worked because, without it the reduce method in the superclass (Reducer) was being invoked, which basically writes the input from the mapper class to the context object. Try to look up the source code for the Reducer class online and you'll realize that. Hope that cle

Re: Best practice for writing to HFileOutputFormat(2) with multiple Column Families

2014-08-01 Thread Arun Allamsetty
Hi Jianshi, Do you mean that you want to sort the row keys? If yes, then you don't have to worry about it because HBase sorts the row keys on its own but lexicographically. Cheers, Arun Sent from a mobile device. Please don't mind the typos. On Jul 30, 2014 9:02 PM, "Jianshi Huang" wrote: > I

Re: Hbase Mapreduce API - Reduce to a file is not working properly.

2014-08-01 Thread Arun Allamsetty
Hi Parkirat, I don't think that HBase is causing the problems. You might already know this but need to add the reducer class to the job as you add the mapper. Also, if you want to read from a HBase table in a MapReduce job, you need to implement the TableMapper for the mapper and if you want to wr

Re: Using HBase to store a directory structure

2014-07-22 Thread Arun Allamsetty
Hi Varun, I am still learning HBase here, so the experts can point out the mistakes I make. Your problem seems to be something which can be easily mapped to a HBase table structure. Firstly, never ever store references in HBase. It doesn't serve any purpose and will just make your queries slower.

Re: HBase appends

2014-07-22 Thread Arun Allamsetty
bine when reading > > The combining process applies to the multi-version approach as well. > > Cheers > > > On Tue, Jul 22, 2014 at 12:32 PM, Arun Allamsetty < > arun.allamse...@gmail.com > > wrote: > > > Hi, > > > > Isn't versioning used for a

Re: HBase appends

2014-07-22 Thread Arun Allamsetty
gt; > > Another way to solve this problem is write a new column for each appended > > list and read all the columns and combine when reading. (I prefer this > > approach since the way Append is implemented internally, it can lead to > > high memstore usage). > > &

HBase appends

2014-07-21 Thread Arun Allamsetty
Hi, If I have a one-to-many relationship in a SQL database (an author might have written many books), and I want to denormalize it for writing in HBase, I'll have a table with the Author as the row key and a *list* of books as values. Now my question is how do I create a *list* such that I could

Re: How to limit columns returned by a single row in HBase

2014-07-19 Thread Arun Allamsetty
Hi, I have an idea which might be just bulloni, but people learn from mistakes and this is my attempt to learn. So if I properly understand user use case, you want to get the first 500 records pertaining to a file based on its file name. Since you want to limit the number of records written, I won

Re: incremental cluster backup using snapshots

2014-07-11 Thread Arun Allamsetty
Hi, People who are more experienced than I am, correct me if I am wrong, but I am positive that we can export snapshots to local FS. I did exactly that today and it's just a hadoop get command once we have the snapshots on HDFS. Though when I directly tried to export them to local FS it failed wit

Re: Using HBase in standalone mode in production

2014-07-07 Thread Arun Allamsetty
cluster to a distributed one, unless > I'm mistaken, you should have no problem doing so. HDFS is quite good with > scaling, whether it's from 10 machines to 20 or 1 to 10 and I don't know of > any reason that HBase would cause any problems in this regard. > > -Dima >

Re: Using HBase in standalone mode in production

2014-07-07 Thread Arun Allamsetty
t using such a setup for any production > use. > > -Dima > > > On Mon, Jul 7, 2014 at 4:25 PM, Arun Allamsetty > > wrote: > > > Hi Ted, > > > > I have. So the book says there are two types of distributed modes. One is > > pseudo distributed, which is

Re: Using HBase in standalone mode in production

2014-07-07 Thread Arun Allamsetty
t; Cheers > > > On Mon, Jul 7, 2014 at 3:55 PM, Arun Allamsetty > > wrote: > > > Hi all, > > > > So this question might be stupid, retarded even, but it has been bugging > me > > for a while and I cannot think of a better place to ask this. I am really > &

Using HBase in standalone mode in production

2014-07-07 Thread Arun Allamsetty
Hi all, So this question might be stupid, retarded even, but it has been bugging me for a while and I cannot think of a better place to ask this. I am really impressed with the way HBase works (as a key-value store). Since it stores everything as a byte array, I find it really convenient to store

Re: HBase chain MapReduce job with broadcasting smaller tables to all Mappers

2014-07-07 Thread Arun Allamsetty
> > Cheers > > > On Thu, Jul 3, 2014 at 9:14 AM, Arun Allamsetty > > wrote: > > > Hi, > > > > I am trying to write a chained MapReduce job on data present in HBase > > tables and need some help with the concept. I am not expecting people to >

HBase chain MapReduce job with broadcasting smaller tables to all Mappers

2014-07-03 Thread Arun Allamsetty
Hi, I am trying to write a chained MapReduce job on data present in HBase tables and need some help with the concept. I am not expecting people to provide code by pseudo code for this based on HBase's Java API would be nice. In a nutshell, what I am trying to do is, MapReduce Job 1: Read data fr