How many regions does your table have? On Mon, Jun 6, 2011 at 4:48 AM, Andreas Reiter <[email protected]> wrote: > hello everybody > > i'm trying to scan my hbase table for reporting purposes > the cluster has 4 servers: > - server1: namenode, secondary namenode, jobtracker, hbase master, > zookeeper1 > - server2: datanode, tasktracker, hbase regionserver, zookeeper2 > - server3: datanode, tasktracker, hbase regionserver, zookeeper3 > - server4: datanode, tasktracker, hbase regionserver > everything seems to work properly > versions: > - hadoop-0.20.2-CDH3B4 > - hbase-0.90.1-CDH3B4 > - zookeeper-3.3.2-CDH3B4 > > > at the moment our hbase table has 300000 entries > > if i do a table scan over the hbase api (at the moment without a filter) > ResultScanner scanner = table.getScanner(...); > > it takes about 60 seconds to process, which is actually okey, because all > records are processed be only one thread sequentially > BUT it takes approximately the same time, if i do a scan over Map&Reduce job > using TableInputFormat > > i'm definitely doing something wrong, because the processing time is going > up directly proportional to the number of rows. > in my understanding, the big advantage of hadoop/hbase is, that huge numbers > of entries can be processed in parallel and very fast > > 300k entries are not much, we expecting this number to be added hourly to > our cluster, but the processing time is increasing, which is actually not > acceptable > > any one an idea, what i'm doing wrong? > > best regards > andre > >
-- Joseph Echeverria Cloudera, Inc. 443.305.9434
