See http://hbase.apache.org/book/performance.html St.Ack
On Tue, Jun 7, 2011 at 1:08 AM, Andre Reiter <[email protected]> wrote: > now i found out, that there are three regions, each on a particular region > server (server2, server3, server4) > the processing time is still >=60sec, which is not very impressive... > > what can i do, to speed up the table scan > > best regards > andre > > > Andreas Reiter wrote: >> >> hello everybody >> >> i'm trying to scan my hbase table for reporting purposes >> the cluster has 4 servers: >> - server1: namenode, secondary namenode, jobtracker, hbase master, >> zookeeper1 >> - server2: datanode, tasktracker, hbase regionserver, zookeeper2 >> - server3: datanode, tasktracker, hbase regionserver, zookeeper3 >> - server4: datanode, tasktracker, hbase regionserver >> everything seems to work properly >> versions: >> - hadoop-0.20.2-CDH3B4 >> - hbase-0.90.1-CDH3B4 >> - zookeeper-3.3.2-CDH3B4 >> >> >> at the moment our hbase table has 300000 entries >> >> if i do a table scan over the hbase api (at the moment without a filter) >> ResultScanner scanner = table.getScanner(...); >> >> it takes about 60 seconds to process, which is actually okey, because all >> records are processed be only one thread sequentially >> BUT it takes approximately the same time, if i do a scan over Map&Reduce >> job using TableInputFormat >> >> i'm definitely doing something wrong, because the processing time is going >> up directly proportional to the number of rows. >> in my understanding, the big advantage of hadoop/hbase is, that huge >> numbers of entries can be processed in parallel and very fast >> >> 300k entries are not much, we expecting this number to be added hourly to >> our cluster, but the processing time is increasing, which is actually not >> acceptable >> >> any one an idea, what i'm doing wrong? >> >> best regards >> andre >> >> > > >
