Re: full table scan

Stack Tue, 07 Jun 2011 10:29:49 -0700

See http://hbase.apache.org/book/performance.html
St.Ack


On Tue, Jun 7, 2011 at 1:08 AM, Andre Reiter <[email protected]> wrote:
> now i found out, that there are three regions, each on a particular region
> server (server2, server3, server4)
> the processing time is still >=60sec, which is not very impressive...
>
> what can i do, to speed up the table scan
>
> best regards
> andre
>
>
> Andreas Reiter wrote:
>>
>> hello everybody
>>
>> i'm trying to scan my hbase table for reporting purposes
>> the cluster has 4 servers:
>> - server1: namenode, secondary namenode, jobtracker, hbase master,
>> zookeeper1
>> - server2: datanode, tasktracker, hbase regionserver, zookeeper2
>> - server3: datanode, tasktracker, hbase regionserver, zookeeper3
>> - server4: datanode, tasktracker, hbase regionserver
>> everything seems to work properly
>> versions:
>> - hadoop-0.20.2-CDH3B4
>> - hbase-0.90.1-CDH3B4
>> - zookeeper-3.3.2-CDH3B4
>>
>>
>> at the moment our hbase table has 300000 entries
>>
>> if i do a table scan over the hbase api (at the moment without a filter)
>> ResultScanner scanner = table.getScanner(...);
>>
>> it takes about 60 seconds to process, which is actually okey, because all
>> records are processed be only one thread sequentially
>> BUT it takes approximately the same time, if i do a scan over Map&Reduce
>> job using TableInputFormat
>>
>> i'm definitely doing something wrong, because the processing time is going
>> up directly proportional to the number of rows.
>> in my understanding, the big advantage of hadoop/hbase is, that huge
>> numbers of entries can be processed in parallel and very fast
>>
>> 300k entries are not much, we expecting this number to be added hourly to
>> our cluster, but the processing time is increasing, which is actually not
>> acceptable
>>
>> any one an idea, what i'm doing wrong?
>>
>> best regards
>> andre
>>
>>
>
>
>

Re: full table scan

Reply via email to