Re: speeding up rowcount

Himanshu Vashishtha Sun, 09 Oct 2011 18:05:56 -0700

MapReduce support in HBase inherently provides parallelism such that
each Region is given to one mapper.


Himanshu

On Sun, Oct 9, 2011 at 6:44 PM, lars hofhansl <[email protected]> wrote:
> Be aware that the contract for a scan is to return all rows sorted by rowkey, 
> hence it cannot scan regions in parallel by default.I have not played much 
> HBase with MapReduce, but if order is not important you can to split the scan 
> into multiple scans.
>
>
> ----- Original Message -----
> From: Tom Goren <[email protected]>
> To: [email protected]
> Cc:
> Sent: Sunday, October 9, 2011 8:07 AM
> Subject: Re: speeding up rowcount
>
> lol - i just ran a rowcount via mapreduce, and it took 6 hours for 7.5
> million rows...
>
> On Sun, Oct 9, 2011 at 7:50 AM, Rita <[email protected]> wrote:
>
>> Hi,
>>
>> I have been doing a rowcount via mapreduce and its taking about 4-5 hours
>> to
>> count a 500million rows in a table. I was wondering if there are any map
>> reduce tunings I can do so it will go much faster.
>>
>> I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any
>> tuning
>> advice would be much appreciated.
>>
>>
>> --
>> --- Get your facts first, then you can distort them as you please.--
>>
>
>

Re: speeding up rowcount

Reply via email to