MapReduce support in HBase inherently provides parallelism such that each Region is given to one mapper.
Himanshu On Sun, Oct 9, 2011 at 6:44 PM, lars hofhansl <[email protected]> wrote: > Be aware that the contract for a scan is to return all rows sorted by rowkey, > hence it cannot scan regions in parallel by default.I have not played much > HBase with MapReduce, but if order is not important you can to split the scan > into multiple scans. > > > ----- Original Message ----- > From: Tom Goren <[email protected]> > To: [email protected] > Cc: > Sent: Sunday, October 9, 2011 8:07 AM > Subject: Re: speeding up rowcount > > lol - i just ran a rowcount via mapreduce, and it took 6 hours for 7.5 > million rows... > > On Sun, Oct 9, 2011 at 7:50 AM, Rita <[email protected]> wrote: > >> Hi, >> >> I have been doing a rowcount via mapreduce and its taking about 4-5 hours >> to >> count a 500million rows in a table. I was wondering if there are any map >> reduce tunings I can do so it will go much faster. >> >> I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any >> tuning >> advice would be much appreciated. >> >> >> -- >> --- Get your facts first, then you can distort them as you please.-- >> > >
