Re: speeding up rowcount

2011-10-29 Thread Ted Yu
to return all rows sorted > by > > > > rowkey, hence it cannot scan regions in parallel by default.I have > not > > > > played much HBase with MapReduce, but if order is not important you > can > > > to > > > > split the scan into multiple scans. > > > > &g

Re: speeding up rowcount

2011-10-29 Thread Rita
r a scan is to return all rows sorted by > > > rowkey, hence it cannot scan regions in parallel by default.I have not > > > played much HBase with MapReduce, but if order is not important you can > > to > > > split the scan into multiple scans. > > > > &g

Re: speeding up rowcount

2011-10-29 Thread Ted Yu
t; > > > > > - Original Message ----- > > > From: Tom Goren > > > To: user@hbase.apache.org > > > Cc: > > > Sent: Sunday, October 9, 2011 8:07 AM > > > Subject: Re: speeding up rowcount > > > > > > lol - i just ran a

Re: speeding up rowcount

2011-10-29 Thread Rita
plit the scan into multiple scans. > > > > > > - Original Message - > > From: Tom Goren > > To: user@hbase.apache.org > > Cc: > > Sent: Sunday, October 9, 2011 8:07 AM > > Subject: Re: speeding up rowcount > > > > lol - i just ran a rowcou

Re: speeding up rowcount

2011-10-09 Thread Himanshu Vashishtha
lel by default.I have not played much > HBase with MapReduce, but if order is not important you can to split the scan > into multiple scans. > > > - Original Message - > From: Tom Goren > To: user@hbase.apache.org > Cc: > Sent: Sunday, October 9, 2011 8:07 AM &g

Re: speeding up rowcount

2011-10-09 Thread lars hofhansl
Goren To: user@hbase.apache.org Cc: Sent: Sunday, October 9, 2011 8:07 AM Subject: Re: speeding up rowcount lol - i just ran a rowcount via mapreduce, and it took 6 hours for 7.5 million rows... On Sun, Oct 9, 2011 at 7:50 AM, Rita wrote: > Hi, > > I have been doing a rowcount via mapr

Re: speeding up rowcount

2011-10-09 Thread Ryan Rawson
Are you sure the job is running on the cluster and not running in single node mode? This happens a lot... On Oct 9, 2011 7:50 AM, "Rita" wrote: > Hi, > > I have been doing a rowcount via mapreduce and its taking about 4-5 hours > to > count a 500million rows in a table. I was wondering if there a

Re: speeding up rowcount

2011-10-09 Thread Ted Yu
That is fine. We should also allow users to override cache value. On Sun, Oct 9, 2011 at 9:26 AM, Himanshu Vashishtha wrote: > Since a RowCounter uses FirstKeyOnlyFilter, we can have a default Scan > cache value of 500 or so? > > Himanshu > > On Sun, Oct 9, 2011 at 9:44 AM, Ted Yu wrote: > > Ex

Re: speeding up rowcount

2011-10-09 Thread Himanshu Vashishtha
Since a RowCounter uses FirstKeyOnlyFilter, we can have a default Scan cache value of 500 or so? Himanshu On Sun, Oct 9, 2011 at 9:44 AM, Ted Yu wrote: > Excellent question. > There seems to be a bug for RowCounter. > > In TableInputFormat: >        if (conf.get(SCAN_CACHEDROWS) != null) { >    

Re: speeding up rowcount

2011-10-09 Thread Ted Yu
Excellent question. There seems to be a bug for RowCounter. In TableInputFormat: if (conf.get(SCAN_CACHEDROWS) != null) { scan.setCaching(Integer.parseInt(conf.get(SCAN_CACHEDROWS))); } But I don't see SCAN_CACHEDROWS in either TableMapReduceUtil or RowCounter. Mind fili

Re: speeding up rowcount

2011-10-09 Thread Rita
Thanks for the responses. Where do I set the high Scan cache values? On Sun, Oct 9, 2011 at 11:19 AM, Himanshu Vashishtha < hvash...@cs.ualberta.ca> wrote: > Since a MapReduce is a separate process, try with a high Scan cache value. > > http://hbase.apache.org/book.html#perf.hbase.client.cachin

Re: speeding up rowcount

2011-10-09 Thread Himanshu Vashishtha
Since a MapReduce is a separate process, try with a high Scan cache value. http://hbase.apache.org/book.html#perf.hbase.client.caching Himanshu On Sun, Oct 9, 2011 at 9:09 AM, Ted Yu wrote: > I guess your hbase.hregion.max.filesize is quite high. > If possible, lower its value so that you have

Re: speeding up rowcount

2011-10-09 Thread Ted Yu
I guess your hbase.hregion.max.filesize is quite high. If possible, lower its value so that you have smaller regions. On Sun, Oct 9, 2011 at 7:50 AM, Rita wrote: > Hi, > > I have been doing a rowcount via mapreduce and its taking about 4-5 hours > to > count a 500million rows in a table. I was w

Re: speeding up rowcount

2011-10-09 Thread Tom Goren
lol - i just ran a rowcount via mapreduce, and it took 6 hours for 7.5 million rows... On Sun, Oct 9, 2011 at 7:50 AM, Rita wrote: > Hi, > > I have been doing a rowcount via mapreduce and its taking about 4-5 hours > to > count a 500million rows in a table. I was wondering if there are any map >

speeding up rowcount

2011-10-09 Thread Rita
Hi, I have been doing a rowcount via mapreduce and its taking about 4-5 hours to count a 500million rows in a table. I was wondering if there are any map reduce tunings I can do so it will go much faster. I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any tuning advice would be