to return all rows sorted
> by
> > > > rowkey, hence it cannot scan regions in parallel by default.I have
> not
> > > > played much HBase with MapReduce, but if order is not important you
> can
> > > to
> > > > split the scan into multiple scans.
> > > > &g
r a scan is to return all rows sorted by
> > > rowkey, hence it cannot scan regions in parallel by default.I have not
> > > played much HBase with MapReduce, but if order is not important you can
> > to
> > > split the scan into multiple scans.
> > > >
&g
t; > >
> > > - Original Message -----
> > > From: Tom Goren
> > > To: user@hbase.apache.org
> > > Cc:
> > > Sent: Sunday, October 9, 2011 8:07 AM
> > > Subject: Re: speeding up rowcount
> > >
> > > lol - i just ran a
plit the scan into multiple scans.
> >
> >
> > - Original Message -
> > From: Tom Goren
> > To: user@hbase.apache.org
> > Cc:
> > Sent: Sunday, October 9, 2011 8:07 AM
> > Subject: Re: speeding up rowcount
> >
> > lol - i just ran a rowcou
lel by default.I have not played much
> HBase with MapReduce, but if order is not important you can to split the scan
> into multiple scans.
>
>
> - Original Message -
> From: Tom Goren
> To: user@hbase.apache.org
> Cc:
> Sent: Sunday, October 9, 2011 8:07 AM
&g
Goren
To: user@hbase.apache.org
Cc:
Sent: Sunday, October 9, 2011 8:07 AM
Subject: Re: speeding up rowcount
lol - i just ran a rowcount via mapreduce, and it took 6 hours for 7.5
million rows...
On Sun, Oct 9, 2011 at 7:50 AM, Rita wrote:
> Hi,
>
> I have been doing a rowcount via mapr
Are you sure the job is running on the cluster and not running in single
node mode? This happens a lot...
On Oct 9, 2011 7:50 AM, "Rita" wrote:
> Hi,
>
> I have been doing a rowcount via mapreduce and its taking about 4-5 hours
> to
> count a 500million rows in a table. I was wondering if there a
That is fine.
We should also allow users to override cache value.
On Sun, Oct 9, 2011 at 9:26 AM, Himanshu Vashishtha wrote:
> Since a RowCounter uses FirstKeyOnlyFilter, we can have a default Scan
> cache value of 500 or so?
>
> Himanshu
>
> On Sun, Oct 9, 2011 at 9:44 AM, Ted Yu wrote:
> > Ex
Since a RowCounter uses FirstKeyOnlyFilter, we can have a default Scan
cache value of 500 or so?
Himanshu
On Sun, Oct 9, 2011 at 9:44 AM, Ted Yu wrote:
> Excellent question.
> There seems to be a bug for RowCounter.
>
> In TableInputFormat:
> if (conf.get(SCAN_CACHEDROWS) != null) {
>
Excellent question.
There seems to be a bug for RowCounter.
In TableInputFormat:
if (conf.get(SCAN_CACHEDROWS) != null) {
scan.setCaching(Integer.parseInt(conf.get(SCAN_CACHEDROWS)));
}
But I don't see SCAN_CACHEDROWS in either TableMapReduceUtil or RowCounter.
Mind fili
Thanks for the responses.
Where do I set the high Scan cache values?
On Sun, Oct 9, 2011 at 11:19 AM, Himanshu Vashishtha <
hvash...@cs.ualberta.ca> wrote:
> Since a MapReduce is a separate process, try with a high Scan cache value.
>
> http://hbase.apache.org/book.html#perf.hbase.client.cachin
Since a MapReduce is a separate process, try with a high Scan cache value.
http://hbase.apache.org/book.html#perf.hbase.client.caching
Himanshu
On Sun, Oct 9, 2011 at 9:09 AM, Ted Yu wrote:
> I guess your hbase.hregion.max.filesize is quite high.
> If possible, lower its value so that you have
I guess your hbase.hregion.max.filesize is quite high.
If possible, lower its value so that you have smaller regions.
On Sun, Oct 9, 2011 at 7:50 AM, Rita wrote:
> Hi,
>
> I have been doing a rowcount via mapreduce and its taking about 4-5 hours
> to
> count a 500million rows in a table. I was w
lol - i just ran a rowcount via mapreduce, and it took 6 hours for 7.5
million rows...
On Sun, Oct 9, 2011 at 7:50 AM, Rita wrote:
> Hi,
>
> I have been doing a rowcount via mapreduce and its taking about 4-5 hours
> to
> count a 500million rows in a table. I was wondering if there are any map
>
Hi,
I have been doing a rowcount via mapreduce and its taking about 4-5 hours to
count a 500million rows in a table. I was wondering if there are any map
reduce tunings I can do so it will go much faster.
I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any tuning
advice would be
15 matches
Mail list logo