Hi Xiyun,
My guess is that with the small output, you are fitting each map output in
one spill. When you double the output size, it doesn't fit in one spill, and
you incur an extra penalty to re-read and merge the output.
If you can spare the memory, bump mapred.child.java.opts so that each map
t
On Wed, Apr 6, 2011 at 9:10 PM, Gan, Xiyun wrote:
> A 12-nodes cluster, HBase version is 0.89.20100924.
Please upgrade to 0.90.1 at least.
> The inputs are the same, about 15 million lines of text. I'm sure the time
> cost of parsing a line is low.
How much difference in the size of the output
A 12-nodes cluster, HBase version is 0.89.20100924.
The inputs are the same, about 15 million lines of text. I'm sure the time
cost of parsing a line is low.
The added k/v pair in map() function is very simple, even the added code is
String strKey = "key";
ImmutableBytesWritable r
Tell us more about how you are doing the measurement. Are you
profiling with ten inputs or one million? Is this on a single node or
a thousand node cluster? What version of HBase?
Thank you,
St.Ack
On Wed, Apr 6, 2011 at 7:54 PM, Gan, Xiyun wrote:
> Hi,
> I need to use bulk load functionali
Hi,
I need to use bulk load functionality in HBase. I have read the
documentation on HBase wiki page, but the ImportTsv tool does not meet my
need, so I added some code to the map() function in ImportTsv.java.
Originally, that map() function writes only one key/value pair to the
context. In my m