Re: ImportTsv usage

2011-04-08 Thread Todd Lipcon
Hi Xiyun, My guess is that with the small output, you are fitting each map output in one spill. When you double the output size, it doesn't fit in one spill, and you incur an extra penalty to re-read and merge the output. If you can spare the memory, bump mapred.child.java.opts so that each map t

Re: ImportTsv usage

2011-04-06 Thread Stack
On Wed, Apr 6, 2011 at 9:10 PM, Gan, Xiyun wrote: > A 12-nodes cluster, HBase version is 0.89.20100924. Please upgrade to 0.90.1 at least. > The inputs are the same, about 15 million lines of text. I'm sure the time > cost of parsing a line is low. How much difference in the size of the output

Re: ImportTsv usage

2011-04-06 Thread Gan, Xiyun
A 12-nodes cluster, HBase version is 0.89.20100924. The inputs are the same, about 15 million lines of text. I'm sure the time cost of parsing a line is low. The added k/v pair in map() function is very simple, even the added code is String strKey = "key"; ImmutableBytesWritable r

Re: ImportTsv usage

2011-04-06 Thread Stack
Tell us more about how you are doing the measurement. Are you profiling with ten inputs or one million? Is this on a single node or a thousand node cluster? What version of HBase? Thank you, St.Ack On Wed, Apr 6, 2011 at 7:54 PM, Gan, Xiyun wrote: > Hi, >   I need to use bulk load functionali

ImportTsv usage

2011-04-06 Thread Gan, Xiyun
Hi, I need to use bulk load functionality in HBase. I have read the documentation on HBase wiki page, but the ImportTsv tool does not meet my need, so I added some code to the map() function in ImportTsv.java. Originally, that map() function writes only one key/value pair to the context. In my m