Thank you Jason! How about if I fix the size of each record to the size of the largest record by adding dummy characters to the rest of the records and then set the setMaxInputSplitSize() and setMinInputSplitSize() of FileInputFormat class to this value? The mapper will extract the input after ignoring the dummy characters. Do you think this could work? Thanks.

Regards,
Upendra


----- Original Message ----- From: "Jason Venner" <jason.had...@gmail.com>
To: <common-dev@hadoop.apache.org>
Sent: Friday, November 27, 2009 12:06 AM
Subject: Re: how to set one map task for each input key-value pair


The only thing that comes immediately to mind is to write your own custom
input format that knows how to tell where the boundaries are in your data
set, and uses those to specify the beginning and end of the input splits.

You can also tell the framework not to split your individual input files by
setting the minimum input split size (mapred.min.split.size) to
Long.MAX_VALUE

On Thu, Nov 26, 2009 at 4:53 PM, Upendra Dadi <ud...@gmu.edu> wrote:

Hi,
 I am trying to use MapReduce with some scientific data. I have key-value
pairs such that the size of the value can range from few megabytes to
several hundreds of megabytes. What happens when the size of the value
exceeds block size? How do I set it up so that each key-value pair is
associated with a seperate map? Please some one help. Thanks.

Regards,
Upendra




--
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals


Reply via email to