Re: how to set one map task for each input key-value pair

Upendra Dadi Fri, 27 Nov 2009 08:24:40 -0800

Thank you Jason! How about if I fix the size of each record to the size ofthe largest record by adding dummy characters to the rest of the records andthen set the setMaxInputSplitSize() and setMinInputSplitSize() ofFileInputFormat class to this value? The mapper will extract the input afterignoring the dummy characters. Do you think this could work? Thanks.


Regards,
Upendra

----- Original Message -----From: "Jason Venner" <jason.had...@gmail.com>

To: <common-dev@hadoop.apache.org>
Sent: Friday, November 27, 2009 12:06 AM
Subject: Re: how to set one map task for each input key-value pair

The only thing that comes immediately to mind is to write your own custom
input format that knows how to tell where the boundaries are in your data
set, and uses those to specify the beginning and end of the input splits.

You can also tell the framework not to split your individual input filesby

setting the minimum input split size (mapred.min.split.size) to
Long.MAX_VALUE

On Thu, Nov 26, 2009 at 4:53 PM, Upendra Dadi <ud...@gmu.edu> wrote:

Hi,
 I am trying to use MapReduce with some scientific data. I have key-value
pairs such that the size of the value can range from few megabytes to
several hundreds of megabytes. What happens when the size of the value
exceeds block size? How do I set it up so that each key-value pair is
associated with a seperate map? Please some one help. Thanks.

Regards,
Upendra




--
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: how to set one map task for each input key-value pair

Reply via email to