The only thing that comes immediately to mind is to write your own custom
input format that knows how to tell where the boundaries are in your data
set, and uses those to specify the beginning and end of the input splits.

You can also tell the framework not to split your individual input files by
setting the minimum input split size (mapred.min.split.size) to
Long.MAX_VALUE

On Thu, Nov 26, 2009 at 4:53 PM, Upendra Dadi <ud...@gmu.edu> wrote:

> Hi,
>  I am trying to use MapReduce with some scientific data. I have key-value
> pairs such that the size of the value can range from few megabytes to
> several hundreds of megabytes. What happens when the size of the value
> exceeds block size? How do I set it up so that each key-value pair is
> associated with a seperate map? Please some one help. Thanks.
>
> Regards,
> Upendra
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Reply via email to