Thanks Zhengguo for your answer.

I have read the source of LineRecordReader, it seems that the start
and end point is determined  roughly by FileSplit. I track the code to
FileSplit and found that the split is made by FileInputFormat's
getSplits() function. The FileSplit is "rough" and the record
integrality is ensured in LineRecordReader.


On Thu, Jun 11, 2009 at 11:12 PM, Zhengguo 'Mike'
SUN<[email protected]> wrote:
> Mapper2 doesn't wait for Mapper1. They starts at the same time. It knows the 
> "real" record by looking at the characters he reads. If he sees a newline, 
> then that is the start of a "real" record. It discards all the stuff before 
> that newline. Check the source code of LineRecordReader. You will get more 
> detailed information for that.
>
> ________________________________
> From: Zhong Wang <[email protected]>
> To: [email protected]
> Sent: Thursday, June 11, 2009 10:47:48 AM
> Subject: Re: Large size Text file split
>
>> Mapper 2 starts reading at byte 10000. It finds the first newline at byte
>> 10020, so the first "real" record it processes starts at byte 10021.
>>
>
> There's one problem: how does Mapper2 know the "real" record start at
> 10021 before Mapper1 reach the end of Split1 (9999)? Mappers starts at
> the same time.
>
>
> --
> Zhong Wang
>
>
>
>



-- 
Zhong Wang

Reply via email to