Yeah, I thought of that, but I was concerned that, even if it did
work, if it wasn't guaranteed behavior, that it might stop working in
a future release.  I'll go ahead and give that a try.

Can anybody provide details on this new API?

Thanks for the response!

On Tue, Mar 17, 2009 at 7:29 AM, Jingkei Ly <jingkei...@detica.com> wrote:
> You should be able to keep a reference to the OutputCollector provided
> to the #map() method, and then use it in the #close() method.
>
> I believe that there's a new API that will actually provide the output
> collector to the close() method via a context object, but in the mean
> time I think the above should work.
>
> -----Original Message-----
> From: Stuart White [mailto:stuart.whi...@gmail.com]
> Sent: 17 March 2009 12:13
> To: core-user@hadoop.apache.org
> Subject: Release batched-up output records at end-of-job?
>
> I have a mapred job that simply performs data transformations in its
> Mapper.  I don't need sorting or reduction, so I don't use a Reducer.
>
> Without getting too detailed, the nature of my processing is such that
> it is much more efficient if I can process blocks of records
> at-a-time.  So, what I'd like to do is, in my Mapper, in the map()
> function, simply add the incoming record to a list, and once that list
> reaches a certain size, process the batched-up records, and then call
> output.collect() multiple times to release the output records, each
> corresponding to one of the input records.
>
> At the end of the job, my Mappers will have partially full blocks of
> records.  I'd like to go ahead and process these blocks at end-of-job,
> regardless of their sizes, and release the corresponding output
> records.
>
> How can I accomplish this?  In my Mapper#map(), I have no way of
> knowing whether a record is the final record.  The only end-of-job
> hook that I'm aware of is for my Mapper to override
> MapReduceBase#close(), but when in that method, there is no
> OutputCollector available.
>
> Is it possible to batch-up records, and at end-of-job, process and
> release any final partial blocks?
>
> Thanks!
>
>
>
> This message should be regarded as confidential. If you have received this 
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy by 
> an authorised signatory.  The contents of this email may relate to dealings 
> with other companies within the Detica Group plc group of companies.
>
> Detica Limited is registered in England under No: 1337451.
>
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
>
>
>

Reply via email to