You should be able to keep a reference to the OutputCollector provided to the #map() method, and then use it in the #close() method.
I believe that there's a new API that will actually provide the output collector to the close() method via a context object, but in the mean time I think the above should work. -----Original Message----- From: Stuart White [mailto:stuart.whi...@gmail.com] Sent: 17 March 2009 12:13 To: core-user@hadoop.apache.org Subject: Release batched-up output records at end-of-job? I have a mapred job that simply performs data transformations in its Mapper. I don't need sorting or reduction, so I don't use a Reducer. Without getting too detailed, the nature of my processing is such that it is much more efficient if I can process blocks of records at-a-time. So, what I'd like to do is, in my Mapper, in the map() function, simply add the incoming record to a list, and once that list reaches a certain size, process the batched-up records, and then call output.collect() multiple times to release the output records, each corresponding to one of the input records. At the end of the job, my Mappers will have partially full blocks of records. I'd like to go ahead and process these blocks at end-of-job, regardless of their sizes, and release the corresponding output records. How can I accomplish this? In my Mapper#map(), I have no way of knowing whether a record is the final record. The only end-of-job hook that I'm aware of is for my Mapper to override MapReduceBase#close(), but when in that method, there is no OutputCollector available. Is it possible to batch-up records, and at end-of-job, process and release any final partial blocks? Thanks! This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies within the Detica Group plc group of companies. Detica Limited is registered in England under No: 1337451. Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.