Yeah, I thought of that, but I was concerned that, even if it did work, if it wasn't guaranteed behavior, that it might stop working in a future release. I'll go ahead and give that a try.
Can anybody provide details on this new API? Thanks for the response! On Tue, Mar 17, 2009 at 7:29 AM, Jingkei Ly <jingkei...@detica.com> wrote: > You should be able to keep a reference to the OutputCollector provided > to the #map() method, and then use it in the #close() method. > > I believe that there's a new API that will actually provide the output > collector to the close() method via a context object, but in the mean > time I think the above should work. > > -----Original Message----- > From: Stuart White [mailto:stuart.whi...@gmail.com] > Sent: 17 March 2009 12:13 > To: core-user@hadoop.apache.org > Subject: Release batched-up output records at end-of-job? > > I have a mapred job that simply performs data transformations in its > Mapper. I don't need sorting or reduction, so I don't use a Reducer. > > Without getting too detailed, the nature of my processing is such that > it is much more efficient if I can process blocks of records > at-a-time. So, what I'd like to do is, in my Mapper, in the map() > function, simply add the incoming record to a list, and once that list > reaches a certain size, process the batched-up records, and then call > output.collect() multiple times to release the output records, each > corresponding to one of the input records. > > At the end of the job, my Mappers will have partially full blocks of > records. I'd like to go ahead and process these blocks at end-of-job, > regardless of their sizes, and release the corresponding output > records. > > How can I accomplish this? In my Mapper#map(), I have no way of > knowing whether a record is the final record. The only end-of-job > hook that I'm aware of is for my Mapper to override > MapReduceBase#close(), but when in that method, there is no > OutputCollector available. > > Is it possible to batch-up records, and at end-of-job, process and > release any final partial blocks? > > Thanks! > > > > This message should be regarded as confidential. If you have received this > email in error please notify the sender and destroy it immediately. > Statements of intent shall only become binding when confirmed in hard copy by > an authorised signatory. The contents of this email may relate to dealings > with other companies within the Detica Group plc group of companies. > > Detica Limited is registered in England under No: 1337451. > > Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England. > > >