RE: Multiple outputs and getmerge?

Koji Noguchi Tue, 21 Apr 2009 13:55:56 -0700

Something in the lines of 

... class MyOutputFormat extends MultipleTextOutputFormat<Text, Text> {
    protected String generateFileNameForKeyValue(Text key, 
                                                 Text v, String name) {
      Path outpath = new Path(key.toString(), name);
      return outpath.toString();
    }
  }


would create a directory per key.

If you just want to keep your side-effect files separate, then 
get your working dir by 
FileOutputFormat.getWorkOutputPath(...) 
or $mapred_work_output_dir

and dfs -mkdir <workdir>/NewDir and put the secondary files there.

Explained in 

http://hadoop.apache.org/core/docs/r0.18.3/api/org/apache/hadoop/mapred/FileOutputFormat.html#getWorkOutputPath(org.apache.hadoop.mapred.JobConf)


Koji


-----Original Message-----
From: Stuart White [mailto:[email protected]] 
Sent: Tuesday, April 21, 2009 11:46 AM
To: [email protected]
Subject: Re: Multiple outputs and getmerge?

On Tue, Apr 21, 2009 at 1:00 PM, Koji Noguchi <[email protected]> wrote:
>
> I once used MultipleOutputFormat and created
>   (mapred.work.output.dir)/type1/part-_____
>   (mapred.work.output.dir)/type2/part-_____
>    ...
>
> And JobTracker took care of the renaming to
>   (mapred.output.dir)/type{1,2}/part-______
>
> Would that work for you?

Can you please explain this in more detail?  It looks like you're
using MultipleOutputFormat for *both* of your outputs?  So, you simply
don't use the OutputCollector passed as a parm to Mapper#map()?

RE: Multiple outputs and getmerge?

Reply via email to