You can use pig to do what "hadoop fs -getmerge" is doing in a separate pig script. It will still be one reducer though.
On Tue, May 28, 2013 at 8:29 AM, Alan Gates <[email protected]> wrote: > Nothing that uses MapReduce as an underlying execution engine creates a > single file when running multiple reducers because MapReduce doesn't. The > real question is if you want to keep the file on Hadoop, why worry about > whether it's a single file? Most applications on Hadoop will take a > directory as an input and read all the files contained in it. > > Alan. > > On May 24, 2013, at 12:11 PM, Mix Nin wrote: > > > STORE command produces multiple output files. I want a single output file > > and I tried using command as below > > > > STORE (foreach (group NoNullData all) generate flatten($1)) into 'xxxx'; > > > > This command produces one single file but at the same time forces to use > > single reducer which kills performance. > > > > How do I overcome the scenario? > > > > Normally STORE command produces multiple output files, apart from that > I > > see another file > > "_SUCCESS" in output directory. I ma generating metadata file ( using > > PigStorage('\t', '-schema') ) in output directory > > > > I thought of using getmerge as follows > > > > *hadoop* fs -*getmerge* <dir_of_input_files> <local file> > > > > But this requires > > 1)eliminating files other than data files in HDFS directory > > 2)It creates a single file in local directory but not in HDFS directory > > 3)I need to again move file from local directory to HDFS directory which > > may take additional time , depending on size of single file > > 4)I need to agin place the files which I eliminated in Step 1 > > > > > > Is there an efficient way for my problem? > > > > Thanks > > -- "...:::Aniket:::... Quetzalco@tl"
