My (0.18.2) reduce src looks like this:
write(key);
clientOut_.write('\t');
write(val);
clientOut_.write('\n');
which explains why avoiding the trailing tab is unavoidable.
Thanks for your help, though, Jason!
2009/2/4 jason hadoop <[email protected]>
> For your reduce, the parameter is stream.reduce.input.field.separator, if
> you are supplying a reduce class and I believe the output format is
> TextOutputFormat...
>
> It looks like you have tried the map parameter for the separator, not the
> reduce parameter.
>
> From 0.19.0 PipeReducer:
> configure:
> reduceOutFieldSeparator =
> job_.get("stream.reduce.output.field.separator", "\t").getBytes("UTF-8");
> reduceInputFieldSeparator =
> job_.get("stream.reduce.input.field.separator", "\t").getBytes("UTF-8");
> this.numOfReduceOutputKeyFields =
> job_.getInt("stream.num.reduce.output.key.fields", 1);
>
> getInputSeparator:
> byte[] getInputSeparator() {
> return reduceInputFieldSeparator;
> }
>
> reduce:
> write(key);
> * clientOut_.write(getInputSeparator());*
> write(val);
> clientOut_.write('\n');
> } else {
> // "identity reduce"
> * output.collect(key, val);*
> }
>
>
> On Wed, Feb 4, 2009 at 6:15 AM, Rasit OZDAS <[email protected]> wrote:
>
> > I tried it myself, it doesn't work.
> > I've also tried stream.map.output.field.separator and
> > map.output.key.field.separator parameters for this purpose, they
> > don't work either. When hadoop sees empty string, it takes default tab
> > character instead.
> >
> > Rasit
> >
> > 2009/2/4 jason hadoop <[email protected]>
> > >
> > > Ooops, you are using streaming., and I am not familar.
> > > As a terrible hack, you could set mapred.textoutputformat.separator to
> > the
> > > empty string, in your configuration.
> > >
> > > On Tue, Feb 3, 2009 at 9:26 PM, jason hadoop <[email protected]>
> > wrote:
> > >
> > > > If you are using the standard TextOutputFormat, and the output
> > collector is
> > > > passed a null for the value, there will not be a trailing tab
> character
> > > > added to the output line.
> > > >
> > > > output.collect( key, null );
> > > > Will give you the behavior you are looking for if your configuration
> is
> > as
> > > > I expect.
> > > >
> > > >
> > > > On Tue, Feb 3, 2009 at 7:49 PM, Jack Stahl <[email protected]> wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> I'm interested in a map-reduce flow where I output only values (no
> > keys)
> > > >> in
> > > >> my reduce step. For example, imagine the canonical word-counting
> > program
> > > >> where I'd like my output to be an unlabeled histogram of counts
> > instead of
> > > >> (word, count) pairs.
> > > >>
> > > >> I'm using HadoopStreaming (specifically, I'm using the dumbo module
> to
> > run
> > > >> my python scripts). When I simulate the map reduce using pipes and
> > sort
> > > >> in
> > > >> bash, it works fine. However, in Hadoop, if I output a value with
> no
> > > >> tabs,
> > > >> Hadoop appends a trailing "\t", apparently interpreting my output as
> a
> > > >> (value, "") KV pair. I'd like to avoid outputing this trailing tab
> if
> > > >> possible.
> > > >>
> > > >> Is there a command line option that could be use to effect this?
> More
> > > >> generally, is there something wrong with outputing arbitrary
> strings,
> > > >> instead of key-value pairs, in your reduce step?
> > > >>
> > > >
> > > >
> >
> >
> >
> > --
> > M. Raşit ÖZDAŞ
> >
>