Re: Value-Only Reduce Output

jason hadoop Tue, 03 Feb 2009 21:29:06 -0800

Ooops, you are using streaming., and I am not familar.
As a terrible hack, you could set mapred.textoutputformat.separator to the
empty string, in your configuration.


On Tue, Feb 3, 2009 at 9:26 PM, jason hadoop <[email protected]> wrote:

> If you are using the standard TextOutputFormat, and the output collector is
> passed a null for the value, there will not be a trailing tab character
> added to the output line.
>
> output.collect( key, null );
> Will give you the behavior you are looking for if your configuration is as
> I expect.
>
>
> On Tue, Feb 3, 2009 at 7:49 PM, Jack Stahl <[email protected]> wrote:
>
>> Hello,
>>
>> I'm interested in a map-reduce flow where I output only values (no keys)
>> in
>> my reduce step.  For example, imagine the canonical word-counting program
>> where I'd like my output to be an unlabeled histogram of counts instead of
>> (word, count) pairs.
>>
>> I'm using HadoopStreaming (specifically, I'm using the dumbo module to run
>> my python scripts).  When I simulate the map reduce using pipes and sort
>> in
>> bash, it works fine.   However, in Hadoop, if I output a value with no
>> tabs,
>> Hadoop appends a trailing "\t", apparently interpreting my output as a
>> (value, "") KV pair.  I'd like to avoid outputing this trailing tab if
>> possible.
>>
>> Is there a command line option that could be use to effect this?  More
>> generally, is there something wrong with outputing arbitrary strings,
>> instead of key-value pairs, in your reduce step?
>>
>
>

Re: Value-Only Reduce Output

Reply via email to