Ooops, you are using streaming., and I am not familar. As a terrible hack, you could set mapred.textoutputformat.separator to the empty string, in your configuration.
On Tue, Feb 3, 2009 at 9:26 PM, jason hadoop <[email protected]> wrote: > If you are using the standard TextOutputFormat, and the output collector is > passed a null for the value, there will not be a trailing tab character > added to the output line. > > output.collect( key, null ); > Will give you the behavior you are looking for if your configuration is as > I expect. > > > On Tue, Feb 3, 2009 at 7:49 PM, Jack Stahl <[email protected]> wrote: > >> Hello, >> >> I'm interested in a map-reduce flow where I output only values (no keys) >> in >> my reduce step. For example, imagine the canonical word-counting program >> where I'd like my output to be an unlabeled histogram of counts instead of >> (word, count) pairs. >> >> I'm using HadoopStreaming (specifically, I'm using the dumbo module to run >> my python scripts). When I simulate the map reduce using pipes and sort >> in >> bash, it works fine. However, in Hadoop, if I output a value with no >> tabs, >> Hadoop appends a trailing "\t", apparently interpreting my output as a >> (value, "") KV pair. I'd like to avoid outputing this trailing tab if >> possible. >> >> Is there a command line option that could be use to effect this? More >> generally, is there something wrong with outputing arbitrary strings, >> instead of key-value pairs, in your reduce step? >> > >
