Hi,
I developed a simple "custom streaming" that permits to perform a
mappers-only text processing without shuffling result due to key
sorting.
We successfully use it for semantical precessing on huge size of data
at Pisa University and AFAIK at Bruno Kessler Foundation (http://www.fbk.eu
) for similar purposes.
You can find sources and documentation here:
http://medialab.di.unipi.it/wiki/Hadoop_Streams
I'm posting here at your judgement because it seems to be an hadoop'
lacking feature, and maybe could be an improvement for a future release.
Best regards,
--francesco