Hadoop streaming is the simplest way to do this, if you program is set up to take stdin as its input, write to stdout for the output, and each record "file" in your case is a single line of text.
You need to be able to have it work with the following shell script Hadoop fs -cat <input_file> | head -1 | ./myprocess > output.txt And ideally what is stored in output.txt are lines of text that can have their order rearranged without impacting the result (This is not a requirement unless you want to use a reduce too, but streaming will still try to parse it that way. If not there are tricks you can play to make it work, but they are kind of ugly. --Bobby Evans On 8/22/11 2:57 PM, "Zhixuan Zhu" <z...@calpont.com> wrote: Hi All, I'm using hadoop-0.20.2 to try out some simple tasks. I asked a question about FileInputFormat a few days ago and get some prompt replys from this forum and it helped a lot. Thanks again! Now I have another question. I'm trying to invoke a C++ process from my mapper for each hdfs file in the input directory to achieve some parallel processing. But how do I pass the file to the program? I would want to do something like the following in my mapper: Process lChldProc = Runtime.getRuntime().exec("myprocess -file $filepath"); How do I pass the hdfs filesystem to an outside process like that? Is HadoopStreaming the direction I should go? Thanks very much for any reply in advance. Best, Grace