Greetings, I have written a simple yet pretty handy framework for debugging hadoop pipes programs locally. It is called GaDooB ... combination of GDB and Hadoop :) .
It helps debugging/unit testing c++ hadoop map-reduce programs that were built using hadoop pipes. It is basically a sequencer that reads input text files and feeds them to a mapper, collects the output and feeds it to a reducer. It also handles the usage of a combiner, partitioner and multiple reducers. All the code is in header files. There are no libraries to link with and there is no change to the build process (besides maybe an extra include path). I kept the dependencies to a bare minimum, by only using basic stl collections (map/vector/string) and basic io, which are used in hadoop pipes anyway. For example, let say this is the main function of a pipes map reduce program: int main (int argc, char* argv[]) { return HadoopPipes::runTask( HadoopPipes::TemplateFactory< MyMapper,MyReducer > ()); } Then the locally-run version would look like: int main (int argc, char* argv[]) { if ((argc>=2) && (strcmp(argv[1],"debugMeLocally")==0)) { std::map<std::string,std::string> confMap; SimpleConfReader().readConf("./my_jobconf.xml" , confMap); confMap["extraParam"]="extraValue"; string inputFile = "/tmp/mpj.txt"; string outputFile = "/tmp/out1.txt"; GaDooBSequencer::runTaskLocally<MyMapper,MyReducer>(confMap,inputFile , outputFile); return 0; } return HadoopPipes::runTask( HadoopPipes::TemplateFactory< MyMapper,MyReducer > ()); } I would like to share it with the rest of the hadoop community. I hope this list is the right place to ask where would be the best place to make it available for rest of the world, or maybe make it part of hadoop. Regards, Erez Katz