Hadoop Streaming map_input_file ------------------------------- Key: HADOOP-7535 URL: https://issues.apache.org/jira/browse/HADOOP-7535 Project: Hadoop Common Issue Type: Bug Affects Versions: 0.20.203.0 Environment: Debian Squeeze Reporter: Jonathan Poon
I'm currently trying to use the map_input_file environment variable to determine the file the stdin stream is coming from into a mapper code I've written. I used the following command to print the environment variables and see what map_input_file was given: hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-0.20.203.0.jar -input A -input S -input F -output B -mapper "bash -c \"export\"" I get the following output for the map_input_file: declare -x map_input_file="hdfs://localhost:54310/user/poonj/A/A.txt" declare -x map_input_file="hdfs://localhost:54310/user/poonj/A/A.txt" declare -x map_input_file="hdfs://localhost:54310/user/poonj/F/F.txt" declare -x map_input_file="hdfs://localhost:54310/user/poonj/S/S.txt" I am under the assumption that the variable is only used once and should store the current file being processed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira