Sir , Actually i want to want to perform de duplication on input splits . so for this, i have to perform content based chunking (using TTTD algorithm) on each input split and and leave those chunks that are similar with previous chunk and send only new chunks to map.
sir please tell me .. in which class should i have to make changes. On Tue, Apr 7, 2015 at 10:43 PM, Chris Nauroth <cnaur...@hortonworks.com> wrote: > Hello Shahil, > > In the current trunk codebase, the relevant files are > hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-co > re/src/main/java/org/apache/hadoop/mapred/MapTask.java and > hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-co > re/src/main/java/org/apache/hadoop/mapreduce/Mapper.java. MapTask manages > the execution of the mapper task, and eventually it calls Mapper#run, > which then calls into the implementation of the map method. BTW, you'll > also see a corresponding ReduceTask.java and Reducer.java in the same > directories if you need to look at those too. > > Input split calculation is performed by a subclass of InputFormat. > > http://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/mapreduce/InputF > ormat.html > > > I recommend looking at that. You also can navigate down through those > JavaDocs to identify subclasses of InputFormat, like FileInputFormat and > TextInputFormat, which you can then find in the source code. > > I hope this helps. > > Chris Nauroth > Hortonworks > http://hortonworks.com/ > > > > > > > On 4/7/15, 6:09 AM, "Shahil Varshney" <shahilvarsh...@gmail.com> wrote: > > >Sir , > >i want to know that which class in hadoop (internal source class) is > >responsible for calling map function for each key value pair(means calls > >map() function). > > > > and which class actually done the input split job. i want to create my > >own > >class for input split so please tell me . > >