Re: A bizarre problem in reduce method

Farhan Husain Thu, 02 Apr 2009 10:06:27 -0700

Thanks Rasit for your suggestion. Actually, I should have let the group know
earlier that I solved the problem and it had nothing to do with the reduce
method. I used my reducer class as the combiner too which is not appropriate
in this case. I just got rid of the combiner and everything works fine now.
I think the Map/Reduce tutorial in hadoop's website should talk more about
the combiner. In the word count example the reducer can work as a combiner
but not in all other problems. This should be highlighted a little bit more
in the tutorial.


On Thu, Apr 2, 2009 at 8:50 AM, Rasit OZDAS <[email protected]> wrote:

> Hi, Husain,
>
> 1. You can use a boolean control in your code.
>       boolean hasAlreadyOned = false;
>        int iCount = 0;
>       String sValue;
>       while (values.hasNext()) {
>           sValue = values.next().toString();
>           iCount++;
>            if (sValue.equals("1"))
>                 hasAlreadyOned = true;
>
>           if (!hasAlreadyOned)
>                 sValues += "\t" + sValue;
>       }
>       ...
>
> 2. You're actually controlling for 3 elements, not 2. You should use  if
> (iCount == 1)
>
> 2009/4/1 Farhan Husain <[email protected]>
>
> > Hello All,
> >
> > I am facing some problems with a reduce method I have written which I
> > cannot
> > understand. Here is the method:
> >
> >    @Override
> >    public void reduce(Text key, Iterator<Text> values,
> > OutputCollector<Text, Text> output, Reporter reporter)
> >        throws IOException {
> >        String sValues = "";
> >        int iCount = 0;
> >        String sValue;
> >        while (values.hasNext()) {
> >            sValue = values.next().toString();
> >            iCount++;
> >            sValues += "\t" + sValue;
> >
> >        }
> >        sValues += "\t" + iCount;
> >        //if (iCount == 2)
> >            output.collect(key, new Text(sValues));
> >    }
> >
> > The output of the code is like the following:
> >
> > D0U0:GraduateStudent0                lehigh:GraduateStudent    1    1
>  1
> > D0U0:GraduateStudent1                lehigh:GraduateStudent    1    1
>  1
> > D0U0:GraduateStudent10                lehigh:GraduateStudent    1    1
>  1
> > D0U0:GraduateStudent100                lehigh:GraduateStudent    1    1
> >  1
> > D0U0:GraduateStudent101                lehigh:GraduateStudent    1
> > D0U0:GraduateCourse0    1    2    1
> > D0U0:GraduateStudent102                lehigh:GraduateStudent    1    1
> >  1
> > D0U0:GraduateStudent103                lehigh:GraduateStudent    1    1
> >  1
> > D0U0:GraduateStudent104                lehigh:GraduateStudent    1    1
> >  1
> > D0U0:GraduateStudent105                lehigh:GraduateStudent    1    1
> >  1
> >
> > The problem is there cannot be so many 1's in the output value. The
> output
> > which I expect should be like this:
> >
> > D0U0:GraduateStudent0                lehigh:GraduateStudent    1
> > D0U0:GraduateStudent1                lehigh:GraduateStudent    1
> > D0U0:GraduateStudent10                lehigh:GraduateStudent    1
> > D0U0:GraduateStudent100                lehigh:GraduateStudent    1
> > D0U0:GraduateStudent101                lehigh:GraduateStudent
> > D0U0:GraduateCourse0    2
> > D0U0:GraduateStudent102                lehigh:GraduateStudent    1
> > D0U0:GraduateStudent103                lehigh:GraduateStudent    1
> > D0U0:GraduateStudent104                lehigh:GraduateStudent    1
> > D0U0:GraduateStudent105                lehigh:GraduateStudent    1
> >
> > If I do not append the iCount variable to sValues string, I get the
> > following output:
> >
> > D0U0:GraduateStudent0                lehigh:GraduateStudent
> > D0U0:GraduateStudent1                lehigh:GraduateStudent
> > D0U0:GraduateStudent10                lehigh:GraduateStudent
> > D0U0:GraduateStudent100                lehigh:GraduateStudent
> > D0U0:GraduateStudent101                lehigh:GraduateStudent
> > D0U0:GraduateCourse0
> > D0U0:GraduateStudent102                lehigh:GraduateStudent
> > D0U0:GraduateStudent103                lehigh:GraduateStudent
> > D0U0:GraduateStudent104                lehigh:GraduateStudent
> > D0U0:GraduateStudent105                lehigh:GraduateStudent
> >
> > This confirms that there is no 1's after each of those values (which I
> > already know from the intput data). I do not know why the output is
> > distorted like that when I append the iCount to sValues (like the given
> > code). Can anyone help in this regard?
> >
> > Now comes the second problem which is equally perplexing. Actually, the
> > reduce method which I want to run is like the following:
> >
> >    @Override
> >    public void reduce(Text key, Iterator<Text> values,
> > OutputCollector<Text, Text> output, Reporter reporter)
> >        throws IOException {
> >        String sValues = "";
> >        int iCount = 0;
> >        String sValue;
> >        while (values.hasNext()) {
> >            sValue = values.next().toString();
> >            iCount++;
> >            sValues += "\t" + sValue;
> >
> >        }
> >        sValues += "\t" + iCount;
> >        if (iCount == 2)
> >            output.collect(key, new Text(sValues));
> >    }
> >
> > I want to output only if "values" contained only two elements. By looking
> > at
> > the output above you can see that there is at least one such key values
> > pair
> > where values have exactly two elements. But when I run the code I get an
> > empty output file. Can anyone solve this?
> >
> > I have tried many versions of the code (e.g. using StringBuffer instead
> of
> > String, using flags instead of integer count) but nothing works. Are
> these
> > problems due to bugs in Hadoop? Please let me know any kind of solution
> you
> > can think of.
> >
> > Thanks,
> >
> > --
> > Mohammad Farhan Husain
> > Research Assistant
> > Department of Computer Science
> > Erik Jonsson School of Engineering and Computer Science
> > University of Texas at Dallas
> >
>
>
>
> --
> M. Raşit ÖZDAŞ
>



-- 
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas

Re: A bizarre problem in reduce method

Reply via email to