Hello All,
I am facing some problems with a reduce method I have written which I cannot
understand. Here is the method:
@Override
public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {
String sValues = "";
int iCount = 0;
String sValue;
while (values.hasNext()) {
sValue = values.next().toString();
iCount++;
sValues += "\t" + sValue;
}
sValues += "\t" + iCount;
//if (iCount == 2)
output.collect(key, new Text(sValues));
}
The output of the code is like the following:
D0U0:GraduateStudent0 lehigh:GraduateStudent 1 1 1
D0U0:GraduateStudent1 lehigh:GraduateStudent 1 1 1
D0U0:GraduateStudent10 lehigh:GraduateStudent 1 1 1
D0U0:GraduateStudent100 lehigh:GraduateStudent 1 1 1
D0U0:GraduateStudent101 lehigh:GraduateStudent 1
D0U0:GraduateCourse0 1 2 1
D0U0:GraduateStudent102 lehigh:GraduateStudent 1 1 1
D0U0:GraduateStudent103 lehigh:GraduateStudent 1 1 1
D0U0:GraduateStudent104 lehigh:GraduateStudent 1 1 1
D0U0:GraduateStudent105 lehigh:GraduateStudent 1 1 1
The problem is there cannot be so many 1's in the output value. The output
which I expect should be like this:
D0U0:GraduateStudent0 lehigh:GraduateStudent 1
D0U0:GraduateStudent1 lehigh:GraduateStudent 1
D0U0:GraduateStudent10 lehigh:GraduateStudent 1
D0U0:GraduateStudent100 lehigh:GraduateStudent 1
D0U0:GraduateStudent101 lehigh:GraduateStudent
D0U0:GraduateCourse0 2
D0U0:GraduateStudent102 lehigh:GraduateStudent 1
D0U0:GraduateStudent103 lehigh:GraduateStudent 1
D0U0:GraduateStudent104 lehigh:GraduateStudent 1
D0U0:GraduateStudent105 lehigh:GraduateStudent 1
If I do not append the iCount variable to sValues string, I get the
following output:
D0U0:GraduateStudent0 lehigh:GraduateStudent
D0U0:GraduateStudent1 lehigh:GraduateStudent
D0U0:GraduateStudent10 lehigh:GraduateStudent
D0U0:GraduateStudent100 lehigh:GraduateStudent
D0U0:GraduateStudent101 lehigh:GraduateStudent
D0U0:GraduateCourse0
D0U0:GraduateStudent102 lehigh:GraduateStudent
D0U0:GraduateStudent103 lehigh:GraduateStudent
D0U0:GraduateStudent104 lehigh:GraduateStudent
D0U0:GraduateStudent105 lehigh:GraduateStudent
This confirms that there is no 1's after each of those values (which I
already know from the intput data). I do not know why the output is
distorted like that when I append the iCount to sValues (like the given
code). Can anyone help in this regard?
Now comes the second problem which is equally perplexing. Actually, the
reduce method which I want to run is like the following:
@Override
public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {
String sValues = "";
int iCount = 0;
String sValue;
while (values.hasNext()) {
sValue = values.next().toString();
iCount++;
sValues += "\t" + sValue;
}
sValues += "\t" + iCount;
if (iCount == 2)
output.collect(key, new Text(sValues));
}
I want to output only if "values" contained only two elements. By looking at
the output above you can see that there is at least one such key values pair
where values have exactly two elements. But when I run the code I get an
empty output file. Can anyone solve this?
I have tried many versions of the code (e.g. using StringBuffer instead of
String, using flags instead of integer count) but nothing works. Are these
problems due to bugs in Hadoop? Please let me know any kind of solution you
can think of.
Thanks,
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas