Julien Serdaru created HADOOP-9530:
--------------------------------------
Summary: DBInputSplit creates one invalid range on Oracle
Key: HADOOP-9530
URL: https://issues.apache.org/jira/browse/HADOOP-9530
Project: Hadoop Common
Issue Type: Bug
Affects Versions: 1.1.2
Reporter: Julien Serdaru
The DBInputFormat on Oracle does not create valid ranges.
The method getSplit line 263 is as follows:
split = new DBInputSplit(i * chunkSize, (i * chunkSize)
+ chunkSize);
So the first split will have a start value of 0 (0*chunkSize).
However, the OracleDBRecordReader, line 84 is as follows:
if (split.getLength() > 0 && split.getStart() > 0){
Since the start value of the first range is equal to 0, we will skip the block
that partitions the input set. As a result, one of the map task will process
the entire data set, rather than the partition.
I'm assuming the fix is trivial and would involve removing the second check in
the if block.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira