Hi Christopher,
indeed, I think that the noBufferedKeys and valuesDecompressed should be
reset.
Regards
JB
On 06/24/2013 11:20 AM, Christopher Ng wrote:
cross-posting this from cdh-users group where it received little interest:
is there a bug in SequenceFile.sync()? This is from cdh4.3.0:
/** Seek to the next sync mark past a given position.*/
public synchronized void sync(long position) throws IOException {
if (position+SYNC_SIZE >= end) {
seek(end);
return;
}
if (position < headerEnd) {
// seek directly to first record
in.seek(headerEnd); <====
should this not call seek (ie this.seek) instead?
// note the sync marker "seen" in the header
syncSeen = true;
return;
}
the problem is that when you sync to the start of a compressed file, the
noBufferedKeys and valuesDecompressed isn't reset so a block read isn't
triggered. When you subsequently call next() you're potentially getting
keys from the buffer which still contains keys from the previous position
of the file.
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com