cool thx. is there an ETA on a fix? or a workaround for the case where i want to seek to the start of the file?
On Mon, Jun 24, 2013 at 4:39 PM, Colin McCabe <cmcc...@alumni.cmu.edu>wrote: > Hi Chris, > > Thanks for the report. I filed > https://issues.apache.org/jira/browse/HADOOP-9667 for this. > > Colin > Software Engineer, Cloudera > > > On Mon, Jun 24, 2013 at 2:20 AM, Christopher Ng <cng1...@gmail.com> wrote: > > cross-posting this from cdh-users group where it received little > interest: > > > > is there a bug in SequenceFile.sync()? This is from cdh4.3.0: > > > > /** Seek to the next sync mark past a given position.*/ > > public synchronized void sync(long position) throws IOException { > > if (position+SYNC_SIZE >= end) { > > seek(end); > > return; > > } > > > > if (position < headerEnd) { > > // seek directly to first record > > in.seek(headerEnd); <==== > > should this not call seek (ie this.seek) instead? > > // note the sync marker "seen" in the header > > syncSeen = true; > > return; > > } > > > > the problem is that when you sync to the start of a compressed file, the > > noBufferedKeys and valuesDecompressed isn't reset so a block read isn't > > triggered. When you subsequently call next() you're potentially getting > > keys from the buffer which still contains keys from the previous position > > of the file. >