Magdoll wrote:
I was trying to map various locations in a file to a dictionary. At
first I read through the file using a for-loop, but tell() gave back
weird results, so I switched to while, then it worked.

The for-loop version was something like:
                d = {}
                for line in f:
                         if line.startswith('>'): d[line] = f.tell()

And the while version was:
                d = {}
                while 1:
                        line = f.readline()
                        if len(line) == 0: break
                        if line.startswith('>'): d[line] = f.tell()


In the for-loop version, f.tell() would sometimes return the same
result multiple times consecutively, even though the for-loop
apparently progressed the file descriptor. I don't have a clue why
this happened, but I switched to while loop and then it worked.

Does anyone have any ideas as to why this is so?

I suspect that at least the iterator version uses internal buffering, so the tell() call returns the current buffer read-location, not the current read location. I've also had problems with tell() returning bogus results while reading through large non-binary files (in this case about a 530 meg text-file) once the file-offset passed some point I wasn't able to identify. It may have to do with newline translation as this was python2.4 on Win32. Switching to "b"inary mode resolved the issue for me.

I created the following generator to make my life a little easier:

  def offset_iter(fp):
    assert 'b' in fp.mode.lower(), \
      "offset_iter must have a binary file"
    while True:
      addr = fp.tell()
      line = fp.readline()
      if not line: break
      yield (addr, line.rstrip('\n\r'))

That way, I can just use

  f = file('foo.txt', 'b')
  for offset, line in offset_iter(f):
    if line.startswith('>'): d[line] = offset

This bookmarks the *beginning* (I think your code notes the *end*) of each line that starts with ">"

-tkc





--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to