Gotcha. Thanks! Magdoll
On Nov 19, 2:57 am, Tim Chase <[EMAIL PROTECTED]> wrote: > Magdoll wrote: > > I was trying to map various locations in a file to a dictionary. At > > first I read through the file using a for-loop, buttell() gave back > > weird results, so I switched to while, then it worked. > > > The for-loop version was something like: > > d = {} > > for line in f: > > if line.startswith('>'): d[line] = f.tell() > > > And the while version was: > > d = {} > > while 1: > > line = f.readline() > > if len(line) == 0: break > > if line.startswith('>'): d[line] = f.tell() > > > In the for-loop version, f.tell() would sometimes return the same > > result multiple times consecutively, even though the for-loop > > apparently progressed the file descriptor. I don't have a clue why > > this happened, but I switched to while loop and then it worked. > > > Does anyone have any ideas as to why this is so? > > I suspect that at least the iterator version uses internal > buffering, so thetell() call returns the current buffer > read-location, not the current read location. I've also had > problems withtell() returning bogus results while reading > through large non-binary files (in this case about a 530 meg > text-file) once the file-offset passed some point I wasn't able > to identify. It may have to do with newline translation as this > was python2.4 on Win32. Switching to "b"inary mode resolved the > issue for me. > > I created the following generator to make my life a little easier: > > def offset_iter(fp): > assert 'b' in fp.mode.lower(), \ > "offset_iter must have a binary file" > while True: > addr = fp.tell() > line = fp.readline() > if not line: break > yield (addr, line.rstrip('\n\r')) > > That way, I can just use > > f = file('foo.txt', 'b') > for offset, line in offset_iter(f): > if line.startswith('>'): d[line] = offset > > This bookmarks the *beginning* (I think your code notes the > *end*) of each line that starts with ">" > > -tkc -- http://mail.python.org/mailman/listinfo/python-list