Hi, I'm trying to figure out what is the most pythonic way to interact with a generator.
The task I'm trying to accomplish is writing a PDF tokenizer, and I want to implement it as a Python generator. Suppose all the ugly details of toknizing PDF can be handled (such as embedded streams of arbitrary binary content). There remains one problem, though: In order to get random file access, the tokenizer should not simply spit out a series of tokens read from the file sequentially; it should rather be possible to point it at places in the file at random. I can see two possibilities to do this: either the current file position has to be read from somewhere (say, a mutable object passed to the generator) after each yield, or a new generator needs to be instantiated every time the tokenizer is pointed to a new file position. The first approach has both the disadvantage that the pointer value is exposed and that due to the complex rules for hacking a PDF to tokens, there will be a lot of yield statements in the generator code, which would make for a lot of pointer assignments. This seems ugly to me. The second approach is cleaner in that respect, but pointing the tokenizer to some place has now the added semantics of creating a whole new generator instance. The programmer using the tokenizer now needs to remember to throw away any references to the generator each time the pointer is reset, which is also ugly. Does anybody here have a third way of dealing with this? Otherwise, which ugliness is the more pythonic one? Thanks a lot for any ideas. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list