I have a mixed binary/text file[0], and the text portions use a radically nonstandard character set. I want to read them easily given information about the character encoding and an offset for the beginning of a string.
The descriptions of the codecs module and codecs.register() in particular seem to suggest that this is already supported in the standard library. However, I can't find any examples of its proper use. Most people who use the module seem to want to read utf files in python 2.x.[1] I would like to know how to correctly set up a new codec for reading files that have nonstandard encodings. I have two other related questions: How does seek() work on a file opened in text mode? Does it seek to a character offset or to a byte offset? I need the latter behavior. If I can't get it I will have to find a different approach. The files I'm working with use a nonstandard end-of-string character in the same fashion as C null-terminated strings. Is there a builtin function that will read a file "from seek position until seeing EOS character X"? The methods I see for this online seem to amount to reading one character at a time and checking manually, which seems nonoptimal to me. [0] The file is an SNES ROM dump, but I don't think that matters. [1] I'm using Python 3, if it's relevant. -- Andrew -- http://mail.python.org/mailman/listinfo/python-list