Paulo da Silva wrote: > Hi! > > What is the best way to read a file that begins with some few text lines > and whose rest is a binary stream? > > As an exmaple ... files .pnm. > > Thanks for any comments/help on this.
A mixed text/binary file is really a binary file that contains some binary data which is meant to be interpreted as text. (Just as other binary data is meant to be interpretered as floats, or ints, or pixel colours, or sound samples...) I would open the file in binary mode, then use readline() to extract the first few lines. If there is any chance that the lines could use Windows line endings, then you'll need to handle that yourself. Chances are you will call line.strip() to remove the trailing newline, and that will also remove the trailing carriage return, so that isn't hard. Strictly speaking, the lines you read will be *bytes*, not text, but if they are pure ASCII you won't notice any difference: byte strings in Python are displayed as if they were ASCII. If the lines are supposed to be encoded in some encoding, say Latin-1, or UTF-8, you can convert to text strings: line = line.decode('utf-8') for example. Read the documentation for the file format to learn what encoding you should use. If it isn't documented, the answer is probably ASCII or Latin-1. Remember that the ASCII encoding in Python is strictly 7- bit, so you'll get decoding errors if the strings contain bytes with the high-bit set. If you don't mind the risk of getting moji-bake, the "no brainer" solution is to use Latin-1 as the encoding. http://en.wikipedia.org/wiki/Mojibake Once you know there are no more lines, just swap to using the read() method instead of readline(). Something like this should work: with open(somefile, "rb") as f: process_text(f.readline().decode('latin-1')) process_text(f.readline().decode('latin-1')) process_text(f.readline().decode('latin-1')) data = f.read(10000) while data: process_binary(data) data = f.read(10000) -- Steve -- https://mail.python.org/mailman/listinfo/python-list