Steven D'Aprano wrote: > On Mon, 13 Jun 2005 11:53:25 +0200, Fredrik Lundh wrote: > > >><[EMAIL PROTECTED]> wrote: >> >> >>>It means in windows we should use 'wb' to write and 'rb' to read ? >>>Am I right? >> >>no. >> >>you should use "wb" to write *binary* files, and "rb" to read *binary* >>files. >> >>if you're working with *text* files (that is, files that contain lines of text >>separated by line separators), you should use "w" and "r" instead, and >>treat a single "\n" as the line separator. > > > I get nervous when I read instructions like this. It sounds too much like > voodoo: "Do this, because it works, never mind how or under what > circumstances, just obey or the Things From The Dungeon Dimensions will > suck out your brain!!!" > > Sorry Fredrik :-) >
Many people don't appear to want to know why; they only want a solution to what they perceive to be their current problem. > When you read a Windows text file using "r" mode, what happens to the \r > immediately before the newline? The thing to which you refer is not a "newline". It is an ASCII LF character. The CR and the LF together are the physical representation (in a Windows text file) of the logical "newline" concept. Internally, LF is used (irrespective of platform) to represent that concept. > Do you have to handle it yourself? No. > Or will > Python cleverly suppress it so you don't have to worry about it? Suppressed: no, it's a transformation from a physical line termination representation to a logical one. Cleverly: matter of opinion. By Python: In general, no -- the transformation is handled by the underlying C run-time library. > > And when you write a text file under Python using "w" mode, will the > people who come along afterwards to edit the file in Notepad curse your > name? If they do, it will not be because other than CRLF has been written as a line terminator. > Notepad expects \r\n EOL characters, and gets cranky if the \r is > missing. AFAIR, it performs well enough for a text editor presented with a file consisting of one long unterminated line with occasional embedded meaningless-to-the-editor control characters. You can scroll it, edit it, write it out again ... any crankiness is likely to be between the keyboard and the chair :-) > > How does this behaviour differ from "universal newlines"? > Ordinary behaviour in text mode: Win: \r\n -> newline -> \r\n Mac OS X < 10: \r -> newline -> \r other box: \n -> newline -> \n Note : TFM does appear a little light on in this area. I suppose not all users of Python have aquired this knowledge by osmosis through decades of usage of C on multiple platforms :-) "Universal newlines": On *any* box: \r\n or \n or \r (even a mixture) -> \n on reading On writing, behaviour is "ordinary" i.e. the line terminator is what is expected by the current platform "Universal newlines" (if used) solves problems like where an other-boxer FTPs a Windows text file in binary mode and then posts laments about all those ^M things on the vi screen and :1,$s/^M//g doesn't work :-) HTH, John -- http://mail.python.org/mailman/listinfo/python-list