On Wed, 16 Jul 2014 19:20:14 +0300, Marko Rauhamaa wrote: > Chris Angelico <ros...@gmail.com>: > >> The only thing that might be an issue is that you can't use open(fn) to >> read your files, but you have to explicitly state the encoding. That >> would be an understandable problem, especially for someone who develops >> on a single platform and forgets that the default differs. As long as >> you always explicitly say encoding="utf-8", and document that you do >> so, any problems are someone else's. > > Yes. I don't like open() guessing the enconding:
It doesn't *guess*. It has a sensible default encoding which, for most users most of the time, does the right thing. Ultimately though, the encoding is under your control: you can specify it if you think you know better. > The default encoding is platform dependent (whatever > locale.getpreferredencoding() returns) Right. Most text files will be written using the preferred encoding, unless the user explicitly uses something else when writing the file. In that case it's the user's responsibility. Or if they've got the file from another system with a different encoding. But even then, the most common encodings are ASCII-compatible, which means that the lowest common denominator case (reading and writing ASCII files) will Just Work. From a purity stand-point, no, open() shouldn't have a default encoding, and the user should have to specify it. But what makes you imagine that the user will know the correct encoding better than Python does? The average coder[1] shouldn't have to care about encodings just to do file.write("Hello World"), and on the average computer they don't have to because Python sets a sensible default. But you know what? From a purity stand-point, *even binary mode* assumes an encoding of sorts. How do you know that binary files on your platform use eight-bit bytes? Some DSPs use 9-bit bytes, and historically computers had as few as 6 or as many as 60 bits per byte. This is why the C standard requires that a byte is *at least* 8 bits. But, having said that, the assumption that binary files are based on 8- bit bytes is pretty safe. It would be foolish to force the majority of people, who don't need to care about these sorts of details, to care about them just to suit the one in ten-thousand who do. Likewise with text files. Python makes sensible defaults which will suit most people, rather than force people to guess the wrong encoding. But it's only a default, you can explicitly set it if you believe the file in question uses a different encoding. [...] > In each case, it would have been better to default to bytes just like > subprocess does. Better for whom? You? Maybe. For the typical programmer that Python is designed for? Hell no. [1] Lets be honest, there still is a bias towards English and ASCII in computing, and probably this will remain the case until English ceases to be a de facto lingua franca. Most programming languages are written for J. Random Hacker, not Jランダムハッカー. -- Steven -- https://mail.python.org/mailman/listinfo/python-list