Nick Coghlan <ncogh...@gmail.com> added the comment:

Usually because the file may contain certain ASCII markers (or you're inserting 
such markers), but beyond that, you only care that it's in a consistent ASCII 
compatible encoding.

Parsing log files from sources that aren't set up correctly often falls into 
this category - you know the markers are ASCII, but the actual message contents 
may not be properly encoded. (e.g. they use a locale dependent encoding, but 
not all the log files are from the same machine and not all machines have their 
locale set up properly). (although errors="replace" can be a better option for 
such "read-only" use cases).

A use case where you really do need "errors='surrogateescape'" is when you're 
reformatting a log file and you want to preserve the encoding for the messages 
while manipulating the pure ASCII timestamps and message headers. In that case, 
surrogateescape is the right answer, because you can manipulate the ASCII bits 
freely while preserving the log message contents when you write the reformatted 
files back out. The reformatting script offers an API that says "put any ASCII 
compatible encoding in, and you'll get that same encoding back out".

You'll get weird behaviour (i.e. as you do in Python 2) if the assumption of an 
ASCII compatible encoding is ever violated, but that would be equally true if 
the script tried to process things at the raw bytes level.

The assumption of an ASCII compatibile text encoding is a useful one a lot of 
the time. The problem with Python 2 is it makes that assumption implicitly, and 
makes it almost impossible to disable it. Python 3, on the other hand, assumes 
very little by default (basically what it returns from 
sys.getfilesystemencoding() and locale.getpreferredencoding()), this requiring 
that the programmer know how to state their assumptions explicitly.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13997>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to