Dan Dibagh <[EMAIL PROTECTED]> added the comment: Your reasoning shows a lack of understanding how Python is actually used from a programmers point of view.
Why do you think that "noticing" a problem is the same thing as entering as a python bug report? In practice there are several steps between noticing a problem in a python program and entering it as a bug report in the python development system. It is very difficult so see why any of these steps would happen automatically. Believe me, people have had real problems due to this bug. They have just selected other solutions than reporting it. You are yourself reluctant so seek out the roots of this problem and fix it. Why should other people behave differently and report it? A not so uncommon "fix" to pickle problems out there is to not using pickle at all. There are Python programmers who gives the advice to avoid pickle since "it's too shaky". It is a solution, but is it the solution you desire? The capability to serialize stuff into ASCII strings isn't just an implementation detail that happens to be nice for human readability. It is a feature people need for technical reasons. If the data is ASCII, it can be dealt with in any ASCII-compatible context which might be network protocols, file formats and database interfaces. There is the real use. Programs depend on it to work properly. The solution the change the documentation is in practice breaking compatibility (which programming language designers normally tries to avoid or do in a very controlled manner). How is a documentation fix going to help all the code out there written with the assumption that pickle protocol 0 is always ASCII? Is there a better solution around than changing pickle to meet actual expectations? Well, nobody has reported it as a bug in 8 years. How long do you think that code will stay around based on the ASCII assumption? 8 years? 16 years? 24 years? Maybe all the time in the world for this to become an issue again and again and again? It is difficult to grasp why there is "no way to fix it now". From a programmers point of view an obvious "fix" is to ditch pickle and use something that delivers a consistent result rather than debugging hours. When I try to see it from the Python library developers point of view I see code implemented in C which produces a result with reasonable performance. It is perfectly possible to write the code which implements the expected result within reasonable performance. What is the problem? Perhaps it is the raw-unicode-escape encoding that should be fixed? I failed to find exact information about what raw-unicode-escape means. In particular, where is the information which states that raw-unicode-escape is always an 8-bit format? The closest I've come is PEP 100 and PEP 263 (which I notice is written by you guys), which describes how to decode raw unicode escape strings from Python source and how to define encoding formats for python source code. The sole original purpose of both unicode-escape and raw-unicode-escape appears to be representing unicode strings in Python source code as u' and ur' strings respectively. It is clear that the decoding of a raw unicode escaped or unicode escaped string depends on the actual encoding of the python source, but how goes the logic that when something is _encoded_ into a raw unicode string then the target source must be of some 8-bit encoding. Especially considering that the default python source encoding is ASCII. For unicode-escape this makes sense: >>> f = file("test.py", "wb") >>> f.write('s = u"%s"\n' % u"\u0080".encode("unicode-escape")) >>> f.close() >>> ^Z python test.py (executes silently without errors) But for raw-unicode-escape the outcome is a different thing: >>> f = file("test.py", "wb") >>> f.write('s = ur"%s"\n' % u"\u0080".encode("raw-unicode-escape")) >>> f.close() >>> ^Z python test.py File "test.py", line 1 SyntaxError: Non-ASCII character '\x80' in file test.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Huh? For someone who trusts the Standard Encodings section Python Library reference this isn't what one would expect. If the documentation states "Produce a string that is suitable as raw Unicode literal in Python source code" then why isn't it suitable? ---------- nosy: +dddibagh _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2980> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com