[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Opened issue25905 for IDLE-related files. -- resolution: -> fixed stage: -> resolved status: open -> closed ___ Python tracker ___ _

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Chris Angelico
Chris Angelico added the comment: So Lib/idlelib/README.txt would decode wrongly in anything other than a Windows codepage? Seems a good reason to asciify line 3. -- ___ Python tracker

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Are there any locale encodings where \x92 isn't an apostrophe? Latin1 and all ISO-8859-*. CP437 and perhaps all DOS codepages. KOI8 family. And of course UTF-8. -- ___ Python tracker

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Chris Angelico
Chris Angelico added the comment: Ah, got it. That definitely settles Idle's CREDITS.txt. Are there any locale encodings where \x92 isn't an apostrophe? -- ___ Python tracker __

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- nosy: +terry.reedy ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mai

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Roundup Robot
Roundup Robot added the comment: New changeset 505593490f4c by Serhiy Storchaka in branch 'default': Issue #25899: Converted Objects/listsort.txt to UTF-8. https://hg.python.org/cpython/rev/505593490f4c -- ___ Python tracker

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Lib/idlelib/CREDITS.txt and Lib/idlelib/README.txt are read by IDLE. The first with the iso-8859-1 encoding, the second with locale encoding. Both are wrong if files are in UTF-8. -- ___ Python tracker

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Roundup Robot
Roundup Robot added the comment: New changeset c87b2f61650f by Serhiy Storchaka in branch '3.5': Issue #25899: Converted non-ASCII characters in docstrings and manpage https://hg.python.org/cpython/rev/c87b2f61650f New changeset 1eeb25f08cfd by Serhiy Storchaka in branch 'default': Issue #25899:

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Chris Angelico
Chris Angelico added the comment: Oops, didn't see Victor's comment. (How do I get notified when someone posts a patch review?) New patch uploaded which does this. Note that Steven D'Aprano has expressed the opposite desire - that non-ASCII text be kept, as it should be acceptable and its pres

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: All in Modules/_ctypes/libffi/ is third-party code. The ACUTE ACCENT character at the start of .bzrignore was added in b635462a5798 by Benjamin. I think this is just a typo. The patch LGTM (if address Victor's comment). -- nosy: +benjamin.peterson

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Chris Angelico
Chris Angelico added the comment: Another version of detection script attached. -- Added file: http://bugs.python.org/file41350/nonascii.py ___ Python tracker ___ ___

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Chris Angelico
Chris Angelico added the comment: Misc/NEWS has a UTF-8 BOM. Otherwise, it and Misc/HISTORY look fine (all names and other legit cases). Lib/idlelib/CREDITS.txt and Lib/idlelib/README.txt both have non-UTF8 text in them. I don't understand what's with the first line of .bzrignore, so I'm (pun

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: In .rst files they are good. Add | (LC_ALL=C egrep "$(printf '[\x80-\xff]+')";) in the pipe after your script to highligh non-ASCII characters. -- ___ Python tracker __

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I think yes if this is not Windows-specific file. See also 4796dec0a7d0, 7255af1a1c50, a8568ea83599, 652baf23c368. -- ___ Python tracker ___

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Chris Angelico
Chris Angelico added the comment: There are non-ASCII dashes and apostrophes in .rst files; are they worth cleaning up? -- ___ Python tracker ___ ___

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Chris Angelico
Chris Angelico added the comment: As an alternative to checking only *.py, the second version uses the 'file' command to recognize text files. Run from the cpython top-level directory (rather than Lib/), it finds a large number of additional results, many of which appear to have a UTF-8 BOM. A

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Chris Angelico
Chris Angelico added the comment: Whoops! Meant to include that as a second attachment. Now attached. It's a quickly-thrown-together thing and not fully PEP 8 compliant. -- Added file: http://bugs.python.org/file41346/nonascii.py ___ Python tracker

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Non-ASCII apostrophes and dashes in docstrings can be considered as a bug, since they can lead pydoc or help() to fail. The patch LGTM. Could you please provide your script used to search non-ASCII sources Chris? Are there doubtful non-ASCII inclusions in t

[issue25899] Unnecessary non-ASCII characters in standard library

2015-12-17 Thread Chris Angelico
New submission from Chris Angelico: Discussion on python-list led to searching out unnecessary non-ASCII in the stdlib. While there are places where non-ASCII text is good and worthwhile (eg in comments identifying people such as Łukasz Langa, Peter Åstrand, Martin v. Löwis, and Gerhard Häring