[issue11553] Docs for: import, packages, site.py, .pth files
Graham Wideman added the comment: Hi Eric, Thanks for starting to review this, and your responses are encouraging. Some comments inline below. FWIW, along the way I accumulated my own notes on this topic, on some pages here: grahamwideman.wikispaces.com (Left navigation panel...) Software development > Python > Organization for common modules Might be of interest as feedback on the digging process I needed in order to get some clarity on these issues, and also shows my references. >> Exactly what variants of arguments are possible, and what are their effects? >Does http://docs.python.org/dev/library/functions#__import__ help? Does >http://docs.python.org/dev/library/importlib ? Well somewhat overkill -- because the matter of interest was args for from... and import, while the docs you mention are for more complicated underlying functions. (Interesting nonetheless.) >> Current docs are unclear on points such as: >> -- is __init__.py needed on subpackage directories? >Yes, it always has. I think there was some discussion about removing them in >py3k, but this was rejected. I came to same conclusion.. but have seen it described otherwise (in at least one book), so good to state this explicitly. >> -- the __all__ variable: Does it act generally to limit visibility of a >> module or package's attributes, or does pertain only to the >> 'from...import *' statement? > Both. I'm pretty sure that's not correct -- pretty sure that __all__ only specifies what's included in from...import *, and does not prevent access via from...import specific_attrib. But I may have tested incorrectly. >> Seriously misleading discussion of .pth files. [snip] >Agreed. Cool -- I think it's well worth fixing this area for sure! >> In addsitepackages(), the library directory for Windows (the else clause) >> is shown as lower-case 'lib' instead of 'Lib'. >I don’t see any else clause in the 2.7 or 3.3 code. Otherwise you’re right. Sorry, the lowecase 'lib' issue is in getsitepackages()... if sys.platform in(...) ... else:... sitepackages.append(os.path.join(prefix, "lib", "site-packages")) >> sys >> Could helpfully point to a discussion of the typical items to >> be found in sys.path under normal circumstances >Hm, this would be very platform-specific. What use cases would that help? It would demystify how python already knows how to find various things under vanilla circumstances. >> 'Installing Python Modules' document >> "Windows has no concept of a user’s home directory, " and so on. >The author probably meant that there was no $HOME environment variable, ~ >shortcut and all that. Fair enough, but in actuality there *is* a user-specific location (on Windows) examined by site.py, which is in %APPDATA%\Python\. >> For Windows suggests 'prefix' (default: C:\Python) as an installation >> directory. >> This is indeed one of the possible 'site-package' directories, but surely it >> is >> deprecated in favor of C:\Python\Lib\site-packages, which this section does >> not mention. >Don’t confuse the prefix and the install dir. The directory for Python >modules is computed as prefix + Lib/site-packages. Currently, under "Alternate installation: Windows (the prefix scheme)", it says: python setup.py install --prefix="\Temp\Python" to install modules to the \Temp\Python directory on the current drive. Does this really mean "install modules to \Temp\Python\Lib\site-packages"? (And as a side point, surely installing under the Temp directory is a strange location to pick for an example?) >That was my initial feeback; I think I’ve covered all of your points. Looking forward! -- ___ Python tracker <http://bugs.python.org/issue11553> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11553] Docs for: import, packages, site.py, .pth files
Graham Wideman added the comment: Hi Nick: Thanks for your additional points. Comments inline: > __all__ only affects import *, and may also affect documentation tools (e.g. > pydoc will respect __all__ when deciding what to display). It has no effect > on attribute retrieval from modules. That's indeed my understanding. So the doc (6. Simple statements) which says that __all__ determines the list of "public names" is a bit of a red herring. Attributes are accessible (ie: public) regardless of whether on the __all__ list. Instead the __all__ list establishes the list of names imported by *, and makes those names reference-able without a module prefix. (Plus gives hints about intent to doc tools.) > pkgutil.extend_path() is used to modify pkg.__path__ attributes, *not* > sys.path. Understood, and perhaps my point was obtuse. I was pointing out that the doc for extend_path discusses .pkg entries which point to package dirs, and that this, it says, is like .pth files. I claim that an entry in a .pth files should NOT point to a package dir, but rather to one level up: to a dir that *contains* package dirs. (Pointing a .pth entry directly at a package dir will break package behavior by exposing the constituent modules to sys.path.) Hence the doc for extend_path is misleadingly suggesting a wrong idea about .pth files. -- ___ Python tracker <http://bugs.python.org/issue11553> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11553] Docs for: import, packages, site.py, .pth files
Graham Wideman added the comment: > "Public name" is a term that describes a convention, not anything enforced by > the interpreter. And I guess that's really the main point. In other languages Public means accessible, and Private means not so. In Python, Public means "suggested for outside consumption", and Private means not so intended, but nonetheless accessible. If that was reiterated near the discussion of __all__ it would be most helpful. > Dirs mentioned in .pkg files *should* be added to the [...] pkg.__path__, > not sys.path. > That could probably be made clearer, but the docs aren't wrong as they stand. Again I've not managed to draw attention to the exact point of contention. 1. A dir added to a .pkg file evidently should be an actual package dir. 2. A dir added to a .pth file should NOT be an actual package dir. It should be the dir at the level above. Thus the entries in .pkg and .pth files point to different kinds of things, yet the doc I pointed to asserts they are the same in this regard. -- ___ Python tracker <http://bugs.python.org/issue11553> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11426] CSV examples can't close their files
New submission from Graham Wideman : On the csv doc page (.../library/csv.html) most of the examples show creation of an anonymous file object within the csv.reader or csv.writer function, for example... spamWriter = csv.writer(open('eggs.csv', 'w'), delimiter=' ', This anonymity prevents later closing the file, which seems especially problematic for a writer. It also confuses users as to whether there's some sort of close function on a csv.reader or csv.writer object which should be called, or perhaps some other magic behind the scenes. I'm pretty sure that it's the doc that is incorrect here. This issue was raised pernthetically here http://bugs.python.org/issue7198#msg124678 by sjmachin, though mysteriously overlooked in his later suggested patch http://bugs.python.org/issue7198#msg126593 I suggest changing all examples to include the complete cycle of opening an explicit file, and later closing it. -- assignee: docs@python components: Documentation messages: 130228 nosy: docs@python, gwideman priority: normal severity: normal status: open title: CSV examples can't close their files type: behavior versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3 ___ Python tracker <http://bugs.python.org/issue11426> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4033] python search path - .pth recursion
Changes by Graham Wideman : -- nosy: +gwideman ___ Python tracker <http://bugs.python.org/issue4033> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1271] Raw string parsing fails with backslash as last character
Graham Wideman added the comment: (Not clear how to reopen this issue. Hopefully my change here does that.) OK, so as it currently stands, backslash at end of string is prohibited in the interests of allowing backslash to escape quotes that might be embedded within the string. But the embedded quote scenario doesn't work because the backslash remains in the string. So the current state of play is plain broken. Considering: (a) We already have the ability to use either single or double quotes around the string which gives that chance to use the other quote within the string. (b) The "principle of least surprise" for raw string would be to have raw mean "Never Escape Anything" (c) backslash on end of string is a trap waiting to happen for Windows users. ...I think there is strong motivation to abandon the currently broken "backslash escapes quote" behavior and just let raw strings be totally raw. Furthermore, it's hard to imagine that such a move would break anything. -- nosy: +gwideman type: -> behavior versions: +Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3 -Python 2.4 ___ Python tracker <http://bugs.python.org/issue1271> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11451] Raw string parsing fails with backslash as last character
New submission from Graham Wideman : This is a copy of issue 1271 because I couldn't find a way to reopen it. So, repeating my comment here: As it currently stands, backslash at end of string is prohibited, apparently in the interests of supposedly allowing backslash to escape quotes that might be embedded within the string. But the supposedly beneficial backslash-escaping-embedded quote behavior is broken because the backslash remains in the string. Consider: (a) We already have the ability to use either single or double quotes around the string which gives that chance to use the other quote within the string. (b) The "principle of least surprise" for raw string would be to have raw mean "Never Escape Anything" (c) backslash on end of string is currently a trap waiting to happen for Windows paths. So I think there is strong motivation to abandon the currently broken "backslash escapes quote" behavior and just let raw strings be totally raw. Furthermore, it's hard to imagine that such a move would break anything. (Famous last words, I know... but I challenge anyone to contrive such a scenario!) -- components: Interpreter Core messages: 130443 nosy: QuantumTim, facundobatista, georg.brandl, gwideman priority: normal severity: normal status: open title: Raw string parsing fails with backslash as last character type: behavior versions: Python 2.5, Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3 ___ Python tracker <http://bugs.python.org/issue11451> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1271] Raw string parsing fails with backslash as last character
Graham Wideman added the comment: @Glenn Linderman: I too am usually quick to assume that "innocent fixes" may have serious unforeseen impacts, but in this case I'm not convinced. What would matter is to enumerate the current behavior, and of that what would be changed. You seem to have had experience with other raw-string features/gotchas -- please share! :-) @David Murray: Excuse denseness on my part, but I'm not following the logic of your first paragraph. I think you are saying that current raw string has to do something special to be able to contain the sequence backslash-quote, and this has the side effect of precluding that sequence appearing last in a string. But surely a completely-escape-free string could also contain backslash-quote just fine (assuming the string is surrounded by the other kind of quote). So I'm thinking that the case you mention is not the driver here. It's conceivable there is some more complicated case where backslash-singlequote AND backslash-doublequote MUST appear literally in the same string. However, it seems a little bizarre to worry about that case, but not worry about the simpler case of wanting both a plain singlequote and a plain doublequote in the same string. Maybe there's some popular regular expression that calls for this complexity. I concur that inspection of the parser (and the history and intent of this design) would be fascinating. -- ___ Python tracker <http://bugs.python.org/issue1271> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1271] Raw string parsing fails with backslash as last character
Graham Wideman added the comment: Thanks to all for your patient comments. I think I am resigned to raw-string forever being medium-rare-string :-). Perhaps it's obvious once you get over the initial shock of non-rawness, but workarounds for the disallowed trailing backslash include (note the final space character): mydir = r"C:\somedir\ ".rstrip() or... mydir = r"C:\somedir\ "[:-1] It might be worth mentioning one of these in the raw string docs to emphasize that there is this gotcha, that it's easy to fix, and prompting this as an idiom that becomes familiar in applications where it's needed. -- ___ Python tracker <http://bugs.python.org/issue1271> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11479] Add discussion of trailing slash in raw string to tutorial
Graham Wideman added the comment: Eli: Excellent and thoughtful point. This would indeed be exactly the place to suggest os.path.join as an alternative. In addition, there are still occasions where one needs to form a string with trailing backslash. Two examples: 1. When writing the string specifying root directory: r'C:\ '[:-1] 2. Using python to prepare command lines to run other command line programs, where an argument may require a final backslash to explicitly specify a target directory (as opposed to a file). -- ___ Python tracker <http://bugs.python.org/issue11479> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11553] Docs for: import, packages, site.py, .pth files
New submission from Graham Wideman : The overall scope of this issue is that current Python documentation gives vague, sometimes incorrect, information about the set of Python features involved in modularizing functionality. This issue presents an obstacle to programmers making smooth transitions from a single module, to collections of modules and packages, then on to neatly organized common packages shared between projects. The problem affects documentation of: import and from...import statements The Language Reference is way too complicated for the mainstream case. Exactly what variants of arguments are possible, and what are their effects? What are the interactions with package features, such as whether or not modules have been explicitly imported into package __init_.py? sys.path - Typical consituents; range of alternatives for adding more dirs Module site.py -- Multiple serious errors in the file docstring, relating to site-packages directories and .pth files .pth files --- Incorrectly described in site.py, and then vaguely described in other docs. Are .pth files processed everywhere on sys.path? Can they be interative? (No to both). package structure - Details of package structure have evidently changed over Python versions. Current docs are unclear on points such as: -- is __init__.py needed on subpackage directories? -- the __all__ variable: Does it act generally to limit visibility of a module or package's attributes, or does pertain only to the 'from...import *' statement? Details: = Language Reference --- http://docs.python.org/py3k/reference/simple_stmts.html#the-import-statement The description of the import statement is extensive, but dauntingly complicated for the reader trying to understand the mainstream case of simply importing modules or packages that are on sys.path. This is because the algorithm for finding modules tries numerous esoteric strategies before falling back on the plain-old-file-system method. (Even now that I have a good understanding of the plain-old-file variations of import, I reread this and find it hard to comprehend, and disorganized and incomplete in presenting the available variations of the statement.) Grammar issue: the grammar shown for the import statement shows: relative_module ::= "."* module | "."+ ... which implies that relative module could have zero leading dots. I believe an actual relative path is required to have at least one dot (PEP 328). Evidently, in this grammar, 'relative_module' really means "relative or absolute path to module or package", so it would be quite helpful to change to: relative_path ::= "."+ module | "."+ from_path ::= (relative_path | module) etc. (Really 'module' is not quite right here either since it's used to mean module-or-package.) site.py: Module site.py implements the site-package related features. The docstring has multiple problems with consequences in other docs. 1. Does not mention user-specific site-package directories (implemented by addusersitepackages() ) 2. Seriously misleading discussion of .pth files. In the docstring the example shows using pth files, called "package configuration files" in their comments, to point to actual package directories bar and foo located within the site-packages directory. This is an absolutely incorrect use of pth files: If foo and bar are packages in .../site-packages/, they do not need to be pointed to, they are already on sys.path. If the package dirs ARE pointed to by foo.pth and bar.pth, the modules inside them will be exposed directly to sys.path, possibly precipitating name collisions. Further, programmers following this example will create packages in which import statements will appear to magically perform relative imports without leading dots, leading to confusion over how the import statement is supposed to work. It may be that this discussion is held over from a time when "package" perhaps meant "Just a Bunch of Files in a Directory"? 3. The docstring (or other docs) should make clear that .pth files are ONLY processed within site-package directories (ie: only by site.py). 4. Bug: Minor: In addsitepackages(), the library directory for Windows (the else clause) is shown as lower-case 'lib' instead of 'Lib'. This has some possibility of causing problems when running from a case-sensitive server. In any case, if read as documentation it is misleading. Tutorial - 6. Modules: http://docs.python.org/py3k/tutorial/modules.html 1. Discussion (6.1.2. The Module Search Path) is good as far as it goes, but it doesn'
[issue11669] Clarify Lang Ref "Compound statements" footnote
New submission from Graham Wideman : In Language Ref section 7 "Compound Statements": http://docs.python.org/release/3.1.3/reference/compound_stmts.html there's a footnote regarding what happens to unhandled exceptions in a try-except statement: [1] The exception is propagated to the invocation stack only if there is no *finally* clause that negates the exception. This is very unclearly worded, especially since the reader in need of this footnote is probably familiar with the *except* clause being the one to "negate" an exception, and may well think this footnote is in error. This footnote could provide a more convincing explanation: [1] The exception is propagated to the invocation stack unless there is a finally clause which happens to raise another exception. That new exception causes the old exception to be lost. -- assignee: docs@python components: Documentation messages: 132072 nosy: docs@python, gwideman priority: normal severity: normal status: open title: Clarify Lang Ref "Compound statements" footnote type: behavior versions: Python 2.5, Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4 ___ Python tracker <http://bugs.python.org/issue11669> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31136] raw strings cannot end with a backslash character r'\'
Graham Wideman added the comment: Let us be clear here that this is NOT a case where the backslash escapes the subsequent quote. If it WAS such a case, then the sequence \' would leave only the quote in the output string. But it doesn't; it leaves the complete 2-character \' in the output string. So essentially this is a case of the character sequence \' being given a special status that causes that character pair to have a special meaning in preference to the meaning of the individual characters. So this IS a bug -- it may be "as designed", but that produces the bug in the name of this feature, "raw string", which is patently misleading and in violation of the principle of least surprise. This is a feature (as the FAQ explains) provided explicitly for developers of regular expression parsers. So at best, these r-strings should be called "regex-oriented" string literals, which can be used elsewhere, at risk of knowing this gotcha. -- nosy: +gwideman ___ Python tracker <https://bugs.python.org/issue31136> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31136] raw strings cannot end with a backslash character r'\'
Graham Wideman added the comment: Demonstration: print("x" + r' \' ' + "x") produces x \' x Where is this behavior _ever_ useful? Or if there is some use case for this, how frequent is it compared to the frequency of users expecting either that backslash does nothing special, or that it would behave like an escape, and not appear in the output? I'm not here to suggest there's some easy fix for this. I just don't want this issue closing as "not a bug" and fail to register that this design is flawed. -- ___ Python tracker <https://bugs.python.org/issue31136> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18938] Prepend Is Not A Word
Graham Wideman added the comment: "Prepend" appears in every online dictionary I consulted. For a dictionary to list it and give the usual meaning for it, pretty much demonstrates "prepend" functioning as a real word. That and its 1.3 million hits on google. "Prepend" certainly has a commonly understood meaning, particularly in computing. To the extent that "prepend" has became popular as the appropriate-sounding opposite of "append", that is exactly why it _should_ be used in this context... where one might well need to discuss adding strings before or after, and be clear about the distinction. -- nosy: +gwideman ___ Python tracker <http://bugs.python.org/issue18938> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18939] Venv docs regarding original python install
New submission from Graham Wideman: http://docs.python.org/dev/library/venv.html More detail needed regarding the original python environment The article explains how to use venv to create a new python installation with independent libraries etc, and a means to activate one or another virtual python environment. However, there are some points regarding the original python environment which are cloudy. (1) After pyvenv, what status does the original python installation have? Does pyvenv turn it into just one of now two or more virtual environments? Or is the original one special? Must it be specifically deactivated in order to activate a (different) virtual environment? (2) The motivation behind venv seems to be to create virtual enviroments that can be substantially or completely separate from the system site directories or from the original python that pyvenv was run from. Yet elsewhere the doc discusses how pyvenv creates a pyvenv.cfg file with a home key pointing back to the originating Python installation, and "sys.base_prefix and sys.base_exec_prefix point to the non-venv Python installation which was used to create the venv."... which suggest that a venv is _not_ independent of its creating Python installation. It would be helpful to provide some context for this seemingly contradictory information. Perhaps there are scenarios with differing degrees of independence, in which these pointers back to the originating Python installation may or may not be relevant? (3) How do you proceed to create virtual environments from scratch when you have no initial python installation, or no python installation of that python version? -- Hope these suggestions help. -- assignee: docs@python components: Documentation messages: 197030 nosy: docs@python, gwideman priority: normal severity: normal status: open title: Venv docs regarding original python install type: behavior versions: Python 3.3 ___ Python tracker <http://bugs.python.org/issue18939> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18939] Venv docs regarding original python install
Graham Wideman added the comment: Additionally on the subject of venv docs: I would encourage making it more clear regarding how activate changes the user's PATH. Both http://www.python.org/dev/peps/pep-0405/ and http://docs.python.org/3.3/library/venv.html talk about how activate adds the activated python binary to the path, but doesn't mention what path: The one for the current console session? The system PATH environment variable (Windows) or one of the bash startup scripts (unix)? This is important, because it determines how far-reaching is activation of a particular virtual environment. -- ___ Python tracker <http://bugs.python.org/issue18939> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18939] Venv docs regarding original python install
Graham Wideman added the comment: Thanks R. David for your comments. > It should also mention that the activation is per-shell-session, .. which also has implications (or lack of effect) for launching from Windows Explorer, for example. Seems like in practical use, one would need to set up a batch file or shell script to run a particular venv activate command and launch a command shell with that python environment already set up. "Shell for python 2.7.5 and library XYZ" etc. Advice along these lines would be helpful. -- ___ Python tracker <http://bugs.python.org/issue18939> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18939] Venv docs regarding original python install
Graham Wideman added the comment: @Vinay Sajip Thanks for looking at this issue and adding the link to PEP 405, and your explanation "When working..." with helpful shebang comments. That said, the combination of PEP 405 and this updated page doesn't clear things up completely. Vinay remarks "The venv documentation does assume that the reader knows what virtual environments are and how they work." If so, let's have a link to where the reader can get up to speed on how they work. PEP 405 is a help, but doesn't detail the topics I raised in earlier thread messages. Also, different legacy virtual environment schemes work differently, so prior knowledge doesn't necessarily help. But the article already has a link to "virtual environments"... to a box in the same article, which is at the heart of not bringing clarity to the topics at hand. One problem is lack of clarity about what "active" and "activate" means. Here is what I currently believe: In connection with venv, the term "active" is used in two relatively trivial but subtly different ways. (1) "First on PATH": The phrase "active environment" may be used to simply indicate the python environment which will be found first via the user's shell PATH. Further, each venv-configured python virtual environment installation includes an "activate" script whose main effect is just to add that environment's bin or Scripts directory to the beginning of the user's PATH. This makes the selected python environment the default when the user types the 'python' command. This use of "active" or "activate" might better be termed "default or "make default". (2) "Actually running": A second meaning of "active" refers to an actually running instance of a python interpreter and its associated environment, whether or not it is first in the user's PATH. Any installed python (virtual or not) may be launched by explicitly invoking the complete path to its executable (eg: C:\python33\python.exe), whereupon that version of python will run, with its associated sys.path and so on. These two meanings are obviously related. The particular python environment (virtual or not) that is "active" in the first sense, when invoked by a plain "python" command, will become "active" in the second sense. But a running python ("active" in the second sense) will not necessarily be the "active" one in the first sense. Implications for installers: A library installer invoked from the command line, unless told otherwise, will presumably install its payload into the python environment found via PATH. Consequently, in preparation, the intended target python should be made "active" in the first sense. I have not elaborated here on my other concern (since I don't understand the details) -- clarification of different degrees of isolation/autonomy which can be established for each virtual environment. I still believe that's important to understand, and the current article and PEP 405 don't cover it successfully, in my view. -- resolution: fixed -> status: closed -> open ___ Python tracker <http://bugs.python.org/issue18939> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20906] Issues in Unicode HOWTO
Graham Wideman added the comment: > Do you want to provide a patch? I would be happy to, but I'm not currently set up to create a patch. Also, I hoped that an author who has more history with this article would supervise, especially where I don't know what the original intent was. > I find use of the word "narrative" intimidating in the context of a technical > documentation. Agreed. How about "In documentation such as the current article..." > In general, I find it disappointing that the Unicode HOWTO only gives > hexadecimal representations of non-ASCII characters and (almost) never > represents them in their true form. This makes things more abstract > than necessary. I concur with reducing unnecessary abstraction. No sure what you mean by "true form". Do you mean show the glyph which the code point represents? Or the sequence of bytes? Or display the code point value in decimal? > > This is a vague claim. Probably what was intended was: "Many > > Internet standards define protocols in which the data must > > contain no zero bytes, or zero bytes have special meaning." > > Is this actually true? Are there "many" such standards? > I think it actually means that Internet protocols assume an ASCII-compatible > encoding (which UTF-8 is, but not UTF-16 or UTF-32 - nor EBCDIC :-)). Ah -- yes that makes sense. > > --> "Non-Unicode code systems usually don't handle all of > > the characters to be found in Unicode." > The term *encoding* is used pervasively when dealing with the transformation > of unicode to/from bytes, so I find it confusing to introduce another term > here > ("code systems"). I prefer the original sentence. I see that my revision missed the target. There is a problem, but it is wider than this sentence. One of the most essential points this article should make clear is the distinction between older schemes with a single mapping: Characters <--> numbers in particular binary format. (eg: ASCII) ... versus Unicode with two levels of mapping... Characters <--> code point numbers <--> particular binary format of the number data and sequences thereof. In the older schemes, "encoding" referred to the one mapping: chars <--> numbers in particular binary format. In Unicode, "encoding" refers only to the mapping: code point numbers <--> binary format. It does not refer to the chars <--> code point mapping. (At least, I think that's the case. Regardless, the two mappings need to be rigorously distinguished.) On review, there are many points in the article that muddy this up. For example, "Unicode started out using 16-bit characters instead of 8-bit characters". Saying "so-an-so-bit characters" about Unicode, in the current article, is either wrong, or very confusing. Unicode characters are associated with code points, NOT with any _particular_ bit level representation. If I'm right about the preceding, then it would be good for that to be spelled out more explicitly, and used consistently throughout the article. (I won't try to list all the examples of this problem here -- too messy.) -- ___ Python tracker <http://bugs.python.org/issue20906> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20906] Issues in Unicode HOWTO
Graham Wideman added the comment: A further issue regarding "one-to-one mappings". Article: "Encodings don’t have to be simple one-to-one mappings like Latin-1. Consider IBM’s EBCDIC, which was used on IBM mainframes." I don't think this paragraph is about one-to-one mappings per se. (ie: one character to one code.) It seems to be about whether ranges of characters whose code values are contiguous in one coding system are also contiguous in another coding system. The EBCDIC encoding is still one-to-one, I believe. The subject of one-chararacter-to-one-code mapping is important (normalization etc), though perhaps beyond the current article. But I think the article should avoid suggesting that many-to-one or one-to-many scenarios are common. -- ___ Python tracker <http://bugs.python.org/issue20906> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20906] Issues in Unicode HOWTO
Graham Wideman added the comment: Antoine: Thanks for your comments -- this is slippery stuff. > It's better, but how about simply "In this article"? I was hoping to inform the reader that the hex representations are found in many articles, not just special to this one. > [ showing the glyph ] Agreed -- it would be good to show the glyphs mentioned. But in a way that isn't confusing if the user's web browser doesn't show it correctly. > For all intents and purposes, iso-8859-1 and friends *are* encodings > (and this is how Python actually names them). I am still mulling this over. iso-8859-1 is most literally an "encoding" in the old sense of the word (character <--> byte representation), and is not, per se, a unicode-related concept. I think part of the ambiguity problem here is that there are two subtly but importantly different ideas here: 1. Python string (capable of representing any unicode text) --> some full-fidelity and industry recognized unicode byte stream, like utf-8, or utf-32. I think this is legitimately described as an "encoding" of the unicode string. versus: 2. 1. Python string --> some other code system, such as ASCII, cp1250, etc. The destination code system doesn't necessarily have anything to do with unicode, and whole ranges of unicode's characters either result in an exception, or get translated as escape sequences. Ie: This is more usefully seen as a translation operation, than "merely" encoding. In 1, the encoding process results in data that stays within concepts defined within Unicode. In 2, encoding produces data that would be described by some code system outside of Unicode. At the moment I think Python muddles these two ideas together, and I'm not sure how to clarify this. > So it should say "16-bit code points" instead, right? I don't think Unicode code points should ever be described as having a particular number of bits. I think this is a core concept: Unicode separates the character <--> code point, and code point <--> bits/bytes mappings. At most, one might want to distinguish different ranges of unicode code points. Even if there is a need to distinguish code points <= 65535, I don't think this should be described as "16-bit", as it muddies the distinction between Unicode's two mappings. -- ___ Python tracker <http://bugs.python.org/issue20906> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20906] Issues in Unicode HOWTO
Graham Wideman added the comment: Marc-Andre: Thanks for commenting: > > 2. 1. Python string --> some other code system, such as > > ASCII, cp1250, etc. The destination code system doesn't > > necessarily have anything to do with unicode, and whole > > ranges of unicode's characters either result in an > > exception, or get translated as escape sequences. > > Ie: This is more usefully seen as a translation > > operation, than "merely" encoding. > Those are encodings as well. The operation going from Unicode to one of > these encodings is called "encode" in Python. Yes I am certainly aware that in Python parlance these are also called "encode" (and achieved with encode()), which, I am arguing, is one reason we have confusion. These are not encoding into a recognized Unicode-defined byte stream, they entail translation and filtering into the allowed character set of a different code system and encoding into that code system's byte representation (encoding). > > In 1, the encoding process results in data that stays within concepts > > defined within Unicode. In 2, encoding produces data that would be > > described by some code system outside of Unicode. > > At the moment I think Python muddles these two ideas together, > > and I'm not sure how to clarify this. > An encoding is a mapping of characters to ordinals, nothing more or less. In unicode, the mapping from characters to ordinals (code points) is not the encoding. It's the mapping from code points to bytes that's the encoding. While I wish this was a distinction reserved for pedants, unfortunately it's an aspect that's important for users of unicode to understand in order to make sense of how it works, and what the literature and the web says (correct and otherwise). > You are viewing all this from the a Unicode point of view, but please > realize that Unicode is rather new in the business and the many > other encodings Python supports have been around for decades. I'm advocating that the concepts be clear enough to understand that Unicode (UTF-whatever) works differently (two mappings) than non-Unicode systems (single mapping), so that users have some hope of understanding what happens in moving from one to the other. > > > So it should say "16-bit code points" instead, right? > > I don't think Unicode code points should ever be described as > > having a particular number of bits. I think this is a > > core concept: Unicode separates the character <--> code point, > > and code point <--> bits/bytes mappings. > You have UCS-2 and UCS-4. UCS-2 representable in 16 bits, UCS-4 > needs 21 bits, but is typically stored in 32-bit. Still, > you're right: it's better to use the correct terms UCS-2 vs. UCS-4 > rather than refer to the number of bits. I think mixing in UCS just adds confusion here. Unicode consortium has declared "UCS" obsolete, and even wants people to stop using that term: http://www.unicode.org/faq/utf_bom.html "UCS-2 is obsolete terminology... the term should now be avoided." (That's a somewhat silly position -- we must still use the term to talk about legacy stuff. But probably not necessary here.) So my point wasn't about UCS. It was about referring to code points as having a particular bit width. Fundamentally, code points are numbers, without regard to some particular computer number format. It is a separate matter that they can be encoded in 8, 16 or 32 bit encoding schemes (utf-8, 16, 32), and that is independent of the magnitude of the code point number. It _is_ the case that some code points are large enough integers that when encoded they _require_, say, 3 bytes in utf-8, or two 16-bit words in utf-16 and so on. But the number of bits used in the encoding does not necessarily correspond to the number of bits that would be required to represent the integer code point number in plain binary. (Only in UTF-32 is the encoded value simply the binary version of the code point value.) -- ___ Python tracker <http://bugs.python.org/issue20906> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20906] Issues in Unicode HOWTO
Graham Wideman added the comment: Marc-Andre: Thanks for your latest comments. > We could also have called encodings: "character set", "code page", > "character encoding", "transformation", etc. I concur with you that things _could_ be called all sorts of names, and the choices may be arbitrary. However, creating a clear explanation requires figuring out the distinct things of interest in the domain, picking terms for those things that are distinct, and then using those terms rigorously. (Usage in the field may vary, which in itself may warrant comment.) I read your slide deck/time-capsule-from-2002, with interest, on a number of points. (I realize that you were involved in the Python 2.x implementation of Unicode. Not sure about 3.x?) Page 8 "What is a Character?" is lovely, showing very explicitly Unicode's two levels of mapping, and giving names to the separate parts. It strongly suggests this HOWTO page needs a similar figure. That said, there are a few notes to make on that slide, useful in trying to arrive at consistent terms: 1. The figure shows a more precise word for "what users regard as a character", namely "grapheme". I'd forgotten that. 2. It shows e-accent-acute to demonstrate a pair of code points representing a single grapheme. That's important, but should avoid suggesting this as the only way to form e-accent-acute (canonical equivalence, U+00E9). 3. The illustration identifies the series of code points (the middle row) as "the Unicode encoding of the string". Ie: The grapheme-to-code-points mapping is described as an encoding. Not a wrong use of general language. But inconsistent with the mapping that encode() pertains to. (And I don't think that the code-point-to-grapheme transform is ever called "decoding", but I could be wrong.) 4. The illustration of Code Units (in the third row) shows graphemes for the Code Units (byte values). This confusingly glosses over the fact that those graphemes correspond to what you would see if you _decoded_ these byte values using CP1252 or ISO 8859-1, suggesting that the result is reasonable or useful. It certainly happens that people do this, deliberately or accidentally, but it is a misuse of the data, and should be warned against, or at least explained as a confusion. Returning to your most recent message: > In Python keep it simple: you have Unicode (code points) and > 8-bit strings or bytes (code units). I wish it _were_ that simple. And I agree that, in principle, (assuming Python 3+) there should "inside your program" where you have the str type which always acts as sequences of Unicode code points, and has string functions. And then there's "outside your program", where text is represented by sequences of bytes that specify or imply some encoding. And your program should use supplied library functions to mostly automatically convert on the way in and on the way out. But there are enough situations where the Python programmer, having adopted Python 3's string = Unicode approach, sees unexpected results. That prompts reading this page, which is called upon to make the fine distinctions to allow figuring out what's going on. I'm not sure what you mean by "8-bit strings" but I'm pretty sure that's not an available type in Python 3+. Ie: Some functions (eg: encode()) produce sequences of bytes, but those don't work entirely like strs. --- This discussion to try to revise the article piecemeal has become pretty diffuse, with perhaps competing notions of purpose, and what level of detail and precision are needed etc. I will try to suggest something productive in a subsequent message. -- ___ Python tracker <http://bugs.python.org/issue20906> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20906] Issues in Unicode HOWTO
Graham Wideman added the comment: At the moment I've run out of time to exert much forward push on this. By way of temporary summary/suggestion for regrouping: Focus on what this page is intending to deliver. What concepts should readers of this page be able to distinguish and understand when they are finished? To scope out the needed concepts, I suggest identifying representative unicode-related stumbling blocks (possibly from stackoverflow questions). Here's an example case: just trying to get trivial "beyond ASCII" functionality to work on Windows (Win7, Python 3.3): s = 'knight \u265E' print('Hello ' + s) ... which fails with: "UnicodeEncodeError: 'charmap' codec can't encode character '\u265e' in position 13: character maps to undefined". A naive attempt to fix this by using s.encode() results in the "+" operation failing. What paths forward do programmers explore in an effort to have this code (a) not throw an exception, and produce at least some output, and (b) make it produce the correct output? And why does it work as intended on linux? The set of concepts identified and explained in this article needs to be sufficient to underpin an understanding of the distinct data types, encodings, decodings, translations, settings etc relevant to this problem, and how to use them to get a desired result. There are similar problems that occur at other Python-system boundaries, which would further illuminate the set of necessary concepts. Thanks for all comments. -- Graham -- ___ Python tracker <http://bugs.python.org/issue20906> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20906] Issues in Unicode HOWTO
Graham Wideman added the comment: @Andre: _I_ know more or less the explanations behind all this. I am just putting it forward as an example which touches several concepts which are needed to explain it, and that a programmer might reason with to change a program (or the environment) to produce some output (instead of an exception), and possibly even the intended output. For example, behind the brief explanation you provide, here are some of the related concepts: 1. print(s) sends output to stdout, which sends data to windows console (cmd.exe). 2. In the process, the output function that print --> stdout invokes attempts to encode s according to the encoding that the destination, cmd.exe reports that it expects. 3. On Windows (in English, or perhaps it's US locale), cmd.exe defaults to expecting encoding cp437. 4. cp437 is an encoding containing only 256 characters. Many Unicode code points obviously have no corresponding character in cp437. 5. The encoding process used by print() is set to exception on characters that have no mapping to the encoding wanted by stdout. 6. Consequently, print() throws an exception on code points outside of those representable in cp437. Based on that, there are a number of moves the programmer might make, with varying results... possibly involving: -- s.encode([various choices of options here]) --> s_as_bytes -- print(s_as_bytes) (noting that 'Hello ' + s_as_bytes doesn't work) -- Or maybe ascii(s) -- Or possibly sys.stdout.buffer.write() -- Pros and cons of the above, which require careful tracking of what the resulting strings or byte sequences "really mean" at each juncture. -- cmd.exe chcp 65001 --> so print(unicode) won't exception, but still many chars will show as [?] -- various font choices in cmd.exe which might be able to show the needed graphemes. -- Automatic font substitution that occurs in some contexts when the selected font doesn't contain a requested code point and its grapheme. ... and probably more concepts that I've missed. -- Graham -- ___ Python tracker <http://bugs.python.org/issue20906> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20906] Issues in Unicode HOWTO
Graham Wideman added the comment: @R David: I agree with you. Thanks for extending the line of thinking I outlined. -- ___ Python tracker <http://bugs.python.org/issue20906> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19136] CSV, builtin open(), newline arg. Docs broken again.
New submission from Graham Wideman: The docs appear to be incorrect for CSV at: http://docs.python.org/3.3/library/csv.html. Per issue http://bugs.python.org/issue7198 , there's a long history of contention between os.open and csv.writer, in which, on Windows, the default result is an unwanted additional '\r'. That was 'fixed' by using the newline='' argument to open(). This is reflected in the docs at the above link: with open('eggs.csv', 'w', newline='') as csvfile: spamwriter = csv.writer(csvfile, delimiter=...) However, in python 3.3.2 use of the newline argument returns: "TypeError: 'newline' is an invalid keyword argument for this function." In brief testing, it appears that a correct result can be obtain by calling open as follows: with open(somepath, 'wb') as writerfile: writer = csv.writer(writerfile, delimiter=...) Note: binary mode, not text as previously needed and currently documented. -- assignee: docs@python components: Documentation messages: 198752 nosy: docs@python, gwideman priority: normal severity: normal status: open title: CSV, builtin open(), newline arg. Docs broken again. type: behavior versions: Python 3.3, Python 3.4 ___ Python tracker <http://bugs.python.org/issue19136> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19136] CSV, builtin open(), newline arg. Docs broken again.
Graham Wideman added the comment: David: Yes, as it turns out you are absolutely right, in a manner of speaking. I have retested this exhaustively today, and here's the root cause. It turns out that in testing, I must have activate a particular simplified test script by invoking only scriptname.py rather than invoking 'python scriptname.py'. (And then repeating that mistake by reinvoking via console history... doh!) The latter reliably invokes python 3.3.2, because that's the only python in my PATH. The former, it turns out, invokes the Windows Python Launcher, which finds a previously installed Python 2.7.1, despite that not being on the PATH. So, in my mind, the possibility of launching any version other than Python 3.3.2 did not enter the picture. Prior to this, I was only vaguely aware that Windows Python Launcher existed. Ironically, it was probably installed by Python 3.3.2. Sorry for the bogus bug alert. -- ___ Python tracker <http://bugs.python.org/issue19136> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19141] Windows Launcher fails to respect PATH
New submission from Graham Wideman: Python Launcher for Windows provides some important value for Windows users, but its ability to invoke python versions not on the PATH is a problem. py.exe chooses a version of Python to invoke, in more or less this order of decreasing priority; it is the *last* one that occurs by default in a new install of python 3.3.x: 1. Shebang line in myscript.py 2. py.exe -n argument (n can be 2, 3, 3.2 etc). Launcher chooses the latest installed version so specified. 3. PY_PYTHON environment variable 4. py.ini in C:\WINDOWS or user's %LOCALAPPDATA% directory 5. Launcher hunts through registry for ALL previously installed pythons, and picks the latest version in the 2.x series. DEFAULT. The first issue to note is that, to my knowledge, the exact precedence order is not documented... it would greatly help if this were done. That said, the focus in this report is case 5, which as noted is the default behavior when python 3.3.2 is installed (and py.exe invoked with scripts having no launcher-aware shebang line). In case 5, py.exe completely ignores the PATH environment variable. So, whereas PATH is used to find py.exe, or when the user invokes 'python' on the command line, py.exe ignores PATH and launches a version of python that is not necessarily in the PATH. In case 2 where the user supplies a value for 'n', finding a non-PATH version of python is excusable on the basis that the user deliberately requests a version. However, in case 5, the user is not invoking py explicitly, and is not necessarily aware of py's algorithm for finding all installed versions. The user might reasonably expect that invoking a script or double clicking it would just invoke 'python' the same as the 'python' command, using PATH. In particular, if the user understands how PATH works (as reviewed in the docs here: http://docs.python.org/3/using/windows.html#finding-the-python-executable), then upon installing 3.3.x, he or she might explicitly *remove* python 2.x from PATH in the expectation that this will disable python 2.x. It is surprising and potentially harmful that py.exe does not abide by that choice. A potential improvement is to interpose an item '4.5' in the above list, in which py.exe looks for a version of python on the PATH before falling back to searching for latest 2.x python ever installed. (It is not clear that py.exe should *ever* fallback to just picking the latest 2.x in the registry (item 5). It is conceivable that a user may have configured one of those pythons to do something destructive or insecure on startup, and it will be a great surprise if py.exe "randomly" invokes it just because it has the highest version number.) -- components: Windows messages: 198812 nosy: gwideman priority: normal severity: normal status: open title: Windows Launcher fails to respect PATH type: behavior versions: Python 3.3, Python 3.4, Python 3.5 ___ Python tracker <http://bugs.python.org/issue19141> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19141] Windows Launcher fails to respect PATH
Graham Wideman added the comment: Hi Vinay, thanks for commenting. And of course for your efforts on py.exe (and no doubt the debate process.) I am trying to draw attention to the situation where the script has no shebang line, and there is no other explicit configuration info for py.exe. (No py.ini file, no py.exe envt variables, no py.exe-specific command-line args). In that case, the next thing py.exe should check, in my view, is the user's PATH, where they may well have defined which python version they prefer (even if they are unaware of PEP 397 and Launcher). This rationale is parallel to the one in #17903 that you pointed to. Currently, py.exe ignores PATH in that case, and falls back to looking through all installed pythons and picking the latest 2.x if available. > The choosing of 2.x vs. 3.x is also mentioned in the PEP The discussion of that issue would be illuminating, but I couldn't find it. Could you point to where this is mentioned in PEP-0397? Thanks again. -- ___ Python tracker <http://bugs.python.org/issue19141> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19805] Revise FAQ Dictionary: consistent key order item
New submission from Graham Wideman: FAQ entry: http://docs.python.org/3/faq/programming.html#how-can-i-get-a-dictionary-to-display-its-keys-in-a-consistent-order claims that there's no way for a dictionary to return keys in a consistent order. However, there's OrderedDict which should probably be mentioned here. -- assignee: docs@python components: Documentation messages: 204550 nosy: docs@python, gwideman priority: normal severity: normal status: open title: Revise FAQ Dictionary: consistent key order item type: enhancement versions: Python 3.3 ___ Python tracker <http://bugs.python.org/issue19805> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20906] Unicode HOWTO
New submission from Graham Wideman: The Unicode HOWTO article is an attempt to help users wrap their minds around Unicode. There are some opportunities for improvement. Issues presented in order of the narrative: http://docs.python.org/3.3/howto/unicode.html History of Character Codes --- References to the 1980's are a bit off. "In the mid-1980s an Apple II BASIC program..." Assuming the comment is about the state of play in the mid-80's, then: The Apple II appeared in 1977. By 1985 we already had Macs, and PCs running DOS, which were capable of various character sets (not to mention lowercase letters!) "In the 1980s, almost all personal computers were 8-bit" Both the PC (1983) and Mac (1984) had 16-bit processors. Definitions: "Characters are abstractions": Not helpful unless one already knows what "abstraction" means in this specific context. "the symbol for ohms (Ω) is usually drawn much like the capital letter omega (Ω) in the Greek alphabet [...] but these are two different characters that have different meanings." Omega is a poor example for this concept. Omega is used as the identifier for a unit in the same way as "m" is used for meter, or "A" is used for ampere. Each is a specific use of a character, which, like any specific use, has a particular meaning. However, having a particular meaning doesn't necessarily require a separate character, and in the case of omega, the Unicode standard now says that the separate "ohm" character is deprecated. "The ohm sign is canonically equivalent to the capital omega, and normalization would remove any distinction." http://www.unicode.org/versions/Unicode4.0.0/ch07.pdf#search=%22character%20U%2B2126%20maps%20OR%20map%20OR%20mapping%22 A better example might be the roman numerals, code points U+2160 and subsequent. Definitions "A code point is an integer value, usually denoted in base 16." When trying to convey clearly the distinction between character, code point, and byte representation, the topic of "how it's denoted" is a potential distraction for the reader, so I suggest this point be a bit more explicitly parenthetical, and less confusable with "16 bit". Like: "A code point value is an integer in the range 0 to over 0x10 (about 1.1 million, with some 110 thousand assigned so far). In a narrative such as the current article, a code point value is usually written in hexadecimal. The Unicode standard displays code points with the notation U+265E to mean the character with value 0x265e (9822 decimal; "Black Chess Knight" character)." (Also revise subsequent para to use same example character. I suggest not using "Ethiotic Syllable WI", because it's unfamiliar to most readers, and it muddies the topic by suggesting that Unicode in general captures _syllables_ rather than _characters_.) Encodings: --- "This sequence needs to be represented as a set of bytes" --> ""This code point sequence needs to be represented as a sequence of bytes" "4. Many Internet standards are defined in terms of textual data" This is a vague claim. Probably what was intended was: "Many Internet standards define protocols in which the data must contain no zero bytes, or zero bytes have special meaning." Is this actually true? Are there "many" such standards? "Generally people don’t use this encoding," Probably "people" per se don't use any encoding, computers do. --> "Because of these problems, other more efficient and convenient encodings have been devised and are commonly used. For continuity, directly after that para should come the later paras starting with "UTF-8 is one of the most common". "2. A Unicode string is turned into a string of bytes..." --> "2. A Unicode string is turned into a sequence of bytes..." (Ie: don't overload "string" in and article about strings and encodings.). Create a new subhead "Converting from Unicode to non-Unicode encodings", and move under it the paras: "Encodings don't have to..." "Latin-1, also known as..." "Encodings don't have to..." But also revise: "Encodings don’t have to handle every possible Unicode character, and most encodings don’t." --> "Non-Unicode code systems usually don't handle all of the characters to be found in Unicode." -- assignee: docs@python components: Documentation messages: 213367 nosy: docs@python, gwideman priority: normal severity: normal status: open title: Unicode HOWTO type: enhancement versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 3.5 ___ Python tracker <http://bugs.python.org/issue20906> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com