Toshio Kuratomi <[EMAIL PROTECTED]> added the comment: > The bug tracker is maybe not the right place to discuss a new Python3 feature.
It's a bug! But if you guys want it to be a feature, then what mailing list do I need to join? Is there one devoted to Unicode or is python-dev where I need to go? >> 1) return mixed unicode and byte types in os.environ >One goal of Python3 was to avoid mixing bytes and characters (bytes/str). As stated, in my evaluation of the four options, +1 to this, option #1 takes us back to the problems encountered in python-2. >> 2) return only byte types in os.environ > os.environ contains text (characters) and so should decoded as unicode. This is correct but is not accurate :-) os.environ, the python variable, contains only unicode because that's the way it's coded. However, the Unix environment which os.environ attempts to give access to contains bytes which are almost always representable as characters. The two caveats are: 1) There's nothing that constrains it to characters -- putting byte sequences that do not include null in the environment is valid. 2) The characters in the environment may be mixed encodings, sometimes due to things outside of the user's control. >> 3) raise an exception if someone attempts to access an environment >> variable that cannot be decoded to unicode via the system encoding and >> allow the value to be accessed as a byte string via another method. >> 4) silently ignore the non-decodable variables when accessing os.environ >> the normal way but have another method of accessing it that returns all >> values as byte strings. > > Why not for (3). """ Do you mean, "I support 3"? Or did you not finish a thought here? > But what would be the "another method" (4) to access byte > string? The problem of having two methods is that you need consistent > objects. This is exactly the problem I was talking about in my analysis of #4 in the previous comment. This problem plagues the new os.listdir() method as well by introducing a construct that programmers can use that doesn't give all the information (os.listdir('.')) but also doesn't warn the programmer when the information is not being shown. > Imagine that you have os.environ (unicode) and os.environb (bytes). > > Example 1: > os.environb['PATH'] = b'\xff\xff\xff\xff' > What is the value in os.environ['PATH']? Since option 4 mimics the os.listdir() method, accesing os.environ['PATH'] would give you a KeyError. ie, the value was silently dropped just as os.listdir('.') does. > Example 2: > os.environb['PATH'] = b'têst' > What is the value in os.environ['PATH']? This doesn't work in python3 since byte strings can only be ASCii literals. > Example 3: > os.environ['PATH'] = 'têst' > What is the value in os.environb['PATH']? Dependent on the default system encoding. Assuming utf-8 encoding, os.environb['PATH'] == b't\xc3\xaast' > Example 4: > should I use os.environ['PATH'] or os.environb['PATH'] to get the current > PATH? Should you use os.listdir('.') or os.listdir(b'.') to get the list of files in the current directory? This is where treating pathnames, environment variables and etc as strings instead of bytes becomes non-simple. Now you have to decide what you really want to know (and possibly keep two slightly different values if you want to know two things.) If you want to keep the path in order to look up commands that the user can run you want os.environb['PATH'] since this is exactly what the shell will use when the user types a command at the commandline. If you want to display the elements of the PATH for the user, you probably want this:: try: path = os.environ['PATH'].split(':') except KeyError: try: temp_path = os.environ['PATH'].split(b':') except KeyError: path = DEFAULT_PATH else: path = [] for directory in os.environ['PATH'].split(b':'): path.append(unicode(directory, sys.getdefaultencoding(), 'replace')) > It introduces many new cases (bugs?) that have to be prepared and tested. Those bugs are *already present*. Without taking one of the four options, there's simply no way to code a solution. Take the above code and imagine that there's no way to access the user's PATH variable when a non-default-encoding character is present in the PATH. That means that you're always stuck with the value of DEFAULT_PATH instead of being able to display something reasonable to the user. (Note, these examples are pretty much the same for option #3 or option #4. The value of option #3 becomes apparent when you use os.getenv('PATH') instead of os.environ['PATH']) _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue4006> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com