Re: [Python-Dev] File encodings
Gustavo Niemeyer wrote: Given the fact that files have an 'encoding' parameter, and that any unicode strings with characters not in the 0-127 range will raise an exception if being written to files, isn't it reasonable to respect the 'encoding' attribute whenever writing data to a file? In general, files don't have an encoding parameter - sys.stdout is an exception. The reason why this works for print and not for write is that I considered "print unicodeobject" important, and wanted to implement that. file.write is an entirely different code path, so it doesn't currently consider Unicode objects; instead, it only supports strings (or, more generally, buffers). > This difference may become a really annoying problem when trying to > internationalize programs, since it's usual to see third-party code > dealing with sys.stdout, instead of using 'print'. Apparently, it isn't important enough that somebody had analysed this, and offered a patch. In any case, it would be quite unreliable to pass unicode strings to .write even *if* .write supported .encoding, since most files don't have .encoding. Even sys.stdout does not always have .encoding - only when it is a terminal, and only if we managed to find out what the encoding of the terminal is. Regards, Martin ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] File encodings
Gustavo Niemeyer wrote: Greetings, Today, while trying to internationalize a program I'm working on, I found an interesting side-effect of how we're dealing with encoding of unicode strings while being written to files. Suppose the following example: # -*- encoding: iso-8859-1 -*- print u"á" This will correctly print the string 'á', as expected. Now, what surprises me, is that the following code won't work in an equivalent way (unless using sys.setdefaultencoding()): # -*- encoding: iso-8859-1 -*- import sys sys.stdout.write(u"á\n") This will raise the following error: Traceback (most recent call last): File "asd.py", line 3, in ? sys.stdout.write(u"á") UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 0:ordinal not in range(128) This difference may become a really annoying problem when trying to internationalize programs, since it's usual to see third-party code dealing with sys.stdout, instead of using 'print'. The standard optparse module, for instance, has a reference to sys.stdout which is used in the default --help handling mechanism. You are mixing things here: The source encoding is meant for the parser and defines the way Unicode literals are converted into Unicode objects. The encoding used on the stdout stream doesn't have anything to do with the source code encoding and has to be handled differently. The idiom presented by Bob is the right way to go: wrap sys.stdout with a StreamEncoder. Using sys.setdefaultencoding() is *not* the right solution to the problem. In general when writing programs that are targetted for i18n, you should use Unicode for all text data and convert from Unicode to 8-bit only at the IO/UI layer. The various wrappers in the codecs module make this rather easy. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 30 2004) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] RELEASED Python 2.4 (final)
On behalf of the Python development team and the Python community, I'm happy to announce the release of Python 2.4. Python 2.4 is a final, stable release, and we can recommend that Python users upgrade to this version. Python 2.4 is the result of almost 18 month's worth of work on top of Python 2.3 and represents another stage in the careful evolution of Python. New language features have been kept to a minimum, many bugs have been fixed and a wide variety of improvements have been made. Notable changes in Python 2.4 include improvements to the importing of modules, generator expressions, function decorators, a number of new modules (including subprocess, decimal and cookielib) and countless numbers of fixed bugs and smaller enhancements. For more, see the (subjective) highlights, the release notes, or Andrew Kuchling's What's New In Python, all available from the 2.4 web page. http://www.python.org/2.4/ Please log any problems you have with this release in the SourceForge bug tracker (noting that you're using Python 2.4): http://sourceforge.net/bugs/?group_id=5470 Enjoy the new (stable!) release, Anthony Anthony Baxter [EMAIL PROTECTED] Python Release Manager (on behalf of the entire python-dev team) pgpgvRk3XuWeH.pgp Description: PGP signature ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] File encodings
Hello Bob, [...] > >Given the fact that files have an 'encoding' parameter, and that > >any unicode strings with characters not in the 0-127 range will > >raise an exception if being written to files, isn't it reasonable > >to respect the 'encoding' attribute whenever writing data to a > >file? > > No, because you don't know it's a file. You're calling a function with > a unicode object. The function doesn't know that the object was some > unicode object that came from a source file of some particular > encoding. I don't understand what you're saying here. The file knows itself is a file. The write function knows the parameter is unicode. > >The workaround for that problem is to either use the evil-considered > >sys.setdefaultencoding(), or to wrap sys.stdout. IMO, both options > >seem unreasonable for such a common idiom. > > There's no guaranteed correlation whatsoever between the claimed > encoding of your source document and the encoding of the user's > terminal, why do you want there to be? What if you have some source I don't. I want the write() function of file objects to respect the encoding attribute of these objects. This is already being done when print is used. I'm proposing to extend that behavior to the write function. That's all. > files with 'foo' encoding and others with 'bar' encoding? What about > ascii encoded source documents that use escape sequences to represent > non-ascii characters? What you want doesn't make any sense so long as > python strings and file objects deal in bytes not characters :) Please, take a long breath, and read my message again. :-) > Wrapping sys.stdout is the ONLY reasonable solution. [...] No, it's not. But I'm glad to know other people is also doing workarounds for that problem. -- Gustavo Niemeyer http://niemeyer.net ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] TRUNK UNFROZEN; release24-maint branch has been cut
I've cut the release24-maint branch, and updated the Include/patchlevel.h on trunk and branch (trunk is now 2.5a0, branch is 2.4+) The trunk and the branch are now both unfrozen and suitable for checkins. The feature freeze on the trunk is lifted. Remember - if you're checking bugfixes into the trunk, either backport them to the branch, or else mark the commit message with 'bugfix candidate' or 'backport candidate' or the like. Next up will be a 2.3.5 release. I'm going to be travelling for a large chunk of December (at very short notice) so it's likely that this will happen at the start of January. If someone else wants to cut a 2.3.5 sooner than that, please feel free to volunteer! 2.3.5 will be the last 2.3.x release, barring some almighty cockup - the next scheduled release will be 2.4.1, which will probably happen around May 2005. Anthony and yes, I'm drinking. ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] File encodings
> Gustavo Niemeyer wrote: > >Given the fact that files have an 'encoding' parameter, and that > >any unicode strings with characters not in the 0-127 range will > >raise an exception if being written to files, isn't it reasonable > >to respect the 'encoding' attribute whenever writing data to a > >file? > > In general, files don't have an encoding parameter - sys.stdout > is an exception. That's the only case I'd like to solve. If there are platforms that don't know how to set it, we could make the encoding attribute writable, and that would allow people to easily set it to the encoding which is deemed correct in their systems. > The reason why this works for print and not for write is that > I considered "print unicodeobject" important, and wanted to > implement that. file.write is an entirely different code path, > so it doesn't currently consider Unicode objects; instead, it > only supports strings (or, more generally, buffers). I understand your reasoning behind it, and would like to extend your idea to the write function, allowing anyone to use the common sys.stdout idiom to implement print-like functionality (like optparse and many others). For normal files, the absence of the encoding parameter would ensure the current behavior. > > This difference may become a really annoying problem when trying to > > internationalize programs, since it's usual to see third-party code > > dealing with sys.stdout, instead of using 'print'. > > Apparently, it isn't important enough that somebody had analysed this, > and offered a patch. In any case, it would be quite unreliable to That's what I'm doing here! :-) > pass unicode strings to .write even *if* .write supported .encoding, > since most files don't have .encoding. Even sys.stdout does not always > have .encoding - only when it is a terminal, and only if we managed to > find out what the encoding of the terminal is. I think that's acceptable. The encoding parameter is meant for output streams, and Python does its best to try to find a reasonable value for showing output strings. Thanks for your answer and clarifications, -- Gustavo Niemeyer http://niemeyer.net ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] File encodings
[...] > You are mixing things here: > > The source encoding is meant for the > parser and defines the way Unicode literals are converted > into Unicode objects. > > The encoding used on the stdout stream doesn't have anything > to do with the source code encoding and has to be handled > differently. Sorry. I probably wasn't clear enough in my message. I understand the issue, and I'm not discussing source encoding at all. The only problem I'd like to solve is that of output streams not being able to have unicode strings written. > The idiom presented by Bob is the right way to go: wrap > sys.stdout with a StreamEncoder. I don't see that as a good solution, since every Python software that is internationalizaed will have do figure out this wrapping, introducing extra overhead unnecessarily. > Using sys.setdefaultencoding() is *not* the right solution > to the problem. I understand. > In general when writing programs that are targetted for > i18n, you should use Unicode for all text data and > convert from Unicode to 8-bit only at the IO/UI layer. That's what I think as well. I just would expect that Python was kind enough to allow me to tell which output encoding I want, instead of wrapping the sys.stdout object with a non-native-file. IOW, being widely necessary, handling internationalization without wrapping sys.stdout everytime seems like a good step for a language like Python. > The various wrappers in the codecs module make this > rather easy. Thanks for the suggestion! -- Gustavo Niemeyer http://niemeyer.net ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] File encodings
Gustavo Niemeyer wrote: [...] You are mixing things here: The source encoding is meant for the parser and defines the way Unicode literals are converted into Unicode objects. The encoding used on the stdout stream doesn't have anything to do with the source code encoding and has to be handled differently. Sorry. I probably wasn't clear enough in my message. I understand the issue, and I'm not discussing source encoding at all. The only problem I'd like to solve is that of output streams not being able to have unicode strings written. The idiom presented by Bob is the right way to go: wrap sys.stdout with a StreamEncoder. I don't see that as a good solution, since every Python software that is internationalizaed will have do figure out this wrapping, introducing extra overhead unnecessarily. This wrapping is probably necessary for stateful encodings. If you had a sys.stdout.encoding=="utf-16", print would probably add the BOM every time a unicode object is printed. This doesn't happen if you wrap sys.stdout in a StreamWriter. [...] That's what I think as well. I just would expect that Python was kind enough to allow me to tell which output encoding I want, instead of wrapping the sys.stdout object with a non-native-file. IOW, being widely necessary, handling internationalization without wrapping sys.stdout everytime seems like a good step for a language like Python. You can't have stateful encodings without something that keeps state. The only thing that does keep state in Python is a StreamReader/StreamWriter. Bye, Walter Dörwald ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Trouble installing 2.4
I'm using Windows XP SP2. Uninstalled 2.3, installed 2.4 (running as me, not as administrator). No problems so far. Tried installing pywin32-203.win32-py2.4.exe When I try to install it as me, it gets as far as "ready to install." When I click Next, it says Can't load Python for pre-install script and quits, even though earlier it said it had found Python 2.4 in the registry. When I try to install it as Administrator, it quits immediately, saying that it couldn't locate a Python 2.4 installation. My hypothesis: When I install 2.4 as me, it puts it in my user registry, not the system-wide registry, and then pywin32 can't find it. I'm going to unstall and try again as Administrator. ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] File encodings
Hello Walter, > >I don't see that as a good solution, since every Python software > >that is internationalizaed will have do figure out this wrapping, > >introducing extra overhead unnecessarily. > > This wrapping is probably necessary for stateful encodings. If you > had a sys.stdout.encoding=="utf-16", print would probably add the > BOM every time a unicode object is printed. This doesn't happen if > you wrap sys.stdout in a StreamWriter. I'm not sure this is an issue for a terminal output stream, which is the case I'm trying to find a solution for. Otherwise, Python would already be in trouble for using this scheme in the print statement. Can you show an example of the print statement not working? -- Gustavo Niemeyer http://niemeyer.net ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
RE: [Python-Dev] Trouble installing 2.4
Follow-up: When I install Python as Administrator, all is well. In that case (but not when installing it as me), it asks whether I want to install it for all users or for myself only. I then install pywin32 and it works. So it may be that a caveat is in order to people who do not install 2.4 as Administrator. ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] File encodings
Gustavo Niemeyer wrote: Hello Walter, I don't see that as a good solution, since every Python software that is internationalizaed will have do figure out this wrapping, introducing extra overhead unnecessarily. This wrapping is probably necessary for stateful encodings. If you had a sys.stdout.encoding=="utf-16", print would probably add the BOM every time a unicode object is printed. This doesn't happen if you wrap sys.stdout in a StreamWriter. I'm not sure this is an issue for a terminal output stream, which is the case I'm trying to find a solution for. Otherwise, Python would already be in trouble for using this scheme in the print statement. Can you show an example of the print statement not working? No, I can't. Python doesn't accept UTF-16 as encoding. This works: > LANG=de_DE.UTF-8 python2.4 Python 2.4 (#1, Nov 30 2004, 14:16:24) [GCC 2.96 2731 (Red Hat Linux 7.3 2.96-113)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.stdout.encoding 'UTF-8' This doesn't: > LANG=de_DE.UTF-16 python2.4 Python 2.4 (#1, Nov 30 2004, 14:16:24) [GCC 2.96 2731 (Red Hat Linux 7.3 2.96-113)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.stdout.encoding 'ANSI_X3.4-1968' Bye, Walter Dörwald ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python.org current docs
http://www.python.org/doc/current/ and http://docs.python.org/ still point to 2.3.4 docs. Thomas ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Small subprocess patch
I'm planning to change the signature for subprocess.call slightly: -def call(*args, **kwargs): +def call(*popenargs, **kwargs): The purpose is to make it clearer that "args" in this context is not the same as the "args" argument to the Popen constructor. Two questions: 1) Is it OK to commit changes like this on the 2.4 branch, in addition to trunk? 2) Anyone that thinks that "kwargs" should be changed into "popenkwargs"? /Peter Åstrand <[EMAIL PROTECTED]> ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trouble installing 2.4
Andrew Koenig wrote: So it may be that a caveat is in order to people who do not install 2.4 as Administrator. I think the trouble is not with 2.4, here - the trouble is with installing pywin32. As you said, the installation of Python itself went fine. > My hypothesis: When I install 2.4 as me, it puts it in my user > registry, not the system-wide registry, I can confirm this hypothesis. In a per-user installation, the registry settings are deliberately change for the user, not for the entire system. Otherwise, it wouldn't be per-user. Also, the user might not be able to write to the machine registry (unless he is a member of the Power Users group). > and then pywin32 can't find it. That sounds likely, but I cannot confirm it. If it is, it is a bug in pywin32 (and, in turn, possibly in distutils). Regards, Martin ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python.org current docs
On Tuesday 30 November 2004 02:46 pm, Thomas Heller wrote: > http://www.python.org/doc/current/ > and > http://docs.python.org/ > > still point to 2.3.4 docs. I'll be fixing that up tonight. -Fred -- Fred L. Drake, Jr. ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: Small subprocess patch
On Tue, 30 Nov 2004, Peter Åstrand wrote: > 1) Is it OK to commit changes like this on the 2.4 branch, in addition to > trunk? I'm also wondering if patch 1071755 and 1071764 should go into release24-maint: * 1071755 makes subprocess raise TypeError if Popen is called with a bufsize that is not an integer. * 1071764 adds a new, small utility function. /Peter Åstrand <[EMAIL PROTECTED]> ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Roster Deadline
Hi Larry, FYI: I asked EB about the roster deadline and she says that she doesn't know when it is either. Checking on the Lei Out web page didn't help much either. So, you are no wiser now than at the start of this message. -tim ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python.org current docs
On Tuesday 30 November 2004 02:46 pm, Thomas Heller wrote: > http://www.python.org/doc/current/ > and > http://docs.python.org/ > > > still point to 2.3.4 docs. I think everything is properly updated now. Please let me know if I've missed anything. -Fred -- Fred L. Drake, Jr. ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
