Re: [Python-Dev] Tests and unicode
Martin v. Löwis wrote: > Reinhold Birkenfeld wrote: >> One problem is that no Unicode escapes can be used since compiling >> the file raises ValueErrors for them. Such strings would have to >> be produced using unichr(). > > You mean, in Unicode literals? There are various approaches, depending > on context: > - you could encode the literals as UTF-8, and decode it when the > module/test case is imported. See test_support.TESTFN_UNICODE > for an example. > - you could use unichr > - you could use eval, see test_re for an example Okay. I can fix this, but several library modules must be fixed too (mostly simple fixes), e.g. pickletools, gettext, doctest or encodings. >> Is this the right way? Or is disabling Unicode not supported any more? > > There are certainly tests that cannot be executed when Unicode is not > available. It would be good if such tests get skipped instead of being > failing, and it would be good if all tests that do not require Unicode > support run even when Unicode support is missing. That's my approach too. > Whether "it is supported" is a tricky question: your message indicates > that, right now, it is *not* supported (or else you wouldn't have > noticed a problem). Well, the core builds without Unicode, and any code that doesn't use unicode should run fine too. But the tests fail at the moment. > Whether we think it should be supported depends > on who "we" is, as with all these minor features: some think it is > a waste of time, some think it should be supported if reasonably > possible, and some think this a conditio sine qua non. It certainly > isn't a release-critical feature. Correct. I'll see if I have the time. Reinhold -- Mail address is perfectly valid! ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Martin Blais <[EMAIL PROTECTED]> writes:
> What if we could completely disable the implicit conversions between
> unicode and str? In other words, if you would ALWAYS be forced to
> call either .encode() or .decode() to convert between one and the
> other... wouldn't that help a lot deal with that issue?
I don't know. I've made one or two apps safe against this and it's
mostly just annoying.
> How hard would that be to implement?
import sys
reload(sys)
sys.setdefaultencoding('undefined')
> Would it break a lot of code? Would some people want that? (I know
> I would, at least for some of my code.) It seems to me that this
> would make the code more explicit and force the programmer to become
> more aware of those conversions. Any opinions welcome.
I'm not sure it's a sensible default.
Cheers,
mwh
--
It is never worth a first class man's time to express a majority
opinion. By definition, there are plenty of others to do that.
-- G. H. Hardy
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Michael Hudson wrote:
> Martin Blais <[EMAIL PROTECTED]> writes:
>
>
>>What if we could completely disable the implicit conversions between
>>unicode and str? In other words, if you would ALWAYS be forced to
>>call either .encode() or .decode() to convert between one and the
>>other... wouldn't that help a lot deal with that issue?
>
>
> I don't know. I've made one or two apps safe against this and it's
> mostly just annoying.
>
>>How hard would that be to implement?
>
> import sys
> reload(sys)
> sys.setdefaultencoding('undefined')
You shouldn't post tricks like these :-)
The correct way to change the default encoding is by
providing a sitecustomize.py module which then call the
sys.setdefaultencoding("undefined").
Note that the codec "undefined" was added for just this
reason.
>>Would it break a lot of code? Would some people want that? (I know
>>I would, at least for some of my code.) It seems to me that this
>>would make the code more explicit and force the programmer to become
>>more aware of those conversions. Any opinions welcome.
>
> I'm not sure it's a sensible default.
Me neither, especially since this would make it impossible
to write polymorphic code - e.g. ', '.join(list) wouldn't
work anymore if list contains Unicode; dito for u', '.join(list)
with list containing a string.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Sep 30 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free !
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] --disable-unicode (Tests and unicode)
Reinhold Birkenfeld wrote: > Martin v. Löwis wrote: >>>Whether we think it should be supported depends >>on who "we" is, as with all these minor features: some think it is >>a waste of time, some think it should be supported if reasonably >>possible, and some think this a conditio sine qua non. It certainly >>isn't a release-critical feature. > > Correct. I'll see if I have the time. Is the added complexity needed to support not having Unicode support compiled into Python really worth it ? I know that Martin introduced this feature a long time ago, so he will have had a reason for it. Today, I think the situation has changed: computers have more memory, are faster and the need to integrate (e.g. via XML) is stronger than ever - and maybe we should consider removing the option to get a cleaner code base with fewer #ifdefs and SyntaxErrors from the standard lib. What do you think ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 30 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Le lundi 03 octobre 2005 à 02:09 -0400, Martin Blais a écrit : > > What if we could completely disable the implicit conversions between > unicode and str? This would be very annoying when dealing with some modules or libraries where the type (str / unicode) returned by a function depends on the context, build, or platform. A good rule of thumb is to convert to unicode everything that is semantically textual, and to only use str for what is to be semantically treated as a string of bytes (network packets, identifiers...). This is also, AFAIU, the semantic model which is favoured for a hypothetical future version of Python. This is what I'm using to do safe conversion to a given type without worrying about the type of the argument: DEFAULT_CHARSET = 'utf-8' def safe_unicode(s, charset=None): """ Forced conversion of a string to unicode, does nothing if the argument is already an unicode object. This function is useful because the .decode method on an unicode object, instead of being a no-op, tries to do a double conversion back and forth (which often fails because 'ascii' is the default codec). """ if isinstance(s, str): return s.decode(charset or DEFAULT_CHARSET) else: return s def safe_str(s, charset=None): """ Forced conversion of an unicode to string, does nothing if the argument is already a plain str object. This function is useful because the .encode method on an str object, instead of being a no-op, tries to do a double conversion back and forth (which often fails because 'ascii' is the default codec). """ if isinstance(s, unicode): return s.encode(charset or DEFAULT_CHARSET) else: return s Good luck Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Antoine Pitrou wrote: > A good rule of thumb is to convert to unicode everything that is > semantically textual and isn't pure ASCII. (anyone who are tempted to argue otherwise should benchmark their applications, both speed- and memorywise, and be prepared to come up with very strong arguments for why python programs shouldn't be allowed to be fast and memory-efficient whenever they can...) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit : > Antoine Pitrou wrote: > > > A good rule of thumb is to convert to unicode everything that is > > semantically textual > > and isn't pure ASCII. How can you be sure that something that is /semantically textual/ will always remain "pure ASCII" ? That's contradictory, unless your software never goes out of the anglo-saxon world (and even...). > (anyone who are tempted to argue otherwise should benchmark their > applications, both speed- and memorywise, and be prepared to come > up with very strong arguments for why python programs shouldn't be > allowed to be fast and memory-efficient whenever they can...) I think most applications don't critically depend on text processing performance. OTOH, international adaptability is the kind of thing that /will/ bite you one day if you don't prepare for it at the beginning. Also, if necessary, the distinction could be an implementation detail and the conversion be transparent (like int vs. long): the text would be coded in an 8-bit charset as long as possible and converted to a wide encoding only when necessary. The important thing is that these optimisations, if they are necessary, should be transparently handled by the Python runtime. (it seems to me - I may be mistaken - that modern Windows versions treat every string as 16-bit unicode internally. Why are they doing it if it is that inefficient?) Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
On 10/3/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > > > > I'm not sure it's a sensible default. > > Me neither, especially since this would make it impossible > to write polymorphic code - e.g. ', '.join(list) wouldn't > work anymore if list contains Unicode; dito for u', '.join(list) > with list containing a string. Sounds like what you want is exactly what I want to avoid (for those two types anyway). cheers, ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no moreimplicit conversions).
Antoine Pitrou wrote: > > > A good rule of thumb is to convert to unicode everything that is > > > semantically textual > > > > and isn't pure ASCII. > > How can you be sure that something that is /semantically textual/ will > always remain "pure ASCII" ? "is" != "will always remain" ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Martin Blais wrote: > Hi. > > Like a lot of people (or so I hear in the blogosphere...), I've been > experiencing some friction in my code with unicode conversion > problems. Even when being super extra careful with the types of str's > or unicode objects that my variables can contain, there is always some > case or oversight where something unexpected happens which results in > a conversion which triggers a decode error. str.join() of a list of > strs, where one unicode object appears unexpectedly, and voila! > exception galore. Sometimes the problem shows up late because your > test code doesn't always contain accented characters. I'm sure many > of you experienced that or some variant at some point. > > I came to realize recently that this problem shares strong similarity > with the problem of implicit type conversions in C++, or at least it > feels the same: Stuff just happens implicitly, and it's hard to track > down where and when it happens by just looking at the code. Part of > the problem is that the unicode object acts a lot like a str, which is > convenient, but... I agree. I think it was a mistake to implicitly convert mixed string expressions to unicode. > What if we could completely disable the implicit conversions between > unicode and str? In other words, if you would ALWAYS be forced to > call either .encode() or .decode() to convert between one and the > other... wouldn't that help a lot deal with that issue? Perhaps. > How hard would that be to implement? Not hard. We considered doing it for Zope 3, but ... > Would it break a lot of code? Yes. > Would some people want that? No, I wouldn't want lots of code to break. ;) > (I know I would, at least for some of my > code.) It seems to me that this would make the code more explicit and > force the programmer to become more aware of those conversions. Any > opinions welcome. I think it's too late to change this. I wish it had been done differently. (OTOH, I'm very happy we have Unicode support, so I'm not really complaining. :) I'll note that this hasn't been that much of a problem for us in Zope. We follow the strategy: Antoine Pitrou wrote: ... > A good rule of thumb is to convert to unicode everything that is > semantically textual, and to only use str for what is to be semantically > treated as a string of bytes (network packets, identifiers...). This is > also, AFAIU, the semantic model which is favoured for a hypothetical > future version of Python. This approach has worked pretty well for us. Still, when there is a problem, it's a real pain to debug because the error occurs too late, as you point out. Jim -- Jim Fulton mailto:[EMAIL PROTECTED] Python Powered! CTO (540) 361-1714http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
M.-A. Lemburg wrote:
> Michael Hudson wrote:
>
>>Martin Blais <[EMAIL PROTECTED]> writes:
>>
>>
>>
>>>What if we could completely disable the implicit conversions between
>>>unicode and str? In other words, if you would ALWAYS be forced to
>>>call either .encode() or .decode() to convert between one and the
>>>other... wouldn't that help a lot deal with that issue?
>>
>>
>>I don't know. I've made one or two apps safe against this and it's
>>mostly just annoying.
>>
>>
>>>How hard would that be to implement?
>>
>>import sys
>>reload(sys)
>>sys.setdefaultencoding('undefined')
>
>
> You shouldn't post tricks like these :-)
>
> The correct way to change the default encoding is by
> providing a sitecustomize.py module which then call the
> sys.setdefaultencoding("undefined").
This is a much more evil trick IMO, as it affects all Python code,
rather than a single program.
I would argue that it's evil to change the default encoding
in the first place, except in this case to disable implicit
encoding or decoding.
Jim
--
Jim Fulton mailto:[EMAIL PROTECTED] Python Powered!
CTO (540) 361-1714http://www.python.org
Zope Corporation http://www.zope.com http://www.zope.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Jim Fulton wrote: > I would argue that it's evil to change the default encoding > in the first place, except in this case to disable implicit > encoding or decoding. absolutely. unfortunately, all attempts to add such information to the sys module documentation seem to have failed... (last time I tried, I seem to remember that someone argued that "it's there, so it should be documented in a neutral fashion") ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Proposal for 2.5: Returning values from PEP 342 enhanced generators
PEP 255 ("Simple Generators") closes with:
> Q. Then why not allow an expression on "return" too?
>
> A. Perhaps we will someday. In Icon, "return expr" means both "I'm
>done", and "but I have one final useful value to return too, and
>this is it". At the start, and in the absence of compelling uses
>for "return expr", it's simply cleaner to use "yield" exclusively
>for delivering values.
Now that Python 2.5 gained enhanced generators (multitudes rejoice!), i think
there is a compelling use for valued return statements in cooperative
multitasking code, of the kind:
def foo():
Data = yield Client.read()
[...]
MoreData = yield Client.read()
[...]
return FinalResult
def bar():
Result = yield foo()
For generators written in this style, "yield" means "suspend execution of the
current call until the requested result/resource can be provided", and
"return" regains its full conventional meaning of "terminate the current call
with a given result".
The simplest / most straightforward implementation would be for "return Foo"
to translate to "raise StopIteration, Foo". This is consistent with "return"
translating to "raise StopIteration", and does not break any existing
generator code.
(Another way to think about this change is that if a plain StopIteration means
"the iterator terminated", then a valued StopIteration, by extension, means
"the iterator terminated with the given value".)
Motivation by real-world example:
One system that could benefit from this change is Christopher Armstrong's
defgen.py[1] for Twisted, which he recently reincarnated (as newdefgen.py) to
use enhanced generators. The resulting code is much cleaner than before, and
closer to the conventional synchronous style of writing.
[1] the saga of which is summarized here:
http://radix.twistedmatrix.com/archives/000114.html
However, because enhanced generators have no way to differentiate their
intermediate results from their "real" result, the current solution is a
somewhat confusing compromise: the last value yielded by the generator
implicitly becomes the result returned by the call. Thus, to return
something, in general, requires the idiom "yield Foo; return". If valued
returns are allowed, this would become "return Foo" (and the code implementing
defgen itself would probably end up simpler, as well).
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Antoine Pitrou <[EMAIL PROTECTED]> wrote: > > Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit : > > Antoine Pitrou wrote: > > > > > A good rule of thumb is to convert to unicode everything that is > > > semantically textual > > > > and isn't pure ASCII. > > How can you be sure that something that is /semantically textual/ will > always remain "pure ASCII" ? That's contradictory, unless your software > never goes out of the anglo-saxon world (and even...). Non-unicode text input widgets. Works great. Can be had with the ANSI wxPython installation. > (it seems to me - I may be mistaken - that modern Windows versions treat > every string as 16-bit unicode internally. Why are they doing it if it > is that inefficient?) Because modern Windows supports all sorts of symbols which are necessary for certain special English uses (greek symbols for math, etc.), and trying to have all of them without just using the unicode backend that is used for all of the international "builds" (isn't it just a language definition?) anyways, would be a waste of time/effort. - Josiah ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 343 and __with__
I'm -1 on PEP 343. It seems ...complex. And even with all the complexity, I *still* won't be able to type with self.lock: ... which I submit is perfectly reasonable, clean, and clear. Instead I have to type with locking(self.lock): ... where locking() is apparently either a new builtin, a standard library function, or some 6-line contextmanager I have to write myself. So I have two suggestions. 1. I didn't find any suggestion of a __with__() method in the archives. So I feel I should suggest it. It would work just like __iter__(). class RLock: @contextmanager def __with__(self): self.acquire() try: yield finally: self.release() __with__() always returns a new context manager object. Just as with iterators, a context manager object has "cm.__with__() is cm". The 'with' statement would call __with__(), of course. Optionally, the type constructor could magically apply @contextmanager to __with__() if it's a generator, which is the usual case. It looks like it already does similar magic with __new__(). Perhaps this is too cute though. 2. More radical: Let's get rid of __enter__() and __exit__(). The only example in PEP 343 that uses them is Example 4, which exists only to show that "there's more than one way to do it". It all seems fishy to me. Why not get rid of them and use only __with__()? In this scenario, Python would expect __with__() to return a coroutine (not to say "iterator") that yields exactly once. Then the "@contextmanager" decorator wouldn't be needed on __with__(), and neither would any type constructor magic. The only drawback I see is that context manager methods implemented in C will work differently from those implemented in Python. Since C doesn't have coroutines, I imagine there would have to be enter() and exit() slots. Maybe this is a major design concern; I don't know. My apologies if this is redundant or unwelcome at this date. -j ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Josiah Carlson wrote: > > > and isn't pure ASCII. > > > > How can you be sure that something that is /semantically textual/ will > > always remain "pure ASCII" ? That's contradictory, unless your software > > never goes out of the anglo-saxon world (and even...). > > Non-unicode text input widgets. Works great. Can be had with the ANSI > wxPython installation. You're both missing that Python is dynamically typed. A single string source doesn't have to return the same type of strings, as long as the objects it returns are compatible with Python's string model and with each other. Under the default encoding (and quite a few other encodings), that's true for plain ascii strings and Unicode strings. This is a good thing. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 343 and __with__
At 12:37 PM 10/3/2005 -0400, Jason Orendorff wrote: >I'm -1 on PEP 343. It seems ...complex. And even with all the >complexity, I *still* won't be able to type > > with self.lock: ... > >which I submit is perfectly reasonable, clean, and clear. Which is why it's proposed to add __enter__/__exit__ to locks, and somewhat more controversially, file objects. (Guido objected on the basis that people might reuse the file object, but reusing a closed file object results in a sensible error message and so doesn't seem like a problem to me.) >[snip] >__with__() always returns a new context manager object. Just as with >iterators, a context manager object has "cm.__with__() is cm". > >The 'with' statement would call __with__(), of course. You didn't offer any reasons why this would be useful and/or good. >2. More radical: Let's get rid of __enter__() and __exit__(). The >only example in PEP 343 that uses them is Example 4, which exists only >to show that "there's more than one way to do it". It all seems fishy >to me. Why not get rid of them and use only __with__()? In this >scenario, Python would expect __with__() to return a coroutine (not to >say "iterator") that yields exactly once. Because this multiplies the difficulty of implementing context managers in C. It's easy to define a pair of C methods for __enter__ and __exit__, but an iterator requires creating another class in C. The yield-based syntax is just syntax sugar, not the essence of the proposal. >The only drawback I see is that context manager methods implemented in >C will work differently from those implemented in Python. Since C >doesn't have coroutines, I imagine there would have to be enter() and >exit() slots. Maybe this is a major design concern; I don't know. Considering your argument that locks should be contextmanagers, it would seem like a good idea for C implementations to be easy. :) >My apologies if this is redundant or unwelcome at this date. Since the PEP is accepted and has patches for both its implementation and a good part of its documentation, a major change like this would certainly need a better rationale. If your idea was that __with__ would somehow make it easier for locks to be context managers, it's based on a flawed premise. All that's required now is to have __enter__ and __exit__ call acquire() and release(). At this point, it's simply an open issue as to which stdlib objects will be context managers, and which will have helper functions or classes to serve as context managers. The actual API used to implement them has little or no bearing on that issue. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unifying str and unicode
Hi, Josiah: > > How can you be sure that something that is /semantically textual/ will > > always remain "pure ASCII" ? That's contradictory, unless your software > > never goes out of the anglo-saxon world (and even...). > > Non-unicode text input widgets. You didn't understand my statement. I didn't mean : - how can you /technically enforce/ no unicode text at all but : - how can you be sure that your users will never /want/ to enter some text that can't be represented with the current 8-bit charset? Of course the answer to the latter is: you can't. Fredrik: > Under the default encoding (and quite a few other encodings), that's true for > plain ascii strings and Unicode strings. If I have an unicode string containing legal characters greater than 0x7F, and I pass it to a function which converts it to str, the conversion fails. If I have an 8-bit string containing legal non-ascii characters in it (for example the name of a file as returned by the filesystem, which I of course have no prior control on), and I give it to a function which does an implicit conversion to unicode, the conversion fails. Here is an example so that you really understand. I am under a French locale (iso-8859-15), let's just try to enter a French word and see what happens when converting to unicode: -> As a string constant: >>> s = "été" >>> s '\xe9t\xe9' >>> u = unicode(s) Traceback (most recent call last): File "", line 1, in ? UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128) -> By asking for input: >>> s = raw_input() été >>> s '\xe9t\xe9' >>> unicode(s) Traceback (most recent call last): File "", line 1, in ? UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128) It should work, but it fails miserably. In the current situation, if the programmer doesn't carefully plan for these cases by manually managing conversions (which of course he can do - but it's boring and bothersome - not to mention that many programmers do not even understand the issue!), some users will see the program die with a nasty exception, just because they happen to need a bit more than the plain latin alphabet without diacritics. (even the standard Python library is bitten: witness the weird getcwd() / getcwdu() pair...) I find it surprising that you claim there is no difficulty when everything points to the contrary. See for example how often confused developers ask for help on mailing-lists... Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 343 and __with__
"Phillip J. Eby" <[EMAIL PROTECTED]> writes: > Since the PEP is accepted and has patches for both its implementation and a > good part of its documentation, a major change like this would certainly > need a better rationale. Though given the amount of interest said patch has attracted (none at all) perhaps noone cares very much and the proposal should be dropped. Which would be a shame given the time I spent on it and all the hot air here on python-dev... Cheers, mwh (who still likes PEP 343 and doesn't particularly like Jason's suggested changes). -- Gevalia is undrinkable low-octane see-through only slightly roasted bilge water. Compared to .us coffee it is quite drinkable. -- Måns Nilsson, asr ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 343 and __with__
For the record, I very much want PEPs 342 and 343 implemented. I haven't had the time to look at the patch and don't expect to find the time any time soon, but it's not for lack of desire to see this feature implemented. I don't like Jason's __with__ proposal and even less like his idea to drop __enter__ and __exit__ (I think this would just make it harder to provide efficient implementations in C). I'm all for adding __enter__ and __exit__ to locks. I'm even considering that it might be a good idea to add them to files. For the record, here at Elemental we write a lot of Java code that uses database connections in a pattern that would have greatly benefited from a similar construct in Java. :) --Guido On 10/3/05, Michael Hudson <[EMAIL PROTECTED]> wrote: > "Phillip J. Eby" <[EMAIL PROTECTED]> writes: > > > Since the PEP is accepted and has patches for both its implementation and a > > good part of its documentation, a major change like this would certainly > > need a better rationale. > > Though given the amount of interest said patch has attracted (none at > all) perhaps noone cares very much and the proposal should be dropped. > Which would be a shame given the time I spent on it and all the hot > air here on python-dev... > > Cheers, > mwh > (who still likes PEP 343 and doesn't particularly like Jason's > suggested changes). > > -- > Gevalia is undrinkable low-octane see-through only slightly > roasted bilge water. Compared to .us coffee it is quite > drinkable. -- Måns Nilsson, asr > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unifying str and unicode
Antoine Pitrou wrote: > > Under the default encoding (and quite a few other encodings), that's true > > for > > plain ascii strings and Unicode strings. > > If I have an unicode string containing legal characters greater than > 0x7F, and I pass it to a function which converts it to str, the > conversion fails. so? if it does that, it's not unicode safe. what's that has to do with my argument (which is that you can safely mix ascii strings and unicode strings, because that's how things were designed). > Here is an example so that you really understand. I wrote the unicode type. I do understand how it works. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 343 and __with__
At 07:02 PM 10/3/2005 +0100, Michael Hudson wrote: >"Phillip J. Eby" <[EMAIL PROTECTED]> writes: > > > Since the PEP is accepted and has patches for both its implementation > and a > > good part of its documentation, a major change like this would certainly > > need a better rationale. > >Though given the amount of interest said patch has attracted (none at >all) Actually, I have been reading the patch and meant to comment on it. I was perplexed by the odd stack behavior of the new opcode until I realized that it's try/finally that's weird. :) I was planning to look into whether that could be cleaned up as well, when I got distracted and didn't go back to it. > perhaps noone cares very much and the proposal should be dropped. I care an awful lot, as 'with' is another framework-dissolving tool that makes it possible to do more things in library form, without needing to resort to template methods. It also enables more context-sensitive programming, in that "global" states can be set and restored in a structured fashion. It may take a while to feel the effects, but it's going to be a big improvement to Python, maybe as big as new-style classes, and certainly bigger than decorators. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unifying str and unicode
Hi, Le lundi 03 octobre 2005 à 20:37 +0200, Fredrik Lundh a écrit : > > If I have an unicode string containing legal characters greater than > > 0x7F, and I pass it to a function which converts it to str, the > > conversion fails. > > so? if it does that, it's not unicode safe. [...] > what's that has to do with > my argument (which is that you can safely mix ascii strings and unicode > strings, because that's how things were designed). If that's how things were designed, then Python's entire standard library (not to mention third-party libraries) is not "unicode safe" - to quote your own words - since many functions may return 8-bit strings containing non-ascii characters. There lies the problem for many people, until the stdlib is fixed - or until the string types are changed. That's why you very regularly see people complaining about how conversions sometimes break their code in various ways. Anyway, I don't think we will reach an agreement here. We have different expectations w.r.t. to how the programming language may/should handle general text. I propose we end the discussion. Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unifying str and unicode
Antoine Pitrou wrote: > > > If I have an unicode string containing legal characters greater than > > > 0x7F, and I pass it to a function which converts it to str, the > > > conversion fails. > > > > so? if it does that, it's not unicode safe. > [...] > > what's that has to do with > > my argument (which is that you can safely mix ascii strings and unicode > > strings, because that's how things were designed). > > If that's how things were designed, then Python's entire standard > brary (not to mention third-party libraries) is not "unicode safe" - > to quote your own words - since many functions may return 8-bit strings > containing non-ascii characters. huh? first you talk about functions that convert unicode strings to 8-bit strings, now you talk about functions that return raw 8-bit strings? and all this in response to a post that argues that it's in fact a good idea to use plain strings to hold textual data that happens to contain ASCII only, because 1) it works, by design, and 2) it's almost always more efficient. if you don't know what your own argument is, you cannot expect anyone to understand it. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] --disable-unicode (Tests and unicode)
M.-A. Lemburg wrote: > Is the added complexity needed to support not having Unicode support > compiled into Python really worth it ? If there are volunteers willing to maintain it, and the other volunteers are not affected: certainly. > I know that Martin introduced this feature a long time ago, > so he will have had a reason for it. I added it because users requested it. I personally never use it. > Today, I think the situation has changed: computers have more > memory, are faster and the need to integrate (e.g. via XML) > is stronger than ever - and maybe we should consider removing > the option to get a cleaner code base with fewer #ifdefs > and SyntaxErrors from the standard lib. > > What do you think ? -0 for just ripping it out. +0 if PEP 5 is followed, atleast in spirit (i.e. give users advance warning to let them protest). I guess users in embedded builds (either in embedded systems, or embedding Python into some other application) might still be interested in the feature. Of course, these users could either recreate the feature if we remove it, or just stay with Python 2.4. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unifying str and unicode
> > If that's how things were designed, then Python's entire standard > > brary (not to mention third-party libraries) is not "unicode safe" - > > to quote your own words - since many functions may return 8-bit strings > > containing non-ascii characters. > > huh? first you talk about functions that convert unicode strings to 8-bit > strings, now you talk about functions that return raw 8-bit strings? Are you deliberately missing the argument? And can't you understand that conversions are problematic in both directions (str -> unicode /and/ unicode -> str)? If an stdlib function returns an 8-bit string containing non-ascii data, then this string used in unicode context incurs an implicit conversion, which fails. How's that for "unicode safety" of stdlib functions? Will you argue that this gives no difficulties to anyone ? > all this in response to a post that argues that it's in fact a good idea to > use plain strings to hold textual data that happens to contain ASCII only, To which you apparently didn't read my answer, that is: you can never be sure that a variable containing something which is /semantically/ textual (*) will never contain anything other than ASCII text. For example raw_input() won't tell you that its 8-bit string result contains some chars > 0x7F. Same for many other library functions. How do you cope with (more or less occasional) non-ascii data coming in as 8-bit strings? (*) that is, contains some natural language Either you carefully plan for non-ascii text coming in your application (including workarounds against Python's ascii-by-default conversion policy), or you deliberately cripple your application by deciding that non-ASCII text is forbidden in (some or all) places. Choose the latter and you'll be hostile to users. And this thread began with a poster who found difficult the way implicit conversions happen in Python. So it's very funny that you deny the existence of a problem for certain developers. Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] --disable-unicode (Tests and unicode)
Martin v. Löwis wrote: > M.-A. Lemburg wrote: > >>Is the added complexity needed to support not having Unicode support >>compiled into Python really worth it ? > > If there are volunteers willing to maintain it, and the other volunteers > are not affected: certainly. No objections there. I only see that --disable-unicode has already been broken a couple of times in the past and no-one (except those running test suites regularly) really noticed - at least not AFAIK. >>I know that Martin introduced this feature a long time ago, >>so he will have had a reason for it. > > I added it because users requested it. I personally never use it. > >>Today, I think the situation has changed: computers have more >>memory, are faster and the need to integrate (e.g. via XML) >>is stronger than ever - and maybe we should consider removing >>the option to get a cleaner code base with fewer #ifdefs >>and SyntaxErrors from the standard lib. >> >>What do you think ? > > -0 for just ripping it out. +0 if PEP 5 is followed, atleast > in spirit (i.e. give users advance warning to let them protest). > > I guess users in embedded builds (either in embedded systems, > or embedding Python into some other application) might still > be interested in the feature. Of course, these users could either > recreate the feature if we remove it, or just stay with > Python 2.4. If embedded build users rely on it, I'd suggest that these users take over maintenance of the patch set. Let's add a note to the configure switch that the feature will be removed in 2.6 and see what happens. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 30 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unifying str and unicode
At 10:38 PM 10/3/2005 +0200, Antoine Pitrou wrote: >To which you apparently didn't read my answer, that is: >you can never be sure that a variable containing something which >is /semantically/ textual (*) will never contain anything other than >ASCII text. For example raw_input() won't tell you that its 8-bit string >result contains some chars > 0x7F. Same for many other library >functions. How do you cope with (more or less occasional) non-ascii data >coming in as 8-bit strings? Presumably in Python 3.0, opening a file in "text" mode will require an encoding to be specified, and opening it in "binary" mode will cause it to produce or consume byte arrays, not strings. This should apply to sockets too, and really any I/O facility, including GUI frameworks, DBAPI objects, os.listdir(), etc. Of course, to get there we really need to add a convenient bytes type, perhaps by enhancing the current 'array' module. It'd be nice to have a way to get this in 2.x versions so people can start fixing stuff to work the right way. With no 8-bit strings coming in, there should be no unicode/str problems except those you create yourself. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes type
> Presumably in Python 3.0, opening a file in "text" mode will require an > encoding to be specified, and opening it in "binary" mode will cause it to > produce or consume byte arrays, not strings. This should apply to sockets > too, and really any I/O facility, including GUI frameworks, DBAPI objects, > os.listdir(), etc. Great :) > Of course, to get there we really need to add a convenient bytes type, > perhaps by enhancing the current 'array' module. It'd be nice to have a > way to get this in 2.x versions so people can start fixing stuff to work > the right way. Could the "bytes" type be just the same as the current "str" type but without the implicit unicode conversion ? Or am I missing some desired functionality ? ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes type
On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > Could the "bytes" type be just the same as the current "str" type but > without the implicit unicode conversion ? Or am I missing some desired > functionality ? No. It will be a mutable array of bytes. It will intentionally resemble strings as little as possible. There won't be a literal for it. But you will be able to convert between bytes and strings quite easily by specifying an encoding. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 343 and __with__
Phillip J. Eby writes:
> You didn't offer any reasons why this would be useful and/or good.
It makes it dramatically easier to write Python classes that correctly
support 'with'. I don't see any simple way to do this under PEP 343;
the only sane thing to do is write a separate @contextmanager
generator, as all of the examples do.
Consider:
# decimal.py
class Context:
...
def __enter__(self):
???
def __exit__(self, t, v, tb):
???
DefaultContext = Context(...)
Kindly implement __enter__() and __exit__(). Make sure your
implementation is thread-safe (not easy, even though
decimal.getcontext/.setcontext are thread-safe!). Also make sure it
supports nested 'with DefaultContext:' blocks (I don't mean lexically
nested, of course; I mean nested at runtime.)
The answer requires thread-local storage and a separate stack of saved
context objects per thread. It seems a little ridiculous to me.
Whereas:
class Context:
...
def __with__(self):
old = decimal.getcontext()
decimal.setcontext(self)
try:
yield
finally:
decimal.setcontext(old)
As for the second proposal, I was thinking we'd have one mental model
for context managers (block template generators), rather than two
(generators vs. enter/exit methods). Enter/exit seemed superfluous,
given the examples in the PEP.
> [T]his multiplies the difficulty of implementing context managers in C.
Nonsense.
static PyObject *
lock_with()
{
return PyContextManager_FromCFunctions(self, lock_acquire,
lock_release);
}
There probably ought to be such an API even if my suggestion is in
fact garbage (as, admittedly, still seems the most likely thing).
Cheers,
-j
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unifying str and unicode
Antoine Pitrou wrote: > To which you apparently didn't read my answer, that is: > you can never be sure that a variable containing something which > is /semantically/ textual (*) will never contain anything other than > ASCII text. That is simply not true. There are variables that is semantically textual, yet I can be sure that this is a byte string only if it consists just of ASCII. For example, if you invoke a Tkinter function, it will return a byte string if the result is purely ASCII, else return a Unicode string. This is an interface guarantee, hence I can be sure. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unifying str and unicode
On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > > > > If that's how things were designed, then Python's entire standard > > > brary (not to mention third-party libraries) is not "unicode safe" - > > > to quote your own words - since many functions may return 8-bit strings > > > containing non-ascii characters. > > > > huh? first you talk about functions that convert unicode strings to 8-bit > > strings, now you talk about functions that return raw 8-bit strings? > > Are you deliberately missing the argument? > And can't you understand that conversions are problematic in both > directions (str -> unicode /and/ unicode -> str)? Both directions are a problem. Just a note: it's not so much the conversions that I find problematic, but rather the implicit nature of the conversions (combined with the fact that they may fail). In addition to being difficult to track down, these implicit conversions may be costing processing time as well. cheers, ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 343 and __with__
At 05:15 PM 10/3/2005 -0400, Jason Orendorff wrote: >Phillip J. Eby writes: > > You didn't offer any reasons why this would be useful and/or good. > >It makes it dramatically easier to write Python classes that correctly >support 'with'. I don't see any simple way to do this under PEP 343; >the only sane thing to do is write a separate @contextmanager >generator, as all of the examples do. Wha? For locks (the example you originally gave), this is trivial. >Consider: > > # decimal.py > class Context: > ... > def __enter__(self): > ??? > def __exit__(self, t, v, tb): > ??? > > DefaultContext = Context(...) > >Kindly implement __enter__() and __exit__(). Make sure your >implementation is thread-safe (not easy, even though >decimal.getcontext/.setcontext are thread-safe!). Also make sure it >supports nested 'with DefaultContext:' blocks (I don't mean lexically >nested, of course; I mean nested at runtime.) > >The answer requires thread-local storage and a separate stack of saved >context objects per thread. It seems a little ridiculous to me. Okay, it was completely non-obvious from your post that this was the problem you're trying to solve. >Whereas: > > class Context: > ... > def __with__(self): > old = decimal.getcontext() > decimal.setcontext(self) > try: > yield > finally: > decimal.setcontext(old) This could also be done with a Context.replace() @contextmanager method. On the whole, I'm torn. I definitely like the additional flexibility this gives. On the other hand, it seems to me that __with__ and the additional C baggage violates the "if the implementation is hard to explain" rule. Also, people have already put a lot of effort into implementation and documentation patches based on an accepted PEP. That's not enough to override "the right thing to do", especially if it comes with a volunteer willing to update the work, but in this case the amount of additional goodness seems small, and it's not immediately apparent that you're volunteering to help change this even if Guido blessed it. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unifying str and unicode
Martin Blais wrote: > On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > If that's how things were designed, then Python's entire standard brary (not to mention third-party libraries) is not "unicode safe" - to quote your own words - since many functions may return 8-bit strings containing non-ascii characters. >>> >>>huh? first you talk about functions that convert unicode strings to 8-bit >>>strings, now you talk about functions that return raw 8-bit strings? >> >>Are you deliberately missing the argument? >>And can't you understand that conversions are problematic in both >>directions (str -> unicode /and/ unicode -> str)? > > > Both directions are a problem. > > Just a note: it's not so much the conversions that I find problematic, > but rather the implicit nature of the conversions (combined with the > fact that they may fail). In addition to being difficult to track > down, these implicit conversions may be costing processing time as > well. We've already pointed you to a solution which you might want to use. Why don't you just try it ? BTW, if you want to read up on all the reasons why Unicode was done the way it was, have a look at: http://www.python.org/peps/pep-0100.html and read up in the python-dev archives: http://mail.python.org/pipermail/python-dev/2000-March/thread.html and the next months after the initial checkin. >From what I've read on the web about the Python Unicode implementation we have one of the better ones compared to other languages implementations and their choices and design decisions. None of them is perfect, but that's seems to be an inherent problem with Unicode no matter how you try to approach it - even more so, if you are trying to add it to a language that has used ordinary C strings for text from day 1. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 30 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes type
Le lundi 03 octobre 2005 à 14:02 -0700, Guido van Rossum a écrit : > On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > > Could the "bytes" type be just the same as the current "str" type but > > without the implicit unicode conversion ? Or am I missing some desired > > functionality ? > > No. It will be a mutable array of bytes. It will intentionally > resemble strings as little as possible. There won't be a literal for > it. Thinking about it, it may have to offer the search and replace facilities offered by strings (including regular expressions). Here is an use case : say I'm reading an HTML file (or receiving it over the network). Since the character encoding can be specified in the HTML file itself (in the ...), I must first receive it as a bytes object. But then I must fetch the encoding information from the HTML header: therefore I must use some string ops on the bytes object to parse this information. Only after I have discovered the encoding, can I finally convert the bytes object to a text string. Or would there be another way to do it? ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes type
This would presumaby support the (read-only part of the) buffer API so search would be covered. I don't see a use case for replace. Alternatively, you could always specify Latin-1 as the encoding and convert it that way -- I don't think there's any input that can cause Latin-1 decoding to fail. On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > Le lundi 03 octobre 2005 à 14:02 -0700, Guido van Rossum a écrit : > > On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > > > Could the "bytes" type be just the same as the current "str" type but > > > without the implicit unicode conversion ? Or am I missing some desired > > > functionality ? > > > > No. It will be a mutable array of bytes. It will intentionally > > resemble strings as little as possible. There won't be a literal for > > it. > > Thinking about it, it may have to offer the search and replace > facilities offered by strings (including regular expressions). > > Here is an use case : say I'm reading an HTML file (or receiving it over > the network). Since the character encoding can be specified in the HTML > file itself (in the ...), I must first receive it as a > bytes object. But then I must fetch the encoding information from the > HTML header: therefore I must use some string ops on the bytes object to > parse this information. Only after I have discovered the encoding, can I > finally convert the bytes object to a text string. > > Or would there be another way to do it? > > > > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes type
Le lundi 03 octobre 2005 à 17:42 -0700, Guido van Rossum a écrit : > I don't see a use case for replace. Agreed. > Alternatively, you could always specify Latin-1 as the encoding and > convert it that way -- I don't think there's any input that can cause > Latin-1 decoding to fail. You seem to be right. « In 1992, the IANA registered the character map ISO-8859-1 (note the extra hyphen), a superset of ISO/IEC 8859-1, for use on the Internet. This map assigns control characters to the code values 00-1F, 7F, and 80-9F. It thus provides for 256 characters via every possible 8-bit value. » http://en.wikipedia.org/wiki/ISO_8859-1#ISO-8859-1 Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 64-bit bytecode compatibility (was Re: [PEAK] ez_setup on 64-bit linux problem)
Phillip J. Eby wrote: > At 09:49 AM 9/29/2005 -0400, Viren Shah wrote: > >> [I sent this earlier without being a subscriber and it was sent to the >> moderation queue so I'm resending it after subscribing] >> >> Hi, >> I'm running a 64-bit Fedora Core 3 with python 2.3.4. I'm trying to >> install setuptools to use with Trac, and get the following error: >> >> [EMAIL PROTECTED] ~]$ python ez_setup.py >> Downloading >> http://cheeseshop.python.org/packages/2.3/s/setuptools/setuptools-0.6a4-py2.3.egg >> >> >> Traceback (most recent call last): >> File "ez_setup.py", line 206, in ? >> main(sys.argv[1:]) >> File "ez_setup.py", line 141, in main >> from setuptools.command.easy_install import main >> OverflowError: signed integer is greater than maximum >> >> >> I get the same type of error if I try installing setuptools manually. >> I figure this has to do with the 64-bit nature of the OS and python, >> but not being a python person, don't know what a workaround would be. >> >> Any ideas? > > > Hm. It sounds like perhaps the 64-bit Python in question isn't able to > read bytecode for Python from a 32-bit Python version. You'll need to > download the setuptools source archive from PyPI and install it using > "python setup.py install" instead. > [Thanks for the quick response] I tried downloading and installing setuptools-0.6a4.zip with the same type of result: [EMAIL PROTECTED] setuptools-0.6a4]# python setup.py install running install running bdist_egg running egg_info writing ./setuptools.egg-info/PKG-INFO writing top-level names to ./setuptools.egg-info/top_level.txt writing entry points to ./setuptools.egg-info/entry_points.txt installing library code to build/bdist.linux-x86_64/egg running install_lib running build_py creating build creating build/lib copying pkg_resources.py -> build/lib copying easy_install.py -> build/lib creating build/lib/setuptools copying setuptools/depends.py -> build/lib/setuptools copying setuptools/archive_util.py -> build/lib/setuptools copying setuptools/dist.py -> build/lib/setuptools copying setuptools/__init__.py -> build/lib/setuptools copying setuptools/extension.py -> build/lib/setuptools copying setuptools/sandbox.py -> build/lib/setuptools copying setuptools/package_index.py -> build/lib/setuptools creating build/lib/setuptools/tests copying setuptools/tests/doctest.py -> build/lib/setuptools/tests copying setuptools/tests/__init__.py -> build/lib/setuptools/tests copying setuptools/tests/test_resources.py -> build/lib/setuptools/tests creating build/lib/setuptools/command copying setuptools/command/test.py -> build/lib/setuptools/command copying setuptools/command/saveopts.py -> build/lib/setuptools/command copying setuptools/command/easy_install.py -> build/lib/setuptools/command copying setuptools/command/build_ext.py -> build/lib/setuptools/command copying setuptools/command/egg_info.py -> build/lib/setuptools/command copying setuptools/command/install_lib.py -> build/lib/setuptools/command copying setuptools/command/develop.py -> build/lib/setuptools/command copying setuptools/command/alias.py -> build/lib/setuptools/command copying setuptools/command/sdist.py -> build/lib/setuptools/command copying setuptools/command/bdist_egg.py -> build/lib/setuptools/command copying setuptools/command/bdist_rpm.py -> build/lib/setuptools/command copying setuptools/command/rotate.py -> build/lib/setuptools/command copying setuptools/command/build_py.py -> build/lib/setuptools/command copying setuptools/command/upload.py -> build/lib/setuptools/command copying setuptools/command/setopt.py -> build/lib/setuptools/command copying setuptools/command/__init__.py -> build/lib/setuptools/command copying setuptools/command/install.py -> build/lib/setuptools/command creating build/bdist.linux-x86_64 creating build/bdist.linux-x86_64/egg copying build/lib/pkg_resources.py -> build/bdist.linux-x86_64/egg copying build/lib/easy_install.py -> build/bdist.linux-x86_64/egg creating build/bdist.linux-x86_64/egg/setuptools copying build/lib/setuptools/depends.py -> build/bdist.linux-x86_64/egg/setuptools creating build/bdist.linux-x86_64/egg/setuptools/tests copying build/lib/setuptools/tests/doctest.py -> build/bdist.linux-x86_64/egg/setuptools/tests copying build/lib/setuptools/tests/__init__.py -> build/bdist.linux-x86_64/egg/setuptools/tests copying build/lib/setuptools/tests/test_resources.py -> build/bdist.linux-x86_64/egg/setuptools/tests copying build/lib/setuptools/archive_util.py -> build/bdist.linux-x86_64/egg/setuptools copying build/lib/setuptools/dist.py -> build/bdist.linux-x86_64/egg/setuptools copying build/lib/setuptools/__init__.py -> build/bdist.linux-x86_64/egg/setuptools copying build/lib/setuptools/extension.py -> build/bdist.linux-x86_64/egg/setuptools copying build/lib/setuptools/sandbox.py -> build/bdist.linux-x86_64/egg/setuptools creating build/bdist.linux-x86_64/egg/setuptools/command copying build/lib/setuptools/command/tes
Re: [Python-Dev] 64-bit bytecode compatibility (was Re: [PEAK] ez_setup on 64-bit linux problem)
Phillip J. Eby wrote: > At 12:14 PM 9/29/2005 -0400, Viren Shah wrote: > >> File "/root/svn-install-apps/setuptools-0.6a4/pkg_resources.py", >> line 949, in _get >> return self.loader.get_data(path) >> OverflowError: signed integer is greater than maximum > > > Interesting. That looks like it might be a bug in the Python zipimport > module, which is what implements get_data(). Apparently it happens upon > importing as well; I assumed that it was a bytecode incompatibility. > > Checking the revision log, I find that there's a 64-bit fix for > zipimport.c in Python 2.4 that looks like it would fix this issue, but > it has not been backported to any revision of Python 2.3. You're going > to either have to backport the fix yourself and rebuild Python 2.3, or > upgrade to Python 2.4. Sorry. :( Cool! Thanks for the solution. I'll upgrade to python 2.4 and hope it works :-) Thanks for all your help Viren -- Viren R Shah Sr. Technical Advisor Virtual Technology Corporation [EMAIL PROTECTED] P: 703-333-6246 ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Proposal for 2.5: Returning values from PEP 342 enhanced generators
PEP 255 ("Simple Generators") closes with:
> Q. Then why not allow an expression on "return" too?
>
> A. Perhaps we will someday. In Icon, "return expr" means both "I'm
>done", and "but I have one final useful value to return too, and
>this is it". At the start, and in the absence of compelling uses
>for "return expr", it's simply cleaner to use "yield" exclusively
>for delivering values.
Now that Python 2.5 gained enhanced generators (multitudes rejoice!), i think
there is a compelling use for valued return statements in cooperative
multitasking code, of the kind:
def foo():
Data = yield Client.read()
[...]
MoreData = yield Client.read()
[...]
return FinalResult
def bar():
Result = yield foo()
For generators written in this style, "yield" means "suspend execution of the
current call until the requested result/resource can be provided", and
"return" regains its full conventional meaning of "terminate the current call
with a given result".
The simplest / most straightforward implementation would be for "return Foo"
to translate to "raise StopIteration, Foo". This is consistent with "return"
translating to "raise StopIteration", and does not break any existing
generator code.
(Another way to think about this change is that if a plain StopIteration means
"the iterator terminated", then a valued StopIteration, by extension, means
"the iterator terminated with the given value".)
Motivation by real-world example:
One system that could benefit from this change is Christopher Armstrong's
defgen.py[1] for Twisted, which he recently reincarnated (as newdefgen.py) to
use enhanced generators. The resulting code is much cleaner than before, and
closer to the conventional synchronous style of writing.
[1] the saga of which is summarized here:
http://radix.twistedmatrix.com/archives/000114.html
However, because enhanced generators have no way to differentiate their
intermediate results from their "real" result, the current solution is a
somewhat confusing compromise: the last value yielded by the generator
implicitly becomes the result returned by the call. Thus, to return
something, in general, requires the idiom "yield Foo; return". If valued
returns are allowed, this would become "return Foo" (and the code implementing
defgen itself would probably end up simpler, as well).
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Unicode charmap decoders slow
Is there a faster way to transcode from 8-bit chars (charmaps) to utf-8
than going through unicode()?
I'm writing a small card-file program. As a test, I use a 53 MB MBox file,
in mac-roman encoding. My program reads and parses the file into messages
in about 3 to 5 seconds (Wow! Go Python!), but takes about 14 seconds to
iterate over the cards and convert them to utf-8:
for i in xrange(len(cards)):
u = unicode(cards[i], encoding)
cards[i] = u.encode('utf-8')
The time is nearly all in the unicode() call. It's not so much how much
time it takes, but that it takes 4 times as long as the real work, just to
do table lookups.
Looking at the source (which, if I have it right, is
PyUnicode_DecodeCharmap() in unicodeobject.c), I think it is doing a
dictionary lookup for each character. I would have thought that it would
make and cache a LUT the size of the charmap (and hook the relevent
dictionary stuff to delete the cached LUT if the dictionary is changed).
(You may consider this a request for enhancement. ;)
I thought of using U"".translate(), but the unicode version is defined to
be slow, and anyway I can't find any way to just shove my 8-bit data into a
unicode string without translation. Is there some similar approach? I'm
almost (but not quite) ready to try it in Pyrex.
I'm new to Python. I didn't google anything relevent on python.org or in
groups. I posted this in comp.lang.python yesterday, got a couple of
responses, but I think this may be too sophisticated a question for that
group.
I'm not a member of this list, so please copy me on replies so I don't have
to hunt them down in the archive.
TonyN.:'
'
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal for 2.5: Returning values from PEP 342 enhanced generators
On 10/4/05, Piet Delport <[EMAIL PROTECTED]> wrote: > One system that could benefit from this change is Christopher Armstrong's > defgen.py[1] for Twisted, which he recently reincarnated (as newdefgen.py) to > use enhanced generators. The resulting code is much cleaner than before, and > closer to the conventional synchronous style of writing. > > [1] the saga of which is summarized here: > http://radix.twistedmatrix.com/archives/000114.html > > However, because enhanced generators have no way to differentiate their > intermediate results from their "real" result, the current solution is a > somewhat confusing compromise: the last value yielded by the generator > implicitly becomes the result returned by the call. Thus, to return > something, in general, requires the idiom "yield Foo; return". If valued > returns are allowed, this would become "return Foo" (and the code implementing > defgen itself would probably end up simpler, as well). Hey, that would be nice. I've found people confused by the way defgen handles return values before, getting seemingly meaningless values out of their defgens (if the defgen didn't specifically yield some meaningful value at the end). At first I thought "return foo" in a generator ought to be equivalent to "yield foo; return", but at least for defgen, it turns out raising StopIteration(foo) would be better, as I would have a very explicit way to specify and find the return value of the generator. -- Twisted | Christopher Armstrong: International Man of Twistery Radix|-- http://radix.twistedmatrix.com | Release Manager, Twisted Project \\\V/// |-- http://twistedmatrix.com |o O|| wvw-+ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode charmap decoders slow
As the OP suggests, decoding with a codec like mac-roman or iso8859-1 is very
slow compared to encoding or decoding with utf-8. Here I'm working with 53k of
data instead of 53 megs. (Note: this is a laptop, so it's possible that
thermal or battery management features affected these numbers a bit, but by a
factor of 3 at most)
$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "u.encode('utf-8')"
1000 loops, best of 3: 591 usec per loop
$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('utf-8')"
1000 loops, best of 3: 1.25 msec per loop
$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('mac-roman')"
100 loops, best of 3: 13.5 msec per loop
$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('iso8859-1')"
100 loops, best of 3: 13.6 msec per loop
With utf-8 encoding as the baseline, we have
decode('utf-8') 2.1x as long
decode('mac-roman') 22.8x as long
decode('iso8859-1') 23.0x as long
Perhaps this is an area that is ripe for optimization.
Jeff
pgpq6roOfs3n8.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unifying str and unicode
Antoine> If an stdlib function returns an 8-bit string containing Antoine> non-ascii data, then this string used in unicode context incurs Antoine> an implicit conversion, which fails. Such strings should be converted to Unicode at the point where they enter the application. That's likely the only place where you have a good chance of knowing the data encoding. Files generally have no encoding information associated with them. Some databases don't handle Unicode transparently. If you hang onto the input from such devices as plain strings until you need them as Unicode, you will almost certainly not know how the string was encoded. The state of the outside Unicode world being as miserable as it is (think web input forms), you often don't know the encoding at the interface and have to guess anyway. Even so, isolating that guesswork to the interface is better than recovering somewhere further downstream. Skip ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unifying str and unicode
On Oct 3, 2005, at 3:47 PM, Fredrik Lundh wrote:
> Antoine Pitrou wrote:
>
>
If I have an unicode string containing legal characters greater
than
0x7F, and I pass it to a function which converts it to str, the
conversion fails.
>>>
>>> so? if it does that, it's not unicode safe.
>>>
>> [...]
>>
>>> what's that has to do with
>>> my argument (which is that you can safely mix ascii strings and
>>> unicode
>>> strings, because that's how things were designed).
>>>
>>
>> If that's how things were designed, then Python's entire standard
>> brary (not to mention third-party libraries) is not "unicode safe" -
>> to quote your own words - since many functions may return 8-bit
>> strings
>> containing non-ascii characters.
>>
>
> huh? first you talk about functions that convert unicode strings
> to 8-bit
> strings, now you talk about functions that return raw 8-bit
> strings? and
> all this in response to a post that argues that it's in fact a good
> idea to
> use plain strings to hold textual data that happens to contain
> ASCII only,
> because 1) it works, by design, and 2) it's almost always more
> efficient.
>
> if you don't know what your own argument is, you cannot expect anyone
> to understand it.
Your point would be much easier to stomach if the "str" type could
*only* hold 7-bit ASCII. Perhaps that can be done when Python gets an
actual bytes type in 3.0. There indeed are a multitude of uses for
the efficient storage/processing of ASCII-only data. However,
currently, there are problems because it's so easy to screw yourself
without noticing when mixing unicode and str objects. If, on the
other hand, you have a 7bit ascii string type, and a 16/32-bit
unicode string type, both can be used interchangeably and there is no
possibility for any en/de-coding issues. And
asciiOnlyStringType.encode('utf-8') can become _ultra_ efficient, as
a bonus. :)
Seems win-win to me.
James
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
