Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > On May 4, 2005, at 6:20 PM, Shane Hathaway wrote: > >>>Nicholas Bastin wrote: >>> >>> "This type represents the storage type which is used by Python internally as the basis for holding Unicode ordinals. Extension module developers should make no assumptions about the size of this type on any given platform." >>> >>> >>>But people want to know "Is Python's Unicode 16-bit or 32-bit?" >>>So the documentation should explicitly say "it depends". >> >>On a related note, it would be help if the documentation provided a >>little more background on unicode encoding. Specifically, that UCS-2 >>is >>not the same as UTF-16, even though they're both two bytes wide and >>most >>of the characters are the same. UTF-16 can encode 4 byte characters, >>while UCS-2 can't. A Py_UNICODE is either UCS-2 or UCS-4. It took me > > I'm not sure the Python documentation is the place to teach someone > about unicode. The ISO 10646 pretty clearly defines UCS-2 as only > containing characters in the BMP (plane zero). On the other hand, I > don't know why python lets you choose UCS-2 anyhow, since it's almost > always not what you want. You've got that wrong: Python let's you choose UCS-4 - UCS-2 is the default. Note that Python's Unicode codecs UTF-8 and UTF-16 are surrogate aware and thus support non-BMP code points regardless of the build type: A UCS2-build of Python will store a non-BMP code point as UTF-16 surrogate pair in the Py_UNICODE buffer while a UCS4 build will store it as a single value. Decoding is surrogate aware too, so a UTF-16 surrogate pair in a UCS2 build will get treated as single Unicode code point. Ideally, the Python programmer should not really need to know all this and I think we've achieved that up to certain point (Unicode can be complicated - there's nothing to hide there). However, the C progammer using the Python C API to interface to some other Unicode implementation will need to know these details. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 06 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Fredrik Lundh wrote: > Thomas Heller wrote: > > >>AFAIK, you can configure Python to use 16-bits or 32-bits Unicode chars, >>independend from the size of wchar_t. The HAVE_USABLE_WCHAR_T macro >>can be used by extension writers to determine if Py_UNICODE is the same as >>wchar_t. > > > note that "usable" is more than just "same size"; it also implies that > widechar > predicates (iswalnum etc) works properly with Unicode characters, under all > locales. Only if you intend to use --with-wctypes; a configure option which will go away soon (for exactly the reason you are referring to: the widechar predicates don't work properly under all locales). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 06 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > On May 4, 2005, at 6:03 PM, Martin v. Löwis wrote: > > >>Nicholas Bastin wrote: >> >>>"This type represents the storage type which is used by Python >>>internally as the basis for holding Unicode ordinals. Extension >>>module >>>developers should make no assumptions about the size of this type on >>>any given platform." >> >>But people want to know "Is Python's Unicode 16-bit or 32-bit?" >>So the documentation should explicitly say "it depends". > > > The important piece of information is that it is not guaranteed to be a > particular one of those sizes. Once you can't guarantee the size, no > one really cares what size it is. The documentation should discourage > developers from attempting to manipulate Py_UNICODE directly, which, > other than trivia, is the only reason why someone would care what size > the internal representation is. I don't see why you shouldn't use Py_UNICODE buffer directly. After all, the reason why we have that typedef is to make it possible to program against an abstract type - regardless of its size on the given platform. In that respect it is similar to wchar_t (and all the other *_t typedefs in C). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 06 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Breaking out.
On 5/5/05, Steven Bethard <[EMAIL PROTECTED]> wrote: > On 5/5/05, Paul Moore <[EMAIL PROTECTED]> wrote: > > And does your proposal allow for "continue EXPR" as supported by PEP > > 340? I can't see that it could, given that your proposal treats block > > statements as not being loops. > > Read PEP 340 again -- the "continue EXPR" syntax is orthogonal to the > discussion -- PEP 340 adds it for *all* for loops, so for loops with > the non-looping block statements would also be able to use it. I know this. But we're talking here about Nick's new proposal for a non-looping block. All I am saying is that the new proposal needs to include this orthogonal feature. If it's a modification to PEP 340, that will come naturally. If it's a modification to PEP 310, it won't. A new PEP needs to include it. I am very much against picking bits out of a number of PEPs - that was implicit in my earlier post - sorry, I should have made it explicit. Specifically, PEP 340 should be accepted (possibly with modifications) as a whole, or rejected outright - no "rejected, but can we have continue EXPR in any case, as it's orthogonal" status exists... > > The looping behaviour is a (fairly nasty) wart, but I'm not sure I > > would insist on removing it at the cost of damaging other features I > > like. > > I don't think it "damages" any features. Are there features you still > think the non-looping proposal removes? (I'm not counting orthogonal > feautres like "continue EXPR" which could easily be added as an > entirely separate PEP.) I *am* specifically referring to these "orthogonal" features. Removal of looping by modification of PEP 340 will do no such "damage", I agree - but removal by accepting an updated PEP 310, or a new PEP, *will* (unless the "entirely separate PEP" you mention is written and accepted along with the non-looping PEP - and I don't think that will happen). Thanks for making me clarify what I meant. I left a little too much implicit in my previous post. Paul. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Examples as class's.
Ron Adam wrote:
A minor correction to the Block class due to re-editing.
> def __call__(self, *args):
> self.block(*args)
> self.__del__()
This should have been.
def __call__(self, *args):
try:
self.block(*args)
except Exception, self.__err__:
pass
self.__del__()
Which catches the error in the overriden "block", (I need to change that
to say "body"), method so it can be re-raised after the "final" method
is run. The "final" method can handle it if it chooses.
Thanks to Jim Jewett for noticing. It should make more sense now. :-)
In example (1.), Lock_It lost a carriage return. It should be.
class Lock_It(Get_Lock):
def block(self):
print "Do stuff while locked"
Lock_It(mylock())()
And example (3.) should be, although it may not run as is...
## 3. A template for committing or rolling back a database:
class Transactional(Block):
def start(self, db):
self.db = db
self.cursor = self.db.cursor()
def final(self):
if self.__err__:
self.db.rollback()
print "db rolled back due to err:", __err__
self.__err__ = None
else:
db.commit()
def block(self, batch):
for statement in batch:
self.cursor.execute(statement)
statement_batch = [
"insert into PEP340 values ('Guido','BDFL')",
"insert into PEP340 values ('More examples are needed')"]
db = pgdb.connect(dsn = 'localhost:pythonpeps')
Transactional(db)(statement_batch)
disconnect(db)
Another Block class could be used for connecting and disconecting.
Cheers, Ron_Adam
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Breaking out.
On 5/6/05, Greg Ewing <[EMAIL PROTECTED]> wrote: > Seems to me it should be up to the block iterator whether > a break statement gets caught or propagated, since it's > up to the block iterator whether the construct behaves > like a loop or not. > > This could be achieved by having a separate exception > for breaks, as originally proposed. > > If the iterator propagates the Break exception back out, > the block statement should break any enclosing loop. > If the iterator wants to behave like a loop, it can > catch the Break exception and raise StopIteration > instead. Yes, that's exactly what I was trying to say! I don't know if it's achievable in practice, but the fact that it was in the original proposal (something I'd forgotten, if indeed I ever realised) makes it seem more likely to me. Paul. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Breaking out.
Paul Moore wrote: > On 5/5/05, Nick Coghlan <[EMAIL PROTECTED]> wrote: >> 2. Manual protocol implementations are _significantly_ easier to write > > Hmm, I've not tried so I'll have to take your word for this. But I > don't imagine writing manual implementations much - one of the key > features I like about Guido's proposal is that generators can be used, > and the implementation is a clear template, with "yield" acting as a > "put the block here" marker (yes, I know that's an > oversimplification!). If using a generator is easier to code (but I tend to agree with Nick), a new type, a one-shot-generator (not really a generator, but some type of continuation), as suggested Steven Bethard with stmt, could be created: def opening(filename, mode="r"): f = open(filename, mode) try: yield break f finally: f.close() I prefer Nick's proposal however, since it simplifies non-looping constructs (no generator-template, break of parent loop supported), while leaving looping constructs (a minority in IMO) possible using a for, making things even clearer to me (but harder to implement). I'm still not convinced at all that using generators to implement a acquire/release pattern is a good idea... Regards, Nicolas ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Breaking out.
Paul Moore <[EMAIL PROTECTED]> writes: > On 5/5/05, Nick Coghlan <[EMAIL PROTECTED]> wrote: >> Well, Michael Hudson and Paul Moore are the current authors of PEP 310, so >> updating it with any of my ideas would be their call. > > I'm willing to consider an update - I don't know Michael's view. I'd slightly prefer PEP 310 to remain a very simple proposal, but don't really have the energy to argue with someone who thinks rewriting it makes more sense than creating a new PEP. Cheers, mwh -- Solaris: Shire horse that dreams of being a race horse, blissfully unaware that its owners don't quite know whether to put it out to grass, to stud, or to the knackers yard. -- Jim's pedigree of operating systems, asr ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Non-looping version (aka PEP 310 redux)
On Thursday 05 May 2005 16:03, Nick Coghlan wrote: > The discussion on the meaning of break when nesting a PEP 340 block > statement inside a for loop has given me some real reasons to prefer PEP > 310's single pass semantics for user defined statements That also solves a problem with resource acquisition block generators that I hadnt been able to articulate until now. What about resources whose lifetimes are more complex than a lexical block, where you cant use a block statement? It seems quite natural for code that want to manage its own resources to call __enter__ and __exit__ directly. Thats not true of the block generator API. -- Toby Dickenson ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] my first post: asking about a "decorator" module
On 5/5/05, Raymond Hettinger <[EMAIL PROTECTED]> wrote: > > Yes, there has been quite a bit of interest including several ASPN > recipes and a wiki: > >http://www.python.org/moin/PythonDecoratorLibrary Thanks, I didn't know about that page. BTW, I notice that all the decorators in that page are improper, in the sense that they change the signature of the function they decorate. So, all those recipes would need some help from my decorator module, to make them proper ;-) http://www.phyast.pitt.edu/~micheles/python/decorator.zip ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Breaking out.
Guido van Rossum wrote: >>Maybe generators are not the way to go, but could be >>supported natively by providing a __block__ function, very similarly to >>sequences providing an __iter__ function for for-loops? > > Sorry, I have no idea what you are proposing here. I was suggesting that the feature could be a PEP310-like object and that a __block__ function (or whatever) of the generator could return such an object. But at this point, Nick's proposition is what I prefer. I find the use of generators very elegant, but I'm still unconvinced it is a good idea to use them to implement an acquire/release pattern. Even if another continuation mechanism would be used (like Steven's idea), it would still be a lot of concepts used to implement acquire/release. Regards, Nicolas ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] my first post: asking about a "decorator" module
> > Yes, there has been quite a bit of interest including several ASPN > > recipes and a wiki: > > > >http://www.python.org/moin/PythonDecoratorLibrary > > Thanks, I didn't know about that page. BTW, I notice that all the > decorators > in that page are improper, in the sense that they change the signature of > the function they decorate. Signature changing and signature preserving are probably better classifications than proper and improper. Even then, some decorators like atexit() and classmethod() may warrant their own special categories. Raymond ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The decorator module
On 5/6/05, Jim Jewett <[EMAIL PROTECTED]> wrote: > Thank you; this is very good. > > I added a link to it from http://www.python.org/moin/PythonDecoratorLibrary; > please also consider adding a version number and publishing via PyPI. Yes, this was in my plans. For the moment, however, this is just version 0.1, I want to wait a bit before releasing an official release. > Incidentally, would the resulting functions be a bit faster if you compiled > the lambda instead of repeatedly eval ing it, or does the eval overhead still > apply? > > -jJ > Honestly, I don't care, since "eval" happens only once at decoration time. There is no "eval" overhead at calling time, so I do not expect to have problems. I am waiting for volunteers to perform profiling and performance analysis ;) Michele Simionato ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally
[Nick Coghlan] > > > What does a try statement with neither an except clause nor a finally > > clause mean? [Greg Ewing] > I guess it would mean the same as > >if 1: > ... > > Not particularly useful, but maybe it's not worth complexifying > the grammar just for the sake of disallowing it. > > Also, some people might find it useful for indenting a block > of code for cosmetic reasons, although that could easily > be seen as an abuse... I strongly disagree with this. It should be this: try_stmt: 'try' ':' suite ( except_clause ':' suite)+ ['else' ':' suite] ['finally' ':' suite] | 'finally' ':' suite ) There is no real complexity in this grammar, it's unambiguous, it's an easy enough job for the code generator, and it catches a certain class of mistakes (like mis-indenting some code). -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The decorator module
[jJ] > > Incidentally, would the resulting functions be a bit faster if you compiled > > the lambda instead of repeatedly eval ing it, or does the eval overhead > > still > > apply? [Michele] > Honestly, I don't care, since "eval" happens only once at decoration time. > There is no "eval" overhead at calling time, so I do not expect to have > problems. I am waiting for volunteers to perform profiling and > performance analysis ;) Watch out. I didn't see the code referred to, but realize that eval is *very* expensive on some other implementations of Python (Jython and IronPython). Eval should only be used if there is actual user-provided input that you don't know yet when your module is compiled; not to get around some limitation in the language there are usually ways around that, and occasionally we add one, e.g. getattr()). -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The decorator module
On 5/6/05, Guido van Rossum <[EMAIL PROTECTED]> wrote: > [Michele] > > Honestly, I don't care, since "eval" happens only once at decoration time. > > There is no "eval" overhead at calling time, so I do not expect to have > > problems. I am waiting for volunteers to perform profiling and > > performance analysis ;) > > Watch out. I didn't see the code referred to, but realize that eval is > *very* expensive on some other implementations of Python (Jython and > IronPython). Eval should only be used if there is actual user-provided > input that you don't know yet when your module is compiled; not to get > around some limitation in the language there are usually ways around > that, and occasionally we add one, e.g. getattr()). I actually posted the code on c.l.p. one month ago asking if there was a way to avoid "eval", but I had no answer. So, let me repost the code here and see if somebody comes out with a good solution. It is only ~30 lines long (+ ~30 of comments & docstrings) ## I suggest you uncomment the 'print lambda_src' statement in _decorate ## to understand what is going on. import inspect def _signature_gen(func, rm_defaults=False): argnames, varargs, varkwargs, defaults = inspect.getargspec(func) argdefs = defaults or () n_args = func.func_code.co_argcount n_default_args = len(argdefs) n_non_default_args = n_args - n_default_args non_default_names = argnames[:n_non_default_args] default_names = argnames[n_non_default_args:] for name in non_default_names: yield "%s" % name for i, name in enumerate(default_names): if rm_defaults: yield name else: yield "%s = arg[%s]" % (name, i) if varargs: yield "*%s" % varargs if varkwargs: yield "**%s" % varkwargs def _decorate(func, caller): signature = ", ".join(_signature_gen(func)) variables = ", ".join(_signature_gen(func, rm_defaults=True)) lambda_src = "lambda %s: call(func, %s)" % (signature, variables) # print lambda_src # for debugging evaldict = dict(func=func, call=caller, arg=func.func_defaults or ()) dec_func = eval(lambda_src, evaldict) dec_func.__name__ = func.__name__ dec_func.__doc__ = func.__doc__ dec_func.__dict__ = func.__dict__ # copy if you want to avoid sharing return dec_func class decorator(object): """General purpose decorator factory: takes a caller function as input and returns a decorator. A caller function is any function like this: def caller(func, *args, **kw): # do something return func(*args, **kw) Here is an example of usage: >>> @decorator ... def chatty(f, *args, **kw): ... print "Calling %r" % f.__name__ ... return f(*args, **kw) >>> @chatty ... def f(): pass >>> f() Calling 'f' """ def __init__(self, caller): self.caller = caller def __call__(self, func): return _decorate(func, self.caller) Michele Simionato ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340 - Remaining issues - keyword
[Greg Ewing] > How about 'do'? > >do opening(filename) as f: > ... > >do locking(obj): > ... > >do carefully(): # :-) > ... I've been thinking of that too. It's short, and in a nostalgic way conveys that it's a loop, without making it too obvious. (Those too young to get that should Google for do-loop. :-) I wonder how many folks call their action methods do() though. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340 - Remaining issues - keyword
[Guido] > ... > I wonder how many folks call their action methods do() though. A little Google(tm)-ing suggests it's not all that common, although it would break Zope on NetBSD: http://www.zope.org/Members/tino/ZopeNetBSD I can live with that . ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340 -- Clayton's keyword?
[Greg Ewing] > How about user-defined keywords? > > Suppose you could write > >statement opening > >def opening(path, mode): > f = open(path, mode) > try: >yield > finally: >close(f) > > which would then allow > >opening "myfile", "w" as f: > do_something_with(f) [etc.] This one is easy to reject outright: - I have no idea how that would be implemented, especially since you propose allowing to use the newly minted keyword as the target of an import. I'm sure it can be done, but it would be a major departure from the current parser/lexer separation and would undoubtedly be an extra headache for Jython and IronPython, which use standard components for their parsing. - It doesn't seem to buy you much -- just dropping two parentheses. - I don't see how it would handle the case where the block-controller is a method call or something else beyond a simple identifier. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally
[Guido van Rossum] > [Nick Coghlan] > > > What does a try statement with neither an except clause nor a > > > finally clause mean? > [Greg Ewing] > > > I guess it would mean the same as > >if 1: > > ... > I strongly disagree with this. [...] Allow me a quick comment on this issue. It happens once in a while that I want to comment out the except clauses of a try statement, when I want the traceback of the inner raising, for debugging purposes. Syntax forces me to also comment the `try:' line, and indent out the lines following the `try:' line. And of course, the converse operation once debugging is done. This is slightly heavy. At a few places, Python is helpful for such editorial things, for example, allowing a spurious trailing comma at end of lists, dicts, tuples. `pass' is also useful as a place holder for commented code. At least, the new proposed syntax would allow for some: finally: pass addendum when commenting except clauses, simplifying the editing job for the `try:' line and those following. P.S. - Another detail, while on this subject. On the first message I've read on this topic, the original poster wrote something like: f = None try: f = action1(...) ... finally: if f is not None: action2(f) The proposed syntax did not repeat this little part about "None", quoted above, so suggesting an over-good feeling about syntax efficiency. While nice, the syntax still does not solve this detail, which occurs frequently in my experience. Oh, I do not have solutions to offer, but it might be worth a thought from the mighty thinkers of this list :-) -- François Pinard http://pinard.progiciels-bpi.ca ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally
[François Pinard] > It happens once in a while that I want to comment out the except clauses > of a try statement, when I want the traceback of the inner raising, for > debugging purposes. Syntax forces me to also comment the `try:' line, > and indent out the lines following the `try:' line. And of course, the > converse operation once debugging is done. This is slightly heavy. I tend to address this by substituting a different exception. I don't see the use case common enough to want to allow dangling try-suites. > P.S. - Another detail, while on this subject. On the first message I've read > on this topic, the original poster wrote something like: > > f = None > try: > f = action1(...) > ... > finally: > if f is not None: > action2(f) > > The proposed syntax did not repeat this little part about "None", quoted > above, so suggesting an over-good feeling about syntax efficiency. > While nice, the syntax still does not solve this detail, which occurs > frequently in my experience. Oh, I do not have solutions to offer, but > it might be worth a thought from the mighty thinkers of this list :-) I don't understand your issue here. What is the problem with that code? Perhaps it ought to be rewritten as f = action1() try: ... finally: action2(f) I can't see how this would ever do something different than your version. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally
[Guido van Rossum] > [François Pinard] > > > It happens once in a while that I want to comment out the except > > clauses of a try statement, when I want the traceback of the inner > > raising, for debugging purposes. Syntax forces me to also comment > > the `try:' line, and indent out the lines following the `try:' line. > > And of course, the converse operation once debugging is done. This > > is slightly heavy. > I tend to address this by substituting a different exception. I don't > see the use case common enough to want to allow dangling try-suites. Quite agreed. I just wanted to tell there was a need. > > P.S. - Another detail, while on this subject. On the first message > > I've read on this topic, the original poster wrote something like: > > f = None > > try: > > f = action1(...) > > ... > > finally: > > if f is not None: > > action2(f) > > The proposed syntax did not repeat this little part about "None", > > quoted above, so suggesting an over-good feeling about syntax > > efficiency. While nice, the syntax still does not solve this > > detail, which occurs frequently in my experience. Oh, I do not have > > solutions to offer, but it might be worth a thought from the mighty > > thinkers of this list :-) > I don't understand your issue here. What is the problem with that > code? Perhaps it ought to be rewritten as > f = action1() > try: > ... > finally: > action2(f) > I can't see how this would ever do something different than your version. Oh, the problem is that if `action1()' raises an exception (and this is why it has to be within the `try', not before), `f' will not receive a value, and so, may not be initialised in all cases. The (frequent) stunt is a guard so this never becomes a problem. -- François Pinard http://pinard.progiciels-bpi.ca ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally
Guido van Rossum wrote: > [François Pinard] >> It happens once in a while that I want to comment out the except clauses >> of a try statement, when I want the traceback of the inner raising, for >> debugging purposes. Syntax forces me to also comment the `try:' line, >> and indent out the lines following the `try:' line. And of course, the >> converse operation once debugging is done. This is slightly heavy. > > I tend to address this by substituting a different exception. I don't > see the use case common enough to want to allow dangling try-suites. Easy enough, adding "raise" at the top of the except clause also solves the problem. >> P.S. - Another detail, while on this subject. On the first message I've read >> on this topic, the original poster wrote something like: >> >> f = None >> try: >> f = action1(...) >> ... >> finally: >> if f is not None: >> action2(f) >> >> The proposed syntax did not repeat this little part about "None", quoted >> above, so suggesting an over-good feeling about syntax efficiency. >> While nice, the syntax still does not solve this detail, which occurs >> frequently in my experience. Oh, I do not have solutions to offer, but >> it might be worth a thought from the mighty thinkers of this list :-) > > I don't understand your issue here. What is the problem with that > code? Perhaps it ought to be rewritten as > > f = action1() > try: > ... > finally: > action2(f) > > I can't see how this would ever do something different than your version. Well, in the original the call to action1 was wrapped in an additional try-except block. f = None try: try: f = action1() except: print "error" finally: if f is not None: action2(f) Reinhold -- Mail address is perfectly valid! ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally
François Pinard wrote: > It happens once in a while that I want to comment out the except clauses > of a try statement, when I want the traceback of the inner raising, for > debugging purposes. Syntax forces me to also comment the `try:' line, > and indent out the lines following the `try:' line. And of course, the > converse operation once debugging is done. This is slightly heavy. the standard pydiom for this is to change try: blabla except IOError: blabla to try: blabla except "debug": # IOError: blabla (to save typing, you can use an empty string or even put quotes around the exception name, but that may make it harder to spot the change) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The decorator module
At 07:55 AM 5/6/2005 -0700, Guido van Rossum wrote: >[jJ] > > > Incidentally, would the resulting functions be a bit faster if you > compiled > > > the lambda instead of repeatedly eval ing it, or does the eval > overhead still > > > apply? > >[Michele] > > Honestly, I don't care, since "eval" happens only once at decoration time. > > There is no "eval" overhead at calling time, so I do not expect to have > > problems. I am waiting for volunteers to perform profiling and > > performance analysis ;) > >Watch out. I didn't see the code referred to, but realize that eval is >*very* expensive on some other implementations of Python (Jython and >IronPython). Eval should only be used if there is actual user-provided >input that you don't know yet when your module is compiled; not to get >around some limitation in the language there are usually ways around >that, and occasionally we add one, e.g. getattr()). In this case, the informally-discussed proposal is to add a mutable __signature__ to functions, and have it be used by inspect.getargspec(), so that decorators can copy __signature__ from the decoratee to the decorated function. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally
François Pinard wrote: > > > f = None > > > try: > > > f = action1(...) > > > ... > > > finally: > > > if f is not None: > > > action2(f) > > f = action1() > > try: > > ... > > finally: > > action2(f) > > I can't see how this would ever do something different than your version. > Oh, the problem is that if `action1()' raises an exception (and this is > why it has to be within the `try', not before), `f' will not receive > a value, and so, may not be initialised in all cases. The (frequent) > stunt is a guard so this never becomes a problem. in Guido's solution, the "finally" clause won't be called at all if action1 raises an exception. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340 - For loop cleanup, and feature separation
At 01:58 PM 5/6/2005 +1000, Delaney, Timothy C (Timothy) wrote: >Personally, I'm of the opinion that we should make a significant break >(no pun intended ;) and have for-loops attempt to ensure that iterators >are exhausted. This is simply not backward compatible with existing, perfectly valid and sensible code. Therefore, this can't happen till Py3K. The only way I could see to allow this is if: 1. Calling __iter__ on the target of the for loop returns the same object 2. The for loop owns the only reference to that iterator. However, #2 is problematic for non-CPython implementations, and in any case the whole thing seems terribly fragile. So how about this: calling __exit__(StopIteration) on a generator that doesn't have any active blocks could simply *not* exhaust the iterator. This would ensure that any iterator whose purpose is just iteration (i.e. all generators written to date) still behave in a resumable fashion. Ugh. It's still fragile, though, as adding a block to an iterator will then make it behave differently. It seems likely to provoke subtle errors, arguing again in favor of a complete separation between iteration and block protocols. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally
[me] > > I can't see how this would ever do something different than your version. [Reinhold] > Well, in the original the call to action1 was wrapped in an additional > try-except > block. Ah. Francois was misquoting it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally
[Fredrik] > the standard pydiom for this is to change > > try: > blabla > except IOError: > blabla > > to > > try: > blabla > except "debug": # IOError: > blabla > > (to save typing, you can use an empty string or even > put quotes around the exception name, but that may > make it harder to spot the change) Yeah, but that will stop working in Python 3.0. I like the solution that puts a bare "raise" at the top of the except clause. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
On 5/6/05, Paul Moore <[EMAIL PROTECTED]> wrote: > > I don't think it "damages" any features. Are there features you still > > think the non-looping proposal removes? (I'm not counting orthogonal > > feautres like "continue EXPR" which could easily be added as an > > entirely separate PEP.) > > I *am* specifically referring to these "orthogonal" features. Removal > of looping by modification of PEP 340 will do no such "damage", I > agree - but removal by accepting an updated PEP 310, or a new PEP, > *will* (unless the "entirely separate PEP" you mention is written and > accepted along with the non-looping PEP - and I don't think that will > happen). So, just to make sure, if we had another PEP that contained from PEP 340[1]: * Specification: the __next__() Method * Specification: the next() Built-in Function * Specification: a Change to the 'for' Loop * Specification: the Extended 'continue' Statement * the yield-expression part of Specification: Generator Exit Handling would that cover all the pieces you're concerned about? I'd be willing to break these off into a separate PEP if people think it's a good idea. I've seen very few complaints about any of these pieces of the proposal. If possible, I'd like to see these things approved now, so that the discussion could focus more directly on the block-statement issues. STeVe [1] http://www.python.org/peps/pep-0340.html -- You can wordify anything if you just verb it. --- Bucky Katt, Get Fuzzy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
[Steven Bethard] > So, just to make sure, if we had another PEP that contained from PEP 340[1]: > * Specification: the __next__() Method > * Specification: the next() Built-in Function > * Specification: a Change to the 'for' Loop > * Specification: the Extended 'continue' Statement > * the yield-expression part of Specification: Generator Exit Handling > would that cover all the pieces you're concerned about? > > I'd be willing to break these off into a separate PEP if people think > it's a good idea. I've seen very few complaints about any of these > pieces of the proposal. If possible, I'd like to see these things > approved now, so that the discussion could focus more directly on the > block-statement issues. I don't think it's necessary to separate this out into a separate PEP; that just seems busy-work. I agree these parts are orthogonal and uncontroversial; a counter-PEP can suffice by stating that it's not countering those items nor repeating them. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
> > I'd be willing to break these off into a separate PEP if people think > > it's a good idea. I've seen very few complaints about any of these > > pieces of the proposal. If possible, I'd like to see these things > > approved now, so that the discussion could focus more directly on the > > block-statement issues. > > I don't think it's necessary to separate this out into a separate PEP; > that just seems busy-work. I agree these parts are orthogonal and > uncontroversial; a counter-PEP can suffice by stating that it's not > countering those items nor repeating them. If someone volunteers to split it out for you, I think it would be worthwhile. Right now, the PEP is hard to swallow in one bite. Improving its digestibility would be a big help when the PEP is offered up to the tender mercies to comp.lang.python. Raymond ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340 - For loop cleanup, and feature separation
Phillip J. Eby wrote: > At 01:58 PM 5/6/2005 +1000, Delaney, Timothy C (Timothy) wrote: > >>Personally, I'm of the opinion that we should make a significant break >>(no pun intended ;) and have for-loops attempt to ensure that iterators >>are exhausted. > > > This is simply not backward compatible with existing, perfectly valid and > sensible code. > Therefore, this can't happen till Py3K. > > The only way I could see to allow this is if: > > 1. Calling __iter__ on the target of the for loop returns the same object > 2. The for loop owns the only reference to that iterator. > > However, #2 is problematic for non-CPython implementations, and in any case > the whole thing seems terribly fragile. Is it better to have: 1. A single looping construct that does everything, 2. or several more specialized loops that are distinct? I think the second may be better for performance reasons. So it bay would better to just add a third loop construct just for iterators. a. for-loop -->iterable sequences and lists only b. while-loop --> bool evaluations only c. do-loop --> iterators only Choice c. could mimic a. and b. with an iterator when the situation requires a for-loop or while-loop with special handling. Ron_Adam ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
[me] > > I don't think it's necessary to separate this out into a separate PEP; > > that just seems busy-work. I agree these parts are orthogonal and > > uncontroversial; a counter-PEP can suffice by stating that it's not > > countering those items nor repeating them. [Raymond] > If someone volunteers to split it out for you, I think it would be > worthwhile. Right now, the PEP is hard to swallow in one bite. > Improving its digestibility would be a big help when the PEP is offered > up to the tender mercies to comp.lang.python. Well, I don't care so much about their tender mercies right now. I'm not even sure that if we reach agreement on python-dev there's any point in repeating the agony on c.l.py. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
[Guido]
> I don't think it's necessary to separate this out into a separate PEP;
> that just seems busy-work. I agree these parts are orthogonal and
> uncontroversial; a counter-PEP can suffice by stating that it's not
> countering those items nor repeating them.
[Raymond]
> If someone volunteers to split it out for you, I think it would be
> worthwhile. Right now, the PEP is hard to swallow in one bite.
> Improving its digestibility would be a big help when the PEP is offered
> up to the tender mercies to comp.lang.python.
Well, busy-work or not, I took the 20 minutes to split them up, so I
figured I might as well make them available. It was actually really
easy to split them apart, and I think they both read better this way,
but I'm not sure my opinion counts for much here anyway. ;-) (The
Enhanced Iterators PEP is first, the remainder of PEP 340 follows it.)
--
PEP: XXX
Title: Enhanced Iterators
Version:
Last-Modified:
Author: Guido van Rossum
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 6-May-2005
Post-History:
Introduction
This PEP proposes a new iterator API that allows values to be
passed into an iterator using "continue EXPR". These values are
received in the iterator as an argument to the new __next__
method, and can be accessed in a generator with a
yield-expression.
The content of this PEP is derived from the original content of
PEP 340, broken off into its own PEP as the new iterator API is
pretty much orthogonal from the anonymous block statement
discussion.
Motivation and Summary
...
Use Cases
See the Examples section near the end.
Specification: the __next__() Method
A new method for iterators is proposed, called __next__(). It
takes one optional argument, which defaults to None. Calling the
__next__() method without argument or with None is equivalent to
using the old iterator API, next(). For backwards compatibility,
it is recommended that iterators also implement a next() method as
an alias for calling the __next__() method without an argument.
The argument to the __next__() method may be used by the iterator
as a hint on what to do next.
Specification: the next() Built-in Function
This is a built-in function defined as follows:
def next(itr, arg=None):
nxt = getattr(itr, "__next__", None)
if nxt is not None:
return nxt(arg)
if arg is None:
return itr.next()
raise TypeError("next() with arg for old-style iterator")
This function is proposed because there is often a need to call
the next() method outside a for-loop; the new API, and the
backwards compatibility code, is too ugly to have to repeat in
user code.
Specification: a Change to the 'for' Loop
A small change in the translation of the for-loop is proposed.
The statement
for VAR1 in EXPR1:
BLOCK1
else:
BLOCK2
will be translated as follows:
itr = iter(EXPR1)
arg = None# Set by "continue EXPR2", see below
brk = False
while True:
try:
VAR1 = next(itr, arg)
except StopIteration:
brk = True
break
arg = None
BLOCK1
if brk:
BLOCK2
(However, the variables 'itr' etc. are not user-visible and the
built-in names used cannot be overridden by the user.)
Specification: the Extended 'continue' Statement
In the translation of the for-loop, inside BLOCK1, the new syntax
continue EXPR2
is legal and is translated into
arg = EXPR2
continue
(Where 'arg' references the corresponding hidden variable from the
previous section.)
This is also the case in the body of the block-statement proposed
below.
EXPR2 may contain commas; "continue 1, 2, 3" is equivalent to
"continue (1, 2, 3)".
Specification: Generators and Yield-Expressions
Generators will implement the new __next__() method API, as well
as the old argument-less next() method which becomes an alias for
calling __next__() without an argument.
The yield-statement will be allowed to be used on the right-hand
side of an assignment; in that case it is referred to as
yield-expression. The value of this yield-expression is None
unless __next__() was called with an argument; see below.
A yield-expression must always be parenthesized except when it
occurs at the top-level expression on the right-hand side of an
assignment. So
x = yield 42
x = yield
x = 12 + (yield 42)
x = 12 + (yield)
foo(yield 42)
foo(yield)
are all legal, but
x = 12 + yield 42
x = 12 + yield
foo(yield 42, 12)
foo(yield, 12)
are all
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 3:17 AM, M.-A. Lemburg wrote: > You've got that wrong: Python let's you choose UCS-4 - > UCS-2 is the default. > > Note that Python's Unicode codecs UTF-8 and UTF-16 > are surrogate aware and thus support non-BMP code points > regardless of the build type: A UCS2-build of Python will > store a non-BMP code point as UTF-16 surrogate pair in the > Py_UNICODE buffer while a UCS4 build will store it as a > single value. Decoding is surrogate aware too, so a UTF-16 > surrogate pair in a UCS2 build will get treated as single > Unicode code point. If this is the case, then we're clearly misleading users. If the configure script says UCS-2, then as a user I would assume that surrogate pairs would *not* be encoded, because I chose UCS-2, and it doesn't support that. I would assume that any UTF-16 string I would read would be transcoded into the internal type (UCS-2), and information would be lost. If this is not the case, then what does the configure option mean? -- Nick ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 3:25 AM, M.-A. Lemburg wrote: > I don't see why you shouldn't use Py_UNICODE buffer directly. > After all, the reason why we have that typedef is to make it > possible to program against an abstract type - regardless of > its size on the given platform. Because the encoding of that buffer appears to be different depending on the configure options. If that isn't true, then someone needs to change the doc, and the configure options. Right now, it seems *very* clear that Py_UNICODE may either be UCS-2 or UCS-4 encoded if you read the configure help, and you can't use the buffer directly if the encoding is variable. However, you seem to be saying that this isn't true. -- Nick ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 3:17 AM, M.-A. Lemburg wrote: > You've got that wrong: Python let's you choose UCS-4 - > UCS-2 is the default. No, that's not true. Python lets you choose UCS-4 or UCS-2. What the default is depends on your platform. If you run raw configure, some systems will choose UCS-4, and some will choose UCS-2. This is how the conversation came about in the first place - running ./configure on RHL9 gives you UCS-4. -- Nick ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally
[Guido van Rossum] > I like the solution that puts a bare "raise" at the top of the except > clause. Yes. Clean and simple enough. Thanks all! :-) -- François Pinard http://pinard.progiciels-bpi.ca ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 2:49 PM, Nicholas Bastin wrote: > If this is the case, then we're clearly misleading users. If the > configure script says UCS-2, then as a user I would assume that > surrogate pairs would *not* be encoded, because I chose UCS-2, and it > doesn't support that. I would assume that any UTF-16 string I would > read would be transcoded into the internal type (UCS-2), and > information would be lost. If this is not the case, then what does the > configure option mean? It means all the string operations treat strings as if they were UCS-2, but that in actuality, they are UTF-16. Same as the case in the windows APIs and Java. That is, all string operations are essentially broken, because they're operating on encoded bytes, not characters, but claim to be operating on characters. James ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
On 5/6/05, Steven Bethard <[EMAIL PROTECTED]> wrote: > Well, busy-work or not, I took the 20 minutes to split them up, so I > figured I might as well make them available. It was actually really > easy to split them apart, and I think they both read better this way, > but I'm not sure my opinion counts for much here anyway. ;-) (The > Enhanced Iterators PEP is first, the remainder of PEP 340 follows it.) Thanks for doing this. I think you may well be right - the two pieces feel more orthogonal like this (I haven't checked for dependencies, I'm trusting your editing and Guido's original assertion that the parts are independent). > -- > PEP: XXX > Title: Enhanced Iterators Strawman question - as this is the "uncontroversial" bit, can this part be accepted as it stands? :-) Paul. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > On May 6, 2005, at 3:17 AM, M.-A. Lemburg wrote: > > >>You've got that wrong: Python let's you choose UCS-4 - >>UCS-2 is the default. > > > No, that's not true. Python lets you choose UCS-4 or UCS-2. What the > default is depends on your platform. If you run raw configure, some > systems will choose UCS-4, and some will choose UCS-2. This is how the > conversation came about in the first place - running ./configure on > RHL9 gives you UCS-4. Hmm, looking at the configure.in script, it seems you're right. I wonder why this weird dependency on TCL was added. This was certainly not intended (see the comment): if test $enable_unicode = yes then # Without any arguments, Py_UNICODE defaults to two-byte mode case "$have_ucs4_tcl" in yes) enable_unicode="ucs4" ;; *) enable_unicode="ucs2" ;; esac fi The annotiation suggests that Martin added this. Martin, could you please explain why the whole *Python system* should depend on what Unicode type some installed *TCL system* is using ? I fail to see the connection. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 06 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
On 5/6/05, Paul Moore <[EMAIL PROTECTED]> wrote: > On 5/6/05, Steven Bethard <[EMAIL PROTECTED]> wrote: > > PEP: XXX > > Title: Enhanced Iterators > > Strawman question - as this is the "uncontroversial" bit, can this > part be accepted as it stands? :-) FWIW, I'm +1 on this. Enhanced Iterators * updates the iterator protocol to use .__next__() instead of .next() * introduces a new builtin next() * allows continue-statements to pass values to iterators * allows generators to receive values with a yield-expression The first two are, I believe, how the iterator protocol probably should have been in the first place. The second two provide a simple way of passing values to generators, something I got the impression that the co-routiney people would like a lot. STeVe -- You can wordify anything if you just verb it. --- Bucky Katt, Get Fuzzy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally
Guido van Rossum wrote: > try_stmt: 'try' ':' suite > ( > except_clause ':' suite)+ > ['else' ':' suite] ['finally' ':' suite] > | > 'finally' ':' suite > ) > > There is no real complexity in this grammar, it's unambiguous, it's an > easy enough job for the code generator, and it catches a certain class > of mistakes (like mis-indenting some code). Fair enough. Always nice to have some assistence from the system. --eric ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
Enhanced Iterators: ... > When the *initial* call to __next__() receives an argument > that is not None, TypeError is raised; this is likely caused > by some logic error. This made sense when the (Block) Iterators were Resources, and the first __next__() was just to trigger the setup. It makes less sense for general iterators. It is true that the first call in a generic for-loop couldn't pass a value (as it isn't continued), but I don't see anything wrong with explicit calls to __next__. Example: An agent which responds to the environment; the agent can execute multi-stage plans, or change its mind part way through. action = scheduler.__next__(current_sensory_input) -jJ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
On 5/6/05, Steven Bethard <[EMAIL PROTECTED]> wrote: > On 5/6/05, Paul Moore <[EMAIL PROTECTED]> wrote: > > On 5/6/05, Steven Bethard <[EMAIL PROTECTED]> wrote: > > > PEP: XXX > > > Title: Enhanced Iterators > > > > Strawman question - as this is the "uncontroversial" bit, can this > > part be accepted as it stands? :-) > > FWIW, I'm +1 on this. Enhanced Iterators > * updates the iterator protocol to use .__next__() instead of .next() > * introduces a new builtin next() > * allows continue-statements to pass values to iterators > * allows generators to receive values with a yield-expression > The first two are, I believe, how the iterator protocol probably > should have been in the first place. The second two provide a simple > way of passing values to generators, something I got the impression > that the co-routiney people would like a lot. At the same time it pretty much affects *only* the co-routiney people, so there's no hurry. I'd be happy with PEP 340 without all this too. I think one reason it ended up in that PEP is that an earlier version of the PEP called __next__() with an exception argument instead of having a separate__exit__() API. There's one alternative possible (still orthogonal to PEP 340): instead of __next__(), we could add an optional argument to the next() method, and forget about the next() built-in. This is more compatible (if less future-proof). Old iterators would raise an exception when their next() is called with an argument, and this would be a reasonable way to find out that you're using "continue EXPR" with an iterator that doesn't support it. (The C level API would be a bit hairier but it can all be done in a compatible way.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc (Another Attempt)
After reading through the code and the comments in this thread, I propose the following in the documentation as the definition of Py_UNICODE: "This type represents the storage type which is used by Python internally as the basis for holding Unicode ordinals. Extension module developers should make no assumptions about the size or native encoding of this type on any given platform." The main point here is that extension developers can not safely slam Py_UNICODE (which it appeared was true when the documentation stated that it was always 16-bits). I don't propose that we put this information in the doc, but the possible internal representations are: 2-byte wchar_t or unsigned short encoded as UTF-16 4-byte wchar_t encoded as UTF-32 (UCS-4) If you do not explicitly set the configure option, you cannot guarantee which you will get. Python also does not normalize the byte order of unicode strings passed into it from C (via PyUnicode_EncodeUTF16, for example), so it is possible to have UTF-16LE and UTF-16BE strings in the system at the same time, which is a bit confusing. This may or may not be worth a mention in the doc (or a patch). -- Nick ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 3:42 PM, James Y Knight wrote: > On May 6, 2005, at 2:49 PM, Nicholas Bastin wrote: >> If this is the case, then we're clearly misleading users. If the >> configure script says UCS-2, then as a user I would assume that >> surrogate pairs would *not* be encoded, because I chose UCS-2, and it >> doesn't support that. I would assume that any UTF-16 string I would >> read would be transcoded into the internal type (UCS-2), and >> information would be lost. If this is not the case, then what does >> the >> configure option mean? > > It means all the string operations treat strings as if they were > UCS-2, but that in actuality, they are UTF-16. Same as the case in the > windows APIs and Java. That is, all string operations are essentially > broken, because they're operating on encoded bytes, not characters, > but claim to be operating on characters. Well, this is a completely separate issue/problem. The internal representation is UTF-16, and should be stated as such. If the built-in methods actually don't work with surrogate pairs, then that should be fixed. -- Nick ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
> Enhanced Iterators: > > ... > > When the *initial* call to __next__() receives an argument > > that is not None, TypeError is raised; this is likely caused > > by some logic error. [Jim Jewett] > This made sense when the (Block) Iterators were Resources, > and the first __next__() was just to trigger the setup. > > It makes less sense for general iterators. > > It is true that the first call in a generic for-loop couldn't > pass a value (as it isn't continued), but I don't see anything > wrong with explicit calls to __next__. > > Example: An agent which responds to the environment; > the agent can execute multi-stage plans, or change its mind > part way through. > >action = scheduler.__next__(current_sensory_input) Good point. I'd be happy if the requirement that the first __next__() call doesn't have an argument (or that it's None) only applies to generators, and not to iterators in general. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > On May 6, 2005, at 3:42 PM, James Y Knight wrote: >>It means all the string operations treat strings as if they were >>UCS-2, but that in actuality, they are UTF-16. Same as the case in the >>windows APIs and Java. That is, all string operations are essentially >>broken, because they're operating on encoded bytes, not characters, >>but claim to be operating on characters. > > > Well, this is a completely separate issue/problem. The internal > representation is UTF-16, and should be stated as such. If the > built-in methods actually don't work with surrogate pairs, then that > should be fixed. Wait... are you saying a Py_UNICODE array contains either UTF-16 or UTF-32 characters, but never UCS-2? That's a big surprise to me. I may need to change my PyXPCOM patch to fit this new understanding. I tried hard to not care how Python encodes unicode characters, but details like this are important when combining two frameworks with different unicode APIs. Shane ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally
Guido van Rossum wrote: > > (to save typing, you can use an empty string or even > > put quotes around the exception name, but that may > > make it harder to spot the change) > > Yeah, but that will stop working in Python 3.0. well, I tend to remove my debugging hacks once I've fixed the bug. I definitely don't expect them to be compatible with hypothetical future releases... ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
At 01:18 PM 5/6/2005 -0700, Guido van Rossum wrote: >There's one alternative possible (still orthogonal to PEP 340): >instead of __next__(), we could add an optional argument to the next() >method, and forget about the next() built-in. This is more compatible >(if less future-proof). Old iterators would raise an exception when >their next() is called with an argument, and this would be a >reasonable way to find out that you're using "continue EXPR" with an >iterator that doesn't support it. (The C level API would be a bit >hairier but it can all be done in a compatible way.) +1. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
On Fri, 6 May 2005, Guido van Rossum wrote: > There's one alternative possible (still orthogonal to PEP 340): > instead of __next__(), we could add an optional argument to the next() > method, and forget about the next() built-in. I prefer your original proposal. I think this is a good time to switch to next(). If we are going to change the protocol, let's do it right. -- ?!ng ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 5:21 PM, Shane Hathaway wrote: > Nicholas Bastin wrote: >> On May 6, 2005, at 3:42 PM, James Y Knight wrote: >>> It means all the string operations treat strings as if they were >>> UCS-2, but that in actuality, they are UTF-16. Same as the case in >>> the >>> windows APIs and Java. That is, all string operations are essentially >>> broken, because they're operating on encoded bytes, not characters, >>> but claim to be operating on characters. >> >> >> Well, this is a completely separate issue/problem. The internal >> representation is UTF-16, and should be stated as such. If the >> built-in methods actually don't work with surrogate pairs, then that >> should be fixed. > > Wait... are you saying a Py_UNICODE array contains either UTF-16 or > UTF-32 characters, but never UCS-2? That's a big surprise to me. I > may > need to change my PyXPCOM patch to fit this new understanding. I tried > hard to not care how Python encodes unicode characters, but details > like > this are important when combining two frameworks with different unicode > APIs. Yes. Well, in as much as a large part of UTF-16 directly overlaps UCS-2, then sometimes unicode strings contain UCS-2 characters. However, characters which would not be legal in UCS-2 are still encoded properly in python, in UTF-16. And yes, I feel your pain, that's how I *got* into this position. Mapping from external unicode types is an important aspect of writing extension modules, and the documentation does not help people trying to do this. The fact that python's internal encoding is variable is a huge problem in and of itself, even if that was documented properly. This is why tools like Xerces and ICU will be happy to give you whatever form of unicode strings you want, but internally they always use UTF-16 - to avoid having to write two internal implementations of the same functionality. If you look up and down Objects/unicodeobject.c you'll see a fair amount of code written a couple of different ways (using #ifdef's) because of the variability in the internal representation. -- Nick ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > > On May 6, 2005, at 5:21 PM, Shane Hathaway wrote: >> Wait... are you saying a Py_UNICODE array contains either UTF-16 or >> UTF-32 characters, but never UCS-2? That's a big surprise to me. I may >> need to change my PyXPCOM patch to fit this new understanding. I tried >> hard to not care how Python encodes unicode characters, but details like >> this are important when combining two frameworks with different unicode >> APIs. > > > Yes. Well, in as much as a large part of UTF-16 directly overlaps > UCS-2, then sometimes unicode strings contain UCS-2 characters. > However, characters which would not be legal in UCS-2 are still encoded > properly in python, in UTF-16. > > And yes, I feel your pain, that's how I *got* into this position. > Mapping from external unicode types is an important aspect of writing > extension modules, and the documentation does not help people trying to > do this. The fact that python's internal encoding is variable is a huge > problem in and of itself, even if that was documented properly. This is > why tools like Xerces and ICU will be happy to give you whatever form of > unicode strings you want, but internally they always use UTF-16 - to > avoid having to write two internal implementations of the same > functionality. If you look up and down Objects/unicodeobject.c you'll > see a fair amount of code written a couple of different ways (using > #ifdef's) because of the variability in the internal representation. Ok. Thanks for helping me understand where Python is WRT unicode. I can work around the issues (or maybe try to help solve them) now that I know the current state of affairs. If Python correctly handled UTF-16 strings internally, we wouldn't need the UCS-4 configuration switch, would we? Shane ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > The important piece of information is that it is not guaranteed to be a > particular one of those sizes. Once you can't guarantee the size, no > one really cares what size it is. Please trust many years of experience: This is just not true. People do care, and they want to know. If we tell them "it depends", they ask "how can I find out". > The documentation should discourage > developers from attempting to manipulate Py_UNICODE directly, which, > other than trivia, is the only reason why someone would care what size > the internal representation is. Why is that? Of *course* people will have to manipulate Py_UNICODE* buffers directly. What else can they use? Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > I'm not sure the Python documentation is the place to teach someone > about unicode. The ISO 10646 pretty clearly defines UCS-2 as only > containing characters in the BMP (plane zero). On the other hand, I > don't know why python lets you choose UCS-2 anyhow, since it's almost > always not what you want. It certainly is, in most cases. On Windows, it is the only way to get reasonable interoperability with the platform's WCHAR (i.e. just cast a Py_UNICODE* into a WCHAR*). To a limited degree, in UCS-2 mode, Python has support for surrogate characters (e.g. in UTF-8 codec), so it is not "pure" UCS-2, but this is a minor issue. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 7:05 PM, Shane Hathaway wrote: > Nicholas Bastin wrote: > >> On May 6, 2005, at 5:21 PM, Shane Hathaway wrote: >> >>> Wait... are you saying a Py_UNICODE array contains either UTF-16 or >>> UTF-32 characters, but never UCS-2? That's a big surprise to >>> me. I may >>> need to change my PyXPCOM patch to fit this new understanding. I >>> tried >>> hard to not care how Python encodes unicode characters, but >>> details like >>> this are important when combining two frameworks with different >>> unicode >>> APIs. >> >> Yes. Well, in as much as a large part of UTF-16 directly overlaps >> UCS-2, then sometimes unicode strings contain UCS-2 characters. >> However, characters which would not be legal in UCS-2 are still >> encoded >> properly in python, in UTF-16. >> >> And yes, I feel your pain, that's how I *got* into this position. >> Mapping from external unicode types is an important aspect of writing >> extension modules, and the documentation does not help people >> trying to >> do this. The fact that python's internal encoding is variable is >> a huge >> problem in and of itself, even if that was documented properly. >> This is >> why tools like Xerces and ICU will be happy to give you whatever >> form of >> unicode strings you want, but internally they always use UTF-16 - to >> avoid having to write two internal implementations of the same >> functionality. If you look up and down Objects/unicodeobject.c >> you'll >> see a fair amount of code written a couple of different ways (using >> #ifdef's) because of the variability in the internal representation. >> > > Ok. Thanks for helping me understand where Python is WRT unicode. I > can work around the issues (or maybe try to help solve them) now > that I > know the current state of affairs. If Python correctly handled UTF-16 > strings internally, we wouldn't need the UCS-4 configuration switch, > would we? Personally I would rather see Python (3000) grow a new way to represent strings, more along the lines of the way it's typically done in Objective-C. I wrote a little bit about that works here: http://bob.pythonmac.org/archives/2005/04/04/pyobjc-and-unicode/ Effectively, instead of having One And Only One Way To Store Text, you would have one and only one base class (say basestring) that has some "virtual" methods that know how to deal with text. Then, you have several concrete implementations that implements those functions for its particular backing store (and possibly encoding, but that might be implicit with the backing store.. i.e. if its an ASCII, UCS-2 or UCS-4 backing store). Currently we more or less have this at the Python level, between str and unicode, but certainly not at the C API. -bob ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Shane Hathaway wrote: > Then something in the Python docs ought to say why UCS-2 is not what you > want. I still don't know; I've heard differing opinions on the subject. > Some say you'll never need more than what UCS-2 provides. Is that > incorrect? That clearly depends on who "you" is. > More generally, how should a non-unicode-expert writing Python extension > code find out the minimum they need to know about unicode to use the > Python unicode API? The API reference [1] ought to at least have a list > of background links. I had to hunt everywhere. That, of course, depends on what your background is. Did you know what Latin-1 is, when you started? How it relates to code page 1252? What UTF-8 is? What an abstract character is, as opposed to a byte sequence on the one hand, and to a glyph on the other hand? Different people need different background, especially if they are writing different applications. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > If this is the case, then we're clearly misleading users. If the > configure script says UCS-2, then as a user I would assume that > surrogate pairs would *not* be encoded, because I chose UCS-2, and it > doesn't support that. What do you mean by that? That the interpreter crashes if you try to store a low surrogate into a Py_UNICODE? > I would assume that any UTF-16 string I would > read would be transcoded into the internal type (UCS-2), and information > would be lost. If this is not the case, then what does the configure > option mean? It tells you whether you have the two-octet form of the Universal Character Set, or the four-octet form. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > Because the encoding of that buffer appears to be different depending on > the configure options. What makes it appear so? sizeof(Py_UNICODE) changes when you change the option - does that, in your mind, mean that the encoding changes? > If that isn't true, then someone needs to change > the doc, and the configure options. Right now, it seems *very* clear > that Py_UNICODE may either be UCS-2 or UCS-4 encoded if you read the > configure help, and you can't use the buffer directly if the encoding is > variable. However, you seem to be saying that this isn't true. It's a compile-time option (as all configure options). So at run-time, it isn't variable. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > No, that's not true. Python lets you choose UCS-4 or UCS-2. What the > default is depends on your platform. The truth is more complicated. If your Tcl is built for UCS-4, then Python will also be built for UCS-4 (unless overridden by command line). Otherwise, Python will default to UCS-2. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
M.-A. Lemburg wrote: > Hmm, looking at the configure.in script, it seems you're right. > I wonder why this weird dependency on TCL was added. If Python is configured for UCS-2, and Tcl for UCS-4, then Tkinter would not work out of the box. Hence the weird dependency. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 7:43 PM, Martin v. Löwis wrote: > Nicholas Bastin wrote: >> If this is the case, then we're clearly misleading users. If the >> configure script says UCS-2, then as a user I would assume that >> surrogate pairs would *not* be encoded, because I chose UCS-2, and it >> doesn't support that. > > What do you mean by that? That the interpreter crashes if you try > to store a low surrogate into a Py_UNICODE? What I mean is pretty clear. UCS-2 does *NOT* support surrogate pairs. If it did, it would be called UTF-16. If Python really supported UCS-2, then surrogate pairs from UTF-16 inputs would either get turned into two garbage characters, or the "I couldn't transcode this" UCS-2 code point (I don't remember which on that is off the top of my head). >> I would assume that any UTF-16 string I would >> read would be transcoded into the internal type (UCS-2), and >> information >> would be lost. If this is not the case, then what does the configure >> option mean? > > It tells you whether you have the two-octet form of the Universal > Character Set, or the four-octet form. It would, if that were the case, but it's not. Setting UCS-2 in the configure script really means UTF-16, and as such, the documentation should reflect that. -- Nick ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 7:45 PM, Martin v. Löwis wrote: > Nicholas Bastin wrote: >> Because the encoding of that buffer appears to be different depending >> on >> the configure options. > > What makes it appear so? sizeof(Py_UNICODE) changes when you change > the option - does that, in your mind, mean that the encoding changes? Yes. Not only in my mind, but in the Python source code. If Py_UNICODE is 4 bytes wide, then the encoding is UTF-32 (UCS-4), otherwise the encoding is UTF-16 (*not* UCS-2). >> If that isn't true, then someone needs to change >> the doc, and the configure options. Right now, it seems *very* clear >> that Py_UNICODE may either be UCS-2 or UCS-4 encoded if you read the >> configure help, and you can't use the buffer directly if the encoding >> is >> variable. However, you seem to be saying that this isn't true. > > It's a compile-time option (as all configure options). So at run-time, > it isn't variable. What I mean by 'variable' is that you can't make any assumption as to what the size will be in any given python when you're writing (and building) an extension module. This breaks binary compatibility of extensions modules on the same platform and same version of python across interpreters which may have been built with different configure options. -- Nick ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Shane Hathaway wrote: > Ok. Thanks for helping me understand where Python is WRT unicode. I > can work around the issues (or maybe try to help solve them) now that I > know the current state of affairs. If Python correctly handled UTF-16 > strings internally, we wouldn't need the UCS-4 configuration switch, > would we? Define correctly. Python, in ucs2 mode, will allow to address individual surrogate codes, e.g. in indexing. So you get >>> u"\U00012345"[0] u'\ud808' This will never work "correctly", and never should, because an efficient implementation isn't possible. If you want "safe" indexing and slicing, you need ucs4. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > What I mean is pretty clear. UCS-2 does *NOT* support surrogate pairs. > If it did, it would be called UTF-16. If Python really supported > UCS-2, then surrogate pairs from UTF-16 inputs would either get turned > into two garbage characters, or the "I couldn't transcode this" UCS-2 > code point (I don't remember which on that is off the top of my head). OTOH, if Python really supported UTF-16, then unichr(0x1) would work, and len(u"\U0001") would be 1. It is primarily just the UTF-8 codec which supports UTF-16. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > Yes. Not only in my mind, but in the Python source code. If > Py_UNICODE is 4 bytes wide, then the encoding is UTF-32 (UCS-4), > otherwise the encoding is UTF-16 (*not* UCS-2). I see. Some people equate "encoding" with "encoding scheme"; neither UTF-32 nor UTF-16 is an encoding scheme. You were apparently talking about encoding forms. > What I mean by 'variable' is that you can't make any assumption as to > what the size will be in any given python when you're writing (and > building) an extension module. This breaks binary compatibility of > extensions modules on the same platform and same version of python > across interpreters which may have been built with different configure > options. True. The breakage will be quite obvious, in most cases: the module fails to load because not only sizeof(Py_UNICODE) changes, but also the names of all symbols change. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > Well, this is a completely separate issue/problem. The internal > representation is UTF-16, and should be stated as such. If the > built-in methods actually don't work with surrogate pairs, then that > should be fixed. Yes to the former, no to the latter. PEP 261 specifies what should and shouldn't work. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)
PEP 340 contains several different ideas. This rewrite separates them into five
major areas:
- passing data into an iterator
- finalising iterators
- integrating finalisation into for loops
- the new non-looping finalising statement
- integrating all of these with generators.
The first area has nothing to do with finalisation, so it is not included in
this rewrite (Steven Bethard wrote an Enhanced Iterators pre-PEP which covers
only that area, though).
The whole PEP draft can be found here:
http://members.iinet.net.au/~ncoghlan/public/pep-3XX.html
But I've inlined some examples that differ from or aren't in PEP 340 for those
that don't have time to read the whole thing (example numbers are from the PEP):
4. A template that tries something up to n times::
def auto_retry(n=3, exc=Exception):
for i in range(n):
try:
yield
except exc, err:
# perhaps log exception here
yield
raise # re-raise the exception we caught earlier
Used as follows::
for del auto_retry(3, IOError):
f = urllib.urlopen("http://python.org/";)
print f.read()
6. It is easy to write a regular class with the semantics of example 1::
class locking:
def __init__(self, lock):
self.lock = lock
def __enter__(self):
self.lock.acquire()
def __exit__(self, type, value=None, traceback=None):
self.lock.release()
if type is not None:
raise type, value, traceback
(This example is easily modified to implement the other examples; it shows that
generators are not always the simplest way to do things.)
8. Find the first file with a specific header::
for name in filenames:
stmt opening(name) as f:
if f.read(2) == 0xFEB0: break
9. Find the first item you can handle, holding a lock for the entire loop, or
just for each iteration::
stmt locking(lock):
for item in items:
if handle(item): break
for item in items:
stmt locking(lock):
if handle(item): break
10. Hold a lock while inside a generator, but release it when returning control
to the outer scope::
stmt locking(lock):
for item in items:
stmt unlocking(lock):
yield item
Cheers,
Nick.
--
Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia
---
http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
Guido van Rossum wrote: [SNIP] > There's one alternative possible (still orthogonal to PEP 340): > instead of __next__(), we could add an optional argument to the next() > method, and forget about the next() built-in. This is more compatible > (if less future-proof). Old iterators would raise an exception when > their next() is called with an argument, and this would be a > reasonable way to find out that you're using "continue EXPR" with an > iterator that doesn't support it. (The C level API would be a bit > hairier but it can all be done in a compatible way.) > I prefer the original proposal. -Brett ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 8:25 PM, Martin v. Löwis wrote: > Nicholas Bastin wrote: >> Yes. Not only in my mind, but in the Python source code. If >> Py_UNICODE is 4 bytes wide, then the encoding is UTF-32 (UCS-4), >> otherwise the encoding is UTF-16 (*not* UCS-2). > > I see. Some people equate "encoding" with "encoding scheme"; > neither UTF-32 nor UTF-16 is an encoding scheme. You were That's not true. UTF-16 and UTF-32 are both CES and CEF (although this is not true of UTF-16LE and BE). UTF-32 is a fixed-width encoding form within a code space of (0..10) and UTF-16 is a variable-width encoding form which provides a mix of one of two 16-bit code units in the code space of (0..). However, you are perhaps right to point out that people should be more explicit as to which they are referring to. UCS-2, however, is only a CEF, and thus I thought it was obvious that I was referring to UTF-16 as a CEF. I would point anyone who is confused as this point to Unicode Technical Report #17 on the Character Encoding Model, which is much more clear than trying to piece together the relevant parts out of the entire standard. In any event, Python's use of the term UCS-2 is incorrect. I quote from the TR: "The UCS-2 encoding form, which is associated with ISO/IEC 10646 and can only express characters in the BMP, is a fixed-width encoding form." immediately followed by: "In contrast, UTF-16 uses either one or two code units and is able to cover the entire code space of Unicode." If Python is capable of representing the entire code space of Unicode when you choose --unicode=ucs2, then that is a bug. It either should not be called UCS-2, or the interpreter should be bound by the limitations of the UCS-2 CEF. >> What I mean by 'variable' is that you can't make any assumption as to >> what the size will be in any given python when you're writing (and >> building) an extension module. This breaks binary compatibility of >> extensions modules on the same platform and same version of python >> across interpreters which may have been built with different configure >> options. > > True. The breakage will be quite obvious, in most cases: the module > fails to load because not only sizeof(Py_UNICODE) changes, but also > the names of all symbols change. Yes, but the important question here is why would we want that? Why doesn't Python just have *one* internal representation of a Unicode character? Having more than one possible definition just creates problems, and provides no value. -- Nick ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 6, 2005, at 8:11 PM, Martin v. Löwis wrote: > Nicholas Bastin wrote: >> Well, this is a completely separate issue/problem. The internal >> representation is UTF-16, and should be stated as such. If the >> built-in methods actually don't work with surrogate pairs, then that >> should be fixed. > > Yes to the former, no to the latter. PEP 261 specifies what should > and shouldn't work. This PEP has several textual errors and ambiguities (which, admittedly, may have been a necessary state given the unicode standard in 2001). However, putting that aside, I would recommend that: --enable-unicode=ucs2 be replaced with: --enable-unicode=utf16 and the docs be updated to reflect more accurately the variance of the internal storage type. I would also like the community to strongly consider standardizing on a single internal representation, but I will leave that fight for another day. -- Nick ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
On 5/6/05, Steven Bethard <[EMAIL PROTECTED]> wrote:
> FWIW, I'm +1 on this. Enhanced Iterators
> * updates the iterator protocol to use .__next__() instead of .next()
> * introduces a new builtin next()
> * allows continue-statements to pass values to iterators
> * allows generators to receive values with a yield-expression
> The first two are, I believe, how the iterator protocol probably
> should have been in the first place. The second two provide a simple
> way of passing values to generators, something I got the impression
> that the co-routiney people would like a lot.
Thank you for splitting the PEP. Conceptually, the "coroutine" part
has nothing to do with blocks and it stands on its own, it is right
to discuss it separately from the block syntax.
Personally, I do not see an urgent need for the block syntax (most of
the use case can be managed with decorators) nor for the "couroutine"
syntax (you can already use Armin Rigo's greenlets for that).
Anyway, the idea of passing arguments to generators is pretty cool,
here is some code I have, adapted from Armin's presentation at the
ACCU conference:
from py.magic import greenlet
def yield_(*args):
return greenlet.getcurrent().parent.switch(*args)
def send(key):
return process_commands.switch(key)
@greenlet
def process_commands():
while True:
line = ''
while not line.endswith('\n'):
line += yield_()
print line,
if line == 'quit\n':
print "are you sure?"
if yield_() == 'y':
break
process_commands.switch() # start the greenlet
send("h")
send("e")
send("l")
send("l")
send("o")
send("\n")
send("q")
send("u")
send("i")
send("t")
send("\n")
Michele Simionato
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
