Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread M.-A. Lemburg
Nicholas Bastin wrote:
> On May 4, 2005, at 6:20 PM, Shane Hathaway wrote:
> 
>>>Nicholas Bastin wrote:
>>>
>>>
"This type represents the storage type which is used by Python
internally as the basis for holding Unicode ordinals.  Extension 
module
developers should make no assumptions about the size of this type on
any given platform."
>>>
>>>
>>>But people want to know "Is Python's Unicode 16-bit or 32-bit?"
>>>So the documentation should explicitly say "it depends".
>>
>>On a related note, it would be help if the documentation provided a
>>little more background on unicode encoding.  Specifically, that UCS-2 
>>is
>>not the same as UTF-16, even though they're both two bytes wide and 
>>most
>>of the characters are the same.  UTF-16 can encode 4 byte characters,
>>while UCS-2 can't.  A Py_UNICODE is either UCS-2 or UCS-4.  It took me
> 
> I'm not sure the Python documentation is the place to teach someone 
> about unicode.  The ISO 10646 pretty clearly defines UCS-2 as only 
> containing characters in the BMP (plane zero).  On the other hand, I 
> don't know why python lets you choose UCS-2 anyhow, since it's almost 
> always not what you want.

You've got that wrong: Python let's you choose UCS-4 -
UCS-2 is the default.

Note that Python's Unicode codecs UTF-8 and UTF-16
are surrogate aware and thus support non-BMP code points
regardless of the build type: A UCS2-build of Python will
store a non-BMP code point as UTF-16 surrogate pair in the
Py_UNICODE buffer while a UCS4 build will store it as a
single value. Decoding is surrogate aware too, so a UTF-16
surrogate pair in a UCS2 build will get treated as single
Unicode code point.

Ideally, the Python programmer should not really need to
know all this and I think we've achieved that up to certain
point (Unicode can be complicated - there's nothing to hide there).
However, the C progammer using the Python C API to interface
to some other Unicode implementation will need to know these
details.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 06 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread M.-A. Lemburg
Fredrik Lundh wrote:
> Thomas Heller wrote:
> 
> 
>>AFAIK, you can configure Python to use 16-bits or 32-bits Unicode chars,
>>independend from the size of wchar_t.  The HAVE_USABLE_WCHAR_T macro
>>can be used by extension writers to determine if Py_UNICODE is the same as
>>wchar_t.
> 
> 
> note that "usable" is more than just "same size"; it also implies that 
> widechar
> predicates (iswalnum etc) works properly with Unicode characters, under all
> locales.

Only if you intend to use --with-wctypes; a configure option which
will go away soon (for exactly the reason you are referring to: the
widechar predicates don't work properly under all locales).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 06 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread M.-A. Lemburg
Nicholas Bastin wrote:
> On May 4, 2005, at 6:03 PM, Martin v. Löwis wrote:
> 
> 
>>Nicholas Bastin wrote:
>>
>>>"This type represents the storage type which is used by Python
>>>internally as the basis for holding Unicode ordinals.  Extension 
>>>module
>>>developers should make no assumptions about the size of this type on
>>>any given platform."
>>
>>But people want to know "Is Python's Unicode 16-bit or 32-bit?"
>>So the documentation should explicitly say "it depends".
> 
> 
> The important piece of information is that it is not guaranteed to be a 
> particular one of those sizes.  Once you can't guarantee the size, no 
> one really cares what size it is.  The documentation should discourage 
> developers from attempting to manipulate Py_UNICODE directly, which, 
> other than trivia, is the only reason why someone would care what size 
> the internal representation is.

I don't see why you shouldn't use Py_UNICODE buffer directly.
After all, the reason why we have that typedef is to make it
possible to program against an abstract type - regardless of
its size on the given platform.

In that respect it is similar to wchar_t (and all the other
*_t typedefs in C).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 06 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Breaking out.

2005-05-06 Thread Paul Moore
On 5/5/05, Steven Bethard <[EMAIL PROTECTED]> wrote:
> On 5/5/05, Paul Moore <[EMAIL PROTECTED]> wrote:
> > And does your proposal allow for "continue EXPR" as supported by PEP
> > 340? I can't see that it could, given that your proposal treats block
> > statements as not being loops.
> 
> Read PEP 340 again -- the "continue EXPR" syntax is orthogonal to the
> discussion -- PEP 340 adds it for *all* for loops, so for loops with
> the non-looping block statements would also be able to use it.

I know this. But we're talking here about Nick's new proposal for a
non-looping block. All I am saying is that the new proposal needs to
include this orthogonal feature. If it's a modification to PEP 340,
that will come naturally. If it's a modification to PEP 310, it won't.
A new PEP needs to include it.

I am very much against picking bits out of a number of PEPs - that was
implicit in my earlier post - sorry, I should have made it explicit.
Specifically, PEP 340 should be accepted (possibly with modifications)
as a whole, or rejected outright - no "rejected, but can we have
continue EXPR in any case, as it's orthogonal" status exists...

> > The looping behaviour is a (fairly nasty) wart, but I'm not sure I
> > would insist on removing it at the cost of damaging other features I
> > like.
> 
> I don't think it "damages" any features.  Are there features you still
> think the non-looping proposal removes?  (I'm not counting orthogonal
> feautres like "continue EXPR" which could easily be added as an
> entirely separate PEP.)

I *am* specifically referring to these "orthogonal" features. Removal
of looping by modification of PEP 340 will do no such "damage", I
agree - but removal by accepting an updated PEP 310, or a new PEP,
*will* (unless the "entirely separate PEP" you mention is written and
accepted along with the non-looping PEP - and I don't think that will
happen).

Thanks for making me clarify what I meant. I left a little too much
implicit in my previous post.

Paul.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Examples as class's.

2005-05-06 Thread Ron Adam
Ron Adam wrote:


A minor correction to the Block class due to re-editing.

> def __call__(self, *args):
> self.block(*args)
> self.__del__()

This should have been.

 def __call__(self, *args):
 try:
 self.block(*args)
 except Exception, self.__err__:
 pass
 self.__del__()

Which catches the error in the overriden "block", (I need to change that 
to say "body"), method so it can be re-raised after the "final" method 
is run. The "final" method can handle it if it chooses.

Thanks to Jim Jewett for noticing. It should make more sense now.  :-)


In example (1.), Lock_It lost a carriage return. It should be.

 class Lock_It(Get_Lock):
 def block(self):
 print "Do stuff while locked"

 Lock_It(mylock())()


And example (3.) should be, although it may not run as is...

## 3. A template for committing or rolling back a database:
class Transactional(Block):
 def start(self, db):
 self.db = db
self.cursor = self.db.cursor()
 def final(self):
 if self.__err__:
 self.db.rollback()
 print "db rolled back due to err:", __err__
 self.__err__ = None
 else:
 db.commit()
 def block(self, batch):
 for statement in batch:
 self.cursor.execute(statement)

statement_batch = [
 "insert into PEP340 values ('Guido','BDFL')",
 "insert into PEP340 values ('More examples are needed')"]
db = pgdb.connect(dsn = 'localhost:pythonpeps')
Transactional(db)(statement_batch)
disconnect(db)

Another Block class could be used for connecting and disconecting.


Cheers, Ron_Adam









___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Breaking out.

2005-05-06 Thread Paul Moore
On 5/6/05, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Seems to me it should be up to the block iterator whether
> a break statement gets caught or propagated, since it's
> up to the block iterator whether the construct behaves
> like a loop or not.
> 
> This could be achieved by having a separate exception
> for breaks, as originally proposed.
> 
> If the iterator propagates the Break exception back out,
> the block statement should break any enclosing loop.
> If the iterator wants to behave like a loop, it can
> catch the Break exception and raise StopIteration
> instead.

Yes, that's exactly what I was trying to say! I don't know if it's
achievable in practice, but the fact that it was in the original
proposal (something I'd forgotten, if indeed I ever realised) makes it
seem more likely to me.

Paul.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Breaking out.

2005-05-06 Thread Nicolas Fleury
Paul Moore wrote:
> On 5/5/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
>>   2. Manual protocol implementations are _significantly_ easier to write
> 
> Hmm, I've not tried so I'll have to take your word for this. But I
> don't imagine writing manual implementations much - one of the key
> features I like about Guido's proposal is that generators can be used,
> and the implementation is a clear template, with "yield" acting as a
> "put the block here" marker (yes, I know that's an
> oversimplification!).

If using a generator is easier to code (but I tend to agree with Nick), 
a new type, a one-shot-generator (not really a generator, but some type 
of continuation), as suggested Steven Bethard with stmt, could be created:

 def opening(filename, mode="r"):
 f = open(filename, mode)
 try:
 yield break f
 finally:
 f.close()

I prefer Nick's proposal however, since it simplifies non-looping 
constructs (no generator-template, break of parent loop supported), 
while leaving looping constructs (a minority in IMO) possible using a 
for, making things even clearer to me (but harder to implement).  I'm 
still not convinced at all that using generators to implement a 
acquire/release pattern is a good idea...

Regards,
Nicolas

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Breaking out.

2005-05-06 Thread Michael Hudson
Paul Moore <[EMAIL PROTECTED]> writes:

> On 5/5/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
>> Well, Michael Hudson and Paul Moore are the current authors of PEP 310, so
>> updating it with any of my ideas would be their call.
>
> I'm willing to consider an update - I don't know Michael's view. 

I'd slightly prefer PEP 310 to remain a very simple proposal, but
don't really have the energy to argue with someone who thinks
rewriting it makes more sense than creating a new PEP.

Cheers,
mwh

-- 
  Solaris: Shire horse that dreams of being a race horse,
  blissfully unaware that its owners don't quite know whether
  to put it out to grass, to stud, or to the knackers yard.
   -- Jim's pedigree of operating systems, asr
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Non-looping version (aka PEP 310 redux)

2005-05-06 Thread Toby Dickenson
On Thursday 05 May 2005 16:03, Nick Coghlan wrote:
> The discussion on the meaning of break when nesting a PEP 340 block
> statement inside a for loop has given me some real reasons to prefer PEP
> 310's single pass  semantics for user defined statements 

That also solves a problem with resource acquisition block generators that I 
hadnt been able to articulate until now. What about resources whose lifetimes 
are more complex than a lexical block, where you cant use a block statement? 
It seems quite natural for code that want to manage its own resources to call 
__enter__ and __exit__ directly. Thats not true of the block generator API.



-- 
Toby Dickenson
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] my first post: asking about a "decorator" module

2005-05-06 Thread Michele Simionato
On 5/5/05, Raymond Hettinger <[EMAIL PROTECTED]> wrote:
> 
> Yes, there has been quite a bit of interest including several ASPN
> recipes and a wiki:
> 
>http://www.python.org/moin/PythonDecoratorLibrary

Thanks, I didn't know about that page. BTW, I notice that all the decorators
in that page are improper, in the sense that they change the signature of
the function they decorate. So, all those recipes would need some help
from my decorator module, to make them proper ;-)

http://www.phyast.pitt.edu/~micheles/python/decorator.zip
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Breaking out.

2005-05-06 Thread Nicolas Fleury
Guido van Rossum wrote:
>>Maybe generators are not the way to go, but could be
>>supported natively by providing a __block__ function, very similarly to
>>sequences providing an __iter__ function for for-loops?
> 
> Sorry, I have no idea what you are proposing here.

I was suggesting that the feature could be a PEP310-like object and that 
a __block__ function (or whatever) of the generator could return such an 
object.  But at this point, Nick's proposition is what I prefer.  I find 
the use of generators very elegant, but I'm still unconvinced it is a 
good idea to use them to implement an acquire/release pattern.  Even if 
another continuation mechanism would be used (like Steven's idea), it 
would still be a lot of concepts used to implement acquire/release.

Regards,
Nicolas

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] my first post: asking about a "decorator" module

2005-05-06 Thread Raymond Hettinger
> > Yes, there has been quite a bit of interest including several ASPN
> > recipes and a wiki:
> >
> >http://www.python.org/moin/PythonDecoratorLibrary
> 
> Thanks, I didn't know about that page. BTW, I notice that all the
> decorators
> in that page are improper, in the sense that they change the signature
of
> the function they decorate. 

Signature changing and signature preserving are probably better
classifications than proper and improper.  Even then, some decorators
like atexit() and classmethod() may warrant their own special
categories.


Raymond

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The decorator module

2005-05-06 Thread Michele Simionato
On 5/6/05, Jim Jewett <[EMAIL PROTECTED]> wrote:
> Thank you; this is very good.
> 
> I added a link to it from http://www.python.org/moin/PythonDecoratorLibrary;
> please also consider adding a version number and publishing via PyPI.

Yes, this was in my plans. For the moment,  however, this is just version 0.1,
I want to wait a bit before releasing an official release.

> Incidentally, would the resulting functions be a bit faster if you compiled
> the lambda instead of repeatedly eval ing it, or does the eval overhead still
> apply?
> 
> -jJ
> 

Honestly, I don't care, since "eval" happens only once at decoration time.
There is no "eval" overhead at calling time, so I do not expect to have
problems. I am waiting for volunteers to perform profiling and
performance analysis ;)

Michele Simionato
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally

2005-05-06 Thread Guido van Rossum
[Nick Coghlan]
> 
> > What does a try statement with neither an except clause nor a finally 
> > clause mean?

[Greg Ewing]
> I guess it would mean the same as
> 
>if 1:
>  ...
> 
> Not particularly useful, but maybe it's not worth complexifying
> the grammar just for the sake of disallowing it.
> 
> Also, some people might find it useful for indenting a block
> of code for cosmetic reasons, although that could easily
> be seen as an abuse...

I strongly disagree with this. It should be this:

try_stmt: 'try' ':' suite
(
except_clause ':' suite)+
['else' ':' suite] ['finally' ':' suite]
|
'finally' ':' suite
)

There is no real complexity in this grammar, it's unambiguous, it's an
easy enough job for the code generator, and it catches a certain class
of mistakes (like mis-indenting some code).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The decorator module

2005-05-06 Thread Guido van Rossum
[jJ]
> > Incidentally, would the resulting functions be a bit faster if you compiled
> > the lambda instead of repeatedly eval ing it, or does the eval overhead 
> > still
> > apply?

[Michele]
> Honestly, I don't care, since "eval" happens only once at decoration time.
> There is no "eval" overhead at calling time, so I do not expect to have
> problems. I am waiting for volunteers to perform profiling and
> performance analysis ;)

Watch out. I didn't see the code referred to, but realize that eval is
*very* expensive on some other implementations of Python (Jython and
IronPython). Eval should only be used if there is actual user-provided
input that you don't know yet when your module is compiled; not to get
around some limitation in the language there are usually ways around
that, and occasionally we add one, e.g. getattr()).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The decorator module

2005-05-06 Thread Michele Simionato
On 5/6/05, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> [Michele]
> > Honestly, I don't care, since "eval" happens only once at decoration time.
> > There is no "eval" overhead at calling time, so I do not expect to have
> > problems. I am waiting for volunteers to perform profiling and
> > performance analysis ;)
> 
> Watch out. I didn't see the code referred to, but realize that eval is
> *very* expensive on some other implementations of Python (Jython and
> IronPython). Eval should only be used if there is actual user-provided
> input that you don't know yet when your module is compiled; not to get
> around some limitation in the language there are usually ways around
> that, and occasionally we add one, e.g. getattr()).

I actually posted the code on c.l.p. one month ago asking if there was
a way to avoid "eval", but I had no answer. So, let me repost the code
here and see if somebody comes out with a good solution.
It is only ~30 lines long (+ ~30 of comments & docstrings)

## I suggest you uncomment the 'print lambda_src' statement in _decorate
## to understand what is going on.

import inspect

def _signature_gen(func, rm_defaults=False):
argnames, varargs, varkwargs, defaults = inspect.getargspec(func)
argdefs = defaults or ()
n_args = func.func_code.co_argcount
n_default_args = len(argdefs)
n_non_default_args = n_args - n_default_args
non_default_names = argnames[:n_non_default_args]
default_names = argnames[n_non_default_args:]
for name in non_default_names:
yield "%s" % name
for i, name in enumerate(default_names):
if rm_defaults:
yield name
else:
yield "%s = arg[%s]" % (name, i) 
if varargs:
yield "*%s" % varargs
if varkwargs:
yield "**%s" % varkwargs

def _decorate(func, caller):
signature = ", ".join(_signature_gen(func))
variables = ", ".join(_signature_gen(func, rm_defaults=True))   
lambda_src = "lambda %s: call(func, %s)" % (signature, variables)
# print lambda_src # for debugging
evaldict = dict(func=func, call=caller, arg=func.func_defaults or ())
dec_func = eval(lambda_src, evaldict)
dec_func.__name__ = func.__name__
dec_func.__doc__ = func.__doc__
dec_func.__dict__ = func.__dict__ # copy if you want to avoid sharing
return dec_func

class decorator(object):
"""General purpose decorator factory: takes a caller function as
input and returns a decorator. A caller function is any function like this:

def caller(func, *args, **kw):
# do something
return func(*args, **kw)

Here is an example of usage:

>>> @decorator
... def chatty(f, *args, **kw):
... print "Calling %r" % f.__name__
... return f(*args, **kw)
>>> @chatty
... def f(): pass
>>> f()
Calling 'f'
"""
def __init__(self, caller):
self.caller = caller
def __call__(self, func):
return _decorate(func, self.caller)

Michele Simionato
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340 - Remaining issues - keyword

2005-05-06 Thread Guido van Rossum
[Greg Ewing]
> How about 'do'?
> 
>do opening(filename) as f:
>  ...
> 
>do locking(obj):
>  ...
> 
>do carefully(): # :-)
>  ...

I've been thinking of that too. It's short, and in a nostalgic way
conveys that it's a loop, without making it too obvious. (Those too
young to get that should Google for do-loop. :-)

I wonder how many folks call their action methods do() though.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340 - Remaining issues - keyword

2005-05-06 Thread Tim Peters
[Guido]
> ...
> I wonder how many folks call their action methods do() though.

A little Google(tm)-ing suggests it's not all that common, although it
would break Zope on NetBSD:

http://www.zope.org/Members/tino/ZopeNetBSD

I can live with that .
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340 -- Clayton's keyword?

2005-05-06 Thread Guido van Rossum
[Greg Ewing]
> How about user-defined keywords?
> 
> Suppose you could write
> 
>statement opening
> 
>def opening(path, mode):
>  f = open(path, mode)
>  try:
>yield
>  finally:
>close(f)
> 
> which would then allow
> 
>opening "myfile", "w" as f:
>  do_something_with(f)
[etc.]

This one is easy to reject outright:

- I have no idea how that would be implemented, especially since you
propose allowing to use the newly minted keyword as the target of an
import. I'm sure it can be done, but it would be a major departure
from the current parser/lexer separation and would undoubtedly be an
extra headache for Jython and IronPython, which use standard
components for their parsing.

- It doesn't seem to buy you much -- just dropping two parentheses.

- I don't see how it would handle the case where the block-controller
is a method call or something else beyond a simple identifier.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally

2005-05-06 Thread François Pinard
[Guido van Rossum]
> [Nick Coghlan]

> > > What does a try statement with neither an except clause nor a
> > > finally clause mean?

> [Greg Ewing]
>
> > I guess it would mean the same as

> >if 1:
> >  ...

> I strongly disagree with this.  [...]

Allow me a quick comment on this issue.

It happens once in a while that I want to comment out the except clauses
of a try statement, when I want the traceback of the inner raising, for
debugging purposes.  Syntax forces me to also comment the `try:' line,
and indent out the lines following the `try:' line.  And of course, the
converse operation once debugging is done.  This is slightly heavy.

At a few places, Python is helpful for such editorial things, for
example, allowing a spurious trailing comma at end of lists, dicts,
tuples. `pass' is also useful as a place holder for commented code.

At least, the new proposed syntax would allow for some:

 finally:
 pass

addendum when commenting except clauses, simplifying the editing job for
the `try:' line and those following.


P.S. - Another detail, while on this subject.  On the first message I've read
on this topic, the original poster wrote something like:

f = None
try:
f = action1(...)
...
finally:
if f is not None:
action2(f)

The proposed syntax did not repeat this little part about "None", quoted
above, so suggesting an over-good feeling about syntax efficiency.
While nice, the syntax still does not solve this detail, which occurs
frequently in my experience.  Oh, I do not have solutions to offer, but
it might be worth a thought from the mighty thinkers of this list :-)

-- 
François Pinard   http://pinard.progiciels-bpi.ca
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally

2005-05-06 Thread Guido van Rossum
[François Pinard]
> It happens once in a while that I want to comment out the except clauses
> of a try statement, when I want the traceback of the inner raising, for
> debugging purposes.  Syntax forces me to also comment the `try:' line,
> and indent out the lines following the `try:' line.  And of course, the
> converse operation once debugging is done.  This is slightly heavy.

I tend to address this by substituting a different exception. I don't
see the use case common enough to want to allow dangling try-suites.

> P.S. - Another detail, while on this subject.  On the first message I've read
> on this topic, the original poster wrote something like:
> 
> f = None
> try:
> f = action1(...)
> ...
> finally:
> if f is not None:
> action2(f)
> 
> The proposed syntax did not repeat this little part about "None", quoted
> above, so suggesting an over-good feeling about syntax efficiency.
> While nice, the syntax still does not solve this detail, which occurs
> frequently in my experience.  Oh, I do not have solutions to offer, but
> it might be worth a thought from the mighty thinkers of this list :-)

I don't understand your issue here. What is the problem with that
code? Perhaps it ought to be rewritten as

f = action1()
try:
...
finally:
action2(f)

I can't see how this would ever do something different than your version.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally

2005-05-06 Thread François Pinard
[Guido van Rossum]

> [François Pinard]
>
> > It happens once in a while that I want to comment out the except
> > clauses of a try statement, when I want the traceback of the inner
> > raising, for debugging purposes.  Syntax forces me to also comment
> > the `try:' line, and indent out the lines following the `try:' line.
> > And of course, the converse operation once debugging is done.  This
> > is slightly heavy.

> I tend to address this by substituting a different exception. I don't
> see the use case common enough to want to allow dangling try-suites.

Quite agreed.  I just wanted to tell there was a need.

> > P.S. - Another detail, while on this subject.  On the first message
> > I've read on this topic, the original poster wrote something like:

> > f = None
> > try:
> > f = action1(...)
> > ...
> > finally:
> > if f is not None:
> > action2(f)

> > The proposed syntax did not repeat this little part about "None",
> > quoted above, so suggesting an over-good feeling about syntax
> > efficiency.  While nice, the syntax still does not solve this
> > detail, which occurs frequently in my experience.  Oh, I do not have
> > solutions to offer, but it might be worth a thought from the mighty
> > thinkers of this list :-)

> I don't understand your issue here. What is the problem with that
> code? Perhaps it ought to be rewritten as

> f = action1()
> try:
> ...
> finally:
> action2(f)

> I can't see how this would ever do something different than your version.

Oh, the problem is that if `action1()' raises an exception (and this is
why it has to be within the `try', not before), `f' will not receive
a value, and so, may not be initialised in all cases.  The (frequent)
stunt is a guard so this never becomes a problem.

-- 
François Pinard   http://pinard.progiciels-bpi.ca
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally

2005-05-06 Thread Reinhold Birkenfeld
Guido van Rossum wrote:
> [François Pinard]
>> It happens once in a while that I want to comment out the except clauses
>> of a try statement, when I want the traceback of the inner raising, for
>> debugging purposes.  Syntax forces me to also comment the `try:' line,
>> and indent out the lines following the `try:' line.  And of course, the
>> converse operation once debugging is done.  This is slightly heavy.
> 
> I tend to address this by substituting a different exception. I don't
> see the use case common enough to want to allow dangling try-suites.

Easy enough, adding "raise" at the top of the except clause also solves the 
problem.

>> P.S. - Another detail, while on this subject.  On the first message I've read
>> on this topic, the original poster wrote something like:
>> 
>> f = None
>> try:
>> f = action1(...)
>> ...
>> finally:
>> if f is not None:
>> action2(f)
>> 
>> The proposed syntax did not repeat this little part about "None", quoted
>> above, so suggesting an over-good feeling about syntax efficiency.
>> While nice, the syntax still does not solve this detail, which occurs
>> frequently in my experience.  Oh, I do not have solutions to offer, but
>> it might be worth a thought from the mighty thinkers of this list :-)
> 
> I don't understand your issue here. What is the problem with that
> code? Perhaps it ought to be rewritten as
> 
> f = action1()
> try:
> ...
> finally:
> action2(f)
> 
> I can't see how this would ever do something different than your version.

Well, in the original the call to action1 was wrapped in an additional 
try-except
block.

f = None
try:
try:
f = action1()
except:
print "error"
finally:
if f is not None:
action2(f)


Reinhold


-- 
Mail address is perfectly valid!

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally

2005-05-06 Thread Fredrik Lundh
François Pinard wrote:

> It happens once in a while that I want to comment out the except clauses
> of a try statement, when I want the traceback of the inner raising, for
> debugging purposes.  Syntax forces me to also comment the `try:' line,
> and indent out the lines following the `try:' line.  And of course, the
> converse operation once debugging is done.  This is slightly heavy.

the standard pydiom for this is to change

try:
blabla
except IOError:
blabla

to

try:
blabla
except "debug": # IOError:
blabla

(to save typing, you can use an empty string or even
put quotes around the exception name, but that may
make it harder to spot the change)





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The decorator module

2005-05-06 Thread Phillip J. Eby
At 07:55 AM 5/6/2005 -0700, Guido van Rossum wrote:
>[jJ]
> > > Incidentally, would the resulting functions be a bit faster if you 
> compiled
> > > the lambda instead of repeatedly eval ing it, or does the eval 
> overhead still
> > > apply?
>
>[Michele]
> > Honestly, I don't care, since "eval" happens only once at decoration time.
> > There is no "eval" overhead at calling time, so I do not expect to have
> > problems. I am waiting for volunteers to perform profiling and
> > performance analysis ;)
>
>Watch out. I didn't see the code referred to, but realize that eval is
>*very* expensive on some other implementations of Python (Jython and
>IronPython). Eval should only be used if there is actual user-provided
>input that you don't know yet when your module is compiled; not to get
>around some limitation in the language there are usually ways around
>that, and occasionally we add one, e.g. getattr()).

In this case, the informally-discussed proposal is to add a mutable 
__signature__ to functions, and have it be used by inspect.getargspec(), so 
that decorators can copy __signature__ from the decoratee to the decorated 
function.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally

2005-05-06 Thread Fredrik Lundh
François Pinard wrote:

> > > f = None
> > > try:
> > > f = action1(...)
> > > ...
> > > finally:
> > > if f is not None:
> > > action2(f)

> > f = action1()
> > try:
> > ...
> > finally:
> > action2(f)

> > I can't see how this would ever do something different than your version.

> Oh, the problem is that if `action1()' raises an exception (and this is
> why it has to be within the `try', not before), `f' will not receive
> a value, and so, may not be initialised in all cases.  The (frequent)
> stunt is a guard so this never becomes a problem.

in Guido's solution, the "finally" clause won't be called at all if action1 
raises
an exception.





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340 - For loop cleanup, and feature separation

2005-05-06 Thread Phillip J. Eby
At 01:58 PM 5/6/2005 +1000, Delaney, Timothy C (Timothy) wrote:
>Personally, I'm of the opinion that we should make a significant break
>(no pun intended ;) and have for-loops attempt to ensure that iterators
>are exhausted.

This is simply not backward compatible with existing, perfectly valid and 
sensible code.
Therefore, this can't happen till Py3K.

The only way I could see to allow this is if:

1. Calling __iter__ on the target of the for loop returns the same object
2. The for loop owns the only reference to that iterator.

However, #2 is problematic for non-CPython implementations, and in any case 
the whole thing seems terribly fragile.

So how about this: calling __exit__(StopIteration) on a generator that 
doesn't have any active blocks could simply *not* exhaust the 
iterator.  This would ensure that any iterator whose purpose is just 
iteration (i.e. all generators written to date) still behave in a resumable 
fashion.

Ugh.  It's still fragile, though, as adding a block to an iterator will 
then make it behave differently.  It seems likely to provoke subtle errors, 
arguing again in favor of a complete separation between iteration and block 
protocols.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally

2005-05-06 Thread Guido van Rossum
[me]
> > I can't see how this would ever do something different than your version.

[Reinhold]
> Well, in the original the call to action1 was wrapped in an additional 
> try-except
> block.

Ah. Francois was misquoting it.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally

2005-05-06 Thread Guido van Rossum
[Fredrik]
> the standard pydiom for this is to change
> 
> try:
> blabla
> except IOError:
> blabla
> 
> to
> 
> try:
> blabla
> except "debug": # IOError:
> blabla
> 
> (to save typing, you can use an empty string or even
> put quotes around the exception name, but that may
> make it harder to spot the change)

Yeah, but that will stop working in Python 3.0. I like the solution
that puts a bare "raise" at the top of the except clause.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-06 Thread Steven Bethard
On 5/6/05, Paul Moore <[EMAIL PROTECTED]> wrote:
> > I don't think it "damages" any features.  Are there features you still
> > think the non-looping proposal removes?  (I'm not counting orthogonal
> > feautres like "continue EXPR" which could easily be added as an
> > entirely separate PEP.)
> 
> I *am* specifically referring to these "orthogonal" features. Removal
> of looping by modification of PEP 340 will do no such "damage", I
> agree - but removal by accepting an updated PEP 310, or a new PEP,
> *will* (unless the "entirely separate PEP" you mention is written and
> accepted along with the non-looping PEP - and I don't think that will
> happen).

So, just to make sure, if we had another PEP that contained from PEP 340[1]:
 * Specification: the __next__() Method
 * Specification: the next() Built-in Function
 * Specification: a Change to the 'for' Loop
 * Specification: the Extended 'continue' Statement
 * the yield-expression part of Specification: Generator Exit Handling
would that cover all the pieces you're concerned about?

I'd be willing to break these off into a separate PEP if people think
it's a good idea.  I've seen very few complaints about any of these
pieces of the proposal.  If possible, I'd like to see these things
approved now, so that the discussion could focus more directly on the
block-statement issues.

STeVe

[1] http://www.python.org/peps/pep-0340.html
-- 
You can wordify anything if you just verb it.
--- Bucky Katt, Get Fuzzy
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-06 Thread Guido van Rossum
[Steven Bethard]
> So, just to make sure, if we had another PEP that contained from PEP 340[1]:
>  * Specification: the __next__() Method
>  * Specification: the next() Built-in Function
>  * Specification: a Change to the 'for' Loop
>  * Specification: the Extended 'continue' Statement
>  * the yield-expression part of Specification: Generator Exit Handling
> would that cover all the pieces you're concerned about?
> 
> I'd be willing to break these off into a separate PEP if people think
> it's a good idea.  I've seen very few complaints about any of these
> pieces of the proposal.  If possible, I'd like to see these things
> approved now, so that the discussion could focus more directly on the
> block-statement issues.

I don't think it's necessary to separate this out into a separate PEP;
that just seems busy-work. I agree these parts are orthogonal and
uncontroversial; a counter-PEP can suffice by stating that it's not
countering those items nor repeating them.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-06 Thread Raymond Hettinger
> > I'd be willing to break these off into a separate PEP if people
think
> > it's a good idea.  I've seen very few complaints about any of these
> > pieces of the proposal.  If possible, I'd like to see these things
> > approved now, so that the discussion could focus more directly on
the
> > block-statement issues.
> 
> I don't think it's necessary to separate this out into a separate PEP;
> that just seems busy-work. I agree these parts are orthogonal and
> uncontroversial; a counter-PEP can suffice by stating that it's not
> countering those items nor repeating them.

If someone volunteers to split it out for you, I think it would be
worthwhile.  Right now, the PEP is hard to swallow in one bite.
Improving its digestibility would be a big help when the PEP is offered
up to the tender mercies to comp.lang.python.


Raymond
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340 - For loop cleanup, and feature separation

2005-05-06 Thread Ron Adam
Phillip J. Eby wrote:
> At 01:58 PM 5/6/2005 +1000, Delaney, Timothy C (Timothy) wrote:
> 
>>Personally, I'm of the opinion that we should make a significant break
>>(no pun intended ;) and have for-loops attempt to ensure that iterators
>>are exhausted.
> 
> 
> This is simply not backward compatible with existing, perfectly valid and 
> sensible code.
> Therefore, this can't happen till Py3K.
> 
> The only way I could see to allow this is if:
> 
> 1. Calling __iter__ on the target of the for loop returns the same object
> 2. The for loop owns the only reference to that iterator.
> 
> However, #2 is problematic for non-CPython implementations, and in any case 
> the whole thing seems terribly fragile.

Is it better to have:
1. A single looping construct that does everything,
2. or several more specialized loops that are distinct?

I think the second may be better for performance reasons. So it bay 
would better to just add a third loop construct just for iterators.

a.  for-loop -->iterable sequences and lists only
 b.  while-loop --> bool evaluations only
 c.  do-loop --> iterators only

Choice c. could mimic a. and b. with an iterator when the situation 
requires a for-loop or while-loop with special handling.

Ron_Adam
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-06 Thread Guido van Rossum
[me]
> > I don't think it's necessary to separate this out into a separate PEP;
> > that just seems busy-work. I agree these parts are orthogonal and
> > uncontroversial; a counter-PEP can suffice by stating that it's not
> > countering those items nor repeating them.

[Raymond]
> If someone volunteers to split it out for you, I think it would be
> worthwhile.  Right now, the PEP is hard to swallow in one bite.
> Improving its digestibility would be a big help when the PEP is offered
> up to the tender mercies to comp.lang.python.

Well, I don't care so much about their tender mercies right now. I'm
not even sure that if we reach agreement on python-dev there's any
point in repeating the agony on c.l.py.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-06 Thread Steven Bethard
[Guido]
> I don't think it's necessary to separate this out into a separate PEP;
> that just seems busy-work. I agree these parts are orthogonal and
> uncontroversial; a counter-PEP can suffice by stating that it's not
> countering those items nor repeating them.

[Raymond]
> If someone volunteers to split it out for you, I think it would be
> worthwhile.  Right now, the PEP is hard to swallow in one bite.
> Improving its digestibility would be a big help when the PEP is offered
> up to the tender mercies to comp.lang.python.

Well, busy-work or not, I took the 20 minutes to split them up, so I
figured I might as well make them available.  It was actually really
easy to split them apart, and I think they both read better this way,
but I'm not sure my opinion counts for much here anyway. ;-)  (The
Enhanced Iterators PEP is first, the remainder of PEP 340 follows it.)

--
PEP: XXX
Title: Enhanced Iterators
Version: 
Last-Modified: 
Author: Guido van Rossum
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 6-May-2005
Post-History:

Introduction

This PEP proposes a new iterator API that allows values to be
passed into an iterator using "continue EXPR". These values are
received in the iterator as an argument to the new __next__
method, and can be accessed in a generator with a
yield-expression.

The content of this PEP is derived from the original content of
PEP 340, broken off into its own PEP as the new iterator API is
pretty much orthogonal from the anonymous block statement
discussion.

Motivation and Summary

...

Use Cases

See the Examples section near the end.

Specification: the __next__() Method

A new method for iterators is proposed, called __next__().  It
takes one optional argument, which defaults to None.  Calling the
__next__() method without argument or with None is equivalent to
using the old iterator API, next().  For backwards compatibility,
it is recommended that iterators also implement a next() method as
an alias for calling the __next__() method without an argument.

The argument to the __next__() method may be used by the iterator
as a hint on what to do next.

Specification: the next() Built-in Function

This is a built-in function defined as follows:

def next(itr, arg=None):
nxt = getattr(itr, "__next__", None)
if nxt is not None:
return nxt(arg)
if arg is None:
return itr.next()
raise TypeError("next() with arg for old-style iterator")

This function is proposed because there is often a need to call
the next() method outside a for-loop; the new API, and the
backwards compatibility code, is too ugly to have to repeat in
user code.

Specification: a Change to the 'for' Loop

A small change in the translation of the for-loop is proposed.
The statement

for VAR1 in EXPR1:
BLOCK1
else:
BLOCK2

will be translated as follows:

itr = iter(EXPR1)
arg = None# Set by "continue EXPR2", see below
brk = False
while True:
try:
VAR1 = next(itr, arg)
except StopIteration:
brk = True
break
arg = None
BLOCK1
if brk:
BLOCK2

(However, the variables 'itr' etc. are not user-visible and the
built-in names used cannot be overridden by the user.)

Specification: the Extended 'continue' Statement

In the translation of the for-loop, inside BLOCK1, the new syntax

continue EXPR2

is legal and is translated into

arg = EXPR2
continue

(Where 'arg' references the corresponding hidden variable from the
previous section.)

This is also the case in the body of the block-statement proposed
below.

EXPR2 may contain commas; "continue 1, 2, 3" is equivalent to
"continue (1, 2, 3)".

Specification: Generators and Yield-Expressions

Generators will implement the new __next__() method API, as well
as the old argument-less next() method which becomes an alias for
calling __next__() without an argument.

The yield-statement will be allowed to be used on the right-hand
side of an assignment; in that case it is referred to as
yield-expression.  The value of this yield-expression is None
unless __next__() was called with an argument; see below.

A yield-expression must always be parenthesized except when it
occurs at the top-level expression on the right-hand side of an
assignment.  So

x = yield 42
x = yield
x = 12 + (yield 42)
x = 12 + (yield)
foo(yield 42)
foo(yield)

are all legal, but

x = 12 + yield 42
x = 12 + yield
foo(yield 42, 12)
foo(yield, 12)

are all

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin

On May 6, 2005, at 3:17 AM, M.-A. Lemburg wrote:

> You've got that wrong: Python let's you choose UCS-4 -
> UCS-2 is the default.
>
> Note that Python's Unicode codecs UTF-8 and UTF-16
> are surrogate aware and thus support non-BMP code points
> regardless of the build type: A UCS2-build of Python will
> store a non-BMP code point as UTF-16 surrogate pair in the
> Py_UNICODE buffer while a UCS4 build will store it as a
> single value. Decoding is surrogate aware too, so a UTF-16
> surrogate pair in a UCS2 build will get treated as single
> Unicode code point.

If this is the case, then we're clearly misleading users.  If the 
configure script says UCS-2, then as a user I would assume that 
surrogate pairs would *not* be encoded, because I chose UCS-2, and it 
doesn't support that.  I would assume that any UTF-16 string I would 
read would be transcoded into the internal type (UCS-2), and 
information would be lost.  If this is not the case, then what does the 
configure option mean?

--
Nick

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin

On May 6, 2005, at 3:25 AM, M.-A. Lemburg wrote:

> I don't see why you shouldn't use Py_UNICODE buffer directly.
> After all, the reason why we have that typedef is to make it
> possible to program against an abstract type - regardless of
> its size on the given platform.

Because the encoding of that buffer appears to be different depending 
on the configure options.  If that isn't true, then someone needs to 
change the doc, and the configure options.  Right now, it seems *very* 
clear that Py_UNICODE may either be UCS-2 or UCS-4 encoded if you read 
the configure help, and you can't use the buffer directly if the 
encoding is variable.  However, you seem to be saying that this isn't 
true.

--
Nick

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin

On May 6, 2005, at 3:17 AM, M.-A. Lemburg wrote:

> You've got that wrong: Python let's you choose UCS-4 -
> UCS-2 is the default.

No, that's not true.  Python lets you choose UCS-4 or UCS-2.  What the 
default is depends on your platform.  If you run raw configure, some 
systems will choose UCS-4, and some will choose UCS-2.  This is how the 
conversation came about in the first place - running ./configure on 
RHL9 gives you UCS-4.

--
Nick

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally

2005-05-06 Thread François Pinard
[Guido van Rossum]

> I like the solution that puts a bare "raise" at the top of the except
> clause.

Yes.  Clean and simple enough.  Thanks all! :-)

-- 
François Pinard   http://pinard.progiciels-bpi.ca
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread James Y Knight
On May 6, 2005, at 2:49 PM, Nicholas Bastin wrote:
> If this is the case, then we're clearly misleading users.  If the
> configure script says UCS-2, then as a user I would assume that
> surrogate pairs would *not* be encoded, because I chose UCS-2, and it
> doesn't support that.  I would assume that any UTF-16 string I would
> read would be transcoded into the internal type (UCS-2), and
> information would be lost.  If this is not the case, then what does the
> configure option mean?

It means all the string operations treat strings as if they were UCS-2, 
but that in actuality, they are UTF-16. Same as the case in the windows 
APIs and Java. That is, all string operations are essentially broken, 
because they're operating on encoded bytes, not characters, but claim 
to be operating on characters.

James

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-06 Thread Paul Moore
On 5/6/05, Steven Bethard <[EMAIL PROTECTED]> wrote:
> Well, busy-work or not, I took the 20 minutes to split them up, so I
> figured I might as well make them available.  It was actually really
> easy to split them apart, and I think they both read better this way,
> but I'm not sure my opinion counts for much here anyway. ;-)  (The
> Enhanced Iterators PEP is first, the remainder of PEP 340 follows it.)

Thanks for doing this. I think you may well be right - the two pieces
feel more orthogonal like this (I haven't checked for dependencies,
I'm trusting your editing and Guido's original assertion that the
parts are independent).

> --
> PEP: XXX
> Title: Enhanced Iterators

Strawman question - as this is the "uncontroversial" bit, can this
part be accepted as it stands? :-)

Paul.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread M.-A. Lemburg
Nicholas Bastin wrote:
> On May 6, 2005, at 3:17 AM, M.-A. Lemburg wrote:
> 
> 
>>You've got that wrong: Python let's you choose UCS-4 -
>>UCS-2 is the default.
> 
> 
> No, that's not true.  Python lets you choose UCS-4 or UCS-2.  What the 
> default is depends on your platform.  If you run raw configure, some 
> systems will choose UCS-4, and some will choose UCS-2.  This is how the 
> conversation came about in the first place - running ./configure on 
> RHL9 gives you UCS-4.

Hmm, looking at the configure.in script, it seems you're right.
I wonder why this weird dependency on TCL was added. This was
certainly not intended (see the comment):

if test $enable_unicode = yes
then
  # Without any arguments, Py_UNICODE defaults to two-byte mode
  case "$have_ucs4_tcl" in
  yes) enable_unicode="ucs4"
   ;;
  *)   enable_unicode="ucs2"
   ;;
  esac
fi

The annotiation suggests that Martin added this.

Martin, could you please explain why the whole *Python system*
should depend on what Unicode type some installed *TCL system*
is using ? I fail to see the connection.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 06 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-06 Thread Steven Bethard
On 5/6/05, Paul Moore <[EMAIL PROTECTED]> wrote:
> On 5/6/05, Steven Bethard <[EMAIL PROTECTED]> wrote:
> > PEP: XXX
> > Title: Enhanced Iterators
> 
> Strawman question - as this is the "uncontroversial" bit, can this
> part be accepted as it stands? :-)

FWIW, I'm +1 on this.  Enhanced Iterators
 * updates the iterator protocol to use .__next__() instead of .next()
 * introduces a new builtin next()
 * allows continue-statements to pass values to iterators
 * allows generators to receive values with a yield-expression
The first two are, I believe, how the iterator protocol probably
should have been in the first place.  The second two provide a simple
way of passing values to generators, something I got the impression
that the co-routiney people would like a lot.

STeVe
-- 
You can wordify anything if you just verb it.
--- Bucky Katt, Get Fuzzy
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally

2005-05-06 Thread Eric Nieuwland
Guido van Rossum wrote:
> try_stmt: 'try' ':' suite
> (
> except_clause ':' suite)+
> ['else' ':' suite] ['finally' ':' suite]
> |
> 'finally' ':' suite
> )
>
> There is no real complexity in this grammar, it's unambiguous, it's an
> easy enough job for the code generator, and it catches a certain class
> of mistakes (like mis-indenting some code).

Fair enough. Always nice to have some assistence from the system.

--eric

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-06 Thread Jim Jewett
Enhanced Iterators: 

...
> When the *initial* call to __next__() receives an argument 
> that is not None, TypeError is raised; this is likely caused
> by some logic error. 

This made sense when the (Block) Iterators were Resources,
and the first __next__() was just to trigger the setup.

It makes less sense for general iterators.

It is true that the first call in a generic for-loop couldn't 
pass a value (as it isn't continued), but I don't see anything
wrong with explicit calls to __next__.

Example:  An agent which responds to the environment;
the agent can execute multi-stage plans, or change its mind 
part way through.  

   action = scheduler.__next__(current_sensory_input)

-jJ
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-06 Thread Guido van Rossum
On 5/6/05, Steven Bethard <[EMAIL PROTECTED]> wrote:
> On 5/6/05, Paul Moore <[EMAIL PROTECTED]> wrote:
> > On 5/6/05, Steven Bethard <[EMAIL PROTECTED]> wrote:
> > > PEP: XXX
> > > Title: Enhanced Iterators
> >
> > Strawman question - as this is the "uncontroversial" bit, can this
> > part be accepted as it stands? :-)
> 
> FWIW, I'm +1 on this.  Enhanced Iterators
>  * updates the iterator protocol to use .__next__() instead of .next()
>  * introduces a new builtin next()
>  * allows continue-statements to pass values to iterators
>  * allows generators to receive values with a yield-expression
> The first two are, I believe, how the iterator protocol probably
> should have been in the first place.  The second two provide a simple
> way of passing values to generators, something I got the impression
> that the co-routiney people would like a lot.

At the same time it pretty much affects *only* the co-routiney people,
so there's no hurry. I'd be happy with PEP 340 without all this too. I
think one reason it ended up in that PEP is that an earlier version of
the PEP called __next__() with an exception argument instead of having
a separate__exit__() API.

There's one alternative possible (still orthogonal to PEP 340):
instead of __next__(), we could add an optional argument to the next()
method, and forget about the next() built-in. This is more compatible
(if less future-proof). Old iterators would raise an exception when
their next() is called with an argument, and this would be a
reasonable way to find out that you're using "continue EXPR" with an
iterator that doesn't support it. (The C level API would be a bit
hairier but it can all be done in a compatible way.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc (Another Attempt)

2005-05-06 Thread Nicholas Bastin
After reading through the code and the comments in this thread, I 
propose the following in the documentation as the definition of 
Py_UNICODE:

"This type represents the storage type which is used by Python 
internally as the basis for holding Unicode ordinals.  Extension module 
developers should make no assumptions about the size or native encoding 
of this type on any given platform."

The main point here is that extension developers can not safely slam 
Py_UNICODE (which it appeared was true when the documentation stated 
that it was always 16-bits).

I don't propose that we put this information in the doc, but the 
possible internal representations are:

2-byte wchar_t or unsigned short encoded as UTF-16
4-byte wchar_t encoded as UTF-32 (UCS-4)

If you do not explicitly set the configure option, you cannot guarantee 
which you will get.  Python also does not normalize the byte order of 
unicode strings passed into it from C (via PyUnicode_EncodeUTF16, for 
example), so it is possible to have UTF-16LE and UTF-16BE strings in 
the system at the same time, which is a bit confusing.  This may or may 
not be worth a mention in the doc (or a patch).

--
Nick

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin

On May 6, 2005, at 3:42 PM, James Y Knight wrote:

> On May 6, 2005, at 2:49 PM, Nicholas Bastin wrote:
>> If this is the case, then we're clearly misleading users.  If the
>> configure script says UCS-2, then as a user I would assume that
>> surrogate pairs would *not* be encoded, because I chose UCS-2, and it
>> doesn't support that.  I would assume that any UTF-16 string I would
>> read would be transcoded into the internal type (UCS-2), and
>> information would be lost.  If this is not the case, then what does 
>> the
>> configure option mean?
>
> It means all the string operations treat strings as if they were 
> UCS-2, but that in actuality, they are UTF-16. Same as the case in the 
> windows APIs and Java. That is, all string operations are essentially 
> broken, because they're operating on encoded bytes, not characters, 
> but claim to be operating on characters.

Well, this is a completely separate issue/problem. The internal 
representation is UTF-16, and should be stated as such.  If the 
built-in methods actually don't work with surrogate pairs, then that 
should be fixed.

--
Nick

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-06 Thread Guido van Rossum
> Enhanced Iterators:
> 
> ...
> > When the *initial* call to __next__() receives an argument
> > that is not None, TypeError is raised; this is likely caused
> > by some logic error.

[Jim Jewett]
> This made sense when the (Block) Iterators were Resources,
> and the first __next__() was just to trigger the setup.
> 
> It makes less sense for general iterators.
> 
> It is true that the first call in a generic for-loop couldn't
> pass a value (as it isn't continued), but I don't see anything
> wrong with explicit calls to __next__.
> 
> Example:  An agent which responds to the environment;
> the agent can execute multi-stage plans, or change its mind
> part way through.
> 
>action = scheduler.__next__(current_sensory_input)

Good point. I'd be happy if the requirement that the first __next__()
call doesn't have an argument (or that it's None) only applies to
generators, and not to iterators in general.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Shane Hathaway
Nicholas Bastin wrote:
> On May 6, 2005, at 3:42 PM, James Y Knight wrote:
>>It means all the string operations treat strings as if they were 
>>UCS-2, but that in actuality, they are UTF-16. Same as the case in the 
>>windows APIs and Java. That is, all string operations are essentially 
>>broken, because they're operating on encoded bytes, not characters, 
>>but claim to be operating on characters.
> 
> 
> Well, this is a completely separate issue/problem. The internal 
> representation is UTF-16, and should be stated as such.  If the 
> built-in methods actually don't work with surrogate pairs, then that 
> should be fixed.

Wait... are you saying a Py_UNICODE array contains either UTF-16 or
UTF-32 characters, but never UCS-2?  That's a big surprise to me.  I may
need to change my PyXPCOM patch to fit this new understanding.  I tried
hard to not care how Python encodes unicode characters, but details like
this are important when combining two frameworks with different unicode
APIs.

Shane
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally

2005-05-06 Thread Fredrik Lundh
Guido van Rossum wrote:

> > (to save typing, you can use an empty string or even
> > put quotes around the exception name, but that may
> > make it harder to spot the change)
>
> Yeah, but that will stop working in Python 3.0.

well, I tend to remove my debugging hacks once I've fixed
the bug.  I definitely don't expect them to be compatible with
hypothetical future releases...





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-06 Thread Phillip J. Eby
At 01:18 PM 5/6/2005 -0700, Guido van Rossum wrote:
>There's one alternative possible (still orthogonal to PEP 340):
>instead of __next__(), we could add an optional argument to the next()
>method, and forget about the next() built-in. This is more compatible
>(if less future-proof). Old iterators would raise an exception when
>their next() is called with an argument, and this would be a
>reasonable way to find out that you're using "continue EXPR" with an
>iterator that doesn't support it. (The C level API would be a bit
>hairier but it can all be done in a compatible way.)

+1.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-06 Thread Ka-Ping Yee
On Fri, 6 May 2005, Guido van Rossum wrote:
> There's one alternative possible (still orthogonal to PEP 340):
> instead of __next__(), we could add an optional argument to the next()
> method, and forget about the next() built-in.

I prefer your original proposal.  I think this is a good time to switch
to next().  If we are going to change the protocol, let's do it right.


-- ?!ng
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin

On May 6, 2005, at 5:21 PM, Shane Hathaway wrote:

> Nicholas Bastin wrote:
>> On May 6, 2005, at 3:42 PM, James Y Knight wrote:
>>> It means all the string operations treat strings as if they were
>>> UCS-2, but that in actuality, they are UTF-16. Same as the case in 
>>> the
>>> windows APIs and Java. That is, all string operations are essentially
>>> broken, because they're operating on encoded bytes, not characters,
>>> but claim to be operating on characters.
>>
>>
>> Well, this is a completely separate issue/problem. The internal
>> representation is UTF-16, and should be stated as such.  If the
>> built-in methods actually don't work with surrogate pairs, then that
>> should be fixed.
>
> Wait... are you saying a Py_UNICODE array contains either UTF-16 or
> UTF-32 characters, but never UCS-2?  That's a big surprise to me.  I 
> may
> need to change my PyXPCOM patch to fit this new understanding.  I tried
> hard to not care how Python encodes unicode characters, but details 
> like
> this are important when combining two frameworks with different unicode
> APIs.

Yes.  Well, in as much as a large part of UTF-16 directly overlaps 
UCS-2, then sometimes unicode strings contain UCS-2 characters.  
However, characters which would not be legal in UCS-2 are still encoded 
properly in python, in UTF-16.

And yes, I feel your pain, that's how I *got* into this position.  
Mapping from external unicode types is an important aspect of writing 
extension modules, and the documentation does not help people trying to 
do this.  The fact that python's internal encoding is variable is a 
huge problem in and of itself, even if that was documented properly.  
This is why tools like Xerces and ICU will be happy to give you 
whatever form of unicode strings you want, but internally they always 
use UTF-16 - to avoid having to write two internal implementations of 
the same functionality.  If you look up and down 
Objects/unicodeobject.c you'll see a fair amount of code written a 
couple of different ways (using #ifdef's) because of the variability in 
the internal representation.

--
Nick

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Shane Hathaway
Nicholas Bastin wrote:
> 
> On May 6, 2005, at 5:21 PM, Shane Hathaway wrote:
>> Wait... are you saying a Py_UNICODE array contains either UTF-16 or
>> UTF-32 characters, but never UCS-2?  That's a big surprise to me.  I may
>> need to change my PyXPCOM patch to fit this new understanding.  I tried
>> hard to not care how Python encodes unicode characters, but details like
>> this are important when combining two frameworks with different unicode
>> APIs.
> 
> 
> Yes.  Well, in as much as a large part of UTF-16 directly overlaps
> UCS-2, then sometimes unicode strings contain UCS-2 characters. 
> However, characters which would not be legal in UCS-2 are still encoded
> properly in python, in UTF-16.
> 
> And yes, I feel your pain, that's how I *got* into this position. 
> Mapping from external unicode types is an important aspect of writing
> extension modules, and the documentation does not help people trying to
> do this.  The fact that python's internal encoding is variable is a huge
> problem in and of itself, even if that was documented properly.  This is
> why tools like Xerces and ICU will be happy to give you whatever form of
> unicode strings you want, but internally they always use UTF-16 - to
> avoid having to write two internal implementations of the same
> functionality.  If you look up and down Objects/unicodeobject.c you'll
> see a fair amount of code written a couple of different ways (using
> #ifdef's) because of the variability in the internal representation.

Ok.  Thanks for helping me understand where Python is WRT unicode.  I
can work around the issues (or maybe try to help solve them) now that I
know the current state of affairs.  If Python correctly handled UTF-16
strings internally, we wouldn't need the UCS-4 configuration switch,
would we?

Shane
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote:
> The important piece of information is that it is not guaranteed to be a
> particular one of those sizes.  Once you can't guarantee the size, no
> one really cares what size it is.

Please trust many years of experience: This is just not true. People
do care, and they want to know. If we tell them "it depends", they
ask "how can I find out".

> The documentation should discourage
> developers from attempting to manipulate Py_UNICODE directly, which,
> other than trivia, is the only reason why someone would care what size
> the internal representation is.

Why is that? Of *course* people will have to manipulate Py_UNICODE*
buffers directly. What else can they use?

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote:
> I'm not sure the Python documentation is the place to teach someone
> about unicode.  The ISO 10646 pretty clearly defines UCS-2 as only
> containing characters in the BMP (plane zero).  On the other hand, I
> don't know why python lets you choose UCS-2 anyhow, since it's almost
> always not what you want.

It certainly is, in most cases. On Windows, it is the only way to
get reasonable interoperability with the platform's WCHAR (i.e.
just cast a Py_UNICODE* into a WCHAR*).

To a limited degree, in UCS-2 mode, Python has support for surrogate
characters (e.g. in UTF-8 codec), so it is not "pure" UCS-2, but
this is a minor issue.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Bob Ippolito
On May 6, 2005, at 7:05 PM, Shane Hathaway wrote:

> Nicholas Bastin wrote:
>
>> On May 6, 2005, at 5:21 PM, Shane Hathaway wrote:
>>
>>> Wait... are you saying a Py_UNICODE array contains either UTF-16 or
>>> UTF-32 characters, but never UCS-2?  That's a big surprise to  
>>> me.  I may
>>> need to change my PyXPCOM patch to fit this new understanding.  I  
>>> tried
>>> hard to not care how Python encodes unicode characters, but  
>>> details like
>>> this are important when combining two frameworks with different  
>>> unicode
>>> APIs.
>>
>> Yes.  Well, in as much as a large part of UTF-16 directly overlaps
>> UCS-2, then sometimes unicode strings contain UCS-2 characters.
>> However, characters which would not be legal in UCS-2 are still  
>> encoded
>> properly in python, in UTF-16.
>>
>> And yes, I feel your pain, that's how I *got* into this position.
>> Mapping from external unicode types is an important aspect of writing
>> extension modules, and the documentation does not help people  
>> trying to
>> do this.  The fact that python's internal encoding is variable is  
>> a huge
>> problem in and of itself, even if that was documented properly.   
>> This is
>> why tools like Xerces and ICU will be happy to give you whatever  
>> form of
>> unicode strings you want, but internally they always use UTF-16 - to
>> avoid having to write two internal implementations of the same
>> functionality.  If you look up and down Objects/unicodeobject.c  
>> you'll
>> see a fair amount of code written a couple of different ways (using
>> #ifdef's) because of the variability in the internal representation.
>>
>
> Ok.  Thanks for helping me understand where Python is WRT unicode.  I
> can work around the issues (or maybe try to help solve them) now  
> that I
> know the current state of affairs.  If Python correctly handled UTF-16
> strings internally, we wouldn't need the UCS-4 configuration switch,
> would we?

Personally I would rather see Python (3000) grow a new way to  
represent strings, more along the lines of the way it's typically  
done in Objective-C.  I wrote a little bit about that works here:

http://bob.pythonmac.org/archives/2005/04/04/pyobjc-and-unicode/

Effectively, instead of having One And Only One Way To Store Text,  
you would have one and only one base class (say basestring) that has  
some "virtual" methods that know how to deal with text.  Then, you  
have several concrete implementations that implements those functions  
for its particular backing store (and possibly encoding, but that  
might be implicit with the backing store.. i.e. if its an ASCII,  
UCS-2 or UCS-4 backing store).  Currently we more or less have this  
at the Python level, between str and unicode, but certainly not at  
the C API.

-bob

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Shane Hathaway wrote:
> Then something in the Python docs ought to say why UCS-2 is not what you
> want.  I still don't know; I've heard differing opinions on the subject.
>  Some say you'll never need more than what UCS-2 provides.  Is that
> incorrect?

That clearly depends on who "you" is.

> More generally, how should a non-unicode-expert writing Python extension
> code find out the minimum they need to know about unicode to use the
> Python unicode API?  The API reference [1] ought to at least have a list
> of background links.  I had to hunt everywhere.

That, of course, depends on what your background is. Did you know what
Latin-1 is, when you started? How it relates to code page 1252? What
UTF-8 is? What an abstract character is, as opposed to a byte sequence
on the one hand, and to a glyph on the other hand?

Different people need different background, especially if they are
writing different applications.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote:
> If this is the case, then we're clearly misleading users.  If the
> configure script says UCS-2, then as a user I would assume that
> surrogate pairs would *not* be encoded, because I chose UCS-2, and it
> doesn't support that.

What do you mean by that? That the interpreter crashes if you try
to store a low surrogate into a Py_UNICODE?

> I would assume that any UTF-16 string I would
> read would be transcoded into the internal type (UCS-2), and information
> would be lost.  If this is not the case, then what does the configure
> option mean?

It tells you whether you have the two-octet form of the Universal
Character Set, or the four-octet form.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote:
> Because the encoding of that buffer appears to be different depending on
> the configure options.

What makes it appear so? sizeof(Py_UNICODE) changes when you change
the option - does that, in your mind, mean that the encoding changes?

> If that isn't true, then someone needs to change
> the doc, and the configure options.  Right now, it seems *very* clear
> that Py_UNICODE may either be UCS-2 or UCS-4 encoded if you read the
> configure help, and you can't use the buffer directly if the encoding is
> variable.  However, you seem to be saying that this isn't true.

It's a compile-time option (as all configure options). So at run-time,
it isn't variable.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote:
> No, that's not true.  Python lets you choose UCS-4 or UCS-2.  What the
> default is depends on your platform.

The truth is more complicated. If your Tcl is built for UCS-4, then
Python will also be built for UCS-4 (unless overridden by command line).
Otherwise, Python will default to UCS-2.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
M.-A. Lemburg wrote:
> Hmm, looking at the configure.in script, it seems you're right.
> I wonder why this weird dependency on TCL was added.

If Python is configured for UCS-2, and Tcl for UCS-4, then
Tkinter would not work out of the box. Hence the weird dependency.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin

On May 6, 2005, at 7:43 PM, Martin v. Löwis wrote:

> Nicholas Bastin wrote:
>> If this is the case, then we're clearly misleading users.  If the
>> configure script says UCS-2, then as a user I would assume that
>> surrogate pairs would *not* be encoded, because I chose UCS-2, and it
>> doesn't support that.
>
> What do you mean by that? That the interpreter crashes if you try
> to store a low surrogate into a Py_UNICODE?

What I mean is pretty clear.  UCS-2 does *NOT* support surrogate pairs. 
  If it did, it would be called UTF-16.  If Python really supported 
UCS-2, then surrogate pairs from UTF-16 inputs would either get turned 
into two garbage characters, or the "I couldn't transcode this" UCS-2 
code point (I don't remember which on that is off the top of my head).

>> I would assume that any UTF-16 string I would
>> read would be transcoded into the internal type (UCS-2), and 
>> information
>> would be lost.  If this is not the case, then what does the configure
>> option mean?
>
> It tells you whether you have the two-octet form of the Universal
> Character Set, or the four-octet form.

It would, if that were the case, but it's not.  Setting UCS-2 in the 
configure script really means UTF-16, and as such, the documentation 
should reflect that.

--
Nick

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin

On May 6, 2005, at 7:45 PM, Martin v. Löwis wrote:

> Nicholas Bastin wrote:
>> Because the encoding of that buffer appears to be different depending 
>> on
>> the configure options.
>
> What makes it appear so? sizeof(Py_UNICODE) changes when you change
> the option - does that, in your mind, mean that the encoding changes?

Yes.  Not only in my mind, but in the Python source code.  If 
Py_UNICODE is 4 bytes wide, then the encoding is UTF-32 (UCS-4), 
otherwise the encoding is UTF-16 (*not* UCS-2).

>> If that isn't true, then someone needs to change
>> the doc, and the configure options.  Right now, it seems *very* clear
>> that Py_UNICODE may either be UCS-2 or UCS-4 encoded if you read the
>> configure help, and you can't use the buffer directly if the encoding 
>> is
>> variable.  However, you seem to be saying that this isn't true.
>
> It's a compile-time option (as all configure options). So at run-time,
> it isn't variable.

What I mean by 'variable' is that you can't make any assumption as to 
what the size will be in any given python when you're writing (and 
building) an extension module.  This breaks binary compatibility of 
extensions modules on the same platform and same version of python 
across interpreters which may have been built with different configure 
options.

--
Nick

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Shane Hathaway wrote:
> Ok.  Thanks for helping me understand where Python is WRT unicode.  I
> can work around the issues (or maybe try to help solve them) now that I
> know the current state of affairs.  If Python correctly handled UTF-16
> strings internally, we wouldn't need the UCS-4 configuration switch,
> would we?

Define correctly. Python, in ucs2 mode, will allow to address individual
surrogate codes, e.g. in indexing. So you get

>>> u"\U00012345"[0]
u'\ud808'

This will never work "correctly", and never should, because an efficient
implementation isn't possible. If you want "safe" indexing and slicing,
you need ucs4.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote:
> What I mean is pretty clear.  UCS-2 does *NOT* support surrogate pairs. 
>   If it did, it would be called UTF-16.  If Python really supported 
> UCS-2, then surrogate pairs from UTF-16 inputs would either get turned 
> into two garbage characters, or the "I couldn't transcode this" UCS-2 
> code point (I don't remember which on that is off the top of my head).

OTOH, if Python really supported UTF-16, then unichr(0x1) would
work, and len(u"\U0001") would be 1.

It is primarily just the UTF-8 codec which supports UTF-16.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote:
> Yes.  Not only in my mind, but in the Python source code.  If 
> Py_UNICODE is 4 bytes wide, then the encoding is UTF-32 (UCS-4), 
> otherwise the encoding is UTF-16 (*not* UCS-2).

I see. Some people equate "encoding" with "encoding scheme";
neither UTF-32 nor UTF-16 is an encoding scheme. You were
apparently talking about encoding forms.

> What I mean by 'variable' is that you can't make any assumption as to 
> what the size will be in any given python when you're writing (and 
> building) an extension module.  This breaks binary compatibility of 
> extensions modules on the same platform and same version of python 
> across interpreters which may have been built with different configure 
> options.

True. The breakage will be quite obvious, in most cases: the module
fails to load because not only sizeof(Py_UNICODE) changes, but also
the names of all symbols change.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote:
> Well, this is a completely separate issue/problem. The internal 
> representation is UTF-16, and should be stated as such.  If the 
> built-in methods actually don't work with surrogate pairs, then that 
> should be fixed.

Yes to the former, no to the latter. PEP 261 specifies what should
and shouldn't work.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)

2005-05-06 Thread Nick Coghlan
PEP 340 contains several different ideas. This rewrite separates them into five 
major areas:
  - passing data into an iterator
  - finalising iterators
  - integrating finalisation into for loops
  - the new non-looping finalising statement
  - integrating all of these with generators.

The first area has nothing to do with finalisation, so it is not included in 
this rewrite (Steven Bethard wrote an Enhanced Iterators pre-PEP which covers 
only that area, though).

The whole PEP draft can be found here:
http://members.iinet.net.au/~ncoghlan/public/pep-3XX.html

But I've inlined some examples that differ from or aren't in PEP 340 for those 
that don't have time to read the whole thing (example numbers are from the PEP):

4. A template that tries something up to n times::

 def auto_retry(n=3, exc=Exception):
 for i in range(n):
 try:
 yield
 except exc, err:
 # perhaps log exception here
 yield
 raise # re-raise the exception we caught earlier

Used as follows::

 for del auto_retry(3, IOError):
 f = urllib.urlopen("http://python.org/";)
 print f.read()

6. It is easy to write a regular class with the semantics of example 1::

 class locking:
def __init__(self, lock):
self.lock = lock
def __enter__(self):
self.lock.acquire()
def __exit__(self, type, value=None, traceback=None):
self.lock.release()
if type is not None:
raise type, value, traceback

(This example is easily modified to implement the other examples; it shows that 
generators are not always the simplest way to do things.)

8. Find the first file with a specific header::

 for name in filenames:
 stmt opening(name) as f:
 if f.read(2) == 0xFEB0: break

9. Find the first item you can handle, holding a lock for the entire loop, or 
just for each iteration::

 stmt locking(lock):
 for item in items:
 if handle(item): break

 for item in items:
 stmt locking(lock):
 if handle(item): break

10. Hold a lock while inside a generator, but release it when returning control 
to the outer scope::

 stmt locking(lock):
 for item in items:
 stmt unlocking(lock):
 yield item


Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-06 Thread Brett C.
Guido van Rossum wrote:
[SNIP]
> There's one alternative possible (still orthogonal to PEP 340):
> instead of __next__(), we could add an optional argument to the next()
> method, and forget about the next() built-in. This is more compatible
> (if less future-proof). Old iterators would raise an exception when
> their next() is called with an argument, and this would be a
> reasonable way to find out that you're using "continue EXPR" with an
> iterator that doesn't support it. (The C level API would be a bit
> hairier but it can all be done in a compatible way.)
> 

I prefer the original proposal.

-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin

On May 6, 2005, at 8:25 PM, Martin v. Löwis wrote:

> Nicholas Bastin wrote:
>> Yes.  Not only in my mind, but in the Python source code.  If
>> Py_UNICODE is 4 bytes wide, then the encoding is UTF-32 (UCS-4),
>> otherwise the encoding is UTF-16 (*not* UCS-2).
>
> I see. Some people equate "encoding" with "encoding scheme";
> neither UTF-32 nor UTF-16 is an encoding scheme. You were

That's not true.  UTF-16 and UTF-32 are both CES and CEF (although this 
is not true of UTF-16LE and BE).  UTF-32 is a fixed-width encoding form 
within a code space of (0..10) and UTF-16 is a variable-width 
encoding form which provides a mix of one of two 16-bit code units in 
the code space of (0..).  However, you are perhaps right to point 
out that people should be more explicit as to which they are referring 
to.  UCS-2, however, is only a CEF, and thus I thought it was obvious 
that I was referring to UTF-16 as a CEF.  I would point anyone who is 
confused as this point to Unicode Technical Report #17 on the Character 
Encoding Model, which is much more clear than trying to piece together 
the relevant parts out of the entire standard.

In any event, Python's use of the term UCS-2 is incorrect.  I quote 
from the TR:

"The UCS-2 encoding form, which is associated with ISO/IEC 10646 and 
can only express characters in the  BMP, is a fixed-width encoding 
form."

immediately followed by:

"In contrast, UTF-16 uses either one or two code  units and is able to 
cover the entire code space of Unicode."

If Python is capable of representing the entire code space of Unicode 
when you choose --unicode=ucs2, then that is a bug.  It either should 
not be called UCS-2, or the interpreter should be bound by the 
limitations of the UCS-2 CEF.


>> What I mean by 'variable' is that you can't make any assumption as to
>> what the size will be in any given python when you're writing (and
>> building) an extension module.  This breaks binary compatibility of
>> extensions modules on the same platform and same version of python
>> across interpreters which may have been built with different configure
>> options.
>
> True. The breakage will be quite obvious, in most cases: the module
> fails to load because not only sizeof(Py_UNICODE) changes, but also
> the names of all symbols change.

Yes, but the important question here is why would we want that?  Why 
doesn't Python just have *one* internal representation of a Unicode 
character?  Having more than one possible definition just creates 
problems, and provides no value.

--
Nick

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin

On May 6, 2005, at 8:11 PM, Martin v. Löwis wrote:

> Nicholas Bastin wrote:
>> Well, this is a completely separate issue/problem. The internal
>> representation is UTF-16, and should be stated as such.  If the
>> built-in methods actually don't work with surrogate pairs, then that
>> should be fixed.
>
> Yes to the former, no to the latter. PEP 261 specifies what should
> and shouldn't work.

This PEP has several textual errors and ambiguities (which, admittedly, 
may have been a necessary state given the unicode standard in 2001).  
However, putting that aside, I would recommend that:

--enable-unicode=ucs2

be replaced with:

--enable-unicode=utf16

and the docs be updated to reflect more accurately the variance of the 
internal storage type.

I would also like the community to strongly consider standardizing on a 
single internal representation, but I will leave that fight for another 
day.

--
Nick

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-06 Thread Michele Simionato
On 5/6/05, Steven Bethard <[EMAIL PROTECTED]> wrote:
> FWIW, I'm +1 on this.  Enhanced Iterators
>  * updates the iterator protocol to use .__next__() instead of .next()
>  * introduces a new builtin next()
>  * allows continue-statements to pass values to iterators
>  * allows generators to receive values with a yield-expression
> The first two are, I believe, how the iterator protocol probably
> should have been in the first place.  The second two provide a simple
> way of passing values to generators, something I got the impression
> that the co-routiney people would like a lot.

Thank you for splitting the PEP. Conceptually, the "coroutine" part  
has nothing to do with blocks and it stands on its own, it is right
to discuss it separately from the block syntax.

Personally, I do not see an urgent need for the block syntax (most of
the use case can be managed with decorators) nor for the "couroutine"
syntax (you can already use Armin Rigo's greenlets for that).

Anyway, the idea of passing arguments to generators is pretty cool,
here is some code I have, adapted from Armin's presentation at the
ACCU conference:

from py.magic import greenlet

def yield_(*args):
return greenlet.getcurrent().parent.switch(*args)

def send(key):
return process_commands.switch(key)

@greenlet
def process_commands():
while True:
line = ''
while not line.endswith('\n'):
line += yield_()
print line,
if line == 'quit\n':
print "are you sure?"
if yield_() == 'y':
break

process_commands.switch() # start the greenlet

send("h")
send("e")
send("l")
send("l")
send("o")
send("\n")

send("q")
send("u")
send("i")
send("t")
send("\n")
  

Michele Simionato
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com