date:20051003

Re: [Python-Dev] Tests and unicode

2005-10-03 Thread Reinhold Birkenfeld

Martin v. Löwis wrote:
> Reinhold Birkenfeld wrote:
>> One problem is that no Unicode escapes can be used since compiling
>> the file raises ValueErrors for them. Such strings would have to
>> be produced using unichr().
> 
> You mean, in Unicode literals? There are various approaches, depending
> on context:
> - you could encode the literals as UTF-8, and decode it when the
>   module/test case is imported. See test_support.TESTFN_UNICODE
>   for an example.
> - you could use unichr
> - you could use eval, see test_re for an example

Okay. I can fix this, but several library modules must be fixed too (mostly
simple fixes), e.g. pickletools, gettext, doctest or encodings.

>> Is this the right way? Or is disabling Unicode not supported any more?
> 
> There are certainly tests that cannot be executed when Unicode is not
> available. It would be good if such tests get skipped instead of being
> failing, and it would be good if all tests that do not require Unicode
> support run even when Unicode support is missing.

That's my approach too.

> Whether "it is supported" is a tricky question: your message indicates
> that, right now, it is *not* supported (or else you wouldn't have
> noticed a problem).

Well, the core builds without Unicode, and any code that doesn't use unicode
should run fine too. But the tests fail at the moment.

> Whether we think it should be supported depends
> on who "we" is, as with all these minor features: some think it is
> a waste of time, some think it should be supported if reasonably
> possible, and some think this a conditio sine qua non. It certainly
> isn't a release-critical feature.

Correct. I'll see if I have the time.

Reinhold

-- 
Mail address is perfectly valid!

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Michael Hudson

Martin Blais <[EMAIL PROTECTED]> writes:

> What if we could completely disable the implicit conversions between
> unicode and str?  In other words, if you would ALWAYS be forced to
> call either .encode() or .decode() to convert between one and the
> other... wouldn't that help a lot deal with that issue?

I don't know.  I've made one or two apps safe against this and it's
mostly just annoying.

> How hard would that be to implement?

import sys
reload(sys)
sys.setdefaultencoding('undefined')

> Would it break a lot of code?  Would some people want that?  (I know
> I would, at least for some of my code.)  It seems to me that this
> would make the code more explicit and force the programmer to become
> more aware of those conversions.  Any opinions welcome.

I'm not sure it's a sensible default.

Cheers,
mwh

-- 
  It is never worth a first class man's time to express a majority
  opinion.  By definition, there are plenty of others to do that.
-- G. H. Hardy
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread M.-A. Lemburg

Michael Hudson wrote:
> Martin Blais <[EMAIL PROTECTED]> writes:
> 
> 
>>What if we could completely disable the implicit conversions between
>>unicode and str?  In other words, if you would ALWAYS be forced to
>>call either .encode() or .decode() to convert between one and the
>>other... wouldn't that help a lot deal with that issue?
> 
> 
> I don't know.  I've made one or two apps safe against this and it's
> mostly just annoying.
>
>>How hard would that be to implement?
> 
> import sys
> reload(sys)
> sys.setdefaultencoding('undefined')

You shouldn't post tricks like these :-)

The correct way to change the default encoding is by
providing a sitecustomize.py module which then call the
sys.setdefaultencoding("undefined").

Note that the codec "undefined" was added for just this
reason.

>>Would it break a lot of code?  Would some people want that?  (I know
>>I would, at least for some of my code.)  It seems to me that this
>>would make the code more explicit and force the programmer to become
>>more aware of those conversions.  Any opinions welcome.
> 
> I'm not sure it's a sensible default.

Me neither, especially since this would make it impossible
to write polymorphic code - e.g. ', '.join(list) wouldn't
work anymore if list contains Unicode; dito for u', '.join(list)
with list containing a string.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 30 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] --disable-unicode (Tests and unicode)

2005-10-03 Thread M.-A. Lemburg

Reinhold Birkenfeld wrote:
> Martin v. Löwis wrote:
>>>Whether we think it should be supported depends
>>on who "we" is, as with all these minor features: some think it is
>>a waste of time, some think it should be supported if reasonably
>>possible, and some think this a conditio sine qua non. It certainly
>>isn't a release-critical feature.
> 
> Correct. I'll see if I have the time.

Is the added complexity needed to support not having Unicode support
compiled into Python really worth it ?

I know that Martin introduced this feature a long time ago,
so he will have had a reason for it.

Today, I think the situation has changed: computers have more
memory, are faster and the need to integrate (e.g. via XML)
is stronger than ever - and maybe we should consider removing
the option to get a cleaner code base with fewer #ifdefs
and SyntaxErrors from the standard lib.

What do you think ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 30 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Antoine Pitrou

Le lundi 03 octobre 2005 à 02:09 -0400, Martin Blais a écrit :
> 
> What if we could completely disable the implicit conversions between
> unicode and str?

This would be very annoying when dealing with some modules or libraries
where the type (str / unicode) returned by a function depends on the
context, build, or platform.

A good rule of thumb is to convert to unicode everything that is
semantically textual, and to only use str for what is to be semantically
treated as a string of bytes (network packets, identifiers...). This is
also, AFAIU, the semantic model which is favoured for a hypothetical
future version of Python.

This is what I'm using to do safe conversion to a given type without
worrying about the type of the argument:


DEFAULT_CHARSET = 'utf-8'

def safe_unicode(s, charset=None):
"""
Forced conversion of a string to unicode, does nothing
if the argument is already an unicode object.
This function is useful because the .decode method
on an unicode object, instead of being a no-op, tries to
do a double conversion back and forth (which often fails
because 'ascii' is the default codec).
"""
if isinstance(s, str):
return s.decode(charset or DEFAULT_CHARSET)
else:
return s

def safe_str(s, charset=None):
"""
Forced conversion of an unicode to string, does nothing
if the argument is already a plain str object.
This function is useful because the .encode method
on an str object, instead of being a no-op, tries to
do a double conversion back and forth (which often fails
because 'ascii' is the default codec).
"""
if isinstance(s, unicode):
return s.encode(charset or DEFAULT_CHARSET)
else:
return s


Good luck

Antoine.



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh

Antoine Pitrou wrote:

> A good rule of thumb is to convert to unicode everything that is
> semantically textual

and isn't pure ASCII.

(anyone who are tempted to argue otherwise should benchmark their
applications, both speed- and memorywise, and be prepared to come
up with very strong arguments for why python programs shouldn't be
allowed to be fast and memory-efficient whenever they can...)

 



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Antoine Pitrou

Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit :
> Antoine Pitrou wrote:
> 
> > A good rule of thumb is to convert to unicode everything that is
> > semantically textual
> 
> and isn't pure ASCII.

How can you be sure that something that is /semantically textual/ will
always remain "pure ASCII" ? That's contradictory, unless your software
never goes out of the anglo-saxon world (and even...).

> (anyone who are tempted to argue otherwise should benchmark their
> applications, both speed- and memorywise, and be prepared to come
> up with very strong arguments for why python programs shouldn't be
> allowed to be fast and memory-efficient whenever they can...)

I think most applications don't critically depend on text processing
performance. OTOH, international adaptability is the kind of thing
that /will/ bite you one day if you don't prepare for it at the
beginning.

Also, if necessary, the distinction could be an implementation detail
and the conversion be transparent (like int vs. long): the text would be
coded in an 8-bit charset as long as possible and converted to a wide
encoding only when necessary. The important thing is that these
optimisations, if they are necessary, should be transparently handled by
the Python runtime.

(it seems to me - I may be mistaken - that modern Windows versions treat
every string as 16-bit unicode internally. Why are they doing it if it
is that inefficient?)

Regards

Antoine.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Martin Blais

On 10/3/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> >
> > I'm not sure it's a sensible default.
>
> Me neither, especially since this would make it impossible
> to write polymorphic code - e.g. ', '.join(list) wouldn't
> work anymore if list contains Unicode; dito for u', '.join(list)
> with list containing a string.

Sounds like what you want is exactly what I want to avoid (for those
two types anyway).

cheers,
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no moreimplicit conversions).

2005-10-03 Thread Fredrik Lundh

Antoine Pitrou wrote:

> > > A good rule of thumb is to convert to unicode everything that is
> > > semantically textual
> >
> > and isn't pure ASCII.
>
> How can you be sure that something that is /semantically textual/ will
> always remain "pure ASCII" ?

"is" != "will always remain"

 



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Jim Fulton

Martin Blais wrote:
> Hi.
> 
> Like a lot of people (or so I hear in the blogosphere...), I've been
> experiencing some friction in my code with unicode conversion
> problems.  Even when being super extra careful with the types of str's
> or unicode objects that my variables can contain, there is always some
> case or oversight where something unexpected happens which results in
> a conversion which triggers a decode error.  str.join() of a list of
> strs, where one unicode object appears unexpectedly, and voila!
> exception galore.  Sometimes the problem shows up late because your
> test code doesn't always contain accented characters.  I'm sure many
> of you experienced that or some variant at some point.
> 
> I came to realize recently that this problem shares strong similarity
> with the problem of implicit type conversions in C++, or at least it
> feels the same:  Stuff just happens implicitly, and it's hard to track
> down where and when it happens by just looking at the code.  Part of
> the problem is that the unicode object acts a lot like a str, which is
> convenient, but...

I agree.  I think it was a mistake to implicitly convert mixed string
expressions to unicode.


> What if we could completely disable the implicit conversions between
> unicode and str?  In other words, if you would ALWAYS be forced to
> call either .encode() or .decode() to convert between one and the
> other... wouldn't that help a lot deal with that issue?

Perhaps.

> How hard would that be to implement? 

Not hard. We considered doing it for Zope 3, but ...

 > Would it break a lot of code?

Yes.

> Would some people want that? 

No, I wouldn't want lots of code to break. ;)

 > (I know I would, at least for some of my
> code.)  It seems to me that this would make the code more explicit and
> force the programmer to become more aware of those conversions.  Any
> opinions welcome.

I think it's too late to change this.  I wish it had been done
differently.  (OTOH, I'm very happy we have Unicode support, so
I'm not really complaining. :)

I'll note that this hasn't been that much of a problem for us in Zope.
We follow the strategy:

Antoine Pitrou wrote:
...
 > A good rule of thumb is to convert to unicode everything that is
 > semantically textual, and to only use str for what is to be semantically
 > treated as a string of bytes (network packets, identifiers...). This is
 > also, AFAIU, the semantic model which is favoured for a hypothetical
 > future version of Python.

This approach has worked pretty well for us.  Still, when there is a problem,
it's a real pain to debug because the error occurs too late, as you point
out.

Jim

-- 
Jim Fulton   mailto:[EMAIL PROTECTED]   Python Powered!
CTO  (540) 361-1714http://www.python.org
Zope Corporation http://www.zope.com   http://www.zope.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Jim Fulton

M.-A. Lemburg wrote:
> Michael Hudson wrote:
> 
>>Martin Blais <[EMAIL PROTECTED]> writes:
>>
>>
>>
>>>What if we could completely disable the implicit conversions between
>>>unicode and str?  In other words, if you would ALWAYS be forced to
>>>call either .encode() or .decode() to convert between one and the
>>>other... wouldn't that help a lot deal with that issue?
>>
>>
>>I don't know.  I've made one or two apps safe against this and it's
>>mostly just annoying.
>>
>>
>>>How hard would that be to implement?
>>
>>import sys
>>reload(sys)
>>sys.setdefaultencoding('undefined')
> 
> 
> You shouldn't post tricks like these :-)
> 
> The correct way to change the default encoding is by
> providing a sitecustomize.py module which then call the
> sys.setdefaultencoding("undefined").

This is a much more evil trick IMO, as it affects all Python code,
rather than a single program.

I would argue that it's evil to change the default encoding
in the first place, except in this case to disable implicit
encoding or decoding.

Jim

-- 
Jim Fulton   mailto:[EMAIL PROTECTED]   Python Powered!
CTO  (540) 361-1714http://www.python.org
Zope Corporation http://www.zope.com   http://www.zope.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh

Jim Fulton wrote:

> I would argue that it's evil to change the default encoding
> in the first place, except in this case to disable implicit
> encoding or decoding.

absolutely.  unfortunately, all attempts to add such information to the
sys module documentation seem to have failed...

(last time I tried, I seem to remember that someone argued that "it's
there, so it should be documented in a neutral fashion")

 



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Proposal for 2.5: Returning values from PEP 342 enhanced generators

2005-10-03 Thread Piet Delport

PEP 255 ("Simple Generators") closes with:

> Q. Then why not allow an expression on "return" too?
>
> A. Perhaps we will someday.  In Icon, "return expr" means both "I'm
>done", and "but I have one final useful value to return too, and
>this is it".  At the start, and in the absence of compelling uses
>for "return expr", it's simply cleaner to use "yield" exclusively
>for delivering values.

Now that Python 2.5 gained enhanced generators (multitudes rejoice!), i think
there is a compelling use for valued return statements in cooperative
multitasking code, of the kind:

def foo():
Data = yield Client.read()
[...]
MoreData = yield Client.read()
[...]
return FinalResult

def bar():
Result = yield foo()

For generators written in this style, "yield" means "suspend execution of the
current call until the requested result/resource can be provided", and
"return" regains its full conventional meaning of "terminate the current call
with a given result".

The simplest / most straightforward implementation would be for "return Foo"
to translate to "raise StopIteration, Foo". This is consistent with "return"
translating to "raise StopIteration", and does not break any existing
generator code.

(Another way to think about this change is that if a plain StopIteration means
"the iterator terminated", then a valued StopIteration, by extension, means
"the iterator terminated with the given value".)

Motivation by real-world example:

One system that could benefit from this change is Christopher Armstrong's
defgen.py[1] for Twisted, which he recently reincarnated (as newdefgen.py) to
use enhanced generators. The resulting code is much cleaner than before, and
closer to the conventional synchronous style of writing.

[1] the saga of which is summarized here:
http://radix.twistedmatrix.com/archives/000114.html

However, because enhanced generators have no way to differentiate their
intermediate results from their "real" result, the current solution is a
somewhat confusing compromise: the last value yielded by the generator
implicitly becomes the result returned by the call. Thus, to return
something, in general, requires the idiom "yield Foo; return". If valued
returns are allowed, this would become "return Foo" (and the code implementing
defgen itself would probably end up simpler, as well).
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Josiah Carlson

Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> 
> Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit :
> > Antoine Pitrou wrote:
> > 
> > > A good rule of thumb is to convert to unicode everything that is
> > > semantically textual
> > 
> > and isn't pure ASCII.
> 
> How can you be sure that something that is /semantically textual/ will
> always remain "pure ASCII" ? That's contradictory, unless your software
> never goes out of the anglo-saxon world (and even...).

Non-unicode text input widgets.  Works great.  Can be had with the ANSI
wxPython installation.

> (it seems to me - I may be mistaken - that modern Windows versions treat
> every string as 16-bit unicode internally. Why are they doing it if it
> is that inefficient?)

Because modern Windows supports all sorts of symbols which are necessary
for certain special English uses (greek symbols for math, etc.), and
trying to have all of them without just using the unicode backend that
is used for all of the international "builds" (isn't it just a language
definition?) anyways, would be a waste of time/effort.

 - Josiah

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 343 and with

2005-10-03 Thread Jason Orendorff

I'm -1 on PEP 343.  It seems ...complex.  And even with all the
complexity, I *still* won't be able to type

with self.lock: ...

which I submit is perfectly reasonable, clean, and clear.  Instead I
have to type

with locking(self.lock): ...

where locking() is apparently either a new builtin, a standard library
function, or some 6-line contextmanager I have to write myself.

So I have two suggestions.

1.  I didn't find any suggestion of a __with__() method in the
archives.  So I feel I should suggest it.  It would work just like
__iter__().

class RLock:
@contextmanager
def __with__(self):
self.acquire()
try:
yield
finally:
self.release()

__with__() always returns a new context manager object.  Just as with
iterators, a context manager object has "cm.__with__() is cm".

The 'with' statement would call __with__(), of course.

Optionally, the type constructor could magically apply @contextmanager
to __with__() if it's a generator, which is the usual case.  It looks
like it already does similar magic with __new__().  Perhaps this is
too cute though.

2.  More radical:  Let's get rid of __enter__() and __exit__().  The
only example in PEP 343 that uses them is Example 4, which exists only
to show that "there's more than one way to do it". It all seems fishy
to me.  Why not get rid of them and use only __with__()?  In this
scenario, Python would expect __with__() to return a coroutine (not to
say "iterator") that yields exactly once.

Then the "@contextmanager" decorator wouldn't be needed on __with__(),
and neither would any type constructor magic.

The only drawback I see is that context manager methods implemented in
C will work differently from those implemented in Python.  Since C
doesn't have coroutines, I imagine there would have to be enter() and
exit() slots.  Maybe this is a major design concern; I don't know.

My apologies if this is redundant or unwelcome at this date.

-j
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh

Josiah Carlson wrote:

> > > and isn't pure ASCII.
> >
> > How can you be sure that something that is /semantically textual/ will
> > always remain "pure ASCII" ? That's contradictory, unless your software
> > never goes out of the anglo-saxon world (and even...).
>
> Non-unicode text input widgets.  Works great.  Can be had with the ANSI
> wxPython installation.

You're both missing that Python is dynamically typed.  A single string source
doesn't have to return the same type of strings, as long as the objects it 
returns
are compatible with Python's string model and with each other.

Under the default encoding (and quite a few other encodings), that's true for
plain ascii strings and Unicode strings.  This is a good thing.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 343 and with

2005-10-03 Thread Phillip J. Eby

At 12:37 PM 10/3/2005 -0400, Jason Orendorff wrote:
>I'm -1 on PEP 343.  It seems ...complex.  And even with all the
>complexity, I *still* won't be able to type
>
> with self.lock: ...
>
>which I submit is perfectly reasonable, clean, and clear.

Which is why it's proposed to add __enter__/__exit__ to locks, and somewhat 
more controversially, file objects.  (Guido objected on the basis that 
people might reuse the file object, but reusing a closed file object 
results in a sensible error message and so doesn't seem like a problem to me.)

>[snip]
>__with__() always returns a new context manager object.  Just as with
>iterators, a context manager object has "cm.__with__() is cm".
>
>The 'with' statement would call __with__(), of course.

You didn't offer any reasons why this would be useful and/or good.

>2.  More radical:  Let's get rid of __enter__() and __exit__().  The
>only example in PEP 343 that uses them is Example 4, which exists only
>to show that "there's more than one way to do it". It all seems fishy
>to me.  Why not get rid of them and use only __with__()?  In this
>scenario, Python would expect __with__() to return a coroutine (not to
>say "iterator") that yields exactly once.

Because this multiplies the difficulty of implementing context managers in 
C.  It's easy to define a pair of C methods for __enter__ and __exit__, but 
an iterator requires creating another class in C.  The yield-based syntax 
is just syntax sugar, not the essence of the proposal.

>The only drawback I see is that context manager methods implemented in
>C will work differently from those implemented in Python.  Since C
>doesn't have coroutines, I imagine there would have to be enter() and
>exit() slots.  Maybe this is a major design concern; I don't know.

Considering your argument that locks should be contextmanagers, it would 
seem like a good idea for C implementations to be easy.  :)

>My apologies if this is redundant or unwelcome at this date.

Since the PEP is accepted and has patches for both its implementation and a 
good part of its documentation, a major change like this would certainly 
need a better rationale.  If your idea was that __with__ would somehow make 
it easier for locks to be context managers, it's based on a flawed 
premise.  All that's required now is to have __enter__ and __exit__ call 
acquire() and release().  At this point, it's simply an open issue as to 
which stdlib objects will be context managers, and which will have helper 
functions or classes to serve as context managers.  The actual API used to 
implement them has little or no bearing on that issue.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Antoine Pitrou


Hi,

Josiah:
> > How can you be sure that something that is /semantically textual/ will
> > always remain "pure ASCII" ? That's contradictory, unless your software
> > never goes out of the anglo-saxon world (and even...).
> 
> Non-unicode text input widgets.

You didn't understand my statement.
I didn't mean :
  - how can you /technically enforce/ no unicode text at all but :
  - how can you be sure that your users will never /want/ to enter some
text that can't be represented with the current 8-bit charset?

Of course the answer to the latter is: you can't.


Fredrik:
> Under the default encoding (and quite a few other encodings), that's true for
> plain ascii strings and Unicode strings.

If I have an unicode string containing legal characters greater than
0x7F, and I pass it to a function which converts it to str, the
conversion fails.

If I have an 8-bit string containing legal non-ascii characters in it
(for example the name of a file as returned by the filesystem, which I
of course have no prior control on), and I give it to a function which
does an implicit conversion to unicode, the conversion fails.

Here is an example so that you really understand. I am under a French
locale (iso-8859-15), let's just try to enter a French word and see what
happens when converting to unicode:

-> As a string constant:

>>> s = "été"
>>> s
'\xe9t\xe9'
>>> u = unicode(s)
Traceback (most recent call last):
  File "", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal 
not in range(128)

-> By asking for input:

>>> s = raw_input()
été
>>> s
'\xe9t\xe9'
>>> unicode(s)
Traceback (most recent call last):
  File "", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal 
not in range(128)


It should work, but it fails miserably.

In the current situation, if the programmer doesn't carefully plan for
these cases by manually managing conversions (which of course he can do
- but it's boring and bothersome - not to mention that many programmers
do not even understand the issue!), some users will see the program die
with a nasty exception, just because they happen to need a bit more than
the plain latin alphabet without diacritics.

(even the standard Python library is bitten: witness the weird
getcwd() / getcwdu() pair...)


I find it surprising that you claim there is no difficulty when
everything points to the contrary. See for example how often confused
developers ask for help on mailing-lists...

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 343 and with

2005-10-03 Thread Michael Hudson

"Phillip J. Eby" <[EMAIL PROTECTED]> writes:

> Since the PEP is accepted and has patches for both its implementation and a 
> good part of its documentation, a major change like this would certainly 
> need a better rationale.

Though given the amount of interest said patch has attracted (none at
all) perhaps noone cares very much and the proposal should be dropped.
Which would be a shame given the time I spent on it and all the hot
air here on python-dev...

Cheers,
mwh
(who still likes PEP 343 and doesn't particularly like Jason's
suggested changes).

-- 
  Gevalia is undrinkable low-octane see-through only slightly
  roasted bilge water. Compared to .us coffee it is quite
  drinkable.  -- Måns Nilsson, asr
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 343 and with

2005-10-03 Thread Guido van Rossum

For the record, I very much want PEPs 342 and 343 implemented. I
haven't had the time to look at the patch and don't expect to find the
time any time soon, but it's not for lack of desire to see this
feature implemented.

I don't like Jason's __with__ proposal and even less like his idea to
drop __enter__ and __exit__ (I think this would just make it harder to
provide efficient implementations in C).

I'm all for adding __enter__ and __exit__ to locks.

I'm even considering that it might be a good idea to add them to files.

For the record, here at Elemental we write a lot of Java code that
uses database connections in a pattern that would have greatly
benefited from a similar construct in Java. :)

--Guido

On 10/3/05, Michael Hudson <[EMAIL PROTECTED]> wrote:
> "Phillip J. Eby" <[EMAIL PROTECTED]> writes:
>
> > Since the PEP is accepted and has patches for both its implementation and a
> > good part of its documentation, a major change like this would certainly
> > need a better rationale.
>
> Though given the amount of interest said patch has attracted (none at
> all) perhaps noone cares very much and the proposal should be dropped.
> Which would be a shame given the time I spent on it and all the hot
> air here on python-dev...
>
> Cheers,
> mwh
> (who still likes PEP 343 and doesn't particularly like Jason's
> suggested changes).
>
> --
>   Gevalia is undrinkable low-octane see-through only slightly
>   roasted bilge water. Compared to .us coffee it is quite
>   drinkable.  -- Måns Nilsson, asr
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Fredrik Lundh

Antoine Pitrou wrote:

> > Under the default encoding (and quite a few other encodings), that's true 
> > for
> > plain ascii strings and Unicode strings.
>
> If I have an unicode string containing legal characters greater than
> 0x7F, and I pass it to a function which converts it to str, the
> conversion fails.

so?  if it does that, it's not unicode safe.  what's that has to do with
my argument (which is that you can safely mix ascii strings and unicode
strings, because that's how things were designed).

> Here is an example so that you really understand.

I wrote the unicode type.  I do understand how it works.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 343 and with

2005-10-03 Thread Phillip J. Eby

At 07:02 PM 10/3/2005 +0100, Michael Hudson wrote:
>"Phillip J. Eby" <[EMAIL PROTECTED]> writes:
>
> > Since the PEP is accepted and has patches for both its implementation 
> and a
> > good part of its documentation, a major change like this would certainly
> > need a better rationale.
>
>Though given the amount of interest said patch has attracted (none at
>all)

Actually, I have been reading the patch and meant to comment on it.  I was 
perplexed by the odd stack behavior of the new opcode until I realized that 
it's try/finally that's weird.  :)  I was planning to look into whether 
that could be cleaned up as well, when I got distracted and didn't go back 
to it.

>  perhaps noone cares very much and the proposal should be dropped.

I care an awful lot, as 'with' is another framework-dissolving tool that 
makes it possible to do more things in library form, without needing to 
resort to template methods.  It also enables more context-sensitive 
programming, in that "global" states can be set and restored in a 
structured fashion.  It may take a while to feel the effects, but it's 
going to be a big improvement to Python, maybe as big as new-style classes, 
and certainly bigger than decorators.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Antoine Pitrou


Hi,

Le lundi 03 octobre 2005 à 20:37 +0200, Fredrik Lundh a écrit :
> > If I have an unicode string containing legal characters greater than
> > 0x7F, and I pass it to a function which converts it to str, the
> > conversion fails.
> 
> so?  if it does that, it's not unicode safe.  
[...]
> what's that has to do with
> my argument (which is that you can safely mix ascii strings and unicode
> strings, because that's how things were designed).

If that's how things were designed, then Python's entire standard
library (not to mention third-party libraries) is not "unicode safe" -
to quote your own words - since many functions may return 8-bit strings
containing non-ascii characters.

There lies the problem for many people, until the stdlib is fixed - or
until the string types are changed. That's why you very regularly see
people complaining about how conversions sometimes break their code in
various ways.

Anyway, I don't think we will reach an agreement here. We have different
expectations w.r.t. to how the programming language may/should handle
general text. I propose we end the discussion.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Fredrik Lundh

Antoine Pitrou wrote:

> > > If I have an unicode string containing legal characters greater than
> > > 0x7F, and I pass it to a function which converts it to str, the
> > > conversion fails.
> >
> > so?  if it does that, it's not unicode safe.
> [...]
> > what's that has to do with
> > my argument (which is that you can safely mix ascii strings and unicode
> > strings, because that's how things were designed).
>
> If that's how things were designed, then Python's entire standard
> brary (not to mention third-party libraries) is not "unicode safe" -
> to quote your own words - since many functions may return 8-bit strings
> containing non-ascii characters.

huh?  first you talk about functions that convert unicode strings to 8-bit
strings, now you talk about functions that return raw 8-bit strings?  and
all this in response to a post that argues that it's in fact a good idea to
use plain strings to hold textual data that happens to contain ASCII only,
because 1) it works, by design, and 2) it's almost always more efficient.

if you don't know what your own argument is, you cannot expect anyone
to understand it.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] --disable-unicode (Tests and unicode)

2005-10-03 Thread Martin v. Löwis

M.-A. Lemburg wrote:
> Is the added complexity needed to support not having Unicode support
> compiled into Python really worth it ?

If there are volunteers willing to maintain it, and the other volunteers
are not affected: certainly.

> I know that Martin introduced this feature a long time ago,
> so he will have had a reason for it.

I added it because users requested it. I personally never use it.

> Today, I think the situation has changed: computers have more
> memory, are faster and the need to integrate (e.g. via XML)
> is stronger than ever - and maybe we should consider removing
> the option to get a cleaner code base with fewer #ifdefs
> and SyntaxErrors from the standard lib.
> 
> What do you think ?

-0 for just ripping it out. +0 if PEP 5 is followed, atleast
in spirit (i.e. give users advance warning to let them protest).

I guess users in embedded builds (either in embedded systems,
or embedding Python into some other application) might still
be interested in the feature. Of course, these users could either
recreate the feature if we remove it, or just stay with
Python 2.4.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Antoine Pitrou


> > If that's how things were designed, then Python's entire standard
> > brary (not to mention third-party libraries) is not "unicode safe" -
> > to quote your own words - since many functions may return 8-bit strings
> > containing non-ascii characters.
> 
> huh?  first you talk about functions that convert unicode strings to 8-bit
> strings, now you talk about functions that return raw 8-bit strings?

Are you deliberately missing the argument?
And can't you understand that conversions are problematic in both
directions (str -> unicode /and/ unicode -> str)?

If an stdlib function returns an 8-bit string containing non-ascii data,
then this string used in unicode context incurs an implicit conversion,
which fails. How's that for "unicode safety" of stdlib functions? Will
you argue that this gives no difficulties to anyone ?


> all this in response to a post that argues that it's in fact a good idea to
> use plain strings to hold textual data that happens to contain ASCII only,

To which you apparently didn't read my answer, that is:
you can never be sure that a variable containing something which
is /semantically/ textual (*) will never contain anything other than
ASCII text. For example raw_input() won't tell you that its 8-bit string
result contains some chars > 0x7F. Same for many other library
functions. How do you cope with (more or less occasional) non-ascii data
coming in as 8-bit strings?

(*) that is, contains some natural language

Either you carefully plan for non-ascii text coming in your application
(including workarounds against Python's ascii-by-default conversion
policy), or you deliberately cripple your application by deciding that
non-ASCII text is forbidden in (some or all) places. Choose the latter
and you'll be hostile to users.

And this thread began with a poster who found difficult the way implicit
conversions happen in Python. So it's very funny that you deny the
existence of a problem for certain developers.


Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] --disable-unicode (Tests and unicode)

2005-10-03 Thread M.-A. Lemburg

Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
> 
>>Is the added complexity needed to support not having Unicode support
>>compiled into Python really worth it ?
> 
> If there are volunteers willing to maintain it, and the other volunteers
> are not affected: certainly.

No objections there. I only see that --disable-unicode
has already been broken a couple of times in the past
and no-one (except those running test suites regularly)
really noticed - at least not AFAIK.

>>I know that Martin introduced this feature a long time ago,
>>so he will have had a reason for it.
> 
> I added it because users requested it. I personally never use it.
> 
>>Today, I think the situation has changed: computers have more
>>memory, are faster and the need to integrate (e.g. via XML)
>>is stronger than ever - and maybe we should consider removing
>>the option to get a cleaner code base with fewer #ifdefs
>>and SyntaxErrors from the standard lib.
>>
>>What do you think ?
> 
> -0 for just ripping it out. +0 if PEP 5 is followed, atleast
> in spirit (i.e. give users advance warning to let them protest).
> 
> I guess users in embedded builds (either in embedded systems,
> or embedding Python into some other application) might still
> be interested in the feature. Of course, these users could either
> recreate the feature if we remove it, or just stay with
> Python 2.4.

If embedded build users rely on it, I'd suggest that these
users take over maintenance of the patch set.

Let's add a note to the configure switch that the feature will
be removed in 2.6 and see what happens.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 30 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Phillip J. Eby

At 10:38 PM 10/3/2005 +0200, Antoine Pitrou wrote:
>To which you apparently didn't read my answer, that is:
>you can never be sure that a variable containing something which
>is /semantically/ textual (*) will never contain anything other than
>ASCII text. For example raw_input() won't tell you that its 8-bit string
>result contains some chars > 0x7F. Same for many other library
>functions. How do you cope with (more or less occasional) non-ascii data
>coming in as 8-bit strings?

Presumably in Python 3.0, opening a file in "text" mode will require an 
encoding to be specified, and opening it in "binary" mode will cause it to 
produce or consume byte arrays, not strings.  This should apply to sockets 
too, and really any I/O facility, including GUI frameworks, DBAPI objects, 
os.listdir(), etc.

Of course, to get there we really need to add a convenient bytes type, 
perhaps by enhancing the current 'array' module.  It'd be nice to have a 
way to get this in 2.x versions so people can start fixing stuff to work 
the right way.  With no 8-bit strings coming in, there should be no 
unicode/str problems except those you create yourself.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type

2005-10-03 Thread Antoine Pitrou


> Presumably in Python 3.0, opening a file in "text" mode will require an 
> encoding to be specified, and opening it in "binary" mode will cause it to 
> produce or consume byte arrays, not strings.  This should apply to sockets 
> too, and really any I/O facility, including GUI frameworks, DBAPI objects, 
> os.listdir(), etc.

Great :)

> Of course, to get there we really need to add a convenient bytes type, 
> perhaps by enhancing the current 'array' module.  It'd be nice to have a 
> way to get this in 2.x versions so people can start fixing stuff to work 
> the right way.

Could the "bytes" type be just the same as the current "str" type but
without the implicit unicode conversion ? Or am I missing some desired
functionality ?



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type

2005-10-03 Thread Guido van Rossum

On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> Could the "bytes" type be just the same as the current "str" type but
> without the implicit unicode conversion ? Or am I missing some desired
> functionality ?

No. It will be a mutable array of bytes. It will intentionally
resemble strings as little as possible. There won't be a literal for
it.

But you will be able to convert between bytes and strings quite easily
by specifying an encoding.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 343 and with

2005-10-03 Thread Jason Orendorff

Phillip J. Eby writes:
> You didn't offer any reasons why this would be useful and/or good.

It makes it dramatically easier to write Python classes that correctly
support 'with'.  I don't see any simple way to do this under PEP 343;
the only sane thing to do is write a separate @contextmanager
generator, as all of the examples do.

Consider:

# decimal.py
class Context:
...
def __enter__(self):
???
def __exit__(self, t, v, tb):
???

DefaultContext = Context(...)

Kindly implement __enter__() and __exit__().  Make sure your
implementation is thread-safe (not easy, even though
decimal.getcontext/.setcontext are thread-safe!).  Also make sure it
supports nested 'with DefaultContext:' blocks (I don't mean lexically
nested, of course; I mean nested at runtime.)

The answer requires thread-local storage and a separate stack of saved
context objects per thread.  It seems a little ridiculous to me.

Whereas:

class Context:
...
def __with__(self):
old = decimal.getcontext()
decimal.setcontext(self)
try:
yield
finally:
decimal.setcontext(old)

As for the second proposal, I was thinking we'd have one mental model
for context managers (block template generators), rather than two
(generators vs. enter/exit methods).  Enter/exit seemed superfluous,
given the examples in the PEP.

> [T]his multiplies the difficulty of implementing context managers in C.

Nonsense.

static PyObject *
lock_with()
{
return PyContextManager_FromCFunctions(self, lock_acquire,
lock_release);
}

There probably ought to be such an API even if my suggestion is in
fact garbage (as, admittedly, still seems the most likely thing).

Cheers,
-j
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Martin v. Löwis

Antoine Pitrou wrote:
> To which you apparently didn't read my answer, that is:
> you can never be sure that a variable containing something which
> is /semantically/ textual (*) will never contain anything other than
> ASCII text.

That is simply not true. There are variables that is semantically
textual, yet I can be sure that this is a byte string only if it
consists just of ASCII.

For example, if you invoke a Tkinter function, it will return a byte
string if the result is purely ASCII, else return a Unicode string.
This is an interface guarantee, hence I can be sure.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Martin Blais

On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
>
> > > If that's how things were designed, then Python's entire standard
> > > brary (not to mention third-party libraries) is not "unicode safe" -
> > > to quote your own words - since many functions may return 8-bit strings
> > > containing non-ascii characters.
> >
> > huh?  first you talk about functions that convert unicode strings to 8-bit
> > strings, now you talk about functions that return raw 8-bit strings?
>
> Are you deliberately missing the argument?
> And can't you understand that conversions are problematic in both
> directions (str -> unicode /and/ unicode -> str)?

Both directions are a problem.

Just a note: it's not so much the conversions that I find problematic,
but rather the implicit nature of the conversions (combined with the
fact that they may fail).  In addition to being difficult to track
down, these implicit conversions may be costing processing time as
well.

cheers,
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 343 and with

2005-10-03 Thread Phillip J. Eby

At 05:15 PM 10/3/2005 -0400, Jason Orendorff wrote:
>Phillip J. Eby writes:
> > You didn't offer any reasons why this would be useful and/or good.
>
>It makes it dramatically easier to write Python classes that correctly
>support 'with'.  I don't see any simple way to do this under PEP 343;
>the only sane thing to do is write a separate @contextmanager
>generator, as all of the examples do.

Wha?  For locks (the example you originally gave), this is trivial.

>Consider:
>
> # decimal.py
> class Context:
> ...
> def __enter__(self):
> ???
> def __exit__(self, t, v, tb):
> ???
>
> DefaultContext = Context(...)
>
>Kindly implement __enter__() and __exit__().  Make sure your
>implementation is thread-safe (not easy, even though
>decimal.getcontext/.setcontext are thread-safe!).  Also make sure it
>supports nested 'with DefaultContext:' blocks (I don't mean lexically
>nested, of course; I mean nested at runtime.)
>
>The answer requires thread-local storage and a separate stack of saved
>context objects per thread.  It seems a little ridiculous to me.

Okay, it was completely non-obvious from your post that this was the 
problem you're trying to solve.

>Whereas:
>
> class Context:
> ...
> def __with__(self):
> old = decimal.getcontext()
> decimal.setcontext(self)
> try:
> yield
> finally:
> decimal.setcontext(old)

This could also be done with a Context.replace() @contextmanager method.

On the whole, I'm torn.  I definitely like the additional flexibility this 
gives.  On the other hand, it seems to me that __with__ and the additional 
C baggage violates the "if the implementation is hard to explain" 
rule.  Also, people have already put a lot of effort into implementation 
and documentation patches based on an accepted PEP.  That's not enough to 
override "the right thing to do", especially if it comes with a volunteer 
willing to update the work, but in this case the amount of additional 
goodness seems small, and it's not immediately apparent that you're 
volunteering to help change this even if Guido blessed it.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread M.-A. Lemburg

Martin Blais wrote:
> On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> 
If that's how things were designed, then Python's entire standard
brary (not to mention third-party libraries) is not "unicode safe" -
to quote your own words - since many functions may return 8-bit strings
containing non-ascii characters.
>>>
>>>huh?  first you talk about functions that convert unicode strings to 8-bit
>>>strings, now you talk about functions that return raw 8-bit strings?
>>
>>Are you deliberately missing the argument?
>>And can't you understand that conversions are problematic in both
>>directions (str -> unicode /and/ unicode -> str)?
> 
> 
> Both directions are a problem.
> 
> Just a note: it's not so much the conversions that I find problematic,
> but rather the implicit nature of the conversions (combined with the
> fact that they may fail).  In addition to being difficult to track
> down, these implicit conversions may be costing processing time as
> well.

We've already pointed you to a solution which you might want
to use. Why don't you just try it ?

BTW, if you want to read up on all the reasons why Unicode
was done the way it was, have a look at:

http://www.python.org/peps/pep-0100.html

and read up in the python-dev archives:

http://mail.python.org/pipermail/python-dev/2000-March/thread.html

and the next months after the initial checkin.

>From what I've read on the web about the Python Unicode
implementation we have one of the better ones compared
to other languages implementations and their choices and
design decisions.

None of them is perfect, but that's seems to be an inherent
problem with Unicode no matter how you try to approach it -
even more so, if you are trying to add it to a language that
has used ordinary C strings for text from day 1.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 30 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type

2005-10-03 Thread Antoine Pitrou

Le lundi 03 octobre 2005 à 14:02 -0700, Guido van Rossum a écrit :
> On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> > Could the "bytes" type be just the same as the current "str" type but
> > without the implicit unicode conversion ? Or am I missing some desired
> > functionality ?
> 
> No. It will be a mutable array of bytes. It will intentionally
> resemble strings as little as possible. There won't be a literal for
> it.

Thinking about it, it may have to offer the search and replace
facilities offered by strings (including regular expressions).

Here is an use case : say I'm reading an HTML file (or receiving it over
the network). Since the character encoding can be specified in the HTML
file itself (in the ...), I must first receive it as a
bytes object. But then I must fetch the encoding information from the
HTML header: therefore I must use some string ops on the bytes object to
parse this information. Only after I have discovered the encoding, can I
finally convert the bytes object to a text string.

Or would there be another way to do it?

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type

2005-10-03 Thread Guido van Rossum

This would presumaby support the (read-only part of the) buffer API so
search would be covered.

I don't see a use case for replace.

Alternatively, you could always specify Latin-1 as the encoding and
convert it that way -- I don't think there's any input that can cause
Latin-1 decoding to fail.

On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> Le lundi 03 octobre 2005 à 14:02 -0700, Guido van Rossum a écrit :
> > On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> > > Could the "bytes" type be just the same as the current "str" type but
> > > without the implicit unicode conversion ? Or am I missing some desired
> > > functionality ?
> >
> > No. It will be a mutable array of bytes. It will intentionally
> > resemble strings as little as possible. There won't be a literal for
> > it.
>
> Thinking about it, it may have to offer the search and replace
> facilities offered by strings (including regular expressions).
>
> Here is an use case : say I'm reading an HTML file (or receiving it over
> the network). Since the character encoding can be specified in the HTML
> file itself (in the ...), I must first receive it as a
> bytes object. But then I must fetch the encoding information from the
> HTML header: therefore I must use some string ops on the bytes object to
> parse this information. Only after I have discovered the encoding, can I
> finally convert the bytes object to a text string.
>
> Or would there be another way to do it?
>
>
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type

2005-10-03 Thread Antoine Pitrou


Le lundi 03 octobre 2005 à 17:42 -0700, Guido van Rossum a écrit :
> I don't see a use case for replace.

Agreed.

> Alternatively, you could always specify Latin-1 as the encoding and
> convert it that way -- I don't think there's any input that can cause
> Latin-1 decoding to fail.

You seem to be right.
« In 1992, the IANA registered the character map ISO-8859-1 (note the
extra hyphen), a superset of ISO/IEC 8859-1, for use on the Internet.
This map assigns control characters to the code values 00-1F, 7F, and
80-9F. It thus provides for 256 characters via every possible 8-bit
value. »
http://en.wikipedia.org/wiki/ISO_8859-1#ISO-8859-1

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 64-bit bytecode compatibility (was Re: [PEAK] ez_setup on 64-bit linux problem)

2005-10-03 Thread Viren Shah

Phillip J. Eby wrote:
> At 09:49 AM 9/29/2005 -0400, Viren Shah wrote:
> 
>> [I sent this earlier without being a subscriber and it was sent to the 
>> moderation queue so I'm resending it after subscribing]
>>
>> Hi,
>>   I'm running a 64-bit Fedora Core 3 with python 2.3.4. I'm trying to 
>> install setuptools to use with Trac, and get the following error:
>>
>>  [EMAIL PROTECTED] ~]$ python ez_setup.py
>> Downloading 
>> http://cheeseshop.python.org/packages/2.3/s/setuptools/setuptools-0.6a4-py2.3.egg
>>  
>>
>> Traceback (most recent call last):
>>   File "ez_setup.py", line 206, in ?
>> main(sys.argv[1:])
>>   File "ez_setup.py", line 141, in main
>> from setuptools.command.easy_install import main
>> OverflowError: signed integer is greater than maximum
>>
>>
>> I get the same type of error if I try installing setuptools manually. 
>> I figure this has to do with the 64-bit nature of the OS and python, 
>> but not being a python person, don't know what a workaround would be.
>>
>> Any ideas?
> 
> 
> Hm.  It sounds like perhaps the 64-bit Python in question isn't able to 
> read bytecode for Python from a 32-bit Python version.  You'll need to 
> download the setuptools source archive from PyPI and install it using 
> "python setup.py install" instead.
> 

[Thanks for the quick response]
I tried downloading and installing setuptools-0.6a4.zip with the same 
type of result:


[EMAIL PROTECTED] setuptools-0.6a4]# python setup.py install
running install
running bdist_egg
running egg_info
writing ./setuptools.egg-info/PKG-INFO
writing top-level names to ./setuptools.egg-info/top_level.txt
writing entry points to ./setuptools.egg-info/entry_points.txt
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
copying pkg_resources.py -> build/lib
copying easy_install.py -> build/lib
creating build/lib/setuptools
copying setuptools/depends.py -> build/lib/setuptools
copying setuptools/archive_util.py -> build/lib/setuptools
copying setuptools/dist.py -> build/lib/setuptools
copying setuptools/__init__.py -> build/lib/setuptools
copying setuptools/extension.py -> build/lib/setuptools
copying setuptools/sandbox.py -> build/lib/setuptools
copying setuptools/package_index.py -> build/lib/setuptools
creating build/lib/setuptools/tests
copying setuptools/tests/doctest.py -> build/lib/setuptools/tests
copying setuptools/tests/__init__.py -> build/lib/setuptools/tests
copying setuptools/tests/test_resources.py -> build/lib/setuptools/tests
creating build/lib/setuptools/command
copying setuptools/command/test.py -> build/lib/setuptools/command
copying setuptools/command/saveopts.py -> build/lib/setuptools/command
copying setuptools/command/easy_install.py -> build/lib/setuptools/command
copying setuptools/command/build_ext.py -> build/lib/setuptools/command
copying setuptools/command/egg_info.py -> build/lib/setuptools/command
copying setuptools/command/install_lib.py -> build/lib/setuptools/command
copying setuptools/command/develop.py -> build/lib/setuptools/command
copying setuptools/command/alias.py -> build/lib/setuptools/command
copying setuptools/command/sdist.py -> build/lib/setuptools/command
copying setuptools/command/bdist_egg.py -> build/lib/setuptools/command
copying setuptools/command/bdist_rpm.py -> build/lib/setuptools/command
copying setuptools/command/rotate.py -> build/lib/setuptools/command
copying setuptools/command/build_py.py -> build/lib/setuptools/command
copying setuptools/command/upload.py -> build/lib/setuptools/command
copying setuptools/command/setopt.py -> build/lib/setuptools/command
copying setuptools/command/__init__.py -> build/lib/setuptools/command
copying setuptools/command/install.py -> build/lib/setuptools/command
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
copying build/lib/pkg_resources.py -> build/bdist.linux-x86_64/egg
copying build/lib/easy_install.py -> build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/setuptools
copying build/lib/setuptools/depends.py -> 
build/bdist.linux-x86_64/egg/setuptools
creating build/bdist.linux-x86_64/egg/setuptools/tests
copying build/lib/setuptools/tests/doctest.py -> 
build/bdist.linux-x86_64/egg/setuptools/tests
copying build/lib/setuptools/tests/__init__.py -> 
build/bdist.linux-x86_64/egg/setuptools/tests
copying build/lib/setuptools/tests/test_resources.py -> 
build/bdist.linux-x86_64/egg/setuptools/tests
copying build/lib/setuptools/archive_util.py -> 
build/bdist.linux-x86_64/egg/setuptools
copying build/lib/setuptools/dist.py -> 
build/bdist.linux-x86_64/egg/setuptools
copying build/lib/setuptools/__init__.py -> 
build/bdist.linux-x86_64/egg/setuptools
copying build/lib/setuptools/extension.py -> 
build/bdist.linux-x86_64/egg/setuptools
copying build/lib/setuptools/sandbox.py -> 
build/bdist.linux-x86_64/egg/setuptools
creating build/bdist.linux-x86_64/egg/setuptools/command
copying build/lib/setuptools/command/tes

Re: [Python-Dev] 64-bit bytecode compatibility (was Re: [PEAK] ez_setup on 64-bit linux problem)

2005-10-03 Thread Viren Shah

Phillip J. Eby wrote:
> At 12:14 PM 9/29/2005 -0400, Viren Shah wrote:
> 
>>   File "/root/svn-install-apps/setuptools-0.6a4/pkg_resources.py", 
>> line 949, in _get
>> return self.loader.get_data(path)
>> OverflowError: signed integer is greater than maximum
> 
> 
> Interesting.  That looks like it might be a bug in the Python zipimport 
> module, which is what implements get_data().  Apparently it happens upon 
> importing as well; I assumed that it was a bytecode incompatibility.
> 
> Checking the revision log, I find that there's a 64-bit fix for 
> zipimport.c in Python 2.4 that looks like it would fix this issue, but 
> it has not been backported to any revision of Python 2.3.  You're going 
> to either have to backport the fix yourself and rebuild Python 2.3, or 
> upgrade to Python 2.4.  Sorry.  :(

Cool! Thanks for the solution. I'll upgrade to python 2.4 and hope it 
works :-)


Thanks for all your help
Viren
-- 
Viren R Shah
Sr. Technical Advisor
Virtual Technology Corporation
[EMAIL PROTECTED]
P: 703-333-6246

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Proposal for 2.5: Returning values from PEP 342 enhanced generators

2005-10-03 Thread Piet Delport

PEP 255 ("Simple Generators") closes with:

> Q. Then why not allow an expression on "return" too?
>
> A. Perhaps we will someday.  In Icon, "return expr" means both "I'm
>done", and "but I have one final useful value to return too, and
>this is it".  At the start, and in the absence of compelling uses
>for "return expr", it's simply cleaner to use "yield" exclusively
>for delivering values.

Now that Python 2.5 gained enhanced generators (multitudes rejoice!), i think
there is a compelling use for valued return statements in cooperative
multitasking code, of the kind:

def foo():
Data = yield Client.read()
[...]
MoreData = yield Client.read()
[...]
return FinalResult

def bar():
Result = yield foo()

For generators written in this style, "yield" means "suspend execution of the
current call until the requested result/resource can be provided", and
"return" regains its full conventional meaning of "terminate the current call
with a given result".

The simplest / most straightforward implementation would be for "return Foo"
to translate to "raise StopIteration, Foo". This is consistent with "return"
translating to "raise StopIteration", and does not break any existing
generator code.

(Another way to think about this change is that if a plain StopIteration means
"the iterator terminated", then a valued StopIteration, by extension, means
"the iterator terminated with the given value".)

Motivation by real-world example:

One system that could benefit from this change is Christopher Armstrong's
defgen.py[1] for Twisted, which he recently reincarnated (as newdefgen.py) to
use enhanced generators. The resulting code is much cleaner than before, and
closer to the conventional synchronous style of writing.

[1] the saga of which is summarized here:
http://radix.twistedmatrix.com/archives/000114.html

However, because enhanced generators have no way to differentiate their
intermediate results from their "real" result, the current solution is a
somewhat confusing compromise: the last value yielded by the generator
implicitly becomes the result returned by the call. Thus, to return
something, in general, requires the idiom "yield Foo; return". If valued
returns are allowed, this would become "return Foo" (and the code implementing
defgen itself would probably end up simpler, as well).
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Unicode charmap decoders slow

2005-10-03 Thread Tony Nelson

Is there a faster way to transcode from 8-bit chars (charmaps) to utf-8
than going through unicode()?

I'm writing a small card-file program. As a test, I use a 53 MB MBox file,
in mac-roman encoding.  My program reads and parses the file into messages
in about 3 to 5 seconds (Wow! Go Python!), but takes about 14 seconds to
iterate over the cards and convert them to utf-8:

for i in xrange(len(cards)):
u = unicode(cards[i], encoding)
cards[i] = u.encode('utf-8')

The time is nearly all in the unicode() call.  It's not so much how much
time it takes, but that it takes 4 times as long as the real work, just to
do table lookups.

Looking at the source (which, if I have it right, is
PyUnicode_DecodeCharmap() in unicodeobject.c), I think it is doing a
dictionary lookup for each character.  I would have thought that it would
make and cache a LUT the size of the charmap (and hook the relevent
dictionary stuff to delete the cached LUT if the dictionary is changed).
(You may consider this a request for enhancement. ;)

I thought of using U"".translate(), but the unicode version is defined to
be slow, and anyway I can't find any way to just shove my 8-bit data into a
unicode string without translation.  Is there some similar approach?  I'm
almost (but not quite) ready to try it in Pyrex.

I'm new to Python.  I didn't google anything relevent on python.org or in
groups.  I posted this in comp.lang.python yesterday, got a couple of
responses, but I think this may be too sophisticated a question for that
group.

I'm not a member of this list, so please copy me on replies so I don't have
to hunt them down in the archive.

TonyN.:'   
  '  
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal for 2.5: Returning values from PEP 342 enhanced generators

2005-10-03 Thread Christopher Armstrong

On 10/4/05, Piet Delport <[EMAIL PROTECTED]> wrote:
> One system that could benefit from this change is Christopher Armstrong's
> defgen.py[1] for Twisted, which he recently reincarnated (as newdefgen.py) to
> use enhanced generators. The resulting code is much cleaner than before, and
> closer to the conventional synchronous style of writing.
>
> [1] the saga of which is summarized here:
> http://radix.twistedmatrix.com/archives/000114.html
>
> However, because enhanced generators have no way to differentiate their
> intermediate results from their "real" result, the current solution is a
> somewhat confusing compromise: the last value yielded by the generator
> implicitly becomes the result returned by the call. Thus, to return
> something, in general, requires the idiom "yield Foo; return". If valued
> returns are allowed, this would become "return Foo" (and the code implementing
> defgen itself would probably end up simpler, as well).

Hey, that would be nice. I've found people confused by the way defgen
handles return values before, getting seemingly meaningless values out
of their defgens (if the defgen didn't specifically yield some
meaningful value at the end).

At first I thought "return foo" in a generator ought to be equivalent
to "yield foo; return", but at least for defgen, it turns out raising
StopIteration(foo) would be better, as I would have a very explicit
way to specify and find the return value of the generator.


--
  Twisted   |  Christopher Armstrong: International Man of Twistery
   Radix|-- http://radix.twistedmatrix.com
|  Release Manager, Twisted Project
  \\\V///   |-- http://twistedmatrix.com
   |o O||
wvw-+
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-03 Thread jepler

As the OP suggests, decoding with a codec like mac-roman or iso8859-1 is very
slow compared to encoding or decoding with utf-8.  Here I'm working with 53k of
data instead of 53 megs.  (Note: this is a laptop, so it's possible that
thermal or battery management features affected these numbers a bit, but by a
factor of 3 at most)

$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "u.encode('utf-8')"
1000 loops, best of 3: 591 usec per loop
$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('utf-8')"
1000 loops, best of 3: 1.25 msec per loop
$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('mac-roman')"
100 loops, best of 3: 13.5 msec per loop
$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('iso8859-1')"
100 loops, best of 3: 13.6 msec per loop

With utf-8 encoding as the baseline, we have
decode('utf-8')  2.1x as long
decode('mac-roman') 22.8x as long
decode('iso8859-1') 23.0x as long

Perhaps this is an area that is ripe for optimization.

Jeff


pgpq6roOfs3n8.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread skip


Antoine> If an stdlib function returns an 8-bit string containing
Antoine> non-ascii data, then this string used in unicode context incurs
Antoine> an implicit conversion, which fails. 

Such strings should be converted to Unicode at the point where they enter
the application.  That's likely the only place where you have a good chance
of knowing the data encoding.  Files generally have no encoding information
associated with them.  Some databases don't handle Unicode transparently.
If you hang onto the input from such devices as plain strings until you need
them as Unicode, you will almost certainly not know how the string was
encoded.  The state of the outside Unicode world being as miserable as it is
(think web input forms), you often don't know the encoding at the interface
and have to guess anyway.  Even so, isolating that guesswork to the
interface is better than recovering somewhere further downstream.

Skip
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread James Y Knight

On Oct 3, 2005, at 3:47 PM, Fredrik Lundh wrote:
> Antoine Pitrou wrote:
>
>
 If I have an unicode string containing legal characters greater  
 than
 0x7F, and I pass it to a function which converts it to str, the
 conversion fails.

>>>
>>> so?  if it does that, it's not unicode safe.
>>>
>> [...]
>>
>>> what's that has to do with
>>> my argument (which is that you can safely mix ascii strings and  
>>> unicode
>>> strings, because that's how things were designed).
>>>
>>
>> If that's how things were designed, then Python's entire standard
>> brary (not to mention third-party libraries) is not "unicode safe" -
>> to quote your own words - since many functions may return 8-bit  
>> strings
>> containing non-ascii characters.
>>
>
> huh?  first you talk about functions that convert unicode strings  
> to 8-bit
> strings, now you talk about functions that return raw 8-bit  
> strings?  and
> all this in response to a post that argues that it's in fact a good  
> idea to
> use plain strings to hold textual data that happens to contain  
> ASCII only,
> because 1) it works, by design, and 2) it's almost always more  
> efficient.
>
> if you don't know what your own argument is, you cannot expect anyone
> to understand it.

Your point would be much easier to stomach if the "str" type could  
*only* hold 7-bit ASCII. Perhaps that can be done when Python gets an  
actual bytes type in 3.0. There indeed are a multitude of uses for  
the efficient storage/processing of ASCII-only data. However,  
currently, there are problems because it's so easy to screw yourself  
without noticing when mixing unicode and str objects. If, on the  
other hand, you have a 7bit ascii string type, and a 16/32-bit  
unicode string type, both can be used interchangeably and there is no  
possibility for any en/de-coding issues. And  
asciiOnlyStringType.encode('utf-8') can become _ultra_ efficient, as  
a bonus. :)

Seems win-win to me.

James

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

46 matches

Mail list logo