Re: [Python-Dev] bytes type discussion

2006-02-15 Thread Martin v. Löwis
Adam Olsen wrote:
> My assumption is these would become errors in 3.x.  bytes(str) is only
> needed so you can do bytes(u"abc".encode('utf-8')) and have it work in
> 2.x and 3.x.

I think the proposal for bytes(seq) to mean bytes(map(ord, seq))
was meant to be valid for both 2.x and 3.x, on the grounds that
you should be able to write byte string constants in the same
way in all versions.

> (I wonder if maybe they should be an error in 2.x as well.  Source
> encoding is for unicode literals, not str literals.)

Source encoding applies to the entire source code, including (byte)
string literals, comments, identifiers, and keywords. IOW, if you
declare your source encoding is utf-8, the keyword "print" must
be represented with the bytes that represent the Unicode letters
for "p","r","i","n", and "t" in UTF-8.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes type discussion

2006-02-15 Thread Martin v. Löwis
Greg Ewing wrote:
> If the protocol has been sensibly designed, that shouldn't
> happen, since everything up to the coding marker should
> be ascii (or some other protocol-defined initial coding).

XML, for one protocol, requires you to restart over. The
initial sequence could be UTF-16, or it could be EBCDIC.
You read a few bytes (up to four), then know which of
these it is. Then you start over, reading further if
it looks like an ASCII superset, to find out the real
encoding. You normally then start over, although switching
at that point could also work.

> For protocols that are not sensibly designed (or if you're
> just trying to guess) what you suggest may be needed. But
> it would be good to have a nicer way of going about it
> for when the protocol is sensible.

There might be buffering of decoded strings already,
(ie. beyond the point to which you have read), so
you would need to unbuffer these, and reinterpret
them. To support that, you really need to buffer
both the original bytes, and the decoded ones, since
the encoding might not roundtrip.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 release schedule

2006-02-15 Thread Martin v. Löwis
Neal Norwitz wrote:
> What do people think about that?  There are still a lot of features we
> want to add.  Is this ok with everyone?  Do you think it's realistic?

My view on schedules is that they need to exist, whether they are
followed or not. So having one is orders of magnitude better than
having none. This specific one "looks right" also.

> We still need a release manager.  No one has heard from Anthony.  If
> he isn't interested is someone else interested in trying their hand at
> it?

He might be on vacation, no need to worry yet. If he doesn't want to
do it, I would.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 PEP

2006-02-15 Thread Alain Poirier
Hi,

2 questions:

  - is (c)ElementTree still planned for inclusion ?
  - isn't the current implementation of itertools.tee (cache of previous
generated values) incompatible with the new possibility to feed a
generator (PEP 342) ?

Regards

Neal Norwitz a écrit :
> Attached is the 2.5 release PEP 356.  It's also available from:
> http://www.python.org/peps/pep-0356.html
>
> Does anyone have any comments?  Is this good or bad?  Feel free to
> send to me comments.
>
> We need to ensure that PEPs 308, 328, and 343 are implemented.  We
> have possible volunteers for 308 and 343, but not 328.  Brett is doing
> 352 and Martin is doing 353.
>
> We also need to resolve a bunch of other implementation details about
> providing the C AST to Python, bdist_* issues and a few more possible
> stdlib modules.  Don't be shy, tell the world what you think about
> these.
>
> Can someone go through PEP 4 and 11 and determine what work needs to be
> done?
>
> The more we distribute the work, the easier it will be on everyone.
> You don't really want to listen to me whine any more do you? ;-)
>
> Thank you,

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bdist_* to stdlib?

2006-02-15 Thread Nick Coghlan
Bob Ippolito wrote:
> ** The exception is scripts.  Scripts go wherever --install-scripts=  
> point to, and AFAIK there is no means to ensure that the scripts from  
> one egg do not interfere with the scripts for another egg or anything  
> else on the PATH.  I'm also not sure what the uninstallation story  
> with scripts is.

Hopefully PEP 338 will go some way towards fixing that - in Python 2.5, the 
'-m' switch should be able to run modules inside eggs as scripts, reducing the 
need to install them directly into the filesystem.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] C AST to Python discussion

2006-02-15 Thread Brett Cannon
As per Neal's prodding email, here is a thread to discuss where we
want to go with the C AST to Python stuff and what I think are the
core issues at the moment.

First issue is the ast-objects branch.  Work is being done on it, but
it still leaks some references (Neal or Martin can correct me if I am
wrong).  We really should choose either this branch or the current
solution before really diving into coding stuff for exposing the AST
so as to not waste too much time.  Basically the issues are that the
current solution will require using a serialization form to go from C
to Python and back again.  The PyObjects solution in the branch won't
need this.  One protects us from ending up with an unusable AST since
the seralization can keep the original AST around and if the version
passed back in from Python code is junk it can be tossed and the
original version used.  The PyObjects branch most likely won't have
this since the actual AST will most likely be passed to Python code. 
But there is performance issues with all of this seralization compared
to a simple Pyobject pointer into Pythonland.  Jeremy supports the
serialization option.  I am personally indifferent while leaning
towards the serialization.

Then there is the API.  First we need to decide if AST modification is
allowed or not.  It has been argued on my blog by someone (see
http://sayspy.blogspot.com/2006/02/possibilities-of-ast.html for the
entry on this whole topic which highly mirrors this email) that Guido
won't okay AST transformations since it can lead to control flow
changes behind the scenes.  I say that is fine as long as knowing that
AST transformations are occurring are sufficiently obvious.  I say
allow transformations.

Once that is settled, I see three places for possible access to the
AST.  One is the command line like -m.  Totally obvious to the user as
long as they are not just working off of the .pyc files.  Next is
something like sys.ast_transformations that is a list of functions
that are passed in the AST (and return a new version if modifications
are allowed).  This could allow chaining of AST transformations by
feeding the next function with another one.  Next is per-object AST
access.  This could get expensive since if we don't keep a copy of the
AST with the code objects (which we probably shouldn't since that is
wasted memory if the AST is not used a lot) we will need to read the
code a second time to get the AST regenerated.

I personally think we should choose an initial global access API to
the AST as a starting API.  I like the sys.ast_transformations idea
since it is simple and gives enough access that whether read-only or
read-write is allowed something like PyChecker can get the access it
needs.  It also allows for simple Python scripts that can install the
desired functions and then compile or check the passed-in files. 
Obviously write accesss would be needed for optimization stuff (such
as if the peepholer was rewritten in Python and used by default), but
we can also expose this later if we want.

In terms of 2.5, I think we really need to settle on the fate of the
ast-objects branch.  If we can get the very basic API for exposing the
AST to Python code in 2.5 that would be great, but I don't view that
as critical as choosing on the final AST implementation style since
wasting work on a version that will disappear would just plain suck. 
It would be great to resolve this before the PyCon sprints since a
good chunk of the AST-caring folk will be there for at least part of
the time.

-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes type discussion

2006-02-15 Thread Adam Olsen
On 2/15/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Adam Olsen wrote:
> > (I wonder if maybe they should be an error in 2.x as well.  Source
> > encoding is for unicode literals, not str literals.)
>
> Source encoding applies to the entire source code, including (byte)
> string literals, comments, identifiers, and keywords. IOW, if you
> declare your source encoding is utf-8, the keyword "print" must
> be represented with the bytes that represent the Unicode letters
> for "p","r","i","n", and "t" in UTF-8.

Although it does apply to the entire source file, I think this is more
for convenience (try telling an editor that only a single line is
Shift_JIS!) than to allow 8-bit (or 16-bit?!) str literals.  Indeed,
you could have arbitrary 8-bit str literals long before the source
encoding was added.  Keywords and identifiers continue to be limited
to ascii characters (even if they make a roundtrip through other
encodings), and comments continue to be ignored.

Source encoding exists so that you can write u"123" with the encoding
stated once at the top of the file, rather than "123".decode('utf-8')
with the encoding repeated everywhere.

Making it an error to have 8-bit str literals in 2.x would help
educate the user that they will change behavior in 3.0 and not be
8-bit str literals anymore.

--
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-15 Thread Stephen J. Turnbull
> "M" == "M.-A. Lemburg" <[EMAIL PROTECTED]> writes:

M> James Y Knight wrote:

>> Nice and simple.

M> Albeit, too simple.

M> The above approach would basically remove the possibility to
M> easily create bytes() from literals in Py3k, since literals in
M> Py3k create Unicode objects, e.g. bytes("123") would not work
M> in Py3k.

No, it just rules out a builtin easy way to create bytes() from
literals.

But who needs to do that?  codec writers and people implementing wire
protocols with bytes() that look like character strings but aren't.
OK, so this makes life hard on codec writers.  But those implementing
wire protocols can use existing codecs, presumably 'ascii' will do 99%
of the time:

def make_wire_token (unicode_string, encoding='ascii'):
return bytes(unicode_string.encode(encoding))

Everybody else is just asking for trouble by using bytes() for
character strings.  It would really be desirable to have "string" be a
Unicode literal in Py3k, and u"string" a syntax error.

M> To prevent [people from learning to write "bytes('string')" in
M> 2.x and expecting that to work in Py3k], you'd have to outrule
M> bytes() construction from strings altogether, which doesn't
M> look like a viable option either.

Why not?  Either bytes() are the same as strings, in which case why
change the name? or they're not, in which case we ask people to jump
through the required hoops to create them.  Maybe I'm missing some
huge use case, of course, but it looks to me like the use cases are
pretty specialized, and are likely to involve explicit coding anyway.

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of TsukubaTennodai 1-1-1 Tsukuba 305-8573 JAPAN
   Ask not how you can "do" free software business;
  ask what your business can "do for" free software.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Please comment on PEP 357 -- adding nb_index slot to PyNumberMethods

2006-02-15 Thread Nick Coghlan
Travis E. Oliphant wrote:
> 3) A new C-API function PyNumber_Index will be added with signature
> 
>Py_ssize_t PyNumber_index (PyObject *obj)
> 

There's a typo in the function name here. Other than that, the PEP looks 
pretty much fine to me.

About the only other quibble is that it could arguably do with a link to the 
thread where we discussed (and discarded) 'discrete' and 'ordinal' as 
alternative names (you mention the discussion, but don't give a reference).

Cheers,
Nick.


-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 release schedule

2006-02-15 Thread Tony Meyer
> We still need a release manager.  No one has heard from Anthony.

It is the peak of the summer down here.  Perhaps he is lucky enough  
to be enjoying it away from computers for a while?

=Tony.Meyer
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-15 Thread Just van Rossum
Guido van Rossum wrote:

> If bytes support the buffer interface, we get another interesting
> issue -- regular expressions over bytes. Brr.

We already have that:

  >>> import re, array
  >>> re.search('\2', array.array('B', [1, 2, 3, 4])).group()
  array('B', [2])
  >>> 

Not sure whether to blame array or re, though...

Just
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C AST to Python discussion

2006-02-15 Thread Greg Ewing
Brett Cannon wrote:
> One protects us from ending up with an unusable AST since
> the seralization can keep the original AST around and if the version
> passed back in from Python code is junk it can be tossed and the
> original version used.

I don't understand why this is an issue. If Python code
produces junk and tries to use it as an AST, then it's
buggy and deserves what it gets. All the AST compiler
should be responsible for is to try not to crash the
interpreter under those conditions. But that's true
whatever method is used for passing ASTs from Python
to the compiler.


   The PyObjects branch most likely won't have
> this since the actual AST will most likely be passed to Python code. 
> But there is performance issues with all of this seralization compared
> to a simple Pyobject pointer into Pythonland.  Jeremy supports the
> serialization option.  I am personally indifferent while leaning
> towards the serialization.
> 
> Then there is the API.  First we need to decide if AST modification is
> allowed or not.  It has been argued on my blog by someone (see
> http://sayspy.blogspot.com/2006/02/possibilities-of-ast.html for the
> entry on this whole topic which highly mirrors this email) that Guido
> won't okay AST transformations since it can lead to control flow
> changes behind the scenes.  I say that is fine as long as knowing that
> AST transformations are occurring are sufficiently obvious.  I say
> allow transformations.
> 
> Once that is settled, I see three places for possible access to the
> AST.  One is the command line like -m.  Totally obvious to the user as
> long as they are not just working off of the .pyc files.  Next is
> something like sys.ast_transformations that is a list of functions
> that are passed in the AST (and return a new version if modifications
> are allowed).  This could allow chaining of AST transformations by
> feeding the next function with another one.  Next is per-object AST
> access.  This could get expensive since if we don't keep a copy of the
> AST with the code objects (which we probably shouldn't since that is
> wasted memory if the AST is not used a lot) we will need to read the
> code a second time to get the AST regenerated.
> 
> I personally think we should choose an initial global access API to
> the AST as a starting API.  I like the sys.ast_transformations idea
> since it is simple and gives enough access that whether read-only or
> read-write is allowed something like PyChecker can get the access it
> needs.  It also allows for simple Python scripts that can install the
> desired functions and then compile or check the passed-in files. 
> Obviously write accesss would be needed for optimization stuff (such
> as if the peepholer was rewritten in Python and used by default), but
> we can also expose this later if we want.
> 
> In terms of 2.5, I think we really need to settle on the fate of the
> ast-objects branch.  If we can get the very basic API for exposing the
> AST to Python code in 2.5 that would be great, but I don't view that
> as critical as choosing on the final AST implementation style since
> wasting work on a version that will disappear would just plain suck. 
> It would be great to resolve this before the PyCon sprints since a
> good chunk of the AST-caring folk will be there for at least part of
> the time.
> 
> -Brett
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/greg.ewing%40canterbury.ac.nz

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes type discussion

2006-02-15 Thread Nick Coghlan
Bob Ippolito wrote:
> On Feb 14, 2006, at 4:17 PM, Guido van Rossum wrote:
>> (Why would you even think about views here? They are evil.)
> 
> I mention views because that's what numpy/Numeric/numarray/etc.  
> do...  It's certainly convenient at times to have that functionality,  
> for example, to work with only the alpha channel in an RGBA image.   
> Probably too magical for the bytes type.

The key difference between numpy arrays and normal sequences is that the 
length of a sequence can change, but the shape of a numpy array is essentially 
fixed.

So view behaviour can be reserved for a dimensioned array type (if the numpy 
folks ever find the time to finish writing their PEP. . .)

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-15 Thread Bengt Richter
On Tue, 14 Feb 2006 12:31:07 -0700, Neil Schemenauer <[EMAIL PROTECTED]> wrote:

>On Mon, Feb 13, 2006 at 08:07:49PM -0800, Guido van Rossum wrote:
>> On 2/13/06, Neil Schemenauer <[EMAIL PROTECTED]> wrote:
>> > "\x80".encode('latin-1')
>> 
>> But in 2.5 we can't change that to return a bytes object without
>> creating HUGE incompatibilities.
>
>People could spell it bytes(s.encode('latin-1')) in order to make it
>work in 2.X.  That spelling would provide a way of ensuring the type
>of the return value.
UIAM spelling it
bytes(map(ord, s))
or
bytes(s)  # (bytes would do above internally)

would work for str or unicode and would be forward compatible.
or
bytes(s, encoding_name) # if standard mapping is not desired

BTW, ord(u'x') has the effect of u'x'.encode('latin-1')
Note:
 >>> s256 = ''.join(chr(i) for i in xrange(256))
 >>> assert s256.decode('latin-1') == u''.join(unichr(ord(c)) for c in s256)
 >>> assert map(ord, s256.decode('latin-1')) == map(ord, s256) == range(256)

But this does *not* mean bytes has an implicit encoding!! It just means
there is a useful 1:1 mapping between the possible bytes values and the
first 256 unicode *characters*, remembering that the latter are *characters*
quite apart from whatever encoding the code source may have.

This is a nice safe 1:1 abstract correspondence ISTM.
>
>> You missed the part where I said that introducing the bytes type
>> *without* a literal seems to be a good first step. A new type, even
>> built-in, is much less drastic than a new literal (which requires
>> lexer and parser support in addition to everything else).
>
>Are you concerned about the implementation effort?  If so, I don't
>think that's justified since adding a new string prefix should be
>pretty straightforward (relative to rest of the effort involved).
>Are you comfortable with the proposed syntax?
>

I'm -1 on special literal at this point. I think a special text-like literal
would be misleading, because it suggests that bytes is somehow in the
string family of types, which IMO it really isn't.
IMO it's semantically more of a builtin array.array('B').

If we adopt the ord/unichr mappings for strings to/from bytes, and
of course init also from a suitable integer sequence, we AGNI, I think.

Using non-ascii non-escaped characters in string literals for specifying
str ord values (as opposed to characters) is bad practice, but escaped
ascii-in-whatever-source-encoding and 
native_literal_in_source_encoding.decode(source_encoding)
seem to work:

 >>> for enc in 'cp437 latin-1 utf-8'.split():
 ... print '\n< %s >'%enc
 ... print mkretesc(enc, 0xf6)[1].decode(enc)
 ... print repr(mkretesc(enc, 0xf6)[1])
 ... print mkretesc(enc, 0xf6)[0]()
 ... t = mkretesc(enc, 0xf6)[0]()
 ... print t[0], t[1], t[2]
 ... print
 ...
 
 < cp437 >
 # -*- coding: cp437 -*-
 def foof6(): return '\xf6', 'ö', 'ö'.decode('cp437')
 
 "# -*- coding: cp437 -*-\ndef foof6(): return '\\xf6', '\x94', 
'\x94'.decode('cp437')\n"
 ('\xf6', '\x94', u'\xf6')
 ÷ ö ö
 
 
 < latin-1 >
 # -*- coding: latin-1 -*-
 def foof6(): return '\xf6', 'ö', 'ö'.decode('latin-1')
 
 "# -*- coding: latin-1 -*-\ndef foof6(): return '\\xf6', '\xf6', 
'\xf6'.decode('latin-1')\n"
 ('\xf6', '\xf6', u'\xf6')
 ÷ ÷ ö
 
 
 < utf-8 >
 # -*- coding: utf-8 -*-
 def foof6(): return '\xf6', 'ö', 'ö'.decode('utf-8')
 
 "# -*- coding: utf-8 -*-\ndef foof6(): return '\\xf6', '\xc3\xb6', 
'\xc3\xb6'.decode('utf-8')\n"
 
 ('\xf6', '\xc3\xb6', u'\xf6')
 ÷ +¦ ö
 
The source looks the same viewed as characters, but you can see the differences 
in the repr values.
But the consequence of source-encoding ord values determining str values is 
that if e.g. you imported
this foo function from variously encoded sources, only the escaped and unicode 
have the proper ord value.
The middle one comes from the native literal source encoding.

So until str becomes unicode, ascii or ascii escapes are a must for 
ord-specifying. Afer str becomes unicode,
escapes will still work, but the unichr/ord symmetry will allow using the full 
first 256 unicode characters
to specify byte type values if desired. (This happens to correspond to latin-1, 
but don't mention it ;-)

It would make possible a round-trippable repr as bytes('...')
using ascii+escaped ascii, and full-256 unicode string literals 
backwards-compatibly after py3k.
Have I missed a pitfall? Hope the output got through to your screen. The first 
and last in the 3-character
lines should always be division sign and umlaut o. The problematical middle 
ones should be cp437 translations
of the middle hex values, since that is the screen I copied from (umluat o, 
division sign, and plus, vertical_bar
for the translation of the utf-8 encoding pair. That one illustrates the 
problem of returning a "character"
encoded in utf-8 thinking single-byte ord value.).

BTW, should bytes be freezable?

Regards,
Bengt Richter

_

Re: [Python-Dev] how to upload new MacPython web page?

2006-02-15 Thread Thomas Wouters
On Tue, Feb 14, 2006 at 09:32:09PM -0800, Bill Janssen wrote:
> We (the pythonmac-sig mailing list) seem to have converged (almost --
> still talking about the logo) on a new download page for MacPython, to
> replace the page currently at
> http://www.python.org/download/download_mac.html.  The strawman can be
> seen at http://bill.janssen.org/mac/new-macpython-page.html.
> 
> How do I get the bits changed on python.org (when we're finished)?

[EMAIL PROTECTED] is probably the right email address (although most of
them are on here as well.)

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C AST to Python discussion

2006-02-15 Thread Nick Coghlan
Greg Ewing wrote:
> Brett Cannon wrote:
>> One protects us from ending up with an unusable AST since
>> the seralization can keep the original AST around and if the version
>> passed back in from Python code is junk it can be tossed and the
>> original version used.
> 
> I don't understand why this is an issue. If Python code
> produces junk and tries to use it as an AST, then it's
> buggy and deserves what it gets. All the AST compiler
> should be responsible for is to try not to crash the
> interpreter under those conditions. But that's true
> whatever method is used for passing ASTs from Python
> to the compiler.

I'd prefer the AST node be real Python objects. The arena approach seems to be 
working reasonably well, but I still don't see a good reason for using a 
specialised memory allocation scheme when it really isn't necessary and we 
have a perfectly good memory management system for PyObject's.

On the 'unusable AST' front, if AST transformation code creates illegal 
output, then the main thing is to raise an exception complaining about what's 
wrong with it. I believe that may need a change to the compiler whether the 
modified AST was serialised or not.

In terms of reverting back to the untransformed AST if the transformation 
fails, then that option is up to the code doing the transformation. Instead of 
serialising all the time (even for cases where the AST is just being inspected 
instead of transformed), we can either let the AST objects support the 
copy/deepcopy protocol, or else provide a method to clone a tree before trying 
to transform it.

A unified representation means we only have one API to learn, that is 
accessible from both Python and C. It also eliminates any need to either 
implement features twice (once in Python and once in C) or else let the Python 
and C API's diverge to the point where what you can do with one differs from 
what you can do with the other.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 PEP

2006-02-15 Thread Thomas Wouters
On Tue, Feb 14, 2006 at 09:58:46PM -0800, Neal Norwitz wrote:

> We need to ensure that PEPs 308, 328, and 343 are implemented.  We
> have possible volunteers for 308 and 343, but not 328.  Brett is doing
> 352 and Martin is doing 353.

I can volunteer for 328 if no one else wants it, I've messed with the import
mechanism before (and besides, it's fun.) I've also written an unfinished
308 implementation to get myself acquainted with the AST code more.
'Unfinished' means that it works completely, except for some cases of
ambiguous syntax. I can fix that in a few days if the deadline nears and
there's no working patch.

(Naively adding if/else expressions broke list comprehensions with an 'if'
clause, and fixing that broke list comprehensions with 'for x in lambda:0,
lambda:1', and fixing that broke list comprehensions altogether... I added
"clean up Grammar file" to the PyCon core sprint topics for that reason. I
guess 308 wasn't as much a trainer implementation as people thought ;) The
syntax part of 328 is probably easier (but the rest isn't.)

> Access to C AST from Python

If this still needs work when I finish grokking the AST code and the PyObj
branch of it, I can help.

I should have more than enough spare time to finish these things before
alpha 1.

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C AST to Python discussion

2006-02-15 Thread Thomas Wouters
On Wed, Feb 15, 2006 at 07:28:36PM +1000, Nick Coghlan wrote:

> On the 'unusable AST' front, if AST transformation code creates illegal
> output, then the main thing is to raise an exception complaining about
> what's wrong with it. I believe that may need a change to the compiler
> whether the modified AST was serialised or not.

I would personally prefer the AST validation to be a separate part of the
compiler. It means the one or the other can be out of sync, but it also
means it can be accessed directly (validating AST before sending it to the
compiler) and the compiler (or CFG generator, or something between AST and
CFG) can decide not to validate internally generated AST for non-debug
builds, for instance.

I like both those reasons.

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread Nick Coghlan
Guido van Rossum wrote:
> But somehow I still like the 'open' verb. It has a long and rich
> tradition. And it also nicely conveys that it is a factory function
> which may return objects of different types (though similar in API)
> based upon either additional arguments (e.g. buffering) or the
> environment (e.g. encodings) or even inspection of the file being
> opened.

If we went with longer names, a slight variation on the opentext/openbinary 
idea would be to use opentext and opendata.

That is, "give me something that looks like a text file (it contains 
characters)", or "give me something that looks like a data file (it contains 
bytes)".

"opentext" would map to "codecs.open" (that is, accepting an encoding argument)

"opendata" would map to the standard "open", but with the 'b' in the mode 
string added automatically.

So the mode choices common to both would be:

   'r'/'w'/'a'   - read/write/append (default 'r')
   ''/'+'- update (IOError if file does not already exist) (default '')

opentext would allow the additional option:
   ''/'U'- universal newlines (default '')

Neither of them would accept a 'b' in the mode string.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] http://www.python.org/dev/doc/devel still available

2006-02-15 Thread Tim Parkin
Guido van Rossum wrote:

> (Now that I work for Google I realize more than ever before the
> importance of keeping URLs stable; PageRank(tm) numbers don't get
> transferred as quickly as contents. I have this worry too in the
> context of the python.org redesign; 301 permanent redirect is *not*
> going to help PageRank of the new page.)
Hi Guido,

Could you expand on why 301 redirects won't help with the transfer of
page rank (if you're allowed)? We've done exactly this on many sites and
the pagerank (or more relevantly the search rankings on specific terms)
has transferred almost overnight. The bigger pagerank updates (both
algorithm changes and overhauls in approach) seem to only happen every
few months and these also seem to take notice of 301 redirects (they
generally clear up any supplemental results).

The addition of the docs.python.org was also intended (I thought) to be
used in the google customised search (the google page you go to when you
search from python.org). I'm not sure if that go lost in implementation
but the idea was that the google box would have a radio button for
docs.python.org.

I agree that docs.python.org should only be the current documentation
however what about the large amount of people who use 2.3 as standard?
perhaps the docs23.python.org makes sense.

In terms of pagerank for the different versions of the docs, would it
make sense to 'hide' the older versions of the docs with a noindex so
that general google searches will only return the current docs.

 Google seems to have a policy of ranking 'long standing' links
with a higher pagerank weighting, hence older versions of python docs
ranking higher). Hence keeping a single 'current' set of docs and having
all inbound links pointing to them (e.g. docs.python.org) will gradually
build up the search ranking.

+1 on docs.python.org only containing current (with the caveat that
there be an equivalent for users of specific versions, e.g. 2.3 users)

Tim Parkin

p.s. All my knowledge of how google work is gained through personal
research so the terminology, techniques and results may be completely
wrong (and also may vary from time to time) - however they do reflect
direct experience.

p.p.s regarding 'site:', 'allinurl:' and other google modifiers; It
would seem a good idea to create a single page that helped site users
make such searches without having to learn how the modifiers work.

It maybe should be noted that you can also add a 'temporary redirects'
(302's) which is taken by google to mean "leave the original search
results in place". This has also worked for us (old urls remain the same
as far as google is concerned).
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-15 Thread Ron Adam
Greg Ewing wrote:
> Ron Adam wrote:
> 
>> My first impression and thoughts were:  (and seems incorrect now)
>>
>>  bytes(object) ->  byte sequence of objects value
>>
>> Basically a "memory dump" of objects value.
> 
> As I understand the current intentions, this is correct.
> The bytes constructor would have two different signatures:
> 
> (1)   bytes(seq) --> interprets seq as a sequence of
>  integers in the range 0..255,
>  exception otherwise
> 
> (2a)  bytes(str, encoding) --> encodes the characters of
> (2b)  bytes(unicode, encoding) the string using the specified
>encoding
> 
> In (2a) the string would be interpreted as containing
> ascii characters, with an exception otherwise. In 3.0,
> (2a) will disappear leaving only (1) and (2b).

I was presuming it would be done in C code and it will just need a 
pointer to the first byte, memchr(), and then read n bytes directly into 
a new memory range via  memcpy(). But I don't know if that's possible 
with Pythons object model.  (My C skills are a bit rusty as well)

However, if it's done with a Python iterator and then each item is 
translated to bytes in a sequence, (much slower), an encoding will need 
to be known for it to work correctly.  Unfortunately Unicode strings 
don't set an attribute to indicate it's own encoding. So bytes() can't 
just do encoding = s.encoding to find out, it would need to be specified 
in this case.

And that should give you a byte object that is equivalent to the bytes 
in memory, providing Python doesn't compress data internally to save 
space. (?, I don't think it does)

I'd prefer the first version *if possible* because of the performance.

>> And I was thinking a bytes argument of more than one item would indicate 
>> a byte sequence.
>>
>>  bytes(1,2,3)  ->  bytes([1,2,3])
> 
> But then you have to test the argument in the one-argument
> case and try to guess whether it should be interpreted as
> a sequence or an integer. Best to avoid having to do that.

Yes, I agree.

>> Which is fine... so ???
>>
>> b = bytes(0L) ->  bytes([0,0,0,0])
> 
> No, bytes(0L) --> TypeError because 0L doesn't implement
> the iterator protocol or the buffer interface.

It wouldn't need it if it was a direct C memory copy.

> I suppose long integers might be enhanced to support the
> buffer interface in 3.0, but that doesn't seem like a good
> idea, because the bytes you got that way would depend on
> the internal representation of long integers. In particular,

Since some longs will be of different length, yes a bytes(0L) could give 
differing results on different platforms, but it will always give the 
same result on the platform it is run on. I actually think this is a 
plus and not a problem. If you are using Python to implement a byte 
interface you need to *know* it is different, not have it hidden.

 bytesize = len(bytes(0L))  # find how long a long is


Cheers,
   Ronald Adam


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C AST to Python discussion

2006-02-15 Thread Nick Coghlan
Thomas Wouters wrote:
> On Wed, Feb 15, 2006 at 07:28:36PM +1000, Nick Coghlan wrote:
> 
>> On the 'unusable AST' front, if AST transformation code creates illegal
>> output, then the main thing is to raise an exception complaining about
>> what's wrong with it. I believe that may need a change to the compiler
>> whether the modified AST was serialised or not.
> 
> I would personally prefer the AST validation to be a separate part of the
> compiler. It means the one or the other can be out of sync, but it also
> means it can be accessed directly (validating AST before sending it to the
> compiler) and the compiler (or CFG generator, or something between AST and
> CFG) can decide not to validate internally generated AST for non-debug
> builds, for instance.
> 
> I like both those reasons.

Aye, I was thinking much the same thing.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Generalizing *args and **kwargs

2006-02-15 Thread Thomas Wouters

I've been thinking about generalization of the *args/**kwargs syntax for
quite a while, and even though I'm pretty sure Guido (and many people) will
consider it overgeneralization, I am finally going to suggest it. This whole
idea is not something dear to my heart, although I obviously would like to
see it happen. If the general vote is 'no', I'll write a small PEP or add it
to PEP 13 and be done with it.

The grand total of the generalization would be something like this:

Allow 'unpacking' of arbitrary iterables in sequences:
>>> iterable = (1, 2)
>>> ['a', 'b', *iterable, 'c']
['a', 'b', 1, 2, 'c']
>>> ('a', 'b', *iterable, 'c')
('a', 'b', 1, 2, 'c')

Possibly also allow 'unpacking' in list comprehensions and genexps:
>>> [ *subseq for subseq in [(1, 2), (3, 4)] ]
[1, 2, 3, 4]

(You can already do this by adding an extra 'for' loop inside the LC)

Allow 'unpacking' of mapping types (anything supporting 'items' or
'iteritems') in dictionaries:
>>> args = {'verbose': 1}
>>> defaults = {'verbose': 0}
>>> {**defaults, **args, 'fixedopt': 1}
{'verbose': 1, 'fixedopt': 1}

Allow 'packing' in assignment, stuffing left-over items in a list.
>>> a, b, *rest = range(5)
>>> a, b, rest
(0, 1, [2, 3, 4])
>>> a, b, *rest = range(2)
(0, 1, [])

(A list because you can't always take the type of the RHS and it's the right
Python type for 'an arbitrary length homogeneous sequence'.)

While generalizing that, it may also make sense to allow:

>>> def spam(*args, **kwargs):
... return args, kwargs
... 
>>> args = (1, 2); kwargs = {'eggs': 'no'}
>>> spam(*args, 3)
((1, 2, 3), {})
>>> spam(*args, 3, **kwargs, spam='extra', eggs='yes')
((1, 2, 3), {'spam': 'extra', 'eggs': 'yes'})

(In spite of the fact that both are already possible by fiddling args/kwargs
beforehand or doing '*(args + (3,))'.)

Maybe it also makes sense on the defining side, particularly for keyword
arguments to indicate 'keyword-only arguments'. Maybe with a '**' without a
name attached:

>>> def spam(pos1, pos2, **, kwarg1=.., kwarg2=..)

But I dunno yet.

Although I've made it look like I have a working implementation, I haven't.
I know exactly how to do it, though, except for the AST part ;) Once I
figure out how to properly work with the AST code I'll probably write this
patch whether it's a definite 'no' or not, just to see if I can. I wouldn't
mind if people gave their opinion, though.

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-15 Thread Adam Olsen
On 2/15/06, Ron Adam <[EMAIL PROTECTED]> wrote:
> Greg Ewing wrote:
> > Ron Adam wrote:
> >> b = bytes(0L) ->  bytes([0,0,0,0])
> >
> > No, bytes(0L) --> TypeError because 0L doesn't implement
> > the iterator protocol or the buffer interface.
>
> It wouldn't need it if it was a direct C memory copy.
>
> > I suppose long integers might be enhanced to support the
> > buffer interface in 3.0, but that doesn't seem like a good
> > idea, because the bytes you got that way would depend on
> > the internal representation of long integers. In particular,
>
> Since some longs will be of different length, yes a bytes(0L) could give
> differing results on different platforms, but it will always give the
> same result on the platform it is run on. I actually think this is a
> plus and not a problem. If you are using Python to implement a byte
> interface you need to *know* it is different, not have it hidden.
>
>  bytesize = len(bytes(0L))  # find how long a long is

I believe you're confusing a C long with a Python long.  A Python long
is implemented as an array and has variable size.

In any case we already have the struct module:

>>> import struct
>>> struct.calcsize('l')
4

--
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C AST to Python discussion

2006-02-15 Thread Simon Burton
On Wed, 15 Feb 2006 00:34:35 -0800
Brett Cannon <[EMAIL PROTECTED]> wrote:

> As per Neal's prodding email, here is a thread to discuss where we
> want to go with the C AST to Python stuff and what I think are the
> core issues at the moment.
> 
> First issue is the ast-objects branch.  Work is being done on it, but
> it still leaks some references (Neal or Martin can correct me if I am
> wrong).  

I've been doing the heavy lifting on ast-objects the last few weeks.
Today it finally passed the python test suite. The last thing to do is
the addition of XDECREF's, so yes, it is leaking a lot of references.

I won't make it to PyCon (it's a long way for me to come), but gee I've left
all the fun stuff for you to do !
:)

Even if AST transforms are not allowed, I see it as the strongest form of
code reflection, and long over-due in python.

Simon.


-- 
Simon Burton, B.Sc.
Licensed PO Box 8066
ANU Canberra 2601
Australia
Ph. 61 02 6249 6940
http://arrowtheory.com 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes type discussion

2006-02-15 Thread Stephen J. Turnbull
> "Fred" == Fred L Drake, <[EMAIL PROTECTED]> writes:

Fred> On Tuesday 14 February 2006 22:34, Greg Ewing wrote:

>> Seems to me this is a case where you want to be able to change
>> encodings in the middle of reading the stream.  You start off
>> reading the data as ascii, and once you've figured out the
>> encoding, you switch to that and carry on reading.

Fred> Not quite.  The proper response in this case is often to
Fred> re-start decoding with the correct encoding, since some of
Fred> the data extracted so far may have been decoded incorrectly.
Fred> A very carefully constructed application may be able to go
Fred> back and re-decode any data saved from the stream with the
Fred> previous encoding, but that seems like it would be pretty
Fred> fragile in practice.

I believe GNU Emacs is currently doing this.  AIUI, they save
annotations where the codec is known to be non-invertible (eg, two
charset-changing escape sequences in a row).  I do think this is
fragile, and a robust application really should buffer everything it's
not sure of decoding correctly.

Fred> There may be cases where switching encoding on the fly makes
Fred> sense, but I'm not aware of any actual examples of where
Fred> that approach would be required.

This is exactly what ISO 2022 formalizes: switching encodings on the
fly.

mboxes of Japanese mail often contain random and unsignaled encoding
changes.

A terminal emulator may need to switch when logging in to a remote
system.

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of TsukubaTennodai 1-1-1 Tsukuba 305-8573 JAPAN
   Ask not how you can "do" free software business;
  ask what your business can "do for" free software.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] nice()

2006-02-15 Thread Smith
I am reluctantly posting here since this is of less intense interest than other 
things being discussed right now, but this is related to the areclose proposal 
that was discussed here recently. 

The following discussion ends with things that python-dev might want to 
consider in terms of adding a function that allows something other than the 
default 12- and 17-digit precision representations of numbers that str() and 
repr() give. Such a function (like nice(), perhaps named trim()?) would provide 
a way to convert fp numbers that are being used in comparisons into a precision 
that reflects the user's preference.  

Everyone knows that fp numbers must be compared with caution, but there is a 
void in the relative-error department for exercising such caution, thus the 
proposal for something like 'areclose'. The problem with areclose(), however, 
is that it only solves one part of the problem that needs to be solved if two 
fp's *are* going to be compared: if you are going to check if a < b you would 
need to do something like 

not areclose(a,b) and a < b

With something like trim() (a.k.a nice()) you could do

trim(a) < trim(b)

to get the comparison to 12-digit default precision or arbitrary precision with 
optional arguments, e.g. to 3 digits of precision:

trim(a,3) < trim(b,3)

>From a search on the documentation, I don't see that the name trim() is taken 
>yet.

OK, comments responding to Greg follow.


| From: Greg Ewing [EMAIL PROTECTED]
| Smith wrote:
| 
|| computing the bin boundaries for a histogram
|| where bins are a width of 0.1:
|| 
| for i in range(20):
|| ...  if (i*.1==i/10.)<>(nice(i*.1)==nice(i/10.)):
|| ...   print i,repr(i*.1),repr(i/10.),i*.1,i/10.
| 
| I don't see how that has any relevance to the way bin boundaries
| would be used in practice, which is to say something like
| 
|   i = int(value / 0.1)
|   bin[i] += 1 # modulo appropriate range checks

This is just masking the issue by converting numbers to integers. The fact 
remains that two mathematically equal numbers can have two different internal 
representations with one being slightly larger than the exact integer value and 
one smaller:

>>> a=(23*.1)*10;a
23.004
>>> b=2.3/.1;b
22.996
>>> int(a/.1),int(b/.1)
(230, 229)

Part of the answer in this context is to use round() rather than int so you are 
getting to the closest integer.

 
|| For, say, garden variety numbers that aren't full of garbage digits
|| resulting from fp computation, the boundaries computed as 0.1*i are\
|| not going to agree with such simple numbers as 1.4 and 0.7.
| 
| Because the arithmetic is binary rather than decimal. But even using
| decimal, you get the same sort of problems using a bin width of
| 1.0/3.0. The solution is to use an algorithm that isn't sensitive
| to those problems, then it doesn't matter what base your arithmetic
| is done in.

Agreed.

| 
|| I understand that the above really is just a patch over the problem,
|| but I'm wondering if it moves the problem far enough away that most
|| users wouldn't have to worry about it.
| 
| No, it doesn't. The problems are not conveniently grouped together
| in some place you can get away from; they're scattered all over the
| place where you can stumble upon one at any time.
| 

Yes, even a simple computation of the wrong type can lead to unexpected 
results. I agree.

|| So perhaps this brings us back to the original comment that "fp
|| issues are a learning opportunity." They are. The question I have is
|| "how 
|| soon  do they need to run into them?" Is decreasing the likelihood
|| that they will see the problem (but not eliminate it) a good thing
|| for the python community or not?
| 
| I don't think you're doing anyone any favours by trying to protect
| them from having to know about these things, because they *need* to
| know about them if they're not to write algorithms that seem to
| work fine on tests but mysteriously start producing garbage when
| run on real data, possibly without it even being obvious that it is
| garbage.

Mostly I agree, but if you go to the extreme then why don't we just drop 
floating point comparisons altogether and force the programmer to convert 
everything to integers and make their own bias evident (like converting to int 
rather than nearest int). Or we drop the fp comparison operators and introduce 
fp comparison functions that require the use of tolerance terms to again make 
the assumptions transparent: 

def lt(x, y, rel_err = 1e-5, abs_err = 1e-8):
return not areclose(x,y,rel_err,abs_err) and int(x-y)<=0
print lt(a,b,0,1e-10) --> False (they are equal to that tolerance)
print lt(a,b,0,1e-20) --> True (a is less than b at that tolerance)

The fact is, we make things easier and let the programmer shoot themselves in 
the foot if they want to by providing things like fp comparisons and even 
functions like sum that do dumb-sums (though Raymond Hettinger's Python Recipe 
at ASPN provides a smart-sum).

I 

Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread Fuzzyman




Adam Olsen wrote:

  On 2/14/06, Just van Rossum <[EMAIL PROTECTED]> wrote:
  
  
+1 for two functions.

My choice would be open() for binary and opentext() for text. I don't
find that backwards at all: the text function is going to be more
different from the current open() function then the binary function
would be since in many ways the str type is closer to bytes than to
unicode.

Maybe it's even better to use opentext() AND openbinary(), and deprecate
plain open(). We could even introduce them at the same time as bytes()
(and leave the open() deprecation for 3.0).

  
  
Thus providing us with a transition period, even with warnings on use
of the old function.
  

[snip..]

I personally like the move towards all unicode strings, basically any
text where you don't know the encoding used is 'random binary data'.
This works fine, so long as you are in control of the text source.
*However*, it leaves the following problem :

The current situation (treating byte-sequences as text and assuming
they are an ascii-superset encoded text-string) *works* (albeit with
many breakages), simply because this assumption is usually correct.

Forcing the programmer to be aware of encodings, also pushes the same
requirement onto the user (who is often the source of the text in
question).

Currently you can read a text file and process it - making sure that
any changes/requirements only use ascii characters. It therefore
doesn't matter what 8 bit ascii-superset encoding is used in the
original. If you force the programmer to specify the encoding in order
to read the file, they would have to pass that requirement onto their
user. Their user is even less likely to be encoding aware than the
programmer.

What this means, is that for simple programs where the programmer
doesn't want to have to worry about encoding, or can't force the user
to be aware, they will read in the file as bytes. Modules will quickly
and inevitably be created implementing all the 'string methods' for
bytes. New programmers will gravitate to these and the old mess will
continue, but with a more awkward hybrid than before. (String
manipulations of byte sequences will no longer be a core part of the
language - and so be harder to use.)

Not sure what we can do to obviate this of course... but is this change
actually going to improve the situation or make it worse ?

All the best,

Michael Foord


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] nice()

2006-02-15 Thread Raymond Hettinger
[Smith]
> The following discussion ends with things that python-dev might want to 
> consider in terms of adding a function that allows something other than the 
> default 12- and 17-digit precision representations of numbers that str() and 
> repr() give. Such a function (like nice(), perhaps named trim()?) would 
> provide a way to convert fp numbers that are being used in comparisons into a 
> precision that reflects the user's preference.

-1  See posts by Greg, Terry, and myself which recommend against trim(), 
nice(), 
or other variants.  For the purpose of precision sensitive comparisons, these 
constructs are unfit for their intended purpose -- they are error-prone and do 
not belong in Python.  They may have some legitimate uses, but those tend to be 
dominated by the existing round() function.

If anything, then some variant of is_close() can go in the math module.  BUT, 
the justification should not be for newbies to ignore issues with 
floating-point 
equality comparisons.  The justification would have to be that folks with some 
numerical sophistication have a recurring need for the function (with 
sophistication meaning that they know how to come up with relative and absolute 
tolerances that make their application succeed over the full domain of possible 
inputs).


Raymond


 relevant posts from Greg and Terry 

[Greg Ewing]
>> I don't think you're doing anyone any favours by trying to protect
>> them from having to know about these things, because they *need* to
>> know about them if they're not to write algorithms that seem to
>> work fine on tests but mysteriously start producing garbage when
>> run on real data,

[Terry Reedy]
> I agree.  Here was my 'kick-in-the-butt' lesson (from 20+ years ago):  the
> 'simplified for computation' formula for standard deviation, found in too
> many statistics books without a warning as to its danger, and specialized
> for three data points, is sqrt( ((a*a+b*b+c*c)-(a+b+c)**2/3.0) /2.0).
> After 1000s of ok calculations, the data were something like a,b,c =
> 10005,10006,10007.  The correct answer is 1.0 but with numbers rounded to 7
> digits, the computed answer is sqrt(-.5) == CRASH.  I was aware that
> subtraction lost precision but not how rounding could make a theoretically
> guaranteed non-negative difference negative.
>
> Of course, Python floats being C doubles makes such glitches much rarer.
> Not exposing C floats is a major newbie (and journeyman) protection
> feature.




[Greg Ewing]
> I don't think you're doing anyone any favours by trying to protect
> them from having to know about these things, because they *need* to
> know about them if they're not to write algorithms that seem to
> work fine on tests but mysteriously start producing garbage when
> run on real data,







I recommend rejecting trim(), nice(), areclose(), and all variants.



Greg, Terry, and myself have



>
> OK, comments responding to Greg follow.
>
>
> | From: Greg Ewing [EMAIL PROTECTED]
> | Smith wrote:
> |
> || computing the bin boundaries for a histogram
> || where bins are a width of 0.1:
> ||
> | for i in range(20):
> || ...  if (i*.1==i/10.)<>(nice(i*.1)==nice(i/10.)):
> || ...   print i,repr(i*.1),repr(i/10.),i*.1,i/10.
> |
> | I don't see how that has any relevance to the way bin boundaries
> | would be used in practice, which is to say something like
> |
> |   i = int(value / 0.1)
> |   bin[i] += 1 # modulo appropriate range checks
>
> This is just masking the issue by converting numbers to integers. The fact 
> remains that two mathematically equal numbers can have two different internal 
> representations with one being slightly larger than the exact integer value 
> and one smaller:
>
 a=(23*.1)*10;a
> 23.004
 b=2.3/.1;b
> 22.996
 int(a/.1),int(b/.1)
> (230, 229)
>
> Part of the answer in this context is to use round() rather than int so you 
> are getting to the closest integer.
>
>
> || For, say, garden variety numbers that aren't full of garbage digits
> || resulting from fp computation, the boundaries computed as 0.1*i are\
> || not going to agree with such simple numbers as 1.4 and 0.7.
> |
> | Because the arithmetic is binary rather than decimal. But even using
> | decimal, you get the same sort of problems using a bin width of
> | 1.0/3.0. The solution is to use an algorithm that isn't sensitive
> | to those problems, then it doesn't matter what base your arithmetic
> | is done in.
>
> Agreed.
>
> |
> || I understand that the above really is just a patch over the problem,
> || but I'm wondering if it moves the problem far enough away that most
> || users wouldn't have to worry about it.
> |
> | No, it doesn't. The problems are not conveniently grouped together
> | in some place you can get away from; they're scattered all over the
> | place where you can stumble upon one at any time.
> |
>
> Yes, even a simple computation of the wrong type can lead to unexpected 
> results. I agree.
>
> || So perha

Re: [Python-Dev] Generalizing *args and **kwargs

2006-02-15 Thread Gustavo Niemeyer
> I've been thinking about generalization of the *args/**kwargs syntax for
> quite a while, and even though I'm pretty sure Guido (and many people) will
> consider it overgeneralization, I am finally going to suggest it. This whole
> idea is not something dear to my heart, although I obviously would like to
> see it happen. If the general vote is 'no', I'll write a small PEP or add it
> to PEP 13 and be done with it.

A PEP would be great, even if not accepted. At least we'll have it discussed
in a single place and avoid rediscussing it everytime someone figures
out it's a nice idea. Have a look for the subject "Extending tuple unpacking"
in the mailing list for a recent discussion on the topic.

-- 
Gustavo Niemeyer
http://niemeyer.net
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bdist_* to stdlib?

2006-02-15 Thread Jan Claeys
Op wo, 15-02-2006 te 14:00 +1300, schreef Greg Ewing:
> I'm disappointed that the various Linux distributions
> still don't seem to have caught onto the very simple
> idea of *not* scattering files all over the place when
> installing something.
> 
> MacOSX seems to be the only system so far that has got
> this right -- organising the system so that everything
> related to a given application or library can be kept
> under a single directory, clearly labelled with a
> version number. 

Those directories might be mounted on entirely different hardware (even
over a network), often with different characteristics (access speed,
writeability, etc.).


-- 
Jan Claeys

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how to upload new MacPython web page?

2006-02-15 Thread Tim Parkin
Thomas Wouters wrote:

>On Tue, Feb 14, 2006 at 09:32:09PM -0800, Bill Janssen wrote:
>  
>
>>We (the pythonmac-sig mailing list) seem to have converged (almost --
>>still talking about the logo) on a new download page for MacPython, to
>>replace the page currently at
>>http://www.python.org/download/download_mac.html.  The strawman can be
>>seen at http://bill.janssen.org/mac/new-macpython-page.html.
>>
>>How do I get the bits changed on python.org (when we're finished)?
>>
>>
>
>[EMAIL PROTECTED] is probably the right email address (although most of
>them are on here as well.)
>
>  
>
I'm happy to upload the pages when you're ready.

Tim
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C AST to Python discussion

2006-02-15 Thread Jeremy Hylton
I am still -1 on the ast-objects branch.  It adds a lot of boilerplate
code and its makes complicated what is now simple.  I'll see if I can
get a rough cut of the marshal code ready today, so there will be a
complete implementation of my original plan.

I also think we should keep the transformation api simple.  If we
provide an extension module, along the lines of the parser module,
users can write transformations with that module.  They can also write
their own wrapper script that runs a script after applying
transformations.

I agree that the question of saved bytecode files still needs to be
resolved.  I'm not sure that extending the bytecode format to record
modifications is enough, since you also have a filename problem:  How
do you manage two versions of a module, one compiled with
transformation and one compiled without?

How about we arrange for some open space time at PyCon to discuss? 
Unfortunately, the compiler talk isn't until the last day and I can't
stay for sprints.  It would be better to have the talk, then the open
space, then the sprint.

Jeremy

On 2/15/06, Simon Burton <[EMAIL PROTECTED]> wrote:
> On Wed, 15 Feb 2006 00:34:35 -0800
> Brett Cannon <[EMAIL PROTECTED]> wrote:
>
> > As per Neal's prodding email, here is a thread to discuss where we
> > want to go with the C AST to Python stuff and what I think are the
> > core issues at the moment.
> >
> > First issue is the ast-objects branch.  Work is being done on it, but
> > it still leaks some references (Neal or Martin can correct me if I am
> > wrong).
>
> I've been doing the heavy lifting on ast-objects the last few weeks.
> Today it finally passed the python test suite. The last thing to do is
> the addition of XDECREF's, so yes, it is leaking a lot of references.
>
> I won't make it to PyCon (it's a long way for me to come), but gee I've left
> all the fun stuff for you to do !
> :)
>
> Even if AST transforms are not allowed, I see it as the strongest form of
> code reflection, and long over-due in python.
>
> Simon.
>
>
> --
> Simon Burton, B.Sc.
> Licensed PO Box 8066
> ANU Canberra 2601
> Australia
> Ph. 61 02 6249 6940
> http://arrowtheory.com
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 PEP

2006-02-15 Thread Aahz
On Wed, Feb 15, 2006, Thomas Wouters wrote:
>
> I can volunteer for 328 if no one else wants it, I've messed with the import
> mechanism before (and besides, it's fun.) I've also written an unfinished
> 308 implementation to get myself acquainted with the AST code more.
> 'Unfinished' means that it works completely, except for some cases of
> ambiguous syntax. I can fix that in a few days if the deadline nears and
> there's no working patch.

If you want to also take over the PEP328 editing, please be my guest.  I
keep making time for it that gets overridden by other things.
-- 
Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread James Y Knight
On Feb 15, 2006, at 7:19 AM, Fuzzyman wrote:
> [snip..]
>
> I personally like the move towards all unicode strings, basically  
> any text where you don't know the encoding used is 'random binary  
> data'. This works fine, so long as you are in control of the text  
> source. *However*, it leaves the following problem :
>
> The current situation (treating byte-sequences as text and assuming  
> they are an ascii-superset encoded text-string) *works* (albeit  
> with many breakages), simply because this assumption is usually  
> correct.
>
> Forcing the programmer to be aware of encodings, also pushes the  
> same requirement onto the user (who is often the source of the text  
> in question).
>
> Currently you can read a text file and process it - making sure  
> that any changes/requirements only use ascii characters. It  
> therefore doesn't matter what 8 bit ascii-superset encoding is used  
> in the original. If you force the programmer to specify the  
> encoding in order to read the file, they would have to pass that  
> requirement onto their user. Their user is even less likely to be  
> encoding aware than the programmer.

Or the programmer can just use "iso-8859-1" and call it done. That  
will get you the same "I don't care" behavior as now.

James
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] math.areclose ...?

2006-02-15 Thread Smith
A problem that I pointed out with the proposed areclose() function is that it 
has within it a fp comparison. If such a function is to have greater utility, 
it should allow the user to specify how significant to consider the computed 
error. A natural extension of being able to tell if 2 fp numbers are close is 
to make a more general comparison. For that purpose, a proposed fpcmp function 
is appended. From that, fp boolean comparison operators (le, gt, ...) are 
easily constructed.

Python allows fp comparison. This is significantly of source of surprises and 
learning experiences. Are any of these proposals of interest for providing 
tools to more intelligently make the fp comparisons?

###
#new proposal for the areclose() function
def areclose(x,y,atol=1e-8,rtol=1e-5,prec=12):
"""Return False if the |x-y| is greater than atol or
 greater than the absolute value of the larger of x and y, 
 otherwise True. The comparison is made by computing a 
 difference that should be 0 if the two numbers satisfy 
 either condition; prec controls the precision of the
 value that is obtained, e.g. 8.3__e-17 is obtained 
 for (2.1-2)-.1. But rounding to the 12th digit (the default 
 precision) the value of 0.0 is returned indicating that for
that precision there is no (significant) error."""

diff = abs(x-y)
return round(diff-atol,prec)<=0 or \
   round(diff-rtol*max(abs(x),abs(y)),prec)<=0

#fp cmp
def fpcmp(x,y,atol=1e-8,rtol=1e-5,prec=12):
"""Return 0 if x and y are close in the absolute or 
relative sense. If not, then return -1 if x < y or +1 if x > y.
Note: prec controls how many digits of the error are retained
when checking for closeness."""

if areclose(x,y,atol,rtol,prec):
return 0
else:
return cmp(x,y)

# fp comparisons functions
def lt(x,y,atol=1e-8,rtol=1e-5,prec=12):
return fpcmp(x, y, atol, rtol, prec)==-1
def le(x,y,atol=1e-8,rtol=1e-5,prec=12):
return fpcmp(x, y, atol, rtol, prec) in (-1,0)
def eq(x,y,atol=1e-8,rtol=1e-5,prec=12):
return fpcmp(x, y, atol, rtol, prec)==0
def gt(x,y,atol=1e-8,rtol=1e-5,prec=12):
return fpcmp(x, y, atol, rtol, prec)==1
def ge(x,y,atol=1e-8,rtol=1e-5,prec=12):
return fpcmp(x, y, atol, rtol, prec) in (0,1)
def ne(x,y,atol=1e-8,rtol=1e-5,prec=12):
return fpcmp(x, y, atol, rtol, prec)<>0
###
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-15 Thread Bengt Richter
On Tue, 14 Feb 2006 15:14:07 -0800, Guido van Rossum <[EMAIL PROTECTED]> wrote:

>On 2/14/06, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
>> Guido van Rossum wrote:
>> > As Phillip guessed, I was indeed thinking about introducing bytes()
>> > sooner than that, perhaps even in 2.5 (though I don't want anything
>> > rushed).
>>
>> Hmm, that is probably going to be too early. As the thread shows
>> there are lots of things to take into account, esp. since if you
>> plan to introduce bytes() in 2.x, the upgrade path to 3.x would
>> have to be carefully planned. Otherwise, we end up introducing
>> a feature which is meant to prepare for 3.x and then we end up
>> causing breakage when the move is finally implemented.
>
>You make a good point. Someone probably needs to write up a new PEP
>summarizing this discussion (or rather, consolidating the agreement
>that is slowly emerging, where there is agreement, and summarizing the
>key open questions).
>
>> > Even in Py3k though, the encoding issue stands -- what if the file
>> > encoding is Unicode? Then using Latin-1 to encode bytes by default
>> > might not by what the user expected. Or what if the file encoding is
>> > something totally different? (Cyrillic, Greek, Japanese, Klingon.)
>> > Anything default but ASCII isn't going to work as expected. ASCII
>> > isn't going to work as expected either, but it will complain loudly
>> > (by throwing a UnicodeError) whenever you try it, rather than causing
>> > subtle bugs later.
>>
>> I think there's a misunderstanding here: in Py3k, all "string"
>> literals will be converted from the source code encoding to
>> Unicode. There are no ambiguities - a Klingon character will still
>> map to the same ordinal used to create the byte content regardless
>> of whether the source file is encoded in UTF-8, UTF-16 or
>> some Klingon charset (are there any ?).
>
>OK, so a string (literal or otherwise) containing a Klingon character
>won't be acceptable to the bytes() constructor in 3.0. It shouldn't be
>in 2.x either then.
>
>I still think that someone who types a file in Latin-1 and enters
>non-ASCII Latin-1 characters in a string literal and then passes it to
>the bytes() constructor might expect to get bytes encoded in Latin-1,
>and someone who types a file in UTF-8 and enters non-ASCII Unicode
>characters might expect to get UTF-8-encoded bytes. Since they can't
>both get what they want, we should disallow both, and only allow
>ASCII.
ISTM this is a good rule for backwards compatibility for the
'...' => u'...' py3k transition. I don't know if you saw my other post,
but I was suggesting that bytes(s_or_u) should be mapped to the integer
values by the current definition of ord for either str or unicode.
UIAM this works when you convert ASCII and will work if you convert
the ASCII string to unicode.

It will also let you use unicode _currently_ to get past the ASCII restriction,
since ord(u) works for all of the first 256 unicode characters.
Using those characters in bytes(u'...') works even if your source encoding is 
utf-8
and contains ascii escapes, e.g.

 >>> utfsrc = """\
 ... # -*- coding: utf-8 -*-
 ... umlaut_os, values = u'\xf6\\xf6', map(ord, u'\xf6\\xf6')
 ... """.decode('latin-1').encode('utf-8')

Hopefully showing on your screen properly:

 >>> print utfsrc.decode('utf-8')
 # -*- coding: utf-8 -*-
 umlaut_os, values = u'ö\xf6', map(ord, u'ö\xf6')

And the repr, where you can see the utf-8 double chars for utf-8 and the \\xf6 
ascii escape:

 >>> print repr(utfsrc)
 "# -*- coding: utf-8 -*-\numlaut_os, values = u'\xc3\xb6\\xf6', map(ord, 
u'\xc3\xb6\\xf6')\n"

compiling the utf-8 source and executing it:

 >>> exec compile(utfsrc,'','exec')

Good results:

 >>> umlaut_os, map(hex, values)
 (u'\xf6\xf6', ['0xf6', '0xf6'])
 >>> print umlaut_os
 öö

So map(s_or_u) works predictably now, and will not break after py3k
unless you use non-ascii in _plain_ str strings now. But in unicode it
should be ok even now.

I think ord is a consistent and handy mapping of characters to bytes,
and the fact that it works for unicode for all 256 characters seems to me
a boon. (So long as no one gets upset that ord(u) _happens_
to match ord(u.encode('latin-1')) ;-)

I didn't see yet where you had ruled against ord mapping of unicode to bytes,
so I am hopeful that you will consider it.

>> Furthermore, by restricting to ASCII you'd also outrule hex escapes
>> which seem to be the natural choice for presenting binary data in
>> literals - the Unicode representation would then only be an
>> implementation detail of the way Python treats "string" literals
>> and a user would certainly expect to find e.g. \x88 in the bytes object
>> if she writes bytes('\x88').
>
>I guess we'l just have to disappoint her. Too bad for the person who
>wrote bytes("\x12\x34\x56\x78\x9a\xbc\xde\xf0") -- they'll have to
>write bytes([0x12,0x34,0x56,0x78,0x9a,0xbc,0xde,0xf0]). Not so bad IMO
>and certainly easier than a *mixture* of hex and ASCII like
>'\xabc\xdef'.
>
>> 

Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread Guido van Rossum
On 2/15/06, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> If we went with longer names, a slight variation on the opentext/openbinary
> idea would be to use opentext and opendata.

After some thinking I don't like opendata any more -- often data is
text, so the term is wrong. openbinary is fine but long. So how about
openbytes? This clearly links the resulting object with the bytes
type, which is mutually reassuring.

Regarding open vs. opentext, I'm still not sure. I don't want to
generalize from the openbytes precedent to openstr or openunicode
(especially since the former is wrong in 2.x and the latter is wrong
in 3.0). I'm tempting to hold out for open() since it's most
compatible.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread Guido van Rossum
On 2/15/06, Fuzzyman <[EMAIL PROTECTED]> wrote:
>  Forcing the programmer to be aware of encodings, also pushes the same
> requirement onto the user (who is often the source of the text in question).

The programmer shouldn't have to be aware of encodings most of the
time -- it's the job of the I/O library to determine the end user's
(as opposed to the language's) default encoding dynamically and act
accordingly. Users who use non-ASCII characters without informing the
OS of their encoding are in a world of pain, *unless* they use the OS
default encoding (which may vary per locale). If the OS can figure out
the default encoding, so can the Python I/O library. Many apps won't
have to go beyond this at all.

Note that I don't want to use this OS/user default encoding as the
default encoding between bytes and strings; once you are reading bytes
you are writing "grown-up" code and you will have to be explicit. It's
only the I/O library that should automatically encode on write and
decode on read.

>  Currently you can read a text file and process it - making sure that any
> changes/requirements only use ascii characters. It therefore doesn't matter
> what 8 bit ascii-superset encoding is used in the original. If you force the
> programmer to specify the encoding in order to read the file, they would
> have to pass that requirement onto their user. Their user is even less
> likely to be encoding aware than the programmer.

I disagree -- the user most likely has set or received a default
encoding when they first got the computer, and that's all they are
using. If other tools (notepad, wordpad, emacs, vi etc.) can figure
out the encoding, so can Python's I/O library.

>  What this means, is that for simple programs where the programmer doesn't
> want to have to worry about encoding, or can't force the user to be aware,
> they will read in the file as bytes.

Of course not!

> Modules will quickly and inevitably be
> created implementing all the 'string methods' for bytes. New programmers
> will gravitate to these and the old mess will continue, but with a more
> awkward hybrid than before. (String manipulations of byte sequences will no
> longer be a core part of the language - and so be harder to use.)

This seems an unlikely development if we do the conversions in the I/O library.

>  Not sure what we can do to obviate this of course... but is this change
> actually going to improve the situation or make it worse ?

I'm not worried about this scenario. "What if all the programmers in
the world suddenly became dumb?"

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread M.-A. Lemburg
Guido van Rossum wrote:
> On 2/15/06, Nick Coghlan <[EMAIL PROTECTED]> wrote:
>> If we went with longer names, a slight variation on the opentext/openbinary
>> idea would be to use opentext and opendata.
> 
> After some thinking I don't like opendata any more -- often data is
> text, so the term is wrong. openbinary is fine but long. So how about
> openbytes? This clearly links the resulting object with the bytes
> type, which is mutually reassuring.
> 
> Regarding open vs. opentext, I'm still not sure. I don't want to
> generalize from the openbytes precedent to openstr or openunicode
> (especially since the former is wrong in 2.x and the latter is wrong
> in 3.0). I'm tempting to hold out for open() since it's most
> compatible.

Maybe a weird idea, but why not use static methods on the
bytes and str type objects for this ?!

E.g. bytes.openfile(...) and unicode.openfile(...) (in 3.0
renamed to str.openfile())

After all, you are in a certain way constructing object
of the given types - only that the input to these
constructors happen to be files in the file system.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 15 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread Barry Warsaw
On Wed, 2006-02-15 at 09:17 -0800, Guido van Rossum wrote:

> Regarding open vs. opentext, I'm still not sure. I don't want to
> generalize from the openbytes precedent to openstr or openunicode
> (especially since the former is wrong in 2.x and the latter is wrong
> in 3.0). I'm tempting to hold out for open() since it's most
> compatible.

If we go with two functions, I'd much rather hang them off of the file
type object then add two new builtins.  I really do think file.bytes()
and file.text() (a.k.a. open.bytes() and open.text()) is better than
opentext() or openbytes().

-Barry



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread Barry Warsaw
On Wed, 2006-02-15 at 18:29 +0100, M.-A. Lemburg wrote:

> Maybe a weird idea, but why not use static methods on the
> bytes and str type objects for this ?!
> 
> E.g. bytes.openfile(...) and unicode.openfile(...) (in 3.0
> renamed to str.openfile())

That's also not a bad idea, but I'd leave off one or the other of the
redudant "open" and "file" parts.  E.g. bytes.open() and unicode.open()
seem fine to me (we all know what 'open' means, right? :).

-Barry



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C AST to Python discussion

2006-02-15 Thread A.M. Kuchling
On Wed, Feb 15, 2006 at 10:29:38AM -0500, Jeremy Hylton wrote:
> Unfortunately, the compiler talk isn't until the last day and I can't
> stay for sprints.  It would be better to have the talk, then the open
> space, then the sprint.

If you mean "Implementation of the Python Bytecode Compiler", that's
on Saturday at 10:50, so you have a whole day in which to fit an open
space event.  Unfortunately there are already a lot of open space
events on that day, and the next open slot is at 3:15PM.  But if you
don't need a room to talk in, I'm sure you can find a comfortable
place for 5 or 6 people to chat.

--amk

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C AST to Python discussion

2006-02-15 Thread Barry Warsaw
On Wed, 2006-02-15 at 00:34 -0800, Brett Cannon wrote:

> I personally think we should choose an initial global access API to
> the AST as a starting API.  I like the sys.ast_transformations idea
> since it is simple and gives enough access that whether read-only or
> read-write is allowed something like PyChecker can get the access it
> needs.

I haven't been following the AST stuff closely enough, but I'm not crazy
about putting access to this in the sys module.  It seems like it
clutters that up with a name that will be rarely used by the average
Python programmer.

-Barry



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread M.-A. Lemburg
Barry Warsaw wrote:
> On Wed, 2006-02-15 at 18:29 +0100, M.-A. Lemburg wrote:
> 
>> Maybe a weird idea, but why not use static methods on the
>> bytes and str type objects for this ?!
>>
>> E.g. bytes.openfile(...) and unicode.openfile(...) (in 3.0
>> renamed to str.openfile())
> 
> That's also not a bad idea, but I'd leave off one or the other of the
> redudant "open" and "file" parts.  E.g. bytes.open() and unicode.open()
> seem fine to me (we all know what 'open' means, right? :).

Thinking about it, I like your idea better (file.bytes()
and file.text()).

Anyway, as long as we don't start adding openthis() and openthat()
I guess I'm happy ;-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 15 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 release schedule

2006-02-15 Thread Barry Warsaw
On Tue, 2006-02-14 at 21:24 -0800, Neal Norwitz wrote:

> We still need a release manager.  No one has heard from Anthony.  If
> he isn't interested is someone else interested in trying their hand at
> it?  There are many changes necessary in PEP 101 because since the
> last release both python and pydotorg have transitioned from CVS to
> SVN.  Creosote also moved.

I would definitely like to see a PEP 101 update as part of the 2.5 RM's
responsibilities, and I think it could be done while spinning the first
alpha release.  I know others have volunteered, but in a pinch I'd be
happy to dust off my RM hat and help out too.

-Barry



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread Barry Warsaw
On Wed, 2006-02-15 at 19:02 +0100, M.-A. Lemburg wrote:

> Anyway, as long as we don't start adding openthis() and openthat()
> I guess I'm happy ;-)

Me too! :)

-Barry



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread Guido van Rossum
On 2/15/06, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> Barry Warsaw wrote:
> > On Wed, 2006-02-15 at 18:29 +0100, M.-A. Lemburg wrote:
> >
> >> Maybe a weird idea, but why not use static methods on the
> >> bytes and str type objects for this ?!
> >>
> >> E.g. bytes.openfile(...) and unicode.openfile(...) (in 3.0
> >> renamed to str.openfile())
> >
> > That's also not a bad idea, but I'd leave off one or the other of the
> > redudant "open" and "file" parts.  E.g. bytes.open() and unicode.open()
> > seem fine to me (we all know what 'open' means, right? :).
>
> Thinking about it, I like your idea better (file.bytes()
> and file.text()).

This is better than making it a static/class method on file (which has
the problem that it might return something that's not a file at all --
file is a particular stream implementation, there may be others) but I
don't like the tight coupling it creates between a data type and an
I/O library. I still think that having global (i.e. built-in) factory
functions for creating various stream types makes the most sense.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-15 Thread Jim Jewett
On 2/14/06, Neil Schemenauer wrote:
> People could spell it bytes(s.encode('latin-1'))

Guido wrote:
> At the cost of an extra copying step.

I asked:
> ... why not just add some smarts to the bytes constructor?

Guido wrote:

> ... the VM usually keeps an extra reference
> on the stack so the refcount is never 1. But
> you can't rely on that

I did miss this, but _PyString_Resize seems to
work around it, and I'm not sure that the bytes
object can't be just as intimate.

Even if that is insurmountable, bytes objects
could recognize two states -- one normal, and
one for "I'm delegating to a string, and have to
copy to my own buffer before I actually mutate
anything."

Then a new bytes object would still need its
own header, but the data copying could often
be avoided.

But back to the possibility of not creating
even a new object header...
> the str's underlying array is allocated inline
> with the str header, this require str and
> bytes to have the same object layout. But
> since bytes are mutable, they can't.

Looking at the arraymodule, the only extra
fields in an array are weakrefs, description
(which will no longer be needed) and tracking
for the indirection.  There are even a few extra
bytes leftover that could be used to indicate
that ob_item was redirected later, the way
tables do with small_table.

-jJ
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread Bill Janssen
> If we go with two functions, I'd much rather hang them off of the file
> type object then add two new builtins.  I really do think file.bytes()
> and file.text() (a.k.a. open.bytes() and open.text()) is better than
> opentext() or openbytes().

+1.

The default behavior of the current open() in opening files as text is
particularly grating.  This would make things much clearer.

Bill
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread Jason Orendorff
Instead of byte literals, how about a classmethod bytes.from_hex(), which works like this:

  # two equivalent things
  expected_md5_hash = bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')

  expected_md5_hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83, 227, 131, 79, 229, 201, 46, 106])

It's just a nicety; the former fits my brain a little better.  This would work fine both in 2.5 and in 3.0.

I thought about unicode.encode('hex'), but obviously it will continue
to return a str in 2.x, not bytes.  Also the pseudo-encodings
('hex', 'rot13', 'zip', 'uu', etc.) generally scare me.  And now
that bytes and text are going to be two very different types, they're
even weirder than before.  Consider:

  text.encode('utf-8') ==> bytes
  text.encode('rot13') ==> text
  bytes.encode('zip') ==> bytes
  bytes.encode('uu') ==> text (?)

This state of affairs seems kind of crazy to me.

Actually users trying to figure out Unicode would probably be better served if bytes.encode() and text.decode() did not exist.

-j

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes type discussion

2006-02-15 Thread Martin v. Löwis
Adam Olsen wrote:
> Making it an error to have 8-bit str literals in 2.x would help
> educate the user that they will change behavior in 3.0 and not be
> 8-bit str literals anymore.

You would like to ban string literals from the language? Remember:
all string literals are currently 8-bit (byte) strings.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread Guido van Rossum
On 2/15/06, Jason Orendorff <[EMAIL PROTECTED]> wrote:
> Instead of byte literals, how about a classmethod bytes.from_hex(), which
> works like this:
>
># two equivalent things
>expected_md5_hash =
> bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')
>expected_md5_hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83, 227,
> 131, 79, 229, 201, 46, 106])
>
>  It's just a nicety; the former fits my brain a little better.  This would
> work fine both in 2.5 and in 3.0.

Yes, this looks nice.

>  I thought about unicode.encode('hex'), but obviously it will continue to
> return a str in 2.x, not bytes.  Also the pseudo-encodings ('hex', 'rot13',
> 'zip', 'uu', etc.) generally scare me.  And now that bytes and text are
> going to be two very different types, they're even weirder than before.
> Consider:
>
>text.encode('utf-8') ==> bytes
>text.encode('rot13') ==> text
>bytes.encode('zip') ==> bytes
>bytes.encode('uu') ==> text (?)
>
>  This state of affairs seems kind of crazy to me.
>
>  Actually users trying to figure out Unicode would probably be better served
> if bytes.encode() and text.decode() did not exist.

Yeah, the pseudogeneralizations seem to be a mistake -- they are
almost universally frowned upon. I'll happily send their to their
grave in Py3k.

It would be better if the signature of text.encode() always returned a
bytes object. But why deny the bytes object a decode() method if text
objects have an encode() method?

I'd say there are two "symmetric" API flavors possible (t and b are
text and bytes objects, respectively, where text is a string type,
either str or unicode; enc is an encoding name):

- b.decode(enc) -> t; t.encode(enc) -> b
- b = bytes(t, enc); t = text(b, enc)

I'm not sure why one flavor would be preferred over the other,
although having both would probably be a mistake.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bdist_* to stdlib?

2006-02-15 Thread Trent Mick
[Bob Ippolito wrote]
>...
> >/Library/Frameworks/Python.framework/...
> >/Applications/MacPython-2.4/...  # just MacPython does this
> 
> ActivePython doesn't install app bundles for IDLE or anything?

It does, but puts them under here instead:
/Library/Frameworks/Python.framework/Versions/X.Y/Resources/

>...
> >Also, a receipt of the installation ends up here:
> >
> >/Library/Receipts/$package_name/...
> >
> >though Apple does not provide tools for uninstallation using those
> >receipts.
> 
> That stuff is really behind the scenes stuff that's wholly managed by  
> Installer.app and is pretty much irrelevant.

Sure.

> Single apps are better than OK.  Download them by whatever means you  
> want, put them wherever you want, and run them.  You can run any well- 
> behaved application from a DMG (or a CD, or a USB key, or any other  
> readable media).

For naive or new-to-mac users it is a confusing process to get the .app
bundle to an appropriate place and then start running it. Why else have
various app distributors out there come up with myriad slick background
images for their DMG's trying to instruct users what to do with the
icons in the mounted DMG's Finder window?

On Windows you download an MSI (it ends up in your browser downloads
folder), it starts the installation, and the end of the installation it
starts the app for you. The app is nicely in Program Files. No need to
eject something. No need to find somewhere to drag the icon.

I'll grant that having the whole thing in one bundle is cool/handy/cute.

...anyway this is getting seriously OT for python-dev. :)

Trent

-- 
Trent Mick
[EMAIL PROTECTED]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread Thomas Heller
Jason Orendorff wrote:
> Instead of byte literals, how about a classmethod bytes.from_hex(), which
> works like this:
> 
>   # two equivalent things
>   expected_md5_hash = bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')

I hope this will also be equivalent:
>   expected_md5_hash = bytes.from_hex('5c 53 50 24 ca c5 19 91 53 e3 83 4f e5 
> c9 2e 6a')

Thomas

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-15 Thread Josiah Carlson

Ron Adam <[EMAIL PROTECTED]> wrote:
> Greg Ewing wrote:
> > Ron Adam wrote:
> >> b = bytes(0L) ->  bytes([0,0,0,0])
> > 
> > No, bytes(0L) --> TypeError because 0L doesn't implement
> > the iterator protocol or the buffer interface.
> 
> It wouldn't need it if it was a direct C memory copy.

Yes it would.  Python long integers are stored as arrays of signed
16-bit short ints.  See longintrepr.h from the source.


 - Josiah

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bdist_* to stdlib?

2006-02-15 Thread Bob Ippolito

On Feb 15, 2006, at 4:49 AM, Jan Claeys wrote:

> Op wo, 15-02-2006 te 14:00 +1300, schreef Greg Ewing:
>> I'm disappointed that the various Linux distributions
>> still don't seem to have caught onto the very simple
>> idea of *not* scattering files all over the place when
>> installing something.
>>
>> MacOSX seems to be the only system so far that has got
>> this right -- organising the system so that everything
>> related to a given application or library can be kept
>> under a single directory, clearly labelled with a
>> version number.
>
> Those directories might be mounted on entirely different hardware  
> (even
> over a network), often with different characteristics (access speed,
> writeability, etc.).

Huh?  What does that have to do with anything?  I've never seen a  
system where /usr/include, /usr/lib, /usr/bin, etc. are not all on  
the same mount.  It's not really any different with OS X either.

-bob

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 PEP

2006-02-15 Thread Martin v. Löwis
Alain Poirier wrote:
>   - is (c)ElementTree still planned for inclusion ?

It is included already.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C AST to Python discussion

2006-02-15 Thread Martin v. Löwis
Thomas Wouters wrote:
> I would personally prefer the AST validation to be a separate part of the
> compiler. It means the one or the other can be out of sync, but it also
> means it can be accessed directly (validating AST before sending it to the
> compiler) and the compiler (or CFG generator, or something between AST and
> CFG) can decide not to validate internally generated AST for non-debug
> builds, for instance.

That's how the ast-objects branch currently works. There is a method
checking that the tree actually conforms to the grammar.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bdist_* to stdlib?

2006-02-15 Thread Trent Mick
[Greg Ewing wrote]
> It's not perfect, but it's still a lot better than the
> situation on any other unix I've seen so far.

Better than Unix, sure. But you *can* (and ActivePython does do) install
everything under:

/opt/$app_name/...

> > open DMG, don't run the app from here, drag it to your
> > Applications folder, then eject this window/disk, then run it from
> > /Applications,
> 
> A decently-designed application should be runnable from
> anywhere, including a dmg, if the user wants to do that.
> If an app refuses to run from a dmg, I consider that a
> bug in the application.

Yes, but the typical user probably *wants* to run the app from their
/Applications folder (or somewhere else on their harddrive). When they
start running from the mounted DMG, they can't then unmount the DMG to
clean up. Actually the typical non-geek user doesn't care where they run
the app from. They don't want to worry about those details.

Trent

-- 
Trent Mick
[EMAIL PROTECTED]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread Martin v. Löwis
Jason Orendorff wrote:
>   expected_md5_hash = bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')

This looks good, although it duplicates

expected_md5_hash = binascii.unhexlify('5c535024cac5199153e3834fe5c92e6a')

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes type discussion

2006-02-15 Thread Bengt Richter
On Tue, 14 Feb 2006 15:13:25 -0800, Guido van Rossum <[EMAIL PROTECTED]> wrote:

>I'm about to send 6 or 8 replies to various salient messages in the
>PEP 332 revival thread. That's probably a sign that there's still a
>lot to be sorted out. In the mean time, to save you reading through
>all those responses, here's a summary of where I believe I stand.
>Let's continue the discussion in this new thread unless there are
>specific hairs to be split in the other thread that aren't addressed
>below or by later posts.
>
>Non-controversial (or almost):
>
>- we need a new PEP; PEP 332 won't cut it
>
>- no b"..." literal
>
>- bytes objects are mutable
>
>- bytes objects are composed of ints in range(256)
>
>- you can pass any iterable of ints to the bytes constructor, as long
>as they are in range(256)
>
>- longs or anything with an __index__ method should do, too
>
>- when you index a bytes object, you get a plain int
>
>- repr(bytes[1,0 20, 30]) == 'bytes([10, 20, 30])'
>
>Somewhat controversial:
>
>- it's probably too big to attempt to rush this into 2.5
>
>- bytes("abc") == bytes(map(ord, "abc"))
>
>- bytes("\x80\xff") == bytes(map(ord, "\x80\xff")) == bytes([128, 256])
>
>Very controversial:
>
Given that ord/unichr and ord/chr work as encoding-agnostic function pairs 
symmetrically
mapping between unicode and int or str and int, please consider the effect of 
this API
as illustrated by how it works with the examples:

 >>> def bytes(arg, encoding=None):
 ... if isinstance(arg, str):
 ... if encoding: b = map(ord, arg.decode(encoding))
 ... else: b = map(ord, arg)
 ... elif isinstance(arg, unicode):
 ... if encoding: raise ValueError(
 ... 'Use bytes(%r.encode(%r)) to avoid PY 3000 breakage'%(arg, 
encoding))
 ... b = map(ord, arg)
 ... else:
 ... b = map(int, arg)
 ... if sum(1 for x in b if x<0 or x>255) > 0:
 ... raise ValueError('byte out of range')
 ... return 'bytes(%r)'%b
 ...
 ...

 
Then

>- bytes("abc", "encoding") == bytes("abc") # ignores the "encoding" argument
(Use encoding, the only requirement is that all the resulting ord values be in 
range(0,256))
 >>> bytes("abc\xf6", 'latin-1')
 'bytes([97, 98, 99, 246])'
 >>> print unichr(246)
 ö
 >>> bytes("abc\xf6", 'cp437')
 'bytes([97, 98, 99, 247])'
 >>> print unichr(247)
 ÷

>
>- bytes(u"abc") == bytes("abc") # for ASCII at least
 >>> bytes(u"abc")
 'bytes([97, 98, 99])'

>
>- bytes(u"\x80\xff") raises UnicodeError
 >>> bytes(u"\x80\xff")
 'bytes([128, 255])'

>
>- bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff")
 >>> bytes(u"\x80\xff", "latin-1")
 Traceback (most recent call last):
   File "", line 1, in ?
   File "", line 6, in bytes
 ValueError: Use bytes(u'\x80\xff'.encode('latin-1')) to avoid PY 3000 breakage
 >>> bytes(u'\x80\xff'.encode('latin-1'))
 'bytes([128, 255])'

(If the characters exist in the encoding specified, it will work, otherwise
raises exception. Assumes PY 3000 string encode results in bytes, so it should
work there too ;-)

of course,
 >>> bytes(u'\u1234')
 Traceback (most recent call last):
   File "", line 1, in ?
   File "", line 12, in bytes
 ValueError: byte out of range
and
 >>> bytes([1,2])
 'bytes([1, 2])'
 >>> bytes([1,-1])
 Traceback (most recent call last):
   File "", line 1, in ?
   File "", line 12, in bytes
 ValueError: byte out of range
 >>> bytes([1,256])
 Traceback (most recent call last):
   File "", line 1, in ?
   File "", line 12, in bytes
 ValueError: byte out of range

Interestingly, the internal map int on a sequence permits
 >>> bytes(["1", 2, 3L, True, 5.6])
 'bytes([1, 2, 3, 1, 5])'

IOW, any sequence of objects that will convert themselves
to int in range(0,256) will do.

>
>Martin von Loewis's alternative for the "very controversial" set is to
>disallow an encoding argument and (I believe) also to disallow Unicode
>arguments. In 3.0 this would leave us with s.encode() as the
>only way to convert a string (which is always unicode) to bytes. The
>problem with this is that there's no code that works in both 2.x and
>3.0.
>
I hope Martin will reconsider, considering ord/unichr as a symmetric
pair of functions mapping 1:1 to unicode (and ignoring the fact that
this also happens to be the latin-1 mapping ;-)

A test class should be easy, except deciding on appropriate methods
and how the type should be defined. It's the same peculiar problem
as str, i.e., length one would be compatible with int, but not other lengths.
How do we do that?

Regards,
Bengt Richter

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] http://www.python.org/dev/doc/devel still available

2006-02-15 Thread Guido van Rossum
On 2/15/06, Tim Parkin <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
>
> > (Now that I work for Google I realize more than ever before the
> > importance of keeping URLs stable; PageRank(tm) numbers don't get
> > transferred as quickly as contents. I have this worry too in the
> > context of the python.org redesign; 301 permanent redirect is *not*
> > going to help PageRank of the new page.)

> Could you expand on why 301 redirects won't help with the transfer of
> page rank (if you're allowed)? We've done exactly this on many sites and
> the pagerank (or more relevantly the search rankings on specific terms)
> has transferred almost overnight. The bigger pagerank updates (both
> algorithm changes and overhauls in approach) seem to only happen every
> few months and these also seem to take notice of 301 redirects (they
> generally clear up any supplemental results).

OK, perhaps I stand corrected. I don't actually know that much about PageRank!

I still don't like docs.python.org, and adding more like it seems a
mistake; but it's possible that this is because of a poor execution of
the idea (there's no "search docs" button near the search button on
the old python.org).

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Generalizing *args and **kwargs

2006-02-15 Thread Guido van Rossum
On 2/15/06, Thomas Wouters <[EMAIL PROTECTED]> wrote:
> I've been thinking about generalization of the *args/**kwargs syntax for
> quite a while, and even though I'm pretty sure Guido (and many people) will
> consider it overgeneralization, I am finally going to suggest it. This whole
> idea is not something dear to my heart, although I obviously would like to
> see it happen. If the general vote is 'no', I'll write a small PEP or add it
> to PEP 13 and be done with it.

Feel free to write a PEP so that at least we have a concrete proposal
where all the nuts and bolts have been thought through.

I'm currently not able to give much thought to any more new proposals,
so don't expect me to look at it any time soon. Unless a miracle
occurs it's off the table for 2.5 so there's no hurry.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread Guido van Rossum
On 2/15/06, Bill Janssen <[EMAIL PROTECTED]> wrote:
> The default behavior of the current open() in opening files as text is
> particularly grating.

Why? Are you perhaps one of those rare folks who read more binary data
than text?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] http://www.python.org/dev/doc/devel still available

2006-02-15 Thread Tim Parkin
Guido van Rossum wrote:
> On 2/15/06, Tim Parkin <[EMAIL PROTECTED]> wrote:
> 
>>Guido van Rossum wrote:
>>
>>>I have this worry too in the
>>>context of the python.org redesign; 301 permanent redirect is *not*
>>>going to help PageRank of the new page.)
>>Could you expand on why 301 redirects won't help with the transfer of
>>page rank (if you're allowed)? We've done exactly this on many sites and
>>the pagerank (or more relevantly the search rankings on specific terms)
>>has transferred almost overnight. The bigger pagerank updates (both
>>algorithm changes and overhauls in approach) seem to only happen every
>>few months and these also seem to take notice of 301 redirects (they
>>generally clear up any supplemental results).
> 
> OK, perhaps I stand corrected. I don't actually know that much about PageRank!
> 
No problem, I don't think that many people do and the general consensus
seems to be that, although the calculations behind pagerank may be one
of the core parts of the google algorithm, there are so many additional
algorithms* that affect searches on a case by case and day by day basis
that the value from is almost meaningless (apart from possibly 0-2 may
be a problem 3-5 is normal, 6-9 is generally good and 10 I've not seen)

* (for instance, patents on working out the value of inbound links based
on there age, how many other inbound links appeared around the same
time, the status of the originating site as an 'authority' site, the
text contained in the inbound link and title attributes, etc and the
general relation between the inbound links and the 'theme' of the target
site ['theme' == the distribution of important keywords across the site])

> I still don't like docs.python.org, and adding more like it seems a
> mistake; but it's possible that this is because of a poor execution of
> the idea (there's no "search docs" button near the search button on
> the old python.org).
I'll try and make a more functional/usable google search page on the new
site.

Tim Parkin

p.s. I hope you didn't think I was digging for 'insider info'..
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] http://www.python.org/dev/doc/devel still available

2006-02-15 Thread Jeremy Hylton
As I said in an earlier message, there's no need to have a separate
domain to restrict queries to just the doc/current part of python.org.
 Just type
"site:python.org/doc/current your query here"

If there isn't any other rationale, maybe we can redirects
docs.python.org back to www.python.org?

Jeremy

On 2/15/06, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 2/15/06, Tim Parkin <[EMAIL PROTECTED]> wrote:
> > Guido van Rossum wrote:
> >
> > > (Now that I work for Google I realize more than ever before the
> > > importance of keeping URLs stable; PageRank(tm) numbers don't get
> > > transferred as quickly as contents. I have this worry too in the
> > > context of the python.org redesign; 301 permanent redirect is *not*
> > > going to help PageRank of the new page.)
>
> > Could you expand on why 301 redirects won't help with the transfer of
> > page rank (if you're allowed)? We've done exactly this on many sites and
> > the pagerank (or more relevantly the search rankings on specific terms)
> > has transferred almost overnight. The bigger pagerank updates (both
> > algorithm changes and overhauls in approach) seem to only happen every
> > few months and these also seem to take notice of 301 redirects (they
> > generally clear up any supplemental results).
>
> OK, perhaps I stand corrected. I don't actually know that much about PageRank!
>
> I still don't like docs.python.org, and adding more like it seems a
> mistake; but it's possible that this is because of a poor execution of
> the idea (there's no "search docs" button near the search button on
> the old python.org).
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] http://www.python.org/dev/doc/devel still available

2006-02-15 Thread Georg Brandl
Jeremy Hylton wrote:
> As I said in an earlier message, there's no need to have a separate
> domain to restrict queries to just the doc/current part of python.org.
>  Just type
> "site:python.org/doc/current your query here"
> 
> If there isn't any other rationale, maybe we can redirects
> docs.python.org back to www.python.org?

If something like Fredrik's new doc system is adopted, it would be extremely
convenient to refer someone to just

docs.python.org/os.path.join

without looking up how the page is actually named.

Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes type discussion

2006-02-15 Thread Guido van Rossum
On 2/14/06, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Fred L. Drake, Jr. wrote:
>
> > The proper response in this case is often to re-start decoding
> > with the correct encoding, since some of the data extracted so far may have
> > been decoded incorrectly.
>
> If the protocol has been sensibly designed, that shouldn't
> happen, since everything up to the coding marker should
> be ascii (or some other protocol-defined initial coding).
>
> For protocols that are not sensibly designed (or if you're
> just trying to guess) what you suggest may be needed. But
> it would be good to have a nicer way of going about it
> for when the protocol is sensible.

I think that the implementation of encoding-guessing or
auto-encoding-upgrade techniques should be left out of the standard
library design for now. I know that XML does something like this, but
fortunately we employ dedicated C code to parse XML so that particular
case should be taken care of without complicating the rest of the
standard I/O library.

As far as searching bytes objects, that shouldn't be a problem as long
as the search 'string' is also specified as a bytes object.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] http://www.python.org/dev/doc/devel still available

2006-02-15 Thread Tim Parkin
Jeremy Hylton wrote:
> As I said in an earlier message, there's no need to have a separate
> domain to restrict queries to just the doc/current part of python.org.
>  Just type
> "site:python.org/doc/current your query here"
> 
> If there isn't any other rationale, maybe we can redirects
> docs.python.org back to www.python.org?

One possible reason, I'd like to be able to serve the docs up integrated
with the new design (with a full hierarchical navigation). I had planned
on leaving the docs.python.org as the raw tex2html conversion. If we got
rid of the docs.python.org would we still want the www.python.org in the
current style? Personally I was hoping that nearly all of the site could
be in the new html structure and design for consistency and usability
reasons.

Tim Parkin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes type discussion

2006-02-15 Thread Bengt Richter
On Tue, 14 Feb 2006 19:41:07 -0500, "Raymond Hettinger" <[EMAIL PROTECTED]> 
wrote:

>[Guido van Rossum]
>> Somewhat controversial:
>>
>> - bytes("abc") == bytes(map(ord, "abc"))
>
>At first glance, this seems obvious and necessary, so if it's somewhat 
>controversial, then I'm missing something.  What's the issue?
>
ord("x") gets the source encoding's ord value of "x", but if that is not unicode
or latin-1, it will break when PY 3000 makes "x" unicode.

This means until Py 3000 plain str string literals have to use ascii and
escapes in order to preserve the meaning when "x" == u"x".

But the good news is bytes(map(ord(u"x"))) works fine for any source encoding
now or after PY 3000. You just have to type characters into your editor
between the quotes that look on the screen like any of the first 256 unicode 
characters
(or use ascii escapes for unshowables). The u"x" translates x into unicode 
according
to the *character* of x, whatever the source encoding, so all you have to do is
choose characters of the first 256 unicodes. This happens to be latin-1, but 
you can ignore that
unless you are interested in the actual byte values. If they have byte meaning, 
escapes
are clearer anyway, and they work in a unicode string (where 
"x".decode(source_encoding) might
fail on an illegal character).

The solution is to use u"x" for now or use ascii-only with escapes, and just
map ord on either kind of string. This should work when u"x"
becomes equivalent to "x". The unicode that comes from a current u"x" string
defines a *character* sequence. If you use legal latin-1 *characters* in
whatever source encoding your editor and coding cookie say, you will get
the *characters* you see inside the quotes in the u"..." literal translated
to unicode, and the first 256 characters of unicode happen to be the latin-1 
set,
so map ord just works. With a unicode string you don't have to think about 
encoding,
just use ord/unichr in range(0,256). Hex escapes within unicode strings work as 
expected,
so IMO it's pretty clean.

I think I have shown this in a couple of other posts in the orignal thread
(where I created and compiled source code in several encodings including utf-8
and comiled with coding cookies and exec'd the result)

I could always have overlooked something, but I am hopeful.

Regards,
Bengt Richter

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] http://www.python.org/dev/doc/devel still available

2006-02-15 Thread Fredrik Lundh
Georg Brandl wrote:

> If something like Fredrik's new doc system is adopted, it would be extremely
> convenient to refer someone to just
>
> docs.python.org/os.path.join
>
> without looking up how the page is actually named.

you could of course reserve a toplevel directory for that purpose; e.g.

http://python.org/lib/os.path.join

or perhaps

http://python.org/tag/os.path.join
http://python.org/tag/print

etc.





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] bytes type needs a new champion

2006-02-15 Thread Guido van Rossum
Skip has mentioned in private email that he's not available to update
PEP 332. I've therefore rejected that PEP; the current ideas are
rather different so we might as well start a new PEP. Anyway, we need
a new PEP author who can take the current discussion and turn it into
a coherent PEP. I've tried to keep up with the current thread but it
takes too much time to organize it all and I need to start focusing on
the 2.5 release schedule.

Any volunteers?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 PEP

2006-02-15 Thread Fredrik Lundh
Martin v. Löwis wrote:

> >   - is (c)ElementTree still planned for inclusion ?
>
> It is included already.

in the xml.etree package, in case someone's looking for it in the
usual place.

that is,

import xml.etree.ElementTree as ET
import xml.etree.cElementTree as ET

will work in any 2.5 that has a working pyexpat.

(is the xmlplus/xmlcore issue still an issue, btw?)





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread M.-A. Lemburg
Jason Orendorff wrote:
> Instead of byte literals, how about a classmethod bytes.from_hex(), which
> works like this:
> 
>   # two equivalent things
>   expected_md5_hash = bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')
>   expected_md5_hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83, 227,
> 131, 79, 229, 201, 46, 106])
> 
> It's just a nicety; the former fits my brain a little better.  This would
> work fine both in 2.5 and in 3.0.
> 
> I thought about unicode.encode('hex'), but obviously it will continue to
> return a str in 2.x, not bytes.  Also the pseudo-encodings ('hex', 'rot13',
> 'zip', 'uu', etc.) generally scare me. 

Those are not pseudo-encodings, they are regular codecs.

It's a common misunderstanding that codecs are only seen as serving
the purpose of converting between Unicode and strings.

The codec system is deliberately designed to be general enough
to also work with many other types, e.g. it is easily possible to
write a codec that convert between the hex literal sequence you
have above to a list of ordinals:

""" Hex string codec

Converts between a list of ordinals and a two byte hex literal
string.

Usage:
>>> codecs.encode([1,2,3], 'hexstring')
'010203'
>>> codecs.decode(_, 'hexstring')
[1, 2, 3]

(c) 2006, Marc-Andre Lemburg.

"""
import codecs

class Codec(codecs.Codec):

def encode(self, input, errors='strict'):

""" Convert hex ordinal list to hex literal string.
"""
if not isinstance(input, list):
raise TypeError('expected list of integers')
return (
''.join(['%02x' % x for x in input]),
len(input))

def decode(self,input,errors='strict'):

""" Convert hex literal string to hex ordinal list.
"""
if not isinstance(input, str):
raise TypeError('expected string of hex literals')
size = len(input)
if not size % 2 == 0:
raise TypeError('input string has uneven length')
return (
[int(input[(i<<1):(i<<1)+2], 16)
 for i in range(size >> 1)],
size)

class StreamWriter(Codec,codecs.StreamWriter):
pass

class StreamReader(Codec,codecs.StreamReader):
pass

def getregentry():
return (Codec().encode,Codec().decode,StreamReader,StreamWriter)

> And now that bytes and text are
> going to be two very different types, they're even weirder than before.
> Consider:
> 
>   text.encode('utf-8') ==> bytes
>   text.encode('rot13') ==> text
>   bytes.encode('zip') ==> bytes
>   bytes.encode('uu') ==> text (?)
> 
> This state of affairs seems kind of crazy to me.

Really ?

It all depends on what you use the codecs for. The above
usages through the .encode() and .decode() methods is
not the only way you can make use of them.

To get full access to the codecs, you'll have to use
the codecs module.

> Actually users trying to figure out Unicode would probably be better served
> if bytes.encode() and text.decode() did not exist.

You're missing the point: the .encode() and .decode() methods
are merely interfaces to the registered codecs. Whether they
make sense for a certain codec depends on the codec, not the
methods that interface to it, and again, codecs do not
only exist to convert between Unicode and strings.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 15 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-15 Thread Thomas Wouters
On Wed, Feb 15, 2006 at 01:38:41PM -0500, Jim Jewett wrote:
> On 2/14/06, Neil Schemenauer wrote:
> > People could spell it bytes(s.encode('latin-1'))
> 
> Guido wrote:
> > At the cost of an extra copying step.
> 
> I asked:
> > ... why not just add some smarts to the bytes constructor?
> 
> Guido wrote:
> 
> > ... the VM usually keeps an extra reference
> > on the stack so the refcount is never 1. But
> > you can't rely on that
> 
> I did miss this, but _PyString_Resize seems to
> work around it, and I'm not sure that the bytes
> object can't be just as intimate.

No, _PyString_Resize doesn't work around it. _PyString_Resize only works if
the refcount is exactly one: only the caller has a reference. And by
'caller', I mean 'the calling C function'. Besides that, the caller takes
care to only use _PyString_Resize on strings it created itself.
Theoretically it could 'steal' a reference from someplace else, but I
haven't seen _PyString_Resize-using code do that, and it would be a recipe
for disaster.

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread Bill Janssen
Well, I probably am, but that's not the reason.  Reading has nothing
to do with it.

The default mode (text) corrupts data on write on a certain platform
(Windows) by inserting extra bytes in the data stream.  This bug
particularly exhibits itself when programs developed on Linux or Mac
OS X are then run on a Windows platform.  I think it's a bug to
default to a mode which modifies the data stream.  The default mode
should be 'binary'; people interested in exploiting the obsolete
Windows distinction between "text" and "binary" should have to use a
mode switch (I suggest "t") to put a file stream in 'text' mode.

Bill

> On 2/15/06, Bill Janssen <[EMAIL PROTECTED]> wrote:
> > The default behavior of the current open() in opening files as text is
> > particularly grating.
> 
> Why? Are you perhaps one of those rare folks who read more binary data
> than text?
> 
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread Guido van Rossum
On 2/15/06, Bill Janssen <[EMAIL PROTECTED]> wrote:
> Well, I probably am, but that's not the reason.  Reading has nothing
> to do with it.

Actually if you read binary data in text mode on Windows you also get
corrupt (and often truncated) data, unless you're lucky enough that
the binary data contains neither ^Z (EOF) nor CRLF.

> The default mode (text) corrupts data on write on a certain platform
> (Windows) by inserting extra bytes in the data stream.  This bug
> particularly exhibits itself when programs developed on Linux or Mac
> OS X are then run on a Windows platform.  I think it's a bug to
> default to a mode which modifies the data stream.  The default mode
> should be 'binary'; people interested in exploiting the obsolete
> Windows distinction between "text" and "binary" should have to use a
> mode switch (I suggest "t") to put a file stream in 'text' mode.

This might have been a possibility in Python 2.x where binary reads
return strings. In Python 3000 binary files will return bytes objects
while text files will return strings (which are decoded from unicode
using an encoding that's determined when the file is opened, taking
into account system and user settings as well as possible overrides
passed to open()). I expect that the APIs for reading and writing
binary data will be sufficiently different from that for
reading/writing text that even staunch Unix programmers won't make the
mistake of using the text API for creating binary files.

I realize that's not the answer you're looking for, but for backwards
compatibility we can't change the default on Windows in Python 2.x, so
the point is moot until 3.0 or until a new binary file API is added to
2.x.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] math.areclose ...?

2006-02-15 Thread Gustavo J. A. M. Carneiro
  Please, I don't much care about the fine points of the function's
semantics, but PLEASE rename that function to are_close.  Every time I
see this subject in my email client I have to think for a few seconds
what the hell 'areclose' means.  This time it's not just because of the
new PEP 8, 'areclose' is really really hard to read.

-- 
Gustavo J. A. M. Carneiro
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
The universe is always one step beyond logic

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread Barry Warsaw
On Wed, 2006-02-15 at 14:01 -0500, Jason Orendorff wrote:
> Instead of byte literals, how about a classmethod bytes.from_hex(),
> which works like this:
> 
>   # two equivalent things
>   expected_md5_hash =
> bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')
>   expected_md5_hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83,
> 227, 131, 79, 229, 201, 46, 106])

Kind of like binascii.unhexlify() but returning a bytes object.

-Barry



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Generalizing *args and **kwargs

2006-02-15 Thread Nick Coghlan
Thomas Wouters wrote:
> Although I've made it look like I have a working implementation, I haven't.
> I know exactly how to do it, though, except for the AST part ;) Once I
> figure out how to properly work with the AST code I'll probably write this
> patch whether it's a definite 'no' or not, just to see if I can. I wouldn't
> mind if people gave their opinion, though.

A phase 1 for Python 2.5 that allowed keyword args to go between "*args" and 
"**kwds" at the call site would be nice (Guido even approved the concept 
already, it's that it hasn't irritated anyone enough to actually tweak the 
grammar. . .)

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] how bugfixes are handled?

2006-02-15 Thread Arkadiusz Miskiewicz
Hi,

How bugfixes are handled? 

I've posted a bug and a patch + test case for a quite common issue (see
google, problem mentioned on this ml) long time ago and nothing happened
with it
http://sourceforge.net/tracker/index.php?func=detail&aid=1380952&group_id=5470&atid=305470

Is anyone reviewing fixes on regular basis? Or just some bugfixes are
reviewed + commited depending on interest of commiters?

Thanks,
-- 
Arkadiusz MiśkiewiczPLD/Linux Team
http://www.t17.ds.pwr.wroc.pl/~misiek/  http://ftp.pld-linux.org/

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread Guido van Rossum
On 2/15/06, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> Jason Orendorff wrote:
> > Also the pseudo-encodings ('hex', 'rot13',
> > 'zip', 'uu', etc.) generally scare me.
>
> Those are not pseudo-encodings, they are regular codecs.
>
> It's a common misunderstanding that codecs are only seen as serving
> the purpose of converting between Unicode and strings.
>
> The codec system is deliberately designed to be general enough
> to also work with many other types, e.g. it is easily possible to
> write a codec that convert between the hex literal sequence you
> have above to a list of ordinals:

It's fine that the codec system supports this. However it's
questionable that these encodings are invoked using the standard
encode() and decode() APIs; and it will be more questionable once
encode() returns a bytes object. Methods that return different types
depending on the value of an argument are generally a bad idea. (Hence
the movement to have separate opentext and openbinary or openbytes
functions.)

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 PEP

2006-02-15 Thread Nick Coghlan
Neal Norwitz wrote:
> Attached is the 2.5 release PEP 356.  It's also available from: 
> http://www.python.org/peps/pep-0356.html
> 
> Does anyone have any comments?  Is this good or bad?  Feel free to
> send to me comments.
> 
> We need to ensure that PEPs 308, 328, and 343 are implemented.  We
> have possible volunteers for 308 and 343, but not 328.  Brett is doing
> 352 and Martin is doing 353.

PEP 338 is pretty much ready to go, too - just waiting on Guido's review and 
pronouncement on the specific API used in the latest update (his last PEP 
parade said he was OK with the general concept, but I only posted the PEP 302 
compliant version after that).

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] A codecs nit (was Re: bytes.from_hex())

2006-02-15 Thread Barry Warsaw
On Wed, 2006-02-15 at 22:07 +0100, M.-A. Lemburg wrote:

> Those are not pseudo-encodings, they are regular codecs.
> 
> It's a common misunderstanding that codecs are only seen as serving
> the purpose of converting between Unicode and strings.
> 
> The codec system is deliberately designed to be general enough
> to also work with many other types, e.g. it is easily possible to
> write a codec that convert between the hex literal sequence you
> have above to a list of ordinals:

Slightly off-topic, but one thing that's always bothered me about the
current codecs implementation is that str.encode() (and friends)
implicitly treats its argument as module, and imports it, even if the
module doesn't live in the encodings package.  That seems like a mistake
to me (and a potential security problem if the import has side-effects).
I don't know whether at the very least restricting the imports to the
encodings package would make sense or would break things.

>>> import sys
>>> sys.modules['smtplib']
Traceback (most recent call last):
  File "", line 1, in ?
KeyError: 'smtplib'
>>> ''.encode('smtplib')
Traceback (most recent call last):
  File "", line 1, in ?
LookupError: unknown encoding: smtplib
>>> sys.modules['smtplib']


I can't see any reason for allowing any randomly importable module to
act like an encoding.
-Barry



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how bugfixes are handled?

2006-02-15 Thread Guido van Rossum
We're all volunteers here, and we get a large volume of bugs.
Unfortunately, bugfixes are reviewed on a voluntary basis.

Are you aware of the standing offer that if you review 5 bugs/patches
some of the developers will pay attention to your bug/patch?

On 2/15/06, Arkadiusz Miskiewicz <[EMAIL PROTECTED]> wrote:
> Hi,
>
> How bugfixes are handled?
>
> I've posted a bug and a patch + test case for a quite common issue (see
> google, problem mentioned on this ml) long time ago and nothing happened
> with it
> http://sourceforge.net/tracker/index.php?func=detail&aid=1380952&group_id=5470&atid=305470
>
> Is anyone reviewing fixes on regular basis? Or just some bugfixes are
> reviewed + commited depending on interest of commiters?
>
> Thanks,
> --
> Arkadiusz MiśkiewiczPLD/Linux Team
> http://www.t17.ds.pwr.wroc.pl/~misiek/  http://ftp.pld-linux.org/
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] http://www.python.org/dev/doc/devel still available

2006-02-15 Thread Fredrik Lundh
Georg Brandl wrote:

> If something like Fredrik's new doc system is adopted

don't hold your breath, by the way.  it's clear that the current PSF-sponsored
site overhaul won't lead to anything remotely close to a best-of-breed python-
powered site, and I'm beginning to think that I should spend my time on other
stuff.

I find it a bit sad that we'll end up with a butt-ugly static and boring 
python.org
site when we have so much talent in the python universe, but I guess that's in-
evitable at this stage in Python's evolution.





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] ssize_t branch merged

2006-02-15 Thread Martin v. Löwis
Just in case you haven't noticed, I just merged
the ssize_t branch (PEP 353).

If you have any corrections to the code to make which
you would consider bug fixes, just go ahead.

If you are uncertain how specific problems should be resolved,
feel free to ask.

If you think certain API changes should be made, please
discuss them here - they would need to be reflected in the
PEP as well.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes type discussion

2006-02-15 Thread Fredrik Lundh
Guido van Rossum wrote:

> - it's probably too big to attempt to rush this into 2.5

After reading some of the discussion, and seen some of the arguments,
I'm beginning to feel that we need working code to get this right.

It would be nice if we could get a bytes() type into the first alpha, so
the design can get some real-world exposure in real-world apps/libs be-
fore 2.5 final.





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes type discussion

2006-02-15 Thread Thomas Wouters
On Wed, Feb 15, 2006 at 11:28:59PM +0100, Fredrik Lundh wrote:

> After reading some of the discussion, and seen some of the arguments,
> I'm beginning to feel that we need working code to get this right.
> 
> It would be nice if we could get a bytes() type into the first alpha, so
> the design can get some real-world exposure in real-world apps/libs be-
> fore 2.5 final.

I agree that working code would be nice, but I don't see why it should be in
an alpha release. IMHO it shouldn't be in an alpha release until it at least
looks good enough for the developers, and good enough to put in a PEP.

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes type discussion

2006-02-15 Thread Fredrik Lundh
Thomas Wouters wrote:

> > After reading some of the discussion, and seen some of the arguments,
> > I'm beginning to feel that we need working code to get this right.
> >
> > It would be nice if we could get a bytes() type into the first alpha, so
> > the design can get some real-world exposure in real-world apps/libs be-
> > fore 2.5 final.
>
> I agree that working code would be nice, but I don't see why it should be in
> an alpha release. IMHO it shouldn't be in an alpha release until it at least
> looks good enough for the developers, and good enough to put in a PEP.

I'm not convinced that the PEP will be good enough without experience
from using a bytes type in *real-world* (i.e. *existing*) byte-crunching
applications.

if we put it in an early alpha, we can use it with real code, fix any issues
that arises, and even remove it if necessary, before 2.5 final.  if it goes in
late, we'll be stuck with whatever the PEP says.





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread Michael Foord
Guido van Rossum wrote:
> On 2/15/06, Fuzzyman <[EMAIL PROTECTED]> wrote:
>   
>>  Forcing the programmer to be aware of encodings, also pushes the same
>> requirement onto the user (who is often the source of the text in question).
>> 
>
> The programmer shouldn't have to be aware of encodings most of the
> time -- it's the job of the I/O library to determine the end user's
> (as opposed to the language's) default encoding dynamically and act
> accordingly. Users who use non-ASCII characters without informing the
> OS of their encoding are in a world of pain, *unless* they use the OS
> default encoding (which may vary per locale). If the OS can figure out
> the default encoding, so can the Python I/O library. Many apps won't
> have to go beyond this at all.
>
> Note that I don't want to use this OS/user default encoding as the
> default encoding between bytes and strings; once you are reading bytes
> you are writing "grown-up" code and you will have to be explicit. It's
> only the I/O library that should automatically encode on write and
> decode on read.
>
>   
>>  Currently you can read a text file and process it - making sure that any
>> changes/requirements only use ascii characters. It therefore doesn't matter
>> what 8 bit ascii-superset encoding is used in the original. If you force the
>> programmer to specify the encoding in order to read the file, they would
>> have to pass that requirement onto their user. Their user is even less
>> likely to be encoding aware than the programmer.
>> 
>
> I disagree -- the user most likely has set or received a default
> encoding when they first got the computer, and that's all they are
> using. If other tools (notepad, wordpad, emacs, vi etc.) can figure
> out the encoding, so can Python's I/O library.
>
>   
I'm intrigued  by the encoding guessing techniques you envisage. I 
currently use a modified version of something contained within docutils.

I read the file in binary and first check for UTF8 or UTF16 BOM.

Then I try to decode the text using the following encodings (in this 
order) :

ascii
UTF8
locale.nl_langinfo(locale.CODESET)
locale.getlocale()[1]
locale.getdefaultlocale()[1]
ISO8859-1
cp1252

(The encodings returned by the locale calls are only used on platforms 
for which they exist.)

The first decode that doesn't blow up, I assume is correct. The problem 
I have is that I usually (for the application I have in mind anyway) 
then want to re-encode into a consistent encoding rather than back into 
the original encoding. If the encoding of the original (usually 
unspecified) is any arbitrary 8-bit ascii superset (as it usually is), 
then it will probably not blow up if decoded with any other arbitrary 8 
bit encoding. This means I sometimes get junk.

I'm curious if there is any extra things I could do ? This is possibly 
beyond the scope of this discussion (in which case I apologise), but we 
are discussing the techniques the I/O layer would use to 'guess' the 
encoding of a file opened in text mode - so maybe it's not so off topic.

There is also the following cookbook recipe that uses an heuristic to 
guess encoding :

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/163743

XML, HTML, or other text streams may also contain additional information 
about their encoding - which be unreliable. :-)

All the best,

Michael Foord


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] http://www.python.org/dev/doc/devel still available

2006-02-15 Thread Tim Parkin
Fredrik Lundh wrote:
> Georg Brandl wrote:
>>If something like Fredrik's new doc system is adopted
> 
> don't hold your breath, by the way.  it's clear that the current PSF-sponsored
> site overhaul won't lead to anything remotely close to a best-of-breed python-
> powered site, and I'm beginning to think that I should spend my time on other
> stuff.
> 
> I find it a bit sad that we'll end up with a butt-ugly static and boring 
> python.org
> site when we have so much talent in the python universe, but I guess that's 
> in-
> evitable at this stage in Python's evolution.
> 
> 
Some very large sites - and some may say some very interesting, very
large sites - are delivered as static html (for some time the two
biggest sites in the uk were both delivered as static html, one of which
was bbc.co.uk and the other was sportinglife.com for which I used to be
the main web developer. As far as I know the bbc and sporting life still
both use static html for a large portion of their content).

Regarding the python site, it was a concious decision to deliver the
pages as static html. This was for many reasons, of which a prominent
one (but by no means the only major one) was mirroring.

One of the advantages of a semantically structured website that uses css
for layout and style is that, as far as design goes, you are welcome to
re-style the html using css; we can also offer it as an alternate
stylesheet (just as I've added a 'large font' style and a 'default font
settings' style). However, design is a subjective thing - I've spent
quite a bit of time reacting to the majority of constructive feedback
(probably far too much time when I should have been getting content
migrated) but obviously it won't please everyone :-)

As for cutting edge, it's using twisted, restructured text, nevow, clean
urls, xhtml, semantic markup, css2, interfaces, adaption, eggs, the path
module, moinmoin, yaml (to avoid xml), etc  - just because it's
generating all of the html up front rather than at runtime doesn't mean
that it's not best-of-breed (although I'm not sure what best-of-breed
is; I'm presuming it's some sort of accolade for excellence in python
programming; something I don't think I would be qualified to judge,
never mind receive).

However, back to the Goerg's comment, we could use mod_write to map:

/lib/sets

to:

/doc/lib/module-sets.html

with

rewriteRule ^/lib/(.*)$ /doc/lib/module-$1.html [L,R=301]

(not tested)

Whether that is a good idea or not is another matter.


Tim Parkin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] http://www.python.org/dev/doc/devel still available

2006-02-15 Thread Fredrik Lundh
Tim Parkin wrote:

> As for cutting edge, it's using twisted, restructured text, nevow, clean
> urls, xhtml, semantic markup, css2, interfaces, adaption, eggs, the path
> module, moinmoin, yaml (to avoid xml),

that's not cutting edge, that's buzzword bingo.

> something I don't think I would be qualified to judge,never mind receive).

no, you're not qualified.  yet, someone gave you total control over the
future of python.org, and there's no way to make you give it up, despite
the fact that you're over a year late and the stuff you've delivered this
far is massively underwhelming.  that's the problem.





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes type needs a new champion

2006-02-15 Thread Neil Schemenauer
Guido van Rossum <[EMAIL PROTECTED]> wrote:
> Anyway, we need a new PEP author who can take the current
> discussion and turn it into a coherent PEP.

I'm not sure that I have time to be the official champion.  Right
now I'm spending some time to collect all the ideas presented in the
email messages and put them into a draft PEP.  Hopefully that will
be useful.

  Neil

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread Guido van Rossum
On 2/15/06, Michael Foord <[EMAIL PROTECTED]> wrote:
> I'm intrigued  by the encoding guessing techniques you envisage.

Don't hold your breath. *I* am not very interested in guessing
encodings -- I was just commenting on posts by others that mentioned
difficulties caused by this approach. My position is that the standard
library (with the exception of XML processing code perhaps) shouldn't
be *guessing* encodings but simply using the encoding specified by the
user (or the OS default) in the environment or some such place. (It is
OS dependent how to retrieve this information but my hypothesis is
that every OS with any kind of text support has a way to get this info
-- even if it's as rudimentary as "it's always ASCII" (v7 Unix :-) or
"it's always UTF-8" (I am hoping this will eventually be the answer in
the distant future).

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes type discussion

2006-02-15 Thread Guido van Rossum
I'm actually assuming to put this off until 2.6 anyway.

On 2/15/06, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> Thomas Wouters wrote:
>
> > > After reading some of the discussion, and seen some of the arguments,
> > > I'm beginning to feel that we need working code to get this right.
> > >
> > > It would be nice if we could get a bytes() type into the first alpha, so
> > > the design can get some real-world exposure in real-world apps/libs be-
> > > fore 2.5 final.
> >
> > I agree that working code would be nice, but I don't see why it should be in
> > an alpha release. IMHO it shouldn't be in an alpha release until it at least
> > looks good enough for the developers, and good enough to put in a PEP.
>
> I'm not convinced that the PEP will be good enough without experience
> from using a bytes type in *real-world* (i.e. *existing*) byte-crunching
> applications.
>
> if we put it in an early alpha, we can use it with real code, fix any issues
> that arises, and even remove it if necessary, before 2.5 final.  if it goes in
> late, we'll be stuck with whatever the PEP says.
>
> 
>
>
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ssize_t branch merged

2006-02-15 Thread Guido van Rossum
Great! I'll mark the PEP as accepted. (Which doesn't mean you can't
update it if changes are found necessary.)

--Guido

On 2/15/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Just in case you haven't noticed, I just merged
> the ssize_t branch (PEP 353).
>
> If you have any corrections to the code to make which
> you would consider bug fixes, just go ahead.
>
> If you are uncertain how specific problems should be resolved,
> feel free to ask.
>
> If you think certain API changes should be made, please
> discuss them here - they would need to be reflected in the
> PEP as well.
>
> Regards,
> Martin
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-15 Thread Greg Ewing
Ron Adam wrote:

> I was presuming it would be done in C code and it will just need a 
> pointer to the first byte, memchr(), and then read n bytes directly into 
> a new memory range via  memcpy().

If the object supports the buffer interface, it can be
done that way. But if not, it would seem to make sense to
fall back on the iterator protocol.

> However, if it's done with a Python iterator and then each item is 
> translated to bytes in a sequence, (much slower), an encoding will need 
> to be known for it to work correctly.

No, it won't. When using the bytes(x) form, encoding has
nothing to do with it. It's purely a conversion from one
representation of an array of 0..255 to another.

When you *do* want to perform encoding, you use
bytes(u, encoding) and say what encoding you want
to use.

> Unfortunately Unicode strings 
> don't set an attribute to indicate it's own encoding.

I think you don't understand what an encoding is. Unicode
strings don't *have* an encoding, because theyre not encoded!
Encoding is what happens when you go from a unicode string
to something else.

> Since some longs will be of different length, yes a bytes(0L) could give 
> differing results on different platforms,

It's not just a matter of length. I'm not sure of the
details, but I believe longs are currently stored as an
array of 16-bit chunks, of which only 15 bits are used.
I'm having trouble imagining a use for low-level access
to that format, other than just treating it as an opaque
lump of data for turning back into a long later -- in
which case why not just leave it as a long in the first
place.

Greg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes type discussion

2006-02-15 Thread Fredrik Lundh
Guido wrote:

> I'm actually assuming to put this off until 2.6 anyway.

makes sense.

(but will there be a 2.6?  isn't it time to start hacking on 3.0?)





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   >