Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Greg Ewing

Guido van Rossum wrote:

I've now looked at asciistr. (Thanks Glenn and Ethan for the link.)

Now that I (hopefully) understand it, I'm worried that a text
processing algorithm that uses asciistr might under hard-to-predict
circumstances (such as when the arguments contain nothing of interest
to the algorithm) might return an asciistr instance instead of a str
or bytes instance,


It seems to me that any algorithm with that property
has a genuine ambiguity as to what it should return
in that case. Arguably, returning an asciistr would
be the *right* thing to do, because that would allow
it to be used as a component of a larger algorithm
that was polymorphic with respect to text/bytes.

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Greg Ewing

Glenn Linderman wrote:
A mechanism could be defined where 
"format string" would only contain format specifications, and any other 
text would be considered an error.


Someone already did -- it's called struct.pack(). :-)

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Terry Reedy

On 1/14/2014 12:03 AM, Guido van Rossum wrote:

On Mon, Jan 13, 2014 at 6:25 PM, Terry Reedy  wrote:



byteformat(b'\x00{}\x02{}def', (b'\x01', b'abc',))

b'\x00\x01\x02abcdef'

re.split produces [b'\x00', b'', b'\x02', b'', b'def']. The only ascii bias
is the one already present is the representation of bytes, and the fact that
Python code must have an ascii-compatible encoding.


I don't think it's that easy. Just searching for '{' is enough to
break in surprising ways


I see your point. The punning problem (between a byte being both itself 
and a special indicator character) is worse with bytes formats than the 
similar pun with text, and the potential for mysterious bugs greater. 
(This is related to why we split 'text' and 'bytes' to begin with.)


With text, we break the pun by doubling the character to escape the 
special meaning. This works because, 1) % and { are relatively rare in 
text, 2) %% and {{ are grammatically incorrect, 3) %, {, and especially 
%% and {{ stand out visually.


With bytes, 1) there is no reason why 37 (%) and 123 ({) should be rare, 
2) there is no grammatical rule against the sequences 37, 37 or 123, 
123, and 3) hex escapes \x25 and \x7b, which might appear in a bytes 
format, do not stand out as needing doubling.


My example above breaks if b'\x00' is replaced with b'\x7b'. Even if a 
doubling and undoubling rule were added, re.split could not be used to 
split the format bytes.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Stephen J. Turnbull
Nick Coghlan writes:

 > "Give up" makes it sound like I got tired of arguing without being
 > convinced rather than admitting I was just plain wrong.

I thought it was something in between (you explicitly said "lenient
PEP 460" doesn't hurt you, but my understanding was you still believe
that there's a safer way, and it's the latter you aren't going to try
to convince folks of).

 > While I'll still work on the asciistr proposal,

Thank you for that.  I really wish I had time to, myself, but not for
several weeks... :-(

 > that's unrelated to PEP 460 - it's about making hybrid APIs less

"It" refers to asciistr or to PEP 460?

 > painful to write in Python 3 when you're willing to place the
 > burden of ensuring ASCII compatibility of binary data on the
 > calling code.

Versus what?

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython (merge 2.7 -> 3.1): complain when nbytes > buflen to fix possible buffer overflow (closes #20246)

2014-01-14 Thread Ned Deily
In article <[email protected]>,
 benjamin.peterson  wrote:
> http://hg.python.org/cpython/rev/715fd3d8ac93
> changeset:   88454:715fd3d8ac93
> branch:  3.1
> parent:  86777:b1ddcb220a7f
> parent:  88453:87673659d8f7
> user:Benjamin Peterson 
> date:Mon Jan 13 23:06:14 2014 -0500
> summary:
>   complain when nbytes > buflen to fix possible buffer overflow (closes 
>   #20246)

Benjamin, I think you may have mistakenly merged from 2.7 to 3.1 here 
and then left the 3.1 branch open (i.e. unmerged to 3.2).

-- 
 Ned Deily,
 [email protected]

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Nick Coghlan
On 14 Jan 2014 19:11, "Stephen J. Turnbull"  wrote:
>
> Nick Coghlan writes:
>
>  > "Give up" makes it sound like I got tired of arguing without being
>  > convinced rather than admitting I was just plain wrong.
>
> I thought it was something in between (you explicitly said "lenient
> PEP 460" doesn't hurt you, but my understanding was you still believe
> that there's a safer way, and it's the latter you aren't going to try
> to convince folks of).

I did say that at one point (when Guido first objected to the formatb
idea), but I switched to complete agreement after he pointed out the ASCII
assumption embedded in the formatting syntax itself.

>
>  > While I'll still work on the asciistr proposal,
>
> Thank you for that.  I really wish I had time to, myself, but not for
> several weeks... :-(

Heh, depending on how many quirky edge cases we find, we may still be
working on it by then, especially since there are still a few docs updates
and other fixes I want to get into Python 3.4.

>  > that's unrelated to PEP 460 - it's about making hybrid APIs less
>
> "It" refers to asciistr or to PEP 460?

asciistr

>  > painful to write in Python 3 when you're willing to place the
>  > burden of ensuring ASCII compatibility of binary data on the
>  > calling code.
>
> Versus what?

Versus doing explicit decoding the way urllib.parse does - it only accepts
strict 7-bit ASCII as binary input by default, so you have to decode to
text externally in order to handle arbitrary input that may contain other
bytes.

Cheers,
Nick.

>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Stephen J. Turnbull
Guido van Rossum writes:

 > Of course, nobody in their right mind would use a format string
 > containing UTF-16 or EBCDIC.

How about Shift JIS and Big 5 (traditionally "mandated by Microsoft"
in their respective regions, with Shift JIS still overwhelmingly
popular) and GB* ("GB18030 is not just a good idea, It's The Law")?
Are the Japanese and Chinese crazy by definition?  This is where I get
the willies -- not that you think anybody is crazy by definition, but
because I personally have to live with people who use crazy encodings
for interoperability reasons, in fact about half the text I process
daily for work is in those encodings.

Anyway, the thought makes me shiver.  GB2312 text may be encoded as
EUC-CN, in which case it is ASCII-compatible, so no problem.  I'm not
sure if that's the encoding typically denoted by "GB2312" in email,
though, and in any case it's irrelevant as most emails claiming
"charset=GB2312" I receive nowadays include characters from the
extension repertoires of GBK or GB18030.  Shift JIS, Big 5, and GBK
manage to avoid non-ASCII-compatible use of all characters significant
in Python %-formatting (yay!), but .format is right out because {} are
used.  GB18030 in principle uses far more of the code space, including
all of the syntactically significant punctuation, but in practice I
don't know how many of those characters are actually assigned, let
alone used.

 > And that is precisely my point. When you're using a format string,
 > all of the format string (not just the part between { and }) had
 > better use ASCII or an ASCII superset. And this (rightly)
 > constrains the output to an ASCII superset as well.

Except that if you interpolate something like Shift JIS, much of the
ASCII really isn't ASCII.  That's a general issue, of course, if you
do something that requires iterated format strings, but it's far more
likely to appear to work most of the time with those encodings.

Of course you can say "if it hurts, don't do that", but 

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Nick Coghlan
On 14 January 2014 19:54, Stephen J. Turnbull  wrote:
> Guido van Rossum writes:
>  > And that is precisely my point. When you're using a format string,
>  > all of the format string (not just the part between { and }) had
>  > better use ASCII or an ASCII superset. And this (rightly)
>  > constrains the output to an ASCII superset as well.
>
> Except that if you interpolate something like Shift JIS, much of the
> ASCII really isn't ASCII.  That's a general issue, of course, if you
> do something that requires iterated format strings, but it's far more
> likely to appear to work most of the time with those encodings.
>
> Of course you can say "if it hurts, don't do that", but 

Right, that's the danger I was worried about, but the problem is that
there's at least *some* minimum level of ASCII compatibility that
needs to be assumed in order to define an interpolation format at all
(this is the point I originally missed). For printf-style formatting,
it's % along with the various formatting characters and other syntax
(like digits, parentheses, variable names and "."), with the format
method it's braces, brackets, colons, variable names, etc. The
mini-language parser has to assume in encoding in order to interpret
the format string, and that's *all* done assuming an ASCII compatible
format string (which must make life interesting if you try to use an
ASCII incompatible coding cookie for your source code - I'm actually
not sure what the full implications of that *are* for bytes literals
in Python 3).

The one remaining way I could potentially see a formatb method working
is along the lines of what Glenn (I think) suggested: just like struct
definitions, the formatb specifier would have to consist *solely* of
substitution fields. However, that's getting awfully close to being
just an alternate spelling for the struct module or bytes.join at that
point, which hardly makes for a compelling case to add two new methods
to a builtin type.

Given that one of the concepts with the Python 3 transition was to
take certain problematic constructs (like ASCII compatible
interpolation directly to binary without a separate encoding step)
away and decide whether or not we were happy to live without them, I
think this one has proven to have sufficient staying power to finally
bring it back in Python 3.5 (especially given the gain in lowering the
barrier to porting Python 2 code that makes heavy use of interpolation
to ASCII compatible binary formats).

It's certainly a decision that has its downsides, with the potential
impact on users of ASCII incompatible encodings (mostly in Asia) being
the main one, but I think the increased convenience in working with
ASCII compatible binary protocols and file formats is worth the cost.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython (merge 2.7 -> 3.1): complain when nbytes > buflen to fix possible buffer overflow (closes #20246)

2014-01-14 Thread Benjamin Peterson

On Tue, Jan 14, 2014, at 01:17 AM, Ned Deily wrote:
> In article <[email protected]>,
>  benjamin.peterson  wrote:
> > http://hg.python.org/cpython/rev/715fd3d8ac93
> > changeset:   88454:715fd3d8ac93
> > branch:  3.1
> > parent:  86777:b1ddcb220a7f
> > parent:  88453:87673659d8f7
> > user:Benjamin Peterson 
> > date:Mon Jan 13 23:06:14 2014 -0500
> > summary:
> >   complain when nbytes > buflen to fix possible buffer overflow (closes 
> >   #20246)
> 
> Benjamin, I think you may have mistakenly merged from 2.7 to 3.1 here 
> and then left the 3.1 branch open (i.e. unmerged to 3.2).

The name of the game is graft-gone-horribly-wrong. I think we can just
ignore it, snce 3.1 is on its last legs anyway.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] magic method __bytes__

2014-01-14 Thread R. David Murray
On Mon, 13 Jan 2014 17:38:38 -0800, Ethan Furman  wrote:
> Has anyone actually used __bytes__ yet?  What for?

bytes(email.message.Message()) returns the message object serialized to
"wire format".

--David

PS: I've always thought of "wire format" as *including* files...a file is
a just a "wire" with an indefinite destination and transmission time
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Guido van Rossum
On Tue, Jan 14, 2014 at 12:20 AM, Greg Ewing
 wrote:
> Guido van Rossum wrote:
>>
>> I've now looked at asciistr. (Thanks Glenn and Ethan for the link.)
>>
>> Now that I (hopefully) understand it, I'm worried that a text
>> processing algorithm that uses asciistr might under hard-to-predict
>> circumstances (such as when the arguments contain nothing of interest
>> to the algorithm) might return an asciistr instance instead of a str
>> or bytes instance,
>
>
> It seems to me that any algorithm with that property
> has a genuine ambiguity as to what it should return
> in that case. Arguably, returning an asciistr would
> be the *right* thing to do, because that would allow
> it to be used as a component of a larger algorithm
> that was polymorphic with respect to text/bytes.

Here's an example of what I mean:

def spam(a):
r = asciistr('(')
if a: r += a.strip()
r += asciistr(')')
return r

The argument must be a string.

If I call spam(''), a's type is never concatenated with r, so the
return value is an asciistr. To fix this particular case, we could
drop the "if a:" part. But it could be more significant, e.g. it could
be something like "if a contains any digits". The general fix would be
to add

else: r += a[:0]

but that's still an example of the awkwardness that asciistr() is
trying to avoid.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Brett Cannon
On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum  wrote:

> On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon  wrote:
> > I have been going on the assumption that bytes.format() would change what
> > '{}' meant for itself and would only interpolate bytes. That convenient
> > between Python 2 and 3 since it represents what we want it to (str and
> bytes
> > under the hood, respectively), so it just falls through. We could also
> add a
> > 'b' conversion for bytes() explicitly so as to help people not
> accidentally
> > mix up things in bytes.format() and str.format(). But I was not
> suggesting
> > adding a specific format spec for bytes but instead making bytes.format()
> > just do the .encode('ascii') automatically to help with compatibility
> when a
> > format spec was present. If people want fancy formatting for bytes they
> can
> > always do it themselves before calling bytes.format().
>
> This seems hastily written (e.g. verb missing :-), and I'm not clear
> on what you are (or were) actually proposing. When exactly would
> bytes.format() need .encode('ascii')?
>
> I would be happy to wait a few hours or days for you to to write it up
> clearly, rather than responding in a hurry.


Sorry about that. Busy day at work + trying to stay on top of this entire
conversation was a bit tough. Let me try to lay out what I'm suggesting for
bytes.format() in terms of how it changes
http://docs.python.org/3/library/string.html#format-string-syntax for bytes.

1. New conversion operator of 'b' that operates as PEP 460 specifies (i.e.
tries to get a buffer, else calls __bytes__). The default conversion
changes from 's' to 'b'.
2. Use of the conversion field adds an added step of calling
str.encode('ascii', 'strict') on the result returned from calling
__format__().

That's it. So point 1 means that the following would work in Python 3.5::

  b'Hello, {}, how are you?'.format(b'Guido')
  b'Hello, {!b}, how are you?'.format(b'Guido')

It would produce an error if you used a text argument for 'Guido' since str
doesn't define __bytes__ or a buffer. That gives the EIBTI group their
bytes.format() where nothing magical happens.

For point 2, let's say you have the following in Python 2::

  'I have {} bottles of beer on the wall'.format(10)

Under my proposal, how would you change it to get the same result in Python
2 and 3?::

  b'I have {:d} bottles of beer on the wall'.format(10)

In Python 2 you're just being more explicit about the format, otherwise
it's the same semantics as today. In Python 3, though, this would translate
into (under the hood)::

  b'I have {} bottles of beer on the wall'.format(format(10,
'd').encode('ascii', 'strict'))

This leads to the same bytes value in Python 2 (since it's just a string)
and in Python 3 (as everything accepted by bytes.format() is either bytes
already or converted to from encoding to ASCII bytes). While Python 2 users
would need to make sure they used a format spec to get the same result in
both Python 2 and 3 for ASCII bytes, it's a minor change which also makes
the format more explicit so it's not an inherently bad thing. And for those
that don't want to utilize the automatic ASCII encoding they can just not
use a format spec in the format string and just pass in bytes directly
(i.e. call __format__() themselves and then call str.encode() on their
own). So PBP people get to have a simple way to use bytes.format() in
Python 2 and 3 when dealing with things that can be represented as ASCII
(just as the bytes methods allow for currently).

I think this covers your desire to have numbers and anything else that can
be represented as ASCII be supported for easy porting while covering my
desire that any automatic encoding is clearly explicit in the format string
and in no way special-cased for only some types (the introduction of a 'c'
converter from PEP 460 is also fine with me).

How you would want to translate this proposal with the % operator I'm not
sure since it has been quite a while since I last seriously used it and so
I don't think I'm in a good position to propose a shift for it.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Yury Selivanov
Brett,


I like your proposal.  There is one idea I have that could,
perhaps, improve it:


1. “%s" and “{}” will continue to work for bytes and bytearray in
the following fashion:

 - check if __bytes__/Py_buffer supported.
 - if it is, check that the bytes are strictly in the printable 
   ASCII-subset (a-z, A-Z, 0-9 + special symbols like ! etc).
   Throw an error if the check fails. If not - concatenate.
 - Try str(), and do ".encode(‘ascii’, ‘stcict’)” on the result.


This way *most* of the use cases of python2 will be covered without
touching the code. So:

 - b’Hello {}’.format(‘world’) 
   will be the same as b’hello ‘ + str(‘world’).encode(‘ascii’, ‘strict’)

 - b’Hello {}’.format(‘\u0394’) will throw UnicodeEncodeError

 - b’Status: {}’.format(200)
   will be the same as b’Status: ‘ + str(200).encode(‘ascii’, ‘strict’)

 - b’Hello %s’ % (‘world’,) - the same as the first example

 - b’Connection: {}’.format(b’keep-alive’) - works

 - b’Hello %s’ % (b'\xce\x94’,) - will fail, not ASCII subset we accept

I think it’s OK to check the buffers for ASCII-subset only. Yes, it
will have some sort of sub-optimal performance, but then, it’s quite
rare when string formatting is used to concatenate huge buffers.

2. new operators {!b} and %b. This ones will just use ‘__bytes__’ and 
Py_buffer.

--  
Yury Selivanov

On January 14, 2014 at 11:31:51 AM, Brett Cannon ([email protected]) wrote:
>  
> On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum  
> wrote:
>  
> > On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon  
> wrote:
> > > I have been going on the assumption that bytes.format() would  
> change what
> > > '{}' meant for itself and would only interpolate bytes. That  
> convenient
> > > between Python 2 and 3 since it represents what we want it to  
> (str and
> > bytes
> > > under the hood, respectively), so it just falls through. We  
> could also
> > add a
> > > 'b' conversion for bytes() explicitly so as to help people  
> not
> > accidentally
> > > mix up things in bytes.format() and str.format(). But I was  
> not
> > suggesting
> > > adding a specific format spec for bytes but instead making  
> bytes.format()
> > > just do the .encode('ascii') automatically to help with compatibility  
> > when a
> > > format spec was present. If people want fancy formatting for  
> bytes they
> > can
> > > always do it themselves before calling bytes.format().
> >
> > This seems hastily written (e.g. verb missing :-), and I'm not  
> clear
> > on what you are (or were) actually proposing. When exactly would  
> > bytes.format() need .encode('ascii')?
> >
> > I would be happy to wait a few hours or days for you to to write it  
> up
> > clearly, rather than responding in a hurry.
>  
>  
> Sorry about that. Busy day at work + trying to stay on top of this  
> entire
> conversation was a bit tough. Let me try to lay out what I'm suggesting  
> for
> bytes.format() in terms of how it changes
> http://docs.python.org/3/library/string.html#format-string-syntax  
> for bytes.
>  
> 1. New conversion operator of 'b' that operates as PEP 460 specifies  
> (i.e.
> tries to get a buffer, else calls __bytes__). The default conversion  
> changes from 's' to 'b'.
> 2. Use of the conversion field adds an added step of calling
> str.encode('ascii', 'strict') on the result returned from  
> calling
> __format__().
>  
> That's it. So point 1 means that the following would work in Python  
> 3.5::
>  
> b'Hello, {}, how are you?'.format(b'Guido')
> b'Hello, {!b}, how are you?'.format(b'Guido')
>  
> It would produce an error if you used a text argument for 'Guido'  
> since str
> doesn't define __bytes__ or a buffer. That gives the EIBTI group  
> their
> bytes.format() where nothing magical happens.
>  
> For point 2, let's say you have the following in Python 2::
>  
> 'I have {} bottles of beer on the wall'.format(10)
>  
> Under my proposal, how would you change it to get the same result  
> in Python
> 2 and 3?::
>  
> b'I have {:d} bottles of beer on the wall'.format(10)
>  
> In Python 2 you're just being more explicit about the format,  
> otherwise
> it's the same semantics as today. In Python 3, though, this would  
> translate
> into (under the hood)::
>  
> b'I have {} bottles of beer on the wall'.format(format(10,
> 'd').encode('ascii', 'strict'))
>  
> This leads to the same bytes value in Python 2 (since it's just  
> a string)
> and in Python 3 (as everything accepted by bytes.format() is  
> either bytes
> already or converted to from encoding to ASCII bytes). While  
> Python 2 users
> would need to make sure they used a format spec to get the same result  
> in
> both Python 2 and 3 for ASCII bytes, it's a minor change which also  
> makes
> the format more explicit so it's not an inherently bad thing.  
> And for those
> that don't want to utilize the automatic ASCII encoding they  
> can just not
> use a format spec in the format string and just pass in bytes directly  
> (i.e. call __format__() themselves and

Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Brett Cannon
On Tue, Jan 14, 2014 at 12:29 PM, Yury Selivanov wrote:

> Brett,
>
>
> I like your proposal.  There is one idea I have that could,
> perhaps, improve it:
>
>
> 1. “%s" and “{}” will continue to work for bytes and bytearray in
> the following fashion:
>
>  - check if __bytes__/Py_buffer supported.
>  - if it is, check that the bytes are strictly in the printable
>ASCII-subset (a-z, A-Z, 0-9 + special symbols like ! etc).
>Throw an error if the check fails. If not - concatenate.
>  - Try str(), and do ".encode(‘ascii’, ‘stcict’)” on the result.


>
> This way *most* of the use cases of python2 will be covered without
> touching the code. So:
>

See, I'm fine with having people update their format strings to specify a
format spec; it's minor and isn't totally useless as it expresses what they
mean more explicitly (e.g. "I want this to be a int, I want this to be a
float, and I want this to be an ASCII string" using d, f, and s,
respectively). I want people to have to make a conscious decision to fall
back on an ASCII encoding. What you are suggesting is for people have to
make a conscious decision **not** to encode to ASCII implicitly which is
what I'm trying to avoid with this proposal. My goal is to make it easy to
work with ASCII but as an explicit choice to, not by default.

-Brett


>  - b’Hello {}’.format(‘world’)
>will be the same as b’hello ‘ + str(‘world’).encode(‘ascii’, ‘strict’)
>
>  - b’Hello {}’.format(‘\u0394’) will throw UnicodeEncodeError
>
>  - b’Status: {}’.format(200)
>will be the same as b’Status: ‘ + str(200).encode(‘ascii’, ‘strict’)
>
>  - b’Hello %s’ % (‘world’,) - the same as the first example
>
>  - b’Connection: {}’.format(b’keep-alive’) - works
>
>  - b’Hello %s’ % (b'\xce\x94’,) - will fail, not ASCII subset we accept
>
> I think it’s OK to check the buffers for ASCII-subset only. Yes, it
> will have some sort of sub-optimal performance, but then, it’s quite
> rare when string formatting is used to concatenate huge buffers.


> 2. new operators {!b} and %b. This ones will just use ‘__bytes__’ and
> Py_buffer.
>
> --
> Yury Selivanov
>
> On January 14, 2014 at 11:31:51 AM, Brett Cannon ([email protected]) wrote:
> >
> > On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum
> > wrote:
> >
> > > On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon
> > wrote:
> > > > I have been going on the assumption that bytes.format() would
> > change what
> > > > '{}' meant for itself and would only interpolate bytes. That
> > convenient
> > > > between Python 2 and 3 since it represents what we want it to
> > (str and
> > > bytes
> > > > under the hood, respectively), so it just falls through. We
> > could also
> > > add a
> > > > 'b' conversion for bytes() explicitly so as to help people
> > not
> > > accidentally
> > > > mix up things in bytes.format() and str.format(). But I was
> > not
> > > suggesting
> > > > adding a specific format spec for bytes but instead making
> > bytes.format()
> > > > just do the .encode('ascii') automatically to help with compatibility
> > > when a
> > > > format spec was present. If people want fancy formatting for
> > bytes they
> > > can
> > > > always do it themselves before calling bytes.format().
> > >
> > > This seems hastily written (e.g. verb missing :-), and I'm not
> > clear
> > > on what you are (or were) actually proposing. When exactly would
> > > bytes.format() need .encode('ascii')?
> > >
> > > I would be happy to wait a few hours or days for you to to write it
> > up
> > > clearly, rather than responding in a hurry.
> >
> >
> > Sorry about that. Busy day at work + trying to stay on top of this
> > entire
> > conversation was a bit tough. Let me try to lay out what I'm suggesting
> > for
> > bytes.format() in terms of how it changes
> > http://docs.python.org/3/library/string.html#format-string-syntax
> > for bytes.
> >
> > 1. New conversion operator of 'b' that operates as PEP 460 specifies
> > (i.e.
> > tries to get a buffer, else calls __bytes__). The default conversion
> > changes from 's' to 'b'.
> > 2. Use of the conversion field adds an added step of calling
> > str.encode('ascii', 'strict') on the result returned from
> > calling
> > __format__().
> >
> > That's it. So point 1 means that the following would work in Python
> > 3.5::
> >
> > b'Hello, {}, how are you?'.format(b'Guido')
> > b'Hello, {!b}, how are you?'.format(b'Guido')
> >
> > It would produce an error if you used a text argument for 'Guido'
> > since str
> > doesn't define __bytes__ or a buffer. That gives the EIBTI group
> > their
> > bytes.format() where nothing magical happens.
> >
> > For point 2, let's say you have the following in Python 2::
> >
> > 'I have {} bottles of beer on the wall'.format(10)
> >
> > Under my proposal, how would you change it to get the same result
> > in Python
> > 2 and 3?::
> >
> > b'I have {:d} bottles of beer on the wall'.format(10)
> >
> > In Python 2 you're just being more explicit about the format,
> > otherwise
> > it's the same sem

Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Yury Selivanov
On January 14, 2014 at 12:47:35 PM, Brett Cannon ([email protected]) wrote:
>  
> On Tue, Jan 14, 2014 at 12:29 PM, Yury Selivanov wrote:  
>  
> > Brett,
> >
> >
> > I like your proposal. There is one idea I have that could,
> > perhaps, improve it:
> >
> >
> > 1. “%s" and “{}” will continue to work for bytes and bytearray  
> in
> > the following fashion:
> >
> > - check if __bytes__/Py_buffer supported.
> > - if it is, check that the bytes are strictly in the printable  
> > ASCII-subset (a-z, A-Z, 0-9 + special symbols like ! etc).
> > Throw an error if the check fails. If not - concatenate.
> > - Try str(), and do ".encode(‘ascii’, ‘stcict’)” on the result.  
>  
>  
> >
> > This way *most* of the use cases of python2 will be covered without  
> > touching the code. So:
> >
>  
> See, I'm fine with having people update their format strings  
> to specify a
> format spec; it's minor and isn't totally useless as it expresses  
> what they
> mean more explicitly (e.g. "I want this to be a int, I want this  
> to be a
> float, and I want this to be an ASCII string" using d, f, and s,
> respectively). I want people to have to make a conscious decision  
> to fall
> back on an ASCII encoding. What you are suggesting is for people  
> have to
> make a conscious decision **not** to encode to ASCII implicitly  
> which is
> what I'm trying to avoid with this proposal. My goal is to make  
> it easy to
> work with ASCII but as an explicit choice to, not by default.


I understand.  But OTOH, this whole discussion started because of 
the lack of convenience to work with bytes in py3, plus it’s hard
to maintain *same* codebase.  Updating the code to include new
‘%b’ operators won’t help them.

My proposal is based on the assumption, that most of the string
formatting people usually use in python2 on ‘str’ (not ‘unicode’)
is used for ascii. That’s the implicit convenience of using
bytes that everybody is looking for in py3. It allows having
single codebase, and provides the necessary safety.

Anyways, my 2 cents.

Thank you,
Yury
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Guido van Rossum
On Tue, Jan 14, 2014 at 7:59 AM, Guido van Rossum  wrote:
> Here's an example of what I mean:

I sent that off without proofreading, and I also got one detail about
asciistr() wrong. Here are some corrections.

> def spam(a):
> r = asciistr('(')
> if a: r += a.strip()
> r += asciistr(')')
> return r
>
> The argument must be a string.

Or a bytes object. And the point is that the return type should be the
same as the argument type.

> If I call spam(''),

or spam(b'')

> a's type is never concatenated with r, so the
> return value is an asciistr.

Actually, Nick explained that asciistr() + asciistr() returns str, so
this would be accidentally correct if called with '', but wrong
(returning a str instead of a bytes) if called with b''.

> To fix this particular case, we could
> drop the "if a:" part. But it could be more significant, e.g. it could
> be something like "if a contains any digits". The general fix would be
> to add
>
> else: r += a[:0]
>
> but that's still an example of the awkwardness that asciistr() is
> trying to avoid.

This is still valid.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Jim J. Jewett


Nick Coghlan wrote:
>> Arbitrary binary data and ASCII  compatible binary data are *different 
>> things* and the only argument in favour of modelling them with a single 
>> type is because Python 2 did it that way.

Greg Ewing replied:

> I would say that ASCII compatible binary data is a
> *subset* of arbitrary binary data. As such, a type
> designed for arbitrary binary data is a perfectly good
> way of representing ASCII compatible binary data.

But not when you care about the ASCII-compatible part;
then you should use a subclass.

Obviously, it is too late for separating bytes from
AsciiStructuredBytes.  PBP *may* even mean that just
using the "subclass" for everything (and just the
ignoring the ASCII specific methods when they aren't
appropriate) was always the right implementation choice.

But in terms of explaining the text model, that
separation is important enough that

(1)  We should be reluctant to strengthen the
 "its really just ASCII" messages.
(2)  It *may* be worth creating a virtual
 split in the documentation.

I'm willing ot work on (2) if there is general consensus
that it would be a good idea.  As a rough sketch, I
would change places like

http://docs.python.org/3/library/stdtypes.html#typebytes

from:

Bytes objects are immutable sequences of single bytes.
Since many major binary protocols are based on the ASCII
text encoding, bytes objects offer several methods that
are only valid when working with ASCII compatible data
and are closely related to string objects in a variety
of other ways.

to something more like:

Bytes objects are immutable sequences of single bytes.

A Bytes object could represent anything, and is
appropriate as the underlying storage for a sound sample
or image file.

Virtual subclass ASCIIStructuredBytes


One particularly common use of bytes is to represent
the contents of a file, or of a network message.  In
these cases, the bytes will often represent Text
*in a specific encoding* and that encoding will usually
be a superset of ASCII.  Rather than create and support
an ASCIIStructuredBytes subclass, Python simply added
support for these use cases straight to Bytes objects,
and assumes that this support simply won't be used when
when it does not make sense. For example, bytes literals
*could* be used to construct a sound sample, but the
literals will be far easier to read when they are used
to represent (encoded) ASCII text, such as "OPEN". 

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Guido van Rossum
[Other readers: asciistr is at https://github.com/jeamland/asciicompat]

On Mon, Jan 13, 2014 at 11:44 PM, Nick Coghlan  wrote:
> Right, asciistr is designed for a specific kind of hybrid API where
> you want to accept binary input (and produce binary output) *and* you
> want to accept text input (and produce text output). Porting those
> from Python 2 to Python 3 is painful not because of any limitations of
> the str or bytes API but because it's the only use case I have found
> where I actually *missed* the implicit interoperability offered by the
> Python 2 str type.

Yes, the use case is clear.

> It's not an implementation style I would consider appropriate for the
> standard library - we need to code very defensively in order to aid
> debugging in arbitrary contexts, so I consider having an API like
> urllib.parse demand 7-bit ASCII in the binary version, and require
> text to handle impure input to be a better design choice.

This surprises me. I think asciistr should strive to be useful for the
stdlib as well.

> However, in an environment where you can place greater preconditions
> on your inputs (such as "ensure all input data is ASCII compatible")

That gives me the Python 2 willies. :-(

> and you're willing to tolerate the occasional obscure traceback for
> particular kinds of errors,

Really? Can you give an example where the traceback using asciistr()
would be more obscure than using the technique you used in
urllib.parse?

> then it should be a convenient way to use
> common constants (like separators or URL scheme names) in an algorithm
> that can manipulate either binary or text, but not a combination of
> the two (the latter is still a nice improvement in correctness over
> Python 2, which allowed them to be mixed freely rather than requiring
> consistency across the inputs).

Unfortunately I suspect there are still examples where asciistr's
"submissive" behavior can produce surprises. E.g. consider a function
of two arguments that must either be both bytes or both str. It's
easily conceivable that for certain combinations of incorrect
arguments (i.e. one bytes and one str) the function doesn't raise an
error but returns something of one or the other type. (And this is
exactly the Python 2 outcome we're trying to avoid.)

> It's still slightly different from Python 2, though. In Python 2, the
> interaction model was:
>
> str & str -> str
> str & unicode -> unicode
>
> (with the one exception being str.format: that consistently produces
> str rather than promoting to Unicode)

Or raises good old UnicodeError. :-(

> My goal for asciistr is that it should exhibit the following behaviour:
>
> str & asciistr -> str
> asciistr & asciistr -> str (making it asciistr would be a pain and
> I don't have a use case for that)

I almost had one in the example code I sent in response to Greg.

> bytes & asciistr -> bytes

I understand that '&' here stands for "any arbitrary combination", but
what about searches? Given that asciistr's base class is str, won't it
still blow up if you try to use it as an argument to e.g.
bytes.startswith()? Equality tests also sound problematic; is b'x' ==
asciistr('x') == 'x' ???

> So in code like that in urllib.parse (but in a more constrained
> context), you could just switch all your constants to asciistr, change
> your indexing operations to length 1 slices and then in theory
> essentially the same code that worked in Python 2 should also work in
> Python 3.

The more I think about this, the less I believe it's that easy. I
suspect you had the right idea when you mentioned singledispatch. It
might be easier to write the bytes version in terms of the string
versions wrapped in decode/encode, or vice versa, rather than trying
to reason out all the different combinations of str, bytes, asciistr.

> However, Benno is finding that my warning about possible
> interoperability issues was accurate - we have various places where we
> do PyUnicode_Check() rather than PyUnicode_CheckExact(), which means
> we don't always notice a PEP 3118 buffer interface if it is provided
> by a str subclass.

Not sure I understand this, but I believe him when he says this won't be easy.

> We'll look at those as we find them, and either
> work around them (if we can), decide not to support that behaviour in
> asciistr, or else I'll create a patch to resolve the interoperability
> issue.
>
> It's not necessarily a type I'd recommend using in production code, as
> there *will* always be a more explicit alternative that doesn't rely
> on a tricksy C extension type that only works in CPython. However,
> it's a type I think is worth having implemented and available on PyPI,
> even if it's just to disprove the claim that you *can't* write that
> kind of code in Python 3.

Hm. It is beginning to sound more and more flawed. I also worry that
it will bring back the nightmare of data-dependent UnicodeError back.
E.g. this (from tests/basic.py):

def test_asciistr_will_not_acce

Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Chris Barker
On Tue, Jan 14, 2014 at 9:29 AM, Yury Selivanov wrote:

>  - Try str(), and do ".encode(‘ascii’, ‘stcict’)” on the result.
>

please no -- that's the source of a lot of pain in py2 now.

having a failure as a result of the value, rather than the type, of an
object just makes hard-to-test for bugs. Everything will be hunky dory for
development and testing, then in deployment some idiot ( ;-) ) will pass in
some non-ascii compatible string and you get  failure. And the person who
gets the failure doesn't understand why, or they wouldn't have passed in
non-ascii values in the first place...

Ease of porting is nice, but let's not make it easy to port bug-prone code.

-Chris












>
> This way *most* of the use cases of python2 will be covered without
> touching the code. So:
>
>  - b’Hello {}’.format(‘world’)
>will be the same as b’hello ‘ + str(‘world’).encode(‘ascii’, ‘strict’)
>
>  - b’Hello {}’.format(‘\u0394’) will throw UnicodeEncodeError
>
>  - b’Status: {}’.format(200)
>will be the same as b’Status: ‘ + str(200).encode(‘ascii’, ‘strict’)
>
>  - b’Hello %s’ % (‘world’,) - the same as the first example
>
>  - b’Connection: {}’.format(b’keep-alive’) - works
>
>  - b’Hello %s’ % (b'\xce\x94’,) - will fail, not ASCII subset we accept
>
> I think it’s OK to check the buffers for ASCII-subset only. Yes, it
> will have some sort of sub-optimal performance, but then, it’s quite
> rare when string formatting is used to concatenate huge buffers.
>
> 2. new operators {!b} and %b. This ones will just use ‘__bytes__’ and
> Py_buffer.
>
> --
> Yury Selivanov
>
> On January 14, 2014 at 11:31:51 AM, Brett Cannon ([email protected]) wrote:
> >
> > On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum
> > wrote:
> >
> > > On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon
> > wrote:
> > > > I have been going on the assumption that bytes.format() would
> > change what
> > > > '{}' meant for itself and would only interpolate bytes. That
> > convenient
> > > > between Python 2 and 3 since it represents what we want it to
> > (str and
> > > bytes
> > > > under the hood, respectively), so it just falls through. We
> > could also
> > > add a
> > > > 'b' conversion for bytes() explicitly so as to help people
> > not
> > > accidentally
> > > > mix up things in bytes.format() and str.format(). But I was
> > not
> > > suggesting
> > > > adding a specific format spec for bytes but instead making
> > bytes.format()
> > > > just do the .encode('ascii') automatically to help with compatibility
> > > when a
> > > > format spec was present. If people want fancy formatting for
> > bytes they
> > > can
> > > > always do it themselves before calling bytes.format().
> > >
> > > This seems hastily written (e.g. verb missing :-), and I'm not
> > clear
> > > on what you are (or were) actually proposing. When exactly would
> > > bytes.format() need .encode('ascii')?
> > >
> > > I would be happy to wait a few hours or days for you to to write it
> > up
> > > clearly, rather than responding in a hurry.
> >
> >
> > Sorry about that. Busy day at work + trying to stay on top of this
> > entire
> > conversation was a bit tough. Let me try to lay out what I'm suggesting
> > for
> > bytes.format() in terms of how it changes
> > http://docs.python.org/3/library/string.html#format-string-syntax
> > for bytes.
> >
> > 1. New conversion operator of 'b' that operates as PEP 460 specifies
> > (i.e.
> > tries to get a buffer, else calls __bytes__). The default conversion
> > changes from 's' to 'b'.
> > 2. Use of the conversion field adds an added step of calling
> > str.encode('ascii', 'strict') on the result returned from
> > calling
> > __format__().
> >
> > That's it. So point 1 means that the following would work in Python
> > 3.5::
> >
> > b'Hello, {}, how are you?'.format(b'Guido')
> > b'Hello, {!b}, how are you?'.format(b'Guido')
> >
> > It would produce an error if you used a text argument for 'Guido'
> > since str
> > doesn't define __bytes__ or a buffer. That gives the EIBTI group
> > their
> > bytes.format() where nothing magical happens.
> >
> > For point 2, let's say you have the following in Python 2::
> >
> > 'I have {} bottles of beer on the wall'.format(10)
> >
> > Under my proposal, how would you change it to get the same result
> > in Python
> > 2 and 3?::
> >
> > b'I have {:d} bottles of beer on the wall'.format(10)
> >
> > In Python 2 you're just being more explicit about the format,
> > otherwise
> > it's the same semantics as today. In Python 3, though, this would
> > translate
> > into (under the hood)::
> >
> > b'I have {} bottles of beer on the wall'.format(format(10,
> > 'd').encode('ascii', 'strict'))
> >
> > This leads to the same bytes value in Python 2 (since it's just
> > a string)
> > and in Python 3 (as everything accepted by bytes.format() is
> > either bytes
> > already or converted to from encoding to ASCII bytes). While
> > Python 2 users
> > would need to make sure they used a format spec to get the same result
> >

Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Ethan Furman

On 01/14/2014 10:11 AM, Jim J. Jewett wrote:


But in terms of explaining the text model, that
separation is important enough that



 (2)  It *may* be worth creating a virtual
  split in the documentation.


I think (2) is a great idea.


I'm willing ot work on (2) if there is general consensus
that it would be a good idea.  As a rough sketch, I
would change places like

 http://docs.python.org/3/library/stdtypes.html#typebytes

from:

 Bytes objects are immutable sequences of single bytes.
 Since many major binary protocols are based on the ASCII
 text encoding, bytes objects offer several methods that
 are only valid when working with ASCII compatible data
 and are closely related to string objects in a variety
 of other ways.

to something more like:

 Bytes objects are immutable sequences of single bytes.

 A Bytes object could represent anything, and is
 appropriate as the underlying storage for a sound sample
 or image file.

 Virtual subclass ASCIIStructuredBytes
 

 One particularly common use of bytes is to represent
 the contents of a file, or of a network message.  In
 these cases, the bytes will often represent Text
 *in a specific encoding* and that encoding will usually
 be a superset of ASCII.  Rather than create and support
 an ASCIIStructuredBytes subclass, Python simply added
 support for these use cases straight to Bytes objects,
 and assumes that this support simply won't be used when
 when it does not make sense. For example, bytes literals
 *could* be used to construct a sound sample, but the
 literals will be far easier to read when they are used
 to represent (encoded) ASCII text, such as "OPEN".


I find the Virtual subclass in the title to be confusing, but I otherwise it's great.  We should have that even if we do 
add formatting to bytes, as that message is even more important then.


--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Guido van Rossum
On Tue, Jan 14, 2014 at 9:45 AM, Chris Barker  wrote:
> On Tue, Jan 14, 2014 at 9:29 AM, Yury Selivanov 
> wrote:
>>
>>  - Try str(), and do ".encode(‘ascii’, ‘stcict’)” on the result.
>
>
> please no -- that's the source of a lot of pain in py2 now.
>
> having a failure as a result of the value, rather than the type, of an
> object just makes hard-to-test for bugs. Everything will be hunky dory for
> development and testing, then in deployment some idiot ( ;-) ) will pass in
> some non-ascii compatible string and you get  failure. And the person who
> gets the failure doesn't understand why, or they wouldn't have passed in
> non-ascii values in the first place...
>
> Ease of porting is nice, but let's not make it easy to port bug-prone code.

Right. This is a big red flag to me as well.

I think there is some inherent conflict between the extensible design
of str.format() and the practical needs of people who are actually
going to use formatting operations (either % or .format()) with bytes.

The *practical* needs are mostly limited to supporting basic number
formatting (decimal, hex, padding) and interpolation of anything that
supports the buffer interface. It would also be nice if you didn't
have to specify the type at all in the format string, i.e. {} should
do the right thing for numbers and (all sorts of) bytes.

But the way to arrive at this behavior without duplicating a whole lot
of code seems to be to call the existing text-based __format__ API and
convert the result to bytes -- for numbers this should be safe (their
formatting produces just ASCII digits and a selected few other ASCII
characters) but leads to an undesirable outcome for other types -- not
just str but also e.g. lists or dicts containing str instances, since
those call __repr__ on the contained items, and repr() may produce
non-ASCII bytes.

This is why my earlier proposal used ascii(), which is a "nerfed"(*)
version of repr(). This does the right thing for numbers as well as
for many other types (e.g. None, bool) and does something unpleasant
for text strings that is perhaps better than the alternative.

Which reminds me. Quite a few people have spoken out in favor of loud
failures rather than silent "wrong" output. But I think that in the
specific context of formatting output, there is a long and IMO good
tradition of producing (slightly) wrong output in favor of more strict
behavior. Consider for example what to do when a number doesn't fit in
the given width. Would you rather raise an exception, truncate the
value, or mess up the formatting? All languages newer than Fortran
that I've used have chosen the latter, and I still agree it's a good
idea. Similar with infinities, NaN, or None. (Yes, it's embarrassing
to have a website displaying 'null'. But isn't a 500 even *more*
embarrassing?)

This doesn't mean I'm insensitive to the argument in favor of loud and
early failure. It's just that I can see both sides of the coin, and
I'm still deciding which argument is more important.

(*) Gamer slang for a weapon made less dangerous. :-)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Terry Reedy

On 1/14/2014 1:11 PM, Jim J. Jewett wrote:


But in terms of explaining the text model, that
separation is important enough that

 (1)  We should be reluctant to strengthen the
  "its really just ASCII" messages.
 (2)  It *may* be worth creating a virtual
  split in the documentation.

I'm willing ot work on (2) if there is general consensus
that it would be a good idea.  As a rough sketch, I
would change places like

 http://docs.python.org/3/library/stdtypes.html#typebytes

from:

 Bytes objects are immutable sequences of single bytes.
 Since many major binary protocols are based on the ASCII
 text encoding, bytes objects offer several methods that
 are only valid when working with ASCII compatible data
 and are closely related to string objects in a variety
 of other ways.

to something more like:

 Bytes objects are immutable sequences of single bytes.

 A Bytes object could represent anything, and is
 appropriate as the underlying storage for a sound sample
 or image file.

 Virtual subclass ASCIIStructuredBytes
 

 One particularly common use of bytes is to represent
 the contents of a file, or of a network message.  In
 these cases, the bytes will often represent Text
 *in a specific encoding* and that encoding will usually
 be a superset of ASCII.  Rather than create and support
 an ASCIIStructuredBytes subclass, Python simply added
 support for these use cases straight to Bytes objects,
 and assumes that this support simply won't be used when
 when it does not make sense. For example, bytes literals
 *could* be used to construct a sound sample, but the
 literals will be far easier to read when they are used
 to represent (encoded) ASCII text, such as "OPEN".


I rather like this. Consider opening a tracker issue.

--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] peps: Fill in PEP number (461).

2014-01-14 Thread Brett Cannon
I think this was supposed to be 461, not 460 =)


On Tue, Jan 14, 2014 at 2:12 PM, guido.van.rossum <
[email protected]> wrote:

> http://hg.python.org/peps/rev/a25f48998ad3
> changeset:   5346:a25f48998ad3
> user:Guido van Rossum 
> date:Tue Jan 14 11:12:09 2014 -0800
> summary:
>   Fill in PEP number (461).
>
> files:
>   pep-0461.txt |  2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
>
> diff --git a/pep-0461.txt b/pep-0461.txt
> --- a/pep-0461.txt
> +++ b/pep-0461.txt
> @@ -1,4 +1,4 @@
> -PEP: XXX
> +PEP: 460
>  Title: Adding % and {} formatting to bytes
>  Version: $Revision$
>  Last-Modified: $Date$
>
> --
> Repository URL: http://hg.python.org/peps
>
> ___
> Python-checkins mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-checkins
>
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Antoine Pitrou
On Tue, 14 Jan 2014 10:52:05 -0800
Guido van Rossum  wrote:
> Would you rather raise an exception, truncate the
> value, or mess up the formatting? All languages newer than Fortran
> that I've used have chosen the latter, and I still agree it's a good
> idea.

Well that's useful when printing out human-readable stuff on stdout,
much less when you're emitting binary data that's supposed to conform
to a well-defined protocol. I expect bytes formatting to be used for
the latter, not the former.

(which also means, actually, that I don't think the fancy formatting
features - alignment, etc. - are useful at all for bytes; but it's
probably ok having them for consistency)

> Similar with infinities, NaN, or None. (Yes, it's embarrassing
> to have a website displaying 'null'. But isn't a 500 even *more*
> embarrassing?)

When it comes to type mismatch, though, an error is raised:

>>> "%d" % object()
Traceback (most recent call last):
  File "", line 1, in 
TypeError: %d format: a number is required, not object

(instead of outputting e.g. repr(id(x)))

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Daniel Holth
On Tue, Jan 14, 2014 at 1:52 PM, Guido van Rossum  wrote:
> On Tue, Jan 14, 2014 at 9:45 AM, Chris Barker  wrote:
>> On Tue, Jan 14, 2014 at 9:29 AM, Yury Selivanov 
>> wrote:
>>>
>>>  - Try str(), and do ".encode(‘ascii’, ‘stcict’)” on the result.
>>
>>
>> please no -- that's the source of a lot of pain in py2 now.
>>
>> having a failure as a result of the value, rather than the type, of an
>> object just makes hard-to-test for bugs. Everything will be hunky dory for
>> development and testing, then in deployment some idiot ( ;-) ) will pass in
>> some non-ascii compatible string and you get  failure. And the person who
>> gets the failure doesn't understand why, or they wouldn't have passed in
>> non-ascii values in the first place...
>>
>> Ease of porting is nice, but let's not make it easy to port bug-prone code.
>
> Right. This is a big red flag to me as well.
>
> I think there is some inherent conflict between the extensible design
> of str.format() and the practical needs of people who are actually
> going to use formatting operations (either % or .format()) with bytes.
>
> The *practical* needs are mostly limited to supporting basic number
> formatting (decimal, hex, padding) and interpolation of anything that
> supports the buffer interface. It would also be nice if you didn't
> have to specify the type at all in the format string, i.e. {} should
> do the right thing for numbers and (all sorts of) bytes.
>
> But the way to arrive at this behavior without duplicating a whole lot
> of code seems to be to call the existing text-based __format__ API and
> convert the result to bytes -- for numbers this should be safe (their
> formatting produces just ASCII digits and a selected few other ASCII
> characters) but leads to an undesirable outcome for other types -- not
> just str but also e.g. lists or dicts containing str instances, since
> those call __repr__ on the contained items, and repr() may produce
> non-ASCII bytes.
>
> This is why my earlier proposal used ascii(), which is a "nerfed"(*)
> version of repr(). This does the right thing for numbers as well as
> for many other types (e.g. None, bool) and does something unpleasant
> for text strings that is perhaps better than the alternative.
>
> Which reminds me. Quite a few people have spoken out in favor of loud
> failures rather than silent "wrong" output. But I think that in the
> specific context of formatting output, there is a long and IMO good
> tradition of producing (slightly) wrong output in favor of more strict
> behavior. Consider for example what to do when a number doesn't fit in
> the given width. Would you rather raise an exception, truncate the
> value, or mess up the formatting? All languages newer than Fortran
> that I've used have chosen the latter, and I still agree it's a good
> idea. Similar with infinities, NaN, or None. (Yes, it's embarrassing
> to have a website displaying 'null'. But isn't a 500 even *more*
> embarrassing?)
>
> This doesn't mean I'm insensitive to the argument in favor of loud and
> early failure. It's just that I can see both sides of the coin, and
> I'm still deciding which argument is more important.
>
> (*) Gamer slang for a weapon made less dangerous. :-)

I think loud and early failure is important for porting while you
might still be trying to pound out the previously blurry encode/decode
boundaries. In this code str and bytes will be wrong everywhere. Some
APIs might return either str or bytes based on the input. Let it fail,
find the boundaries, and fix it until it does something useful without
failing. And it kindof depends on the context whether it is worse to
display weird ephemeral output or write the same weird output to long
term storage.

I'm not sure what to think about content-dependent failures on
protocols that are supposed to be ASCII-only-without-repr-noise.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Jim J. Jewett



Greg Ewing replied:

>> ... ASCII compatible binary data is a
>> *subset* of arbitrary binary data.

I wrote:

> But in terms of explaining the text model, that
> separation is important enough that

>(2)  It *may* be worth creating a virtual
> split in the documentation.

(rough sketch below)

Ethan likes the idea, but points out that the term
"Virtual" is confusing here.

Alas, I'm not sure what the correct term is.  In
addition to "Go for it!" / "Don't waste your time",
I'm looking for advice on:

(A)  What word should I use instead of "Virtual"?
Imaginary?  Pretend?

(B)  Would it be good/bad/at least make the docs
easier to create an actual class (or alias)?

(C)  Same question for a pair of classes provided
only in the documentation, like example code.

(D)  What about an abstract class, or several?

e.g., replacing the XXX TODO of collections.abc.ByteString
with separate abstract classes for ByteSequence, String,
ByteString, and ASCIIByteString?

(ByteString already includes any bytes or bytearray instance,
so backward compatibility means the String suffix isn't
sufficient for an opt-in-by-instances class.)


> I'm willing ot work on (2) if there is general consensus
> that it would be a good idea.  As a rough sketch, I
> would change places like
>
>  http://docs.python.org/3/library/stdtypes.html#typebytes
>
> from:
>
>  Bytes objects are immutable sequences of single bytes.
>  Since many major binary protocols are based on the ASCII
>  text encoding, bytes objects offer several methods that
>  are only valid when working with ASCII compatible data
>  and are closely related to string objects in a variety
>  of other ways.
>
> to something more like:
>
>  Bytes objects are immutable sequences of single bytes.
>
>  A Bytes object could represent anything, and is
>  appropriate as the underlying storage for a sound sample
>  or image file.
>
>  Virtual subclass ASCIIStructuredBytes
>  
>
>  One particularly common use of bytes is to represent
>  the contents of a file, or of a network message.  In
>  these cases, the bytes will often represent Text
>  *in a specific encoding* and that encoding will usually
>  be a superset of ASCII.  Rather than create and support
>  an ASCIIStructuredBytes subclass, Python simply added
>  support for these use cases straight to Bytes objects,
>  and assumes that this support simply won't be used when
>  when it does not make sense. For example, bytes literals
>  *could* be used to construct a sound sample, but the
>  literals will be far easier to read when they are used
>  to represent (encoded) ASCII text, such as "OPEN".


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Byte/text documentation improvements (was: PEP 460 reboot)

2014-01-14 Thread R. David Murray
On Tue, 14 Jan 2014 11:43:16 -0800, "Jim J. Jewett"  
wrote:
> Greg Ewing replied:
> 
> >> ... ASCII compatible binary data is a
> >> *subset* of arbitrary binary data.
> 
> I wrote:
> 
> > But in terms of explaining the text model, that
> > separation is important enough that
> 
> >(2)  It *may* be worth creating a virtual
> > split in the documentation.
> 
> (rough sketch below)
> 
> Ethan likes the idea, but points out that the term
> "Virtual" is confusing here.
> 
> Alas, I'm not sure what the correct term is.  In
> addition to "Go for it!" / "Don't waste your time",
> I'm looking for advice on:
> 
> (A)  What word should I use instead of "Virtual"?
> Imaginary?  Pretend?

Notional.

> (B)  Would it be good/bad/at least make the docs
> easier to create an actual class (or alias)?

I don't have an opinion on this, but if you make it real class then
"notional" would no longer work.  I guess you'd just call it an alias
in that case.

> (C)  Same question for a pair of classes provided
> only in the documentation, like example code.

Bad.  Refer to it via a glossary item ref or a section ref.

> (D)  What about an abstract class, or several?
> 
> e.g., replacing the XXX TODO of collections.abc.ByteString
> with separate abstract classes for ByteSequence, String,
> ByteString, and ASCIIByteString?
> 
> (ByteString already includes any bytes or bytearray instance,
> so backward compatibility means the String suffix isn't
> sufficient for an opt-in-by-instances class.)

What's the difference between ByteString and ByteSequence?  Or maybe
I'm asking the difference between ByteString and ASCIIByteString?

So the only concrete classes would be ASCIIByteStringsthat might
work.  It would give us something to call that argument type in, eg,
the binascii docs.  Not to mention a formal definition of what
methods a Python byte type needs to support.

--David
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Eric V. Smith
On 01/14/2014 01:52 PM, Guido van Rossum wrote:

> But the way to arrive at this behavior without duplicating a whole lot
> of code seems to be to call the existing text-based __format__ API and
> convert the result to bytes -- for numbers this should be safe (their
> formatting produces just ASCII digits and a selected few other ASCII
> characters) but leads to an undesirable outcome for other types -- not
> just str but also e.g. lists or dicts containing str instances, since
> those call __repr__ on the contained items, and repr() may produce
> non-ASCII bytes.

That's why I suggested restricting the types supported. If we could live
with just a subset of known types, then we could hard-code the
conversions to bytes. How many types with custom __format__'s are really
getting written to byte strings in 2.x? For that matter, are any lists,
sets, or dicts (or anything else using object.__format__'s conversion
using str()) really getting written to bytes? Do we need to support
these cases?

In my mind, this comes down to: are we trying to add this just to make
porting easier? In my mind, we wouldn't even be adding feature at all
except for ease of porting 2.x code. So we should focus on what features
are used in the code we're trying to port. I don't think our focus is on
2.x code that's using u''.format(), it's 2.x code that's been reviewed
and is still using b''.format() because it's building up bytes for a
wire protocol. And that code is not likely to need to format objects
with arbitrary __format__ methods, or even str (in the 3.x sense). It's
only likely to use numbers and bytes (or str in the 2.x sense).

Eric.



___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Guido van Rossum
Personally I wouldn't add any words suggesting or referring to the
option of creation another class for this purpose. You wouldn't
recommend subclassing dict for constraining the types of keys or
values, would you?

On Tue, Jan 14, 2014 at 11:43 AM, Jim J. Jewett  wrote:
>
>
>
> Greg Ewing replied:
>
>>> ... ASCII compatible binary data is a
>>> *subset* of arbitrary binary data.
>
> I wrote:
>
>> But in terms of explaining the text model, that
>> separation is important enough that
>
>>(2)  It *may* be worth creating a virtual
>> split in the documentation.
>
> (rough sketch below)
>
> Ethan likes the idea, but points out that the term
> "Virtual" is confusing here.
>
> Alas, I'm not sure what the correct term is.  In
> addition to "Go for it!" / "Don't waste your time",
> I'm looking for advice on:
>
> (A)  What word should I use instead of "Virtual"?
> Imaginary?  Pretend?
>
> (B)  Would it be good/bad/at least make the docs
> easier to create an actual class (or alias)?
>
> (C)  Same question for a pair of classes provided
> only in the documentation, like example code.
>
> (D)  What about an abstract class, or several?
>
> e.g., replacing the XXX TODO of collections.abc.ByteString
> with separate abstract classes for ByteSequence, String,
> ByteString, and ASCIIByteString?
>
> (ByteString already includes any bytes or bytearray instance,
> so backward compatibility means the String suffix isn't
> sufficient for an opt-in-by-instances class.)
>
>
>> I'm willing ot work on (2) if there is general consensus
>> that it would be a good idea.  As a rough sketch, I
>> would change places like
>>
>>  http://docs.python.org/3/library/stdtypes.html#typebytes
>>
>> from:
>>
>>  Bytes objects are immutable sequences of single bytes.
>>  Since many major binary protocols are based on the ASCII
>>  text encoding, bytes objects offer several methods that
>>  are only valid when working with ASCII compatible data
>>  and are closely related to string objects in a variety
>>  of other ways.
>>
>> to something more like:
>>
>>  Bytes objects are immutable sequences of single bytes.
>>
>>  A Bytes object could represent anything, and is
>>  appropriate as the underlying storage for a sound sample
>>  or image file.
>>
>>  Virtual subclass ASCIIStructuredBytes
>>  
>>
>>  One particularly common use of bytes is to represent
>>  the contents of a file, or of a network message.  In
>>  these cases, the bytes will often represent Text
>>  *in a specific encoding* and that encoding will usually
>>  be a superset of ASCII.  Rather than create and support
>>  an ASCIIStructuredBytes subclass, Python simply added
>>  support for these use cases straight to Bytes objects,
>>  and assumes that this support simply won't be used when
>>  when it does not make sense. For example, bytes literals
>>  *could* be used to construct a sound sample, but the
>>  literals will be far easier to read when they are used
>>  to represent (encoded) ASCII text, such as "OPEN".
>
>
> -jJ
>
> --
>
> If there are still threading problems with my replies, please
> email me with details, so that I can try to resolve them.  -jJ
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Guido van Rossum
On Tue, Jan 14, 2014 at 12:04 PM, Eric V. Smith  wrote:
> On 01/14/2014 01:52 PM, Guido van Rossum wrote:
>
>> But the way to arrive at this behavior without duplicating a whole lot
>> of code seems to be to call the existing text-based __format__ API and
>> convert the result to bytes -- for numbers this should be safe (their
>> formatting produces just ASCII digits and a selected few other ASCII
>> characters) but leads to an undesirable outcome for other types -- not
>> just str but also e.g. lists or dicts containing str instances, since
>> those call __repr__ on the contained items, and repr() may produce
>> non-ASCII bytes.
>
> That's why I suggested restricting the types supported. If we could live
> with just a subset of known types, then we could hard-code the
> conversions to bytes. How many types with custom __format__'s are really
> getting written to byte strings in 2.x? For that matter, are any lists,
> sets, or dicts (or anything else using object.__format__'s conversion
> using str()) really getting written to bytes? Do we need to support
> these cases?
>
> In my mind, this comes down to: are we trying to add this just to make
> porting easier? In my mind, we wouldn't even be adding feature at all
> except for ease of porting 2.x code. So we should focus on what features
> are used in the code we're trying to port. I don't think our focus is on
> 2.x code that's using u''.format(), it's 2.x code that's been reviewed
> and is still using b''.format() because it's building up bytes for a
> wire protocol. And that code is not likely to need to format objects
> with arbitrary __format__ methods, or even str (in the 3.x sense). It's
> only likely to use numbers and bytes (or str in the 2.x sense).

Yes, these are exactly the right questions to ask.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Ethan Furman

Duh.  Here's the text, as well.  ;)


PEP: 461
Title: Adding % and {} formatting to bytes
Version: $Revision$
Last-Modified: $Date$
Author: Ethan Furman 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-01-13
Python-Version: 3.5
Post-History: 2014-01-13
Resolution:


Abstract


This PEP proposes adding the % and {} formatting operations from str to bytes.


Proposed semantics for bytes formatting
===

%-interpolation
---

All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.)
will be supported, and will work as they do for str, including the
padding, justification and other related modifiers.

Example::

   >>> b'%4x' % 10
   b'   a'

%c will insert a single byte, either from an int in range(256), or from
a bytes argument of length 1.

Example:

>>> b'%c' % 48
b'0'

>>> b'%c' % b'a'
b'a'

%s, because it is the most general, has the most convoluted resolution:

  - input type is bytes?
pass it straight through

  - input type is numeric?
use its __xxx__ [1] [2] method and ascii-encode it (strictly)

  - input type is something else?
use its __bytes__ method; if there isn't one, raise an exception [3]

Examples:

>>> b'%s' % b'abc'
b'abc'

>>> b'%s' % 3.14
b'3.14'

>>> b'%s' % 'hello world!'
Traceback (most recent call last):
...
TypeError: 'hello world' has no __bytes__ method, perhaps you need to 
encode it?

.. note::

   Because the str type does not have a __bytes__ method, attempts to
   directly use 'a string' as a bytes interpolation value will raise an
   exception.  To use 'string' values, they must be encoded or otherwise
   transformed into a bytes sequence::

  'a string'.encode('latin-1')


format
--

The format mini language will be used as-is, with the behaviors as listed
for %-interpolation.


Open Questions
==

For %s there has been some discussion of trying to use the buffer protocol
(Py_buffer) before trying __bytes__.  This question should be answered before
the PEP is implemented.


Proposed variations
===

It has been suggested to use %b for bytes instead of %s.

  - Rejected as %b does not exist in Python 2.x %-interpolation, which is
why we are using %s.

It has been proposed to automatically use .encode('ascii','strict') for str
arguments to %s.

  - Rejected as this would lead to intermittent failures.  Better to have the
operation always fail so the trouble-spot can be correctly fixed.

It has been proposed to have %s return the ascii-encoded repr when the value
is a str  (b'%s' % 'abc'  --> b"'abc'").

  - Rejected as this would lead to hard to debug failures far from the problem
site.  Better to have the operation always fail so the trouble-spot can be
easily fixed.


Foot notes
==

.. [1] Not sure if this should be the numeric __str__ or the numeric __repr__,
   or if there's any difference
.. [2] Any proper numeric class would then have to provide an ascii
   representation of its value, either via __repr__ or __str__ (whichever
   we choose in [1]).
.. [3] TypeError, ValueError, or UnicodeEncodeError?


Copyright
=

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Larry Hastings


On 01/11/2014 07:35 PM, Larry Hastings wrote:


On 01/08/2014 07:08 AM, Barry Warsaw wrote:

How hard would it be to put together some sample branches that provide
concrete examples of the various options?

My own opinion could easily be influenced by having some hands-on time with
actual code, and I suspect even Guido could be influenced if he could pull
some things up in his editor and take a look around.


I've uploaded a prototype here:

https://bitbucket.org/larry/python-clinic-buffer




I have now received exactly zero feedback about the prototype, which 
suggests people aren't using it.  In an attempt to jump-start this 
conversation, I've created a new repository containing the "concrete 
examples of the various options" that Barry proposed above.  You may 
find it here:


   https://bitbucket.org/larry/clinic-buffer-samples/src

In it I converted Modules/_pickle.c four different ways.  There's a 
README, please read it.


People who want to change how Clinic writes its output: this is your big 
chance.  Comment on these samples, or produce your own counterexamples, 
or something.  If you can enough people on your side maybe Clinic will 
change.  If there is no further debate on this topic, nothing will 
happen and Clinic will not change.



//arry/
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Ethan Furman

On 01/14/2014 10:52 AM, Guido van Rossum wrote:


Which reminds me. Quite a few people have spoken out in favor of loud
failures rather than silent "wrong" output. But I think that in the
specific context of formatting output, there is a long and IMO good
tradition of producing (slightly) wrong output in favor of more strict
behavior. Consider for example what to do when a number doesn't fit in
the given width. Would you rather raise an exception, truncate the
value, or mess up the formatting?


One more data point to consider:  When the binary format has strict rules on how much space a data-point is allowed, 
then failure is the only appropriate option.


In Py2, because '%15s' can actually take 17 characters, I have to use '%15s' % 
data_value[:15] everywhere.

I'm not suggesting we change how that portion works, as it would then be, I think, too different from both Py2 behavior 
as well as current str behavior, but likewise adding in single quotes would of no help to me.  Loud failure so I can 
easily see where I forgot the .encode() would be much more helpful.


--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Zachary Ware
On Tue, Jan 14, 2014 at 2:22 PM, Larry Hastings  wrote:
> I have now received exactly zero feedback about the prototype, which
> suggests people aren't using it.

Oops, I had half a post written about this two days ago, but never got
it posted.

I did some experimenting on winreg.c (see
http://hg.python.org/sandbox/zware/file/prototype_clinic/PC/winreg.c),
and I have to say I really really like having most of the output
shunted down to the bottom of the file.  In that example I have only
the implementation outputting to the block, and everything else
(that's necessary) going into the buffer; to me it looks very nice and
clean.  One of my biggest annoyances with the current output is having
the docstring repeated nearly verbatim (with additives) within just a
few lines, and this takes care of that and more.  To me, those
converted functions read about as close to real Python as is ever
going to happen in a C file.

One thing that I could see being useful (though possibly not easy) is
the ability to dump a buffer "late"; for example, near the top of the
file:

/*[clinic input]
destination prototypes new buffer
output parser_prototype prototypes
dump prototypes later
[clinic start generated code]*/

Then process the file, filling the prototypes buffer as we go.  At the
end of the file, go back and dump the buffer in that output block.

I like the flexibility of the prototype, having more control over what
goes where is always nice :)

-- 
Zach
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Ethan Furman
This PEP goes a but further than PEP 460 does, and hopefully spells things out in enough detail so there is no confusion 
as to what is meant.


--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Larry Hastings


On 01/14/2014 12:48 PM, Zachary Ware wrote:

On Tue, Jan 14, 2014 at 2:22 PM, Larry Hastings  wrote:

I have now received exactly zero feedback about the prototype, which
suggests people aren't using it.

Oops, I had half a post written about this two days ago, but never got
it posted.

I did some experimenting on winreg.c (see
http://hg.python.org/sandbox/zware/file/prototype_clinic/PC/winreg.c),
and I have to say I really really like having most of the output
shunted down to the bottom of the file.


I will consider you a +1 on the "buffer" approach and NaN on the other 
approaches.




One thing that I could see being useful (though possibly not easy) is
the ability to dump a buffer "late"; for example, near the top of the
file:

/*[clinic input]
destination prototypes new buffer
output parser_prototype prototypes
dump prototypes later
[clinic start generated code]*/

Then process the file, filling the prototypes buffer as we go.  At the
end of the file, go back and dump the buffer in that output block.


That wouldn't be too hard.  But conceptually it would make Clinic much 
more complicated.  For example, I suggest that "later" is a confusing 
name, because the output will actually happen *earlier* in the file.  
"If it's hard to explain, it may be a bad idea." ;-)



//arry/
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Guido van Rossum
On Tue, Jan 14, 2014 at 12:13 PM, Ethan Furman  wrote:
> On 01/14/2014 10:52 AM, Guido van Rossum wrote:
>>
>> Which reminds me. Quite a few people have spoken out in favor of loud
>> failures rather than silent "wrong" output. But I think that in the
>> specific context of formatting output, there is a long and IMO good
>> tradition of producing (slightly) wrong output in favor of more strict
>> behavior. Consider for example what to do when a number doesn't fit in
>> the given width. Would you rather raise an exception, truncate the
>> value, or mess up the formatting?
>
> One more data point to consider:  When the binary format has strict rules on
> how much space a data-point is allowed, then failure is the only appropriate
> option.

Yes, that's how the struct module works.

> In Py2, because '%15s' can actually take 17 characters, I have to use '%15s'
> % data_value[:15] everywhere.

Wow. I thought there would be some combination using %.15s but I can't
get that to work. :-(

> I'm not suggesting we change how that portion works, as it would then be, I
> think, too different from both Py2 behavior as well as current str behavior,
> but likewise adding in single quotes would of no help to me.  Loud failure
> so I can easily see where I forgot the .encode() would be much more helpful.

If we go with a more restricted version this makes sense indeed. The
single quotes seemed unavoidable when I was trying (like several other
proposals) to have a format code that works for all types. I think
we're rightly giving up on that now.

(I should review PEP 461, but I don't have time yet.)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Antoine Pitrou
On Tue, 14 Jan 2014 11:56:25 -0800
Ethan Furman  wrote:
> 
> %s, because it is the most general, has the most convoluted resolution:
> 
>- input type is bytes?
>  pass it straight through

It should try to get a Py_buffer instead.

>- input type is numeric?
>  use its __xxx__ [1] [2] method and ascii-encode it (strictly)

What is the definition of "numeric"?

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Brett Cannon
On Tue, Jan 14, 2014 at 3:22 PM, Larry Hastings  wrote:

>
> On 01/11/2014 07:35 PM, Larry Hastings wrote:
>
>
> On 01/08/2014 07:08 AM, Barry Warsaw wrote:
>
> How hard would it be to put together some sample branches that provide
> concrete examples of the various options?
>
> My own opinion could easily be influenced by having some hands-on time with
> actual code, and I suspect even Guido could be influenced if he could pull
> some things up in his editor and take a look around.
>
>
> I've uploaded a prototype here:
>
> https://bitbucket.org/larry/python-clinic-buffer
>
>
>
> I have now received exactly zero feedback about the prototype, which
> suggests people aren't using it.  In an attempt to jump-start this
> conversation, I've created a new repository containing the "concrete
> examples of the various options" that Barry proposed above.  You may find
> it here:
>
> https://bitbucket.org/larry/clinic-buffer-samples/src
>
> In it I converted Modules/_pickle.c four different ways.  There's a
> README, please read it.
>
> People who want to change how Clinic writes its output: this is your big
> chance.  Comment on these samples, or produce your own counterexamples, or
> something.  If you can enough people on your side maybe Clinic will
> change.  If there is no further debate on this topic, nothing will happen
> and Clinic will not change.
>

+0 _pickle.original.c
+1 _pickle.using-buffer.c
-0 _pickle.using-modified-buffer.c
+1  _pickle.using-multiple-buffers.c
-0 _pickle.using-sidefile.c
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Brett Cannon
On Tue, Jan 14, 2014 at 2:55 PM, Ethan Furman  wrote:

> This PEP goes a but further than PEP 460 does, and hopefully spells things
> out in enough detail so there is no confusion as to what is meant.
>

Are we going down the PEP route with the various ideas? Guido, do you want
one from me as well or should I not bother?
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Ethan Furman

On 01/14/2014 01:05 PM, Brett Cannon wrote:

On Tue, Jan 14, 2014 at 2:55 PM, Ethan Furman wrote:


This PEP goes a but further than PEP 460 does, and hopefully spells
things out in enough detail so there is no confusion as to what is
 meant.


Are we going down the PEP route with the various ideas? Guido, do
 you want one from me as well or should I not bother?


While I can't answer for Guido, I will say I authored this PEP because Antoine didn't want 460 to be any more liberal 
than it already was.


If you collect your ideas together, I'll add them to 461 as questions or discussions or however is appropriate (assuming 
you're willing to go that route).


--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Ethan Furman

On 01/14/2014 12:22 PM, Larry Hastings wrote:


I have now received exactly zero feedback about the prototype, which suggests 
people aren't using it.  In an attempt to
jump-start this conversation, I've created a new repository containing the "concrete 
examples of the various options"
that Barry proposed above.  You may find it here:

https://bitbucket.org/larry/clinic-buffer-samples/src

In it I converted Modules/_pickle.c four different ways.  There's a README, 
please read it.

People who want to change how Clinic writes its output: this is your big 
chance.  Comment on these samples, or produce
your own counterexamples, or something.  If you can enough people on your side 
maybe Clinic will change.  If there is no
further debate on this topic, nothing will happen and Clinic will not change.


I checked the README, the current file, and the buffered files.  My preferences 
from highest to lowest:

  - modified buffer approach
  - buffer approach
  - side file

Thanks for taking the time, Larry!

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Antoine Pitrou
On Tue, 14 Jan 2014 12:22:12 -0800
Larry Hastings  wrote:
> 
> https://bitbucket.org/larry/clinic-buffer-samples/src
> 
> In it I converted Modules/_pickle.c four different ways.  There's a 
> README, please read it.

I'm +1 on the sidefile approach. +0 on the various buffer approaches.
-0.5 on the current "sprinkled everywhere" approach.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Eric V. Smith
On 1/14/2014 3:54 PM, Guido van Rossum wrote:
> On Tue, Jan 14, 2014 at 12:13 PM, Ethan Furman  wrote:
>> In Py2, because '%15s' can actually take 17 characters, I have to use '%15s'
>> % data_value[:15] everywhere.
> 
> Wow. I thought there would be some combination using %.15s but I can't
> get that to work. :-(

>>> '%.15s' % 'abcdefghij1234567'
'abcdefghij12345'
>>> '{:.15}'.format('abcdefghij1234567')
'abcdefghij12345'
>>>

Or, depending on what you're after:

>>> '%15.15s' % 'abcde'
'  abcde'
>>> '%15.15s' % 'abcdefghij1234567'
'abcdefghij12345'
>>>


> (I should review PEP 461, but I don't have time yet.)

Same here.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Mark Lawrence

On 14/01/2014 20:54, Guido van Rossum wrote:

On Tue, Jan 14, 2014 at 12:13 PM, Ethan Furman  wrote:


In Py2, because '%15s' can actually take 17 characters, I have to use '%15s'
% data_value[:15] everywhere.


Wow. I thought there would be some combination using %.15s but I can't
get that to work. :-(



I believe you wanted this.

>>> a='01234567890123456'
>>> len(a)
17
>>> b = '%15.15s' % a
>>> b;len(b)
'012345678901234'
15

--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Greg Ewing

Nick Coghlan wrote:

The
mini-language parser has to assume in encoding in order to interpret
the format string, and that's *all* done assuming an ASCII compatible
format string (which must make life interesting if you try to use an
ASCII incompatible coding cookie for your source code


I don't think it's all *that* interesting. As long as you're
able to type the relevant characters on your keyboard and
have them displayed in a recognisable way in your editor,
then what looks like b"Content-Length: %d" in your source
will end up encoded as ascii in the bytes object, whatever
the encoding of the source file.

If the source file uses an encoding that can't even represent
the formatting characters, then you're in trouble -- but
you'd have a hard time writing Python code at all in such
an environment!


It's certainly a decision that has its downsides, with the potential
impact on users of ASCII incompatible encodings (mostly in Asia) being
the main one,


I don't think it will have much impact on them, other
than maybe they will find less use cases for it. But the
main intended use cases are for things like http headers
which have protocol-mandated ascii-ish bits, and those
bits are still just as ascii-ish in China as they are
anywhere else.

--
Greg

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Barry Warsaw
On Jan 14, 2014, at 10:52 AM, Guido van Rossum wrote:

>Which reminds me. Quite a few people have spoken out in favor of loud
>failures rather than silent "wrong" output. But I think that in the
>specific context of formatting output, there is a long and IMO good
>tradition of producing (slightly) wrong output in favor of more strict
>behavior.

In the email package we now have a tradition of allowing either behavior.

http://docs.python.org/3.4/library/email.policy.html#email.policy.Policy.raise_on_defect

Perhaps not appropriate for the PEP 460 related cases, but I think the policy
mechanism works great for email parsing, where sometimes you definitely want
to fail early (e.g. you are composing new messages out of literal strings) and
other times where you are willing to put up with some best-effort
representation in exchange for no exceptions being raised (e.g. you are
parsing messages being fed to you from your mail server).

-Barry
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Ethan Furman

On 01/14/2014 01:15 PM, Eric V. Smith wrote:

On 1/14/2014 3:54 PM, Guido van Rossum wrote:

On Tue, Jan 14, 2014 at 12:13 PM, Ethan Furman  wrote:

In Py2, because '%15s' can actually take 17 characters, I have to use '%15s'
% data_value[:15] everywhere.


Wow. I thought there would be some combination using %.15s but I can't
get that to work. :-(



'%.15s' % 'abcdefghij1234567'

'abcdefghij12345'

'{:.15}'.format('abcdefghij1234567')

'abcdefghij12345'




Or, depending on what you're after:


'%15.15s' % 'abcde'

'  abcde'

'%15.15s' % 'abcdefghij1234567'

'abcdefghij12345'


Huh.  Wish I'd known about that way back when!  ;)

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Ethan Furman

On 01/14/2014 12:57 PM, Antoine Pitrou wrote:

On Tue, 14 Jan 2014 11:56:25 -0800
Ethan Furman  wrote:


%s, because it is the most general, has the most convoluted resolution:

- input type is bytes?
  pass it straight through


It should try to get a Py_buffer instead.


Meaning any bytes or bytes-subtype will support the Py_buffer protocol, and 
this should be the first thing we try?

Sounds good.

For that matter, should the first test be "does this object support Py_buffer" and not worry about it being 
isinstance(obj, bytes)?




- input type is numeric?
  use its __xxx__ [1] [2] method and ascii-encode it (strictly)


What is the definition of "numeric"?


That is a key question.

Obviously we have int, float, and complex.  We also have Decimal.

But what about Fraction?  Or some users numeric class that doesn't inherit from a core numeric type?  Wherever we draw 
the line, we need to make it's well-documented.


--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread MRAB

On 2014-01-14 20:54, Guido van Rossum wrote:

On Tue, Jan 14, 2014 at 12:13 PM, Ethan Furman  wrote:

On 01/14/2014 10:52 AM, Guido van Rossum wrote:


Which reminds me. Quite a few people have spoken out in favor of loud
failures rather than silent "wrong" output. But I think that in the
specific context of formatting output, there is a long and IMO good
tradition of producing (slightly) wrong output in favor of more strict
behavior. Consider for example what to do when a number doesn't fit in
the given width. Would you rather raise an exception, truncate the
value, or mess up the formatting?


One more data point to consider:  When the binary format has strict rules on
how much space a data-point is allowed, then failure is the only appropriate
option.


Yes, that's how the struct module works.


In Py2, because '%15s' can actually take 17 characters, I have to use '%15s'
% data_value[:15] everywhere.


Wow. I thought there would be some combination using %.15s but I can't
get that to work. :-(


I've not sure what you mean here:

Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit 
(AMD64)] on win

32
Type "help", "copyright", "credits" or "license" for more information.
>>> import string
>>> '%.15s' % string.letters
'abcdefghijklmno'
>>> len(_)
15


I'm not suggesting we change how that portion works, as it would then be, I
think, too different from both Py2 behavior as well as current str behavior,
but likewise adding in single quotes would of no help to me.  Loud failure
so I can easily see where I forgot the .encode() would be much more helpful.


If we go with a more restricted version this makes sense indeed. The
single quotes seemed unavoidable when I was trying (like several other
proposals) to have a format code that works for all types. I think
we're rightly giving up on that now.

(I should review PEP 461, but I don't have time yet.)



___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Nick Coghlan
On 15 Jan 2014 04:16, "Guido van Rossum"  wrote:
>
> [Other readers: asciistr is at https://github.com/jeamland/asciicompat]
>
> On Mon, Jan 13, 2014 at 11:44 PM, Nick Coghlan  wrote:
> > Right, asciistr is designed for a specific kind of hybrid API where
> > you want to accept binary input (and produce binary output) *and* you
> > want to accept text input (and produce text output). Porting those
> > from Python 2 to Python 3 is painful not because of any limitations of
> > the str or bytes API but because it's the only use case I have found
> > where I actually *missed* the implicit interoperability offered by the
> > Python 2 str type.
>
> Yes, the use case is clear.
>
> > It's not an implementation style I would consider appropriate for the
> > standard library - we need to code very defensively in order to aid
> > debugging in arbitrary contexts, so I consider having an API like
> > urllib.parse demand 7-bit ASCII in the binary version, and require
> > text to handle impure input to be a better design choice.
>
> This surprises me. I think asciistr should strive to be useful for the
> stdlib as well.

The concerns you raise are the reason I'm not sure that's possible - just
as in the Python 2 text model, I suspect actually *using* asciistr will
trade ease of development against robust detection of input errors.

I'm OK with that in a PyPI module, I'd be dubious about including it in the
standard library and making it a builtin is right out.

> > However, in an environment where you can place greater preconditions
> > on your inputs (such as "ensure all input data is ASCII compatible")
>
> That gives me the Python 2 willies. :-(

Yep - from a formal correctness point of view, asciistr is a terrible idea.
That's not the only consideration in coding though, or we'd all be using
statically typed languages :)

> > and you're willing to tolerate the occasional obscure traceback for
> > particular kinds of errors,
>
> Really? Can you give an example where the traceback using asciistr()
> would be more obscure than using the technique you used in
> urllib.parse?

In urllib.parse I do an up front check that everything is consistently
bytes or str. With asciistr it becomes tempting to skip that up front
check, so you instead get a TypeError about not being able to add str and
bytes.

Technically you could keep that up front check and only use asciistr as an
internal implementation detail, but at that point you may as well do things
properly and write the algorithm to operate solely on bytes or str and
convert the other inputs appropriately (which is the actual approach we use
in the standard library).

> > then it should be a convenient way to use
> > common constants (like separators or URL scheme names) in an algorithm
> > that can manipulate either binary or text, but not a combination of
> > the two (the latter is still a nice improvement in correctness over
> > Python 2, which allowed them to be mixed freely rather than requiring
> > consistency across the inputs).
>
> Unfortunately I suspect there are still examples where asciistr's
> "submissive" behavior can produce surprises. E.g. consider a function
> of two arguments that must either be both bytes or both str. It's
> easily conceivable that for certain combinations of incorrect
> arguments (i.e. one bytes and one str) the function doesn't raise an
> error but returns something of one or the other type. (And this is
> exactly the Python 2 outcome we're trying to avoid.)

Yep - that's why I consider asciistr to be firmly in the "power tool"
category. If you know what you're doing, it should let you write hybrid API
code that is just as concise as Python 2, but it's also far more error
prone than the core Python 3 text model.

I admit that's a key part of my motivation in trying to help Benno to
create it - I want to show that it's not that you *can't* write code that
way in Python 3, it's that there are good reasons why you *shouldn't*.

And in cases where those reasons don't apply... well, the aim in that case
is "pip install asciicompat" and away you go :)

> > It's still slightly different from Python 2, though. In Python 2, the
> > interaction model was:
> >
> > str & str -> str
> > str & unicode -> unicode
> >
> > (with the one exception being str.format: that consistently produces
> > str rather than promoting to Unicode)
>
> Or raises good old UnicodeError. :-(

Unless Benno fixed it in the last couple of days (which seems unlikely
given the complexity of the problem), asciistr currently has the Python 3
behaviour of interpolating the bytes repr() into the string rather than
trying to decode it. That's a key reason why it likely *won't* be a
substitute for PEP 460.

> > My goal for asciistr is that it should exhibit the following behaviour:
> >
> > str & asciistr -> str
> > asciistr & asciistr -> str (making it asciistr would be a pain and
> > I don't have a use case for that)
>
> I almost had one in the example code I sent in re

Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Raymond Hettinger

On Jan 14, 2014, at 9:12 PM, Antoine Pitrou  wrote:

> I'm +1 on the sidefile approach. +0 on the various buffer approaches.
> -0.5 on the current "sprinkled everywhere" approach.

I concur with Antoine except that I'm a full -1 on commingling
generated code with hand edited code.   Sprinked everywhere
interferes with my ability to grok the code.  It interferes with 
code navigation.  And it creates a greater risk of accidentally
editing the generated code.

FWIW, I think everyone should place a lot of weight on
Serhiy's comments and suggestions.  His reasoning is
clear and compelling.  And the thoughts are all soundly
based on extensive experience with the clinic's effect on
the C source code.


Raymond___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Yury Selivanov
On January 14, 2014 at 4:36:00 PM, Ethan Furman ([email protected]) wrote:
> 
> On 01/14/2014 12:57 PM, Antoine Pitrou wrote:
> > On Tue, 14 Jan 2014 11:56:25 -0800
> > Ethan Furman wrote:
> >>
> >> %s, because it is the most general, has the most convoluted 
> resolution:
> >>
> >> - input type is bytes?
> >> pass it straight through
> >
> > It should try to get a Py_buffer instead.
> 
> Meaning any bytes or bytes-subtype will support the Py_buffer 
> protocol, and this should be the first thing we try?
> 
> Sounds good.
> 
> For that matter, should the first test be "does this object support 
> Py_buffer" and not worry about it being
> isinstance(obj, bytes)?
> 
> 
> >> - input type is numeric?
> >> use its __xxx__ [1] [2] method and ascii-encode it (strictly) 
> >
> > What is the definition of "numeric"?
> 
> That is a key question.

isinstance(o, numbers.Number) ?

Yury
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Antoine Pitrou
On Tue, 14 Jan 2014 13:07:57 -0800
Ethan Furman  wrote:
> 
> Meaning any bytes or bytes-subtype will support the Py_buffer protocol, and 
> this should be the first thing we try?
> 
> Sounds good.
> 
> For that matter, should the first test be "does this object support 
> Py_buffer" and not worry about it being 
> isinstance(obj, bytes)?

Yes, unless the implementation wants to micro-optimize stuff.

> >> - input type is numeric?
> >>   use its __xxx__ [1] [2] method and ascii-encode it (strictly)
> >
> > What is the definition of "numeric"?
> 
> That is a key question.
> 
> Obviously we have int, float, and complex.  We also have Decimal.

The question is also how do you test for them? Decimal is not a core
builtin type. Do we need some kind of __bformat__ protocol?

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Zachary Ware
On Tue, Jan 14, 2014 at 2:54 PM, Larry Hastings  wrote:
> I will consider you a +1 on the "buffer" approach and NaN on the other
> approaches.

Oops, I'll give you some real numbers:

-1 _pickle.original.c
+1 _pickle.using-buffer.c
+0 _pickle.using-modified-buffer.c
+1  _pickle.using-multiple-buffers.c
+0 _pickle.using-sidefile.c

> That wouldn't be too hard.  But conceptually it would make Clinic much more
> complicated.  For example, I suggest that "later" is a confusing name,
> because the output will actually happen *earlier* in the file.  "If it's
> hard to explain, it may be a bad idea." ;-)

Fair enough :).  "later" makes sense to me as "there's nothing in the
buffer now, but there will be later; dump it here then".  The spark
for this idea is in _winapi.c, where OverlappedObject's methoddef is
actually before any of the methods are implemented which makes a
certain amount of sense as a list of what will be implemented; but as
far as I can tell, it isn't possible to replicate this with Clinic
right now.  Having read the readme in your examples, this could also
help with the chicken-and-egg problem you talked about using the
various buffers: dump docstrings at the top, followed by prototypes,
then methoddef defines near where they're needed (or even perhaps
output them directly into the PyMethodDef structure, no defines
needed).

-- 
Zach
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Ethan Furman

On 01/14/2014 01:17 PM, Mark Lawrence wrote:

On 14/01/2014 20:54, Guido van Rossum wrote:

On Tue, Jan 14, 2014 at 12:13 PM, Ethan Furman  wrote:


In Py2, because '%15s' can actually take 17 characters, I have to use '%15s'
% data_value[:15] everywhere.


Wow. I thought there would be some combination using %.15s but I can't
get that to work. :-(



I believe you wanted this.


a='01234567890123456'
len(a)

17

b = '%15.15s' % a
b;len(b)

'012345678901234'
15


Cool!

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Terry Reedy

Let me answer you both since the issues are related.

On 1/14/2014 7:46 AM, Nick Coghlan wrote:


Guido van Rossum writes:
  > And that is precisely my point. When you're using a format string,


Bytes interpolation uses a bytes format, or a byte string if you will, 
but it should not be thought of as a character or text string. Certain 
bytes (123 and 125) delimit a replacement field. The bytes in between 
define, in my version, a format-spec after being ascii-decoded to text 
for input to 3.x format(). The decoding and subsequent encoding would 
not be needed if 2.7 format(ob, byte-spec) were available.



  > all of the format string (not just the part between { and }) had
  > better use ASCII or an ASCII superset.


I am not even sure what you mean here. The bytes outside of 123 and 125 
are simply copied to the output string. There is no encoding or 
interpretation involved.


It is true that the uninterpred bytes best not contain a byte pattern 
mistakenly recognized as a replacement field. I plan to refine the 
relational expression byte pattern used in byteformat to sharply reduce 
the possibility of such errors. When such errors happen anyway, an 
exception should be raised, and I plan to expand the error message to 
give more diagnostic information.



And this (rightly) constrains the output to an ASCII superset as well.


What does this mean? I suspect I disagree. The bytes interpolated into 
the output bytes can be any bytes.



Except that if you interpolate something like Shift JIS,


Bytes interpolation interpolates bytes, not encodings. A 
self-identifying byte stream starts with bytes in a known encoding that 
specifies the encoding of the rest of the stream. Neither part need be 
encoded text. (Would that something like were standard for encoded text 
streams, as well as for serialized images.)


>> [snip]


Right, that's the danger I was worried about, but the problem is that
there's at least *some* minimum level of ASCII compatibility that
needs to be assumed in order to define an interpolation format at all
(this is the point I originally missed).


I would put this sightly differently. To process bytes, we may define 
certain bytes as metabytes with a special meaning. We may choose the 
bytes that happen to be the ascii encoding of certain characters. But 
once the special numbers are chosen, they are numbers, not characters.


The problem of metabytes having both a normal and special meaning is 
similar to the problem of metacharacters having both a normal and 
special meaning.



For printf-style formatting,
it's % along with the various formatting characters and other syntax
(like digits, parentheses, variable names and "."), with the format
method it's braces, brackets, colons, variable names, etc.


It is the bytes corresponding to these characters. This is true also of 
the metabytes in an re module bytes pattern.



The mini-language parser has to assume in encoding

> in order to interpret the format string,

This is where I disagree with you and Guido. Bytes processing is done 
with numbers 0 <= n <= 255, not characters. The fact that ascii 
characters can, for convenience, be used in bytes literals to indicate 
the corresponding ascii codes does not change this. A bytes parser looks 
for certain special numbers. Other numbers need not be given any 
interpretation and need not represent encoded characters.


> and that's *all* done assuming an ASCII compatible format string

Since any bytes can be be regarded as an ascii-compatible latin-1 
encoded string, that seems like a vacuous assumption. In any case, I do 
not seen any particular assumption in the following, other than the 
choice of replacement field delimiters.


>>> list(byteformat(bytes([1,2,10, 123, 125, 200]),
   (bytes([50, 100, 150]),)))
[1, 2, 10, 50, 100, 150, 200]

> (which must make life interesting if you try to use an

ASCII incompatible coding cookie for your source code - I'm actually
not sure what the full implications of that *are* for bytes literals
in Python 3).


An interesting and important question. The Python 2 manual says that the 
coding cookie applies to only to comments and strings. To me, this 
suggests that any encoding can be used. I am not sure how and when the 
encoding is applied. It suggests that the sequence of bytes resulting 
from a string literal is not determined by the sequence of characters 
comprising the string literal, but also depends on the coding cookie.


The Python 3 manual says that the coding cookie applies to the whole 
source file. To me, this says that the subset of unicode chars included 
in the encoding *must* include the ascii characters. It also suggest to 
me that the encoding must also ascii-compatible, in order to read the 
encoding in the ascii-text coding cookie (unless there is a fallback to 
the system encoding).


In any case, a 3.x source file is decoded to unicode. When the sequence 
of unicode chars comprising a bytes literal is interpreted, the 
re

Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Greg Ewing

Guido van Rossum wrote:

def spam(a):
r = asciistr('(')
if a: r += a.strip()
r += asciistr(')')
return r

 The general fix would be to add

else: r += a[:0]


The awkwardness might be reducable if asciistr let
you write something like

   r = asciistr('(', a)

meaning "give me either a string or bytes containing
the value '(', depending on the type of a".

But taking a step back, how bad would it really be
if an asciistr were returned in this case? Is it
just that asciistr doesn't behave exactly like a str
in all situations, so it might break something?

If so, would it help if asciistr were a built-in
type, so that other things could be made aware of
it?

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Glenn Linderman

On 1/14/2014 4:46 AM, Nick Coghlan wrote:

The one remaining way I could potentially see a formatb method working
is along the lines of what Glenn (I think) suggested: just like struct
definitions, the formatb specifier would have to consist*solely*  of
substitution fields. However, that's getting awfully close to being
just an alternate spelling for the struct module or bytes.join at that
point, which hardly makes for a compelling case to add two new methods
to a builtin type.


Yes, after someone drew the parallel between my "format specifier only" 
pedantry, and struct.pack (which I hadn't used), I agree that they are 
almost just different spellings for the same things.


The two differences I could see is that struct.pack doesn't support 
variable length items, and struct.pack doesn't support "interpolation", 
which is the whole beauty of the % type syntax... the ability to have a 
template, and interpolate values.


My pedantry DID allow for template work, but they had to be specified in 
HEX the way I specified it yesterday.


Let me repeat that syntax:

b"%{hex-codes}v"

That was mostly so the format string could be ASCII, yet represent any 
byte. That is somewhat clunky, when actually wanting to represent 
characters.  At the next level of abstraction, one could define a 
"format builder" that would take Unicode specifications, and "compile" 
them into the binary interpolation strings, but if doing that, you could 
just as well compile them into functions using struct.pack formats, with 
the parameters interspersed with the "template" data, except for 
struct.pack's inability to deal with variable length data.


So struct is attempting to emulate C structs, and variable length data 
is extremely awkward in C structs also, so I guess it provides a good 
emulation :)


So if I were to look for features to add to Python3 to support template 
interpolation for users of non-ASCII encodings, which could, of course, 
also be used by users of ASCII-based encodings, I guess I would recommend:


1) extend struct to handle variable length data items
2) provide a sample format compiler function that would translate a 
Unicode format description into a function that would use struct.pack, 
and pre-encode (according to the format specification) the template 
parts into parameters for struct.pack).
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Nick Coghlan
On 15 Jan 2014 08:00, "Greg Ewing"  wrote:
>
> Guido van Rossum wrote:
>>
>> def spam(a):
>> r = asciistr('(')
>> if a: r += a.strip()
>> r += asciistr(')')
>> return r
>>
>>  The general fix would be to add
>>
>> else: r += a[:0]
>
>
> The awkwardness might be reducable if asciistr let
> you write something like
>
>r = asciistr('(', a)
>
> meaning "give me either a string or bytes containing
> the value '(', depending on the type of a".
>
> But taking a step back, how bad would it really be
> if an asciistr were returned in this case? Is it
> just that asciistr doesn't behave exactly like a str
> in all situations, so it might break something?
>
> If so, would it help if asciistr were a built-in
> type, so that other things could be made aware of
> it?

That way lies the Python 2 text model, and we're not going there. It's
probably best to think of asciistr as a way of demonstrating a rhetorical
point about the superiority of the Python 3 text model rather than
something that anyone should actually use in production Python 3 code
(although, depending on how rough the edges turn out to be, it *might*
eventually find a place in some single source 2/3 code bases, as well as in
prototype code and personal scripts).

Cheers,
Nick.

>
> --
> Greg
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Georg Brandl
Am 14.01.2014 21:22, schrieb Larry Hastings:
> 
> On 01/11/2014 07:35 PM, Larry Hastings wrote:
>>
>> On 01/08/2014 07:08 AM, Barry Warsaw wrote:
>>> How hard would it be to put together some sample branches that provide
>>> concrete examples of the various options?
>>>
>>> My own opinion could easily be influenced by having some hands-on time with
>>> actual code, and I suspect even Guido could be influenced if he could pull
>>> some things up in his editor and take a look around.
>>
>> I've uploaded a prototype here:
>>
>> https://bitbucket.org/larry/python-clinic-buffer
>>
> 
> 
> I have now received exactly zero feedback about the prototype, which suggests
> people aren't using it.  In an attempt to jump-start this conversation, I've
> created a new repository containing the "concrete examples of the various
> options" that Barry proposed above.  You may find it here:
> 
> https://bitbucket.org/larry/clinic-buffer-samples/src
> 
> In it I converted Modules/_pickle.c four different ways.  There's a README,
> please read it.
> 
> People who want to change how Clinic writes its output: this is your big
> chance.  Comment on these samples, or produce your own counterexamples, or
> something.  If you can enough people on your side maybe Clinic will change.  
> If
> there is no further debate on this topic, nothing will happen and Clinic will
> not change.

Having converted several modules to AC, I think I'm

-1 original
+0 sidefile
+1 multiple buffers
+0 buffer
-0 modified buffer

Georg

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Greg Ewing

Guido van Rossum wrote:

Actually, Nick explained that asciistr() + asciistr() returns str,


That part seems wrong to me, because it means that
you can't write polymorphic byte/string functions
that are composable.

I would be -1 on that, and prefer that
asciistr + asciistr --> asciistr.

--
Greg


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Nick Coghlan
On 15 Jan 2014 07:36, "Ethan Furman"  wrote:
>
> On 01/14/2014 12:57 PM, Antoine Pitrou wrote:
>>
>> On Tue, 14 Jan 2014 11:56:25 -0800
>> Ethan Furman  wrote:
>>>
>>>
>>> %s, because it is the most general, has the most convoluted resolution:
>>>
>>> - input type is bytes?
>>>   pass it straight through
>>
>>
>> It should try to get a Py_buffer instead.
>
>
> Meaning any bytes or bytes-subtype will support the Py_buffer protocol,
and this should be the first thing we try?
>
> Sounds good.
>
> For that matter, should the first test be "does this object support
Py_buffer" and not worry about it being isinstance(obj, bytes)?

Yep. I actually suggest adjusting the %s handling to:

- interpolate Py_buffer exporters directly
- interpolate __bytes__ if defined
- reject anything with an "encode" method
- otherwise interpolate str(obj).encode("ascii")

>>> - input type is numeric?
>>>   use its __xxx__ [1] [2] method and ascii-encode it (strictly)
>>
>>
>> What is the definition of "numeric"?
>
>
> That is a key question.

As suggested above, I would flip the question and explicitly *disallow*
implicit encoding of any object with its own "encode" method, while
allowing everything else.

Cheers,
Nick.

>
> Obviously we have int, float, and complex.  We also have Decimal.
>
> But what about Fraction?  Or some users numeric class that doesn't
inherit from a core numeric type?  Wherever we draw the line, we need to
make it's well-documented.
>
> --
> ~Ethan~
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Nick Coghlan
On 15 Jan 2014 08:14, "Greg Ewing"  wrote:
>
> Guido van Rossum wrote:
>>
>> Actually, Nick explained that asciistr() + asciistr() returns str,
>
>
> That part seems wrong to me, because it means that
> you can't write polymorphic byte/string functions
> that are composable.
>
> I would be -1 on that, and prefer that
> asciistr + asciistr --> asciistr.

You have to pretty much reimplement str to do that. I wouldn't say no to a
patch that implemented it, but we're unlikely to do that much work
ourselves for something which is primarily intended as a proof of concept.

Cheers,
Nick.

>
> --
> Greg
>
>
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Ethan Furman

On 01/14/2014 02:17 PM, Nick Coghlan wrote:


On 15 Jan 2014 07:36, "Ethan Furman" mailto:[email protected]>> wrote:


On 01/14/2014 12:57 PM, Antoine Pitrou wrote:


On Tue, 14 Jan 2014 11:56:25 -0800
Ethan Furman mailto:[email protected]>> wrote:



%s, because it is the most general, has the most convoluted resolution:

- input type is bytes?
  pass it straight through



It should try to get a Py_buffer instead.



Meaning any bytes or bytes-subtype will support the Py_buffer protocol, and 
this should be the first thing we try?

Sounds good.

For that matter, should the first test be "does this object support Py_buffer" 
and not worry about it being isinstance(obj, bytes)?


Yep. I actually suggest adjusting the %s handling to:

- interpolate Py_buffer exporters directly
- interpolate __bytes__ if defined
- reject anything with an "encode" method
- otherwise interpolate str(obj).encode("ascii")


- input type is numeric?
  use its __xxx__ [1] [2] method and ascii-encode it (strictly)



What is the definition of "numeric"?



That is a key question.


As suggested above, I would flip the question and explicitly *disallow* 
implicit encoding of any object with its own
"encode" method, while allowing everything else.


Um, int and floats (for example) don't have an .encode method, don't export Py_buffer, don't have a __bytes__ method... 
Ah! so it would hit the last case, I see.


The danger I see with that route is that any ol' object could then make it into the byte stream, and considering what 
byte streams are for I think we should make the barrier for entry higher than just relying on a __str__ or __repr__.


--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Larry Hastings

On 01/14/2014 12:51 PM, Ethan Furman wrote:
I checked the README, the current file, and the buffered files.  My 
preferences from highest to lowest:


  - modified buffer approach
  - buffer approach
  - side file



Could you put that in the form of numbers from +1 to -1?  I'm literally 
making a spreadsheet to tally people's votes.



Thanks for taking the time, Larry!


Thanks for participating in this sham democracy!  ;-)


//arry/
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Larry Hastings

On 01/14/2014 01:38 PM, Raymond Hettinger wrote:


On Jan 14, 2014, at 9:12 PM, Antoine Pitrou > wrote:



I'm +1 on the sidefile approach. +0 on the various buffer approaches.
-0.5 on the current "sprinkled everywhere" approach.


I concur with Antoine except that I'm a full -1 on commingling
generated code with hand edited code.   Sprinked everywhere
interferes with my ability to grok the code.  It interferes with
code navigation.  And it creates a greater risk of accidentally
editing the generated code.

FWIW, I think everyone should place a lot of weight on
Serhiy's comments and suggestions.  His reasoning is
clear and compelling.  And the thoughts are all soundly
based on extensive experience with the clinic's effect on
the C source code.


For the record I don't much care which of these Clinic does.  My hope is 
just that the Python core dev community accepts Argument Clinic.  If it 
forms a consensus around changing Clinic's output I'd be happy to oblige.


But there's one important caveat to the above.  As I recall, Guido has 
stated that he hates storing generated code in separate files. He has 
yet to rescind or weaken that pronouncement.  Until such time as he 
does, the "side file" approach is off the table.  I implemented it in 
the prototype purely for the purpose of fostering debate, so the "side 
file" proponents can try to convince him that it's necessary or that 
it's not so bad.  But it's not going in without Guido's approval.  As 
you yourself say--"Python is Guido's language, he just lets us use it."


I'm not the person you have to convince,


//arry/
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Nick Coghlan
On 15 Jan 2014 08:23, "Ethan Furman"  wrote:
>
> On 01/14/2014 02:17 PM, Nick Coghlan wrote:
>>
>>
>> On 15 Jan 2014 07:36, "Ethan Furman" > wrote:
>>>
>>>
>>> On 01/14/2014 12:57 PM, Antoine Pitrou wrote:


 On Tue, 14 Jan 2014 11:56:25 -0800
 Ethan Furman mailto:[email protected]>> wrote:
>
>
>
> %s, because it is the most general, has the most convoluted
resolution:
>
> - input type is bytes?
>   pass it straight through



 It should try to get a Py_buffer instead.
>>>
>>>
>>>
>>> Meaning any bytes or bytes-subtype will support the Py_buffer protocol,
and this should be the first thing we try?
>>>
>>> Sounds good.
>>>
>>> For that matter, should the first test be "does this object support
Py_buffer" and not worry about it being isinstance(obj, bytes)?
>>
>>
>> Yep. I actually suggest adjusting the %s handling to:
>>
>> - interpolate Py_buffer exporters directly
>> - interpolate __bytes__ if defined
>> - reject anything with an "encode" method
>> - otherwise interpolate str(obj).encode("ascii")
>>
> - input type is numeric?
>   use its __xxx__ [1] [2] method and ascii-encode it (strictly)



 What is the definition of "numeric"?
>>>
>>>
>>>
>>> That is a key question.
>>
>>
>> As suggested above, I would flip the question and explicitly *disallow*
implicit encoding of any object with its own
>> "encode" method, while allowing everything else.
>
>
> Um, int and floats (for example) don't have an .encode method, don't
export Py_buffer, don't have a __bytes__ method... Ah! so it would hit the
last case, I see.
>
> The danger I see with that route is that any ol' object could then make
it into the byte stream, and considering what byte streams are for I think
we should make the barrier for entry higher than just relying on a __str__
or __repr__.

Yeah, reading the other thread pointed out the issues with this idea
(containers in particular are a problem).

I think Brett has the right idea: we shouldn't try to accept numbers for %s
in binary interpolation. If we limit it to just buffer exporters and
objects with a __bytes__ method then the problem goes away.

The numeric codes all exist in Python 2, so the porting requirement to the
common 2/3 subset will be to update the cases of binary interpolation of a
number with %s to use an appropriate numeric formatting code instead.

Cheers,
Nick.

>
>
> --
> ~Ethan~
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Guido van Rossum
I think of PEP 460 as the strict version and PEP 461 as the lenient
version. I don't think it makes sense to have more variants. So please
collaborate with whichever you like best. :-)

On Tue, Jan 14, 2014 at 1:11 PM, Ethan Furman  wrote:
> On 01/14/2014 01:05 PM, Brett Cannon wrote:
>
>> On Tue, Jan 14, 2014 at 2:55 PM, Ethan Furman wrote:
>>
>>> This PEP goes a but further than PEP 460 does, and hopefully spells
>>> things out in enough detail so there is no confusion as to what is
>>>  meant.
>>
>>
>> Are we going down the PEP route with the various ideas? Guido, do
>>  you want one from me as well or should I not bother?
>
>
> While I can't answer for Guido, I will say I authored this PEP because
> Antoine didn't want 460 to be any more liberal than it already was.
>
> If you collect your ideas together, I'll add them to 461 as questions or
> discussions or however is appropriate (assuming you're willing to go that
> route).
>
> --
> ~Ethan~



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Serhiy Storchaka

14.01.14 23:38, Raymond Hettinger написав(ла):

On Jan 14, 2014, at 9:12 PM, Antoine Pitrou mailto:[email protected]>> wrote:


I'm +1 on the sidefile approach. +0 on the various buffer approaches.
-0.5 on the current "sprinkled everywhere" approach.


I concur with Antoine except that I'm a full -1 on commingling
generated code with hand edited code.   Sprinked everywhere
interferes with my ability to grok the code.  It interferes with
code navigation.  And it creates a greater risk of accidentally
editing the generated code.


As expected I'm same as Raymond. +1 on the sidefile approach, -1 on the 
current "sprinkled everywhere" approach, and about 0 on the various 
buffer approaches.


Yet one nitpick. I prefer to have a sidefile with some unique suffix 
(e.g. .clinic) at the end of file name rather than in the middle. 
_pickle.c.clinic is better then _pickle.clinic.c (even .c in middle is 
not needed, it can be _pickle.clinic).


My reasons:

1. I very very often use global search in sources. It's my way of 
navigation and it's my way of investigations. I don't want to get false 
results in generated files. And it is much easy to specify mask '*.[ch]' 
or '*.c,*.h' (depending on tool) than specify a mask and negative mask. 
The latter is even not always possible, I can write cumbersome 
expression for the find command, but Midnight Commander doesn't support 
negative masks at all (and perhaps your favorite IDE doesn't support 
them too).


2. I'm not use any IDE, but if you use, it can be important for you. If 
IDE shows sources tree, unlikely you want to see generated *.clinic.c 
files in them. This will increase the list of sources almost twice.


3. Pathname expansion works better with unique endings, You can open all 
Modules/_io/*.c files, but unlikely you so interested in *.clinic.c 
files which are matched by former pattern.



___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Mark Lawrence

On 14/01/2014 19:55, Ethan Furman wrote:

This PEP goes a but further than PEP 460 does, and hopefully spells
things out in enough detail so there is no confusion as to what is meant.

--
~Ethan~


Out of plain old curiosity do we have to consider PEP 292 string 
templates in any way, shape or form, or regarding this debate have they 
been safely booted into touch?


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Guido van Rossum
On Tue, Jan 14, 2014 at 1:37 PM, Nick Coghlan  wrote:
> Yep - that's why I consider asciistr to be firmly in the "power tool"
> category. If you know what you're doing, it should let you write hybrid API
> code that is just as concise as Python 2, but it's also far more error prone
> than the core Python 3 text model.

Hm. It sounds like the kind of power tool that only candidates for the
Darwin award would use.

The more I hear you defend it, the less I think it's a good idea for
*anything*. And limiting it to PyPy doesn't make it less dangerous.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] The asciistr problem

2014-01-14 Thread Greg Ewing

Guido van Rossum wrote:

I understand that '&' here stands for "any arbitrary combination", but
what about searches? Given that asciistr's base class is str, won't it
still blow up if you try to use it as an argument to e.g.
bytes.startswith()? Equality tests also sound problematic; is b'x' ==
asciistr('x') == 'x' ???


I'm wondering whether asciistr shouldn't be a *type*
at all, but just a function that constructs a string
with the same type as another string.

All of these problems then go away. Instead of

   foo.startswith(asciistr("prefix"))

you would write

   foo.startswith(asciistr("prefix", foo))

There's also no chance of an asciistr escaping into
the wild, because there's no such thing.

We probably want a more compact way of writing it,
though. Ideally it would support currying. If we
have a number of string literals in our function,
we'd like to be able to write something like this
at the top:

   def myfunc(a):
  s = stringtype(a)
  ...

and then use s('foo') to construct all our string
literals inside the function.

We could go further. If the function has more than
one string argument, they're probably constrained
to be of the same type, so in the interests of
symmetry it would be nice if we could write

   def myfunc(a, b):
  s = stringtype(a, b)
  ...

and have it raise a TypeError if a and b are not
of the same string type.

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Serhiy Storchaka

15.01.14 00:40, Guido van Rossum написав(ла):

I think of PEP 460 as the strict version and PEP 461 as the lenient
version. I don't think it makes sense to have more variants. So please
collaborate with whichever you like best. :-)


Perhaps the consensus will be PEP 460.5? Or PEP 460.3, or may be PEP 460.7?


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Greg Ewing

Ethan Furman wrote:

On 01/14/2014 10:11 AM, Jim J. Jewett wrote:



But in terms of explaining the text model, that
separation is important enough that




 (2)  It *may* be worth creating a virtual
  split in the documentation.



I think (2) is a great idea.


I don't think it's such a great idea to belabour this
point.

The notion of an ASCIIStructuredBytes type seems to
assume that you have *either* ascii-encoded text *or*
some other kind of data. But many of the use cases
for all of this involve constructing a single object,
parts of which are one and parts of which are another.
It's hard to think of that in terms of virtual
classes unless you're willing to imagine that different
parts of the same object are of different types,
which, for a primitive object like bytes, doesn't
make sense in the context of the Python object
model.

By all means point out that the ascii features of
bytes are intended for use on data that happens to
be ascii, and shouldn't be used otherwise. But I
think that talking about "virtual classes" just
risks confusing people, particulary when we
have ABCs, which are also a kind of virtual class
represented by real class objects.

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Larry Hastings



On 01/14/2014 01:38 PM, Raymond Hettinger wrote:

FWIW, I think everyone should place a lot of weight on
Serhiy's comments and suggestions.  His reasoning is
clear and compelling.  And the thoughts are all soundly
based on extensive experience with the clinic's effect on
the C source code.


One more bit of anecdotal evidence.  I suggest that I have easily the 
most extensive experience with working with Clinic.  Serhiy filed his 
first patch converting to Clinic nine days ago; I've been working on 
Clinic for about eighteen months, on and off.  And I got used to living 
with the "sprinkling" approach a long long time ago. I no longer ever 
mistake generated code for handwritten code, and I don't ever modify the 
generated text.  It's basically fine.


This is not to dismiss Serhiy's observations.  Nor to say that my 
experiences will be universal.  Nor indeed to suggest that learning to 
live with Clinic's out as it exists today is a desirable skill. I merely 
suggest that if we didn't modify the output of Clinic it might be 
survivable.


Cheers,


//arry/

p.s. " You get used to it. I...I don't even see the code. All I see is 
blonde, brunette, red-head. Hey, you uh... want a drink?"
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Greg Ewing

Guido van Rossum wrote:

Quite a few people have spoken out in favor of loud
failures rather than silent "wrong" output. But I think that in the
specific context of formatting output, there is a long and IMO good
tradition of producing (slightly) wrong output in favor of more strict
behavior. Consider for example what to do when a number doesn't fit in
the given width. Would you rather raise an exception, truncate the
value, or mess up the formatting?


That depends on the context. If the output is simply a text
file whose lines can grow to accommodate the extra width,
messing up the formatting probably okay.

If it's going into a printed report with a strictly limited
width for each column, and anything that doesn't fit is
going to get graphically clipped away, with no visual
indication that this has happened, it's NOT okay.

If it's going into a text file with defined columns for
each field, which will be read by something that assumes
certain things are in certain columns, it's NOT okay.

If it's going into a binary file as a field consisting
of a length byte followed by some chars, messing up the
formatting is DEFINITELY NOT okay.

This latter kind of situation is the one we're talking
about. If you do something like

   b"%c%s" % (len(data), data)

and data is a str, then the length byte will be correct,
but the data will be (at least) 3 bytes too long. Whatever
reads the file then gets out of step at that point, and
all hell breaks loose.

You do *not* get a nice, easy-to-debug symptom from this
kind of thing. You get "Something is wrong somewhere in
this 50 megabyte jpg file, good luck on finding out what
and why".

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Greg Ewing

Nick Coghlan wrote:


On 15 Jan 2014 08:00, "Greg Ewing" > wrote:

>

 > If so, would it help if asciistr were a built-in
 > type, so that other things could be made aware of
 > it?

That way lies the Python 2 text model, and we're not going there. It's 
probably best to think of asciistr as a way of demonstrating a 
rhetorical point about the superiority of the Python 3 text model


Hmmm... something like "The Python 3 text model is so
superior that we have to use this weird hack to write
something that makes perfectly good semantic sense
but is very awkward to write otherwise" ?-)

Anyhow, I've now convinced myself that asciistr as
a type is completely unnecessary -- see earlier post.

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Yury Selivanov
Even though I’m not a core dev, I happen to work with cpython source
code quite a lot, whether it’s me working on a C extension, or just
digging it for some obscure details of how python works.

And what I want to say is that cpython sources are great. They are
easy to understand even for people who don’t know C. What’s even
more important, they are easy to navigate them. Having clinic-
produced code here and there will surely complicate this. Of course,
if you work with cpython code 24/7, you will adapt, and won’t even
notice it, but for occasional users like me it will require more
focus.

For my use pattern, having clinic to produce a separate file (with
a distinct extension like “.c.clinic”) would be a huge win. Besides
just clean source files, it will also make it easier to:

- review patches;

- work with repository: logs, blames, diffs, etc;

- adjusting workflow - in sublime text / eclipse / almost any IDE
it would be just a file mask to hide the clinic output completely
(and you don’t need to see it anyways).

And besides just cpython, as I understand, the clinic should be
used not just by cpython core devs for cpython sources, but also
by numerous authors of C extensions.

So my vote is:

+1 for side files

0 for the current state of things

-1 for buffers (as it makes no sense to me why would you want
to have generated code at almost random places)

Thanks,
Yury


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] magic method __bytes__

2014-01-14 Thread Steven D'Aprano
On Tue, Jan 14, 2014 at 10:58:49AM -0500, R. David Murray wrote:
> On Mon, 13 Jan 2014 17:38:38 -0800, Ethan Furman  wrote:
> > Has anyone actually used __bytes__ yet?  What for?
> 
> bytes(email.message.Message()) returns the message object serialized to
> "wire format".
> 
> --David
> 
> PS: I've always thought of "wire format" as *including* files...a file is
> a just a "wire" with an indefinite destination and transmission time

Nice analogy! I must steal it :-)


-- 
Steven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Glenn Linderman

On 1/14/2014 10:11 AM, Jim J. Jewett wrote:

Virtual subclass ASCIIStructuredBytes


You would first have to define what you meant by a virtual subclass, and 
that somewhere would have to be linked every place you use the term, 
because it is a new term.


Why not just call the sections of the documentation where 
ASCII-supporting features of bytes are discussed "Special ASCII 
support". Calling it that will make it clear that if you are not using 
ASCII, you need to be careful of using the feature... or contrariwise, 
that if you are using the feature, you need to be using ASCII.


While some ASCII supersets may also be usable with the features, I don't 
think that should be emphasized in anyway, unless there is specific 
support for particular ASCII supersets. Using ASCII supersets should be 
"buyer beware".


The whole b"%s" interpolation feature would, appropriately, be described 
in such a section.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Glenn Linderman

On 1/14/2014 2:38 PM, Nick Coghlan wrote:


I think Brett has the right idea: we shouldn't try to accept numbers 
for %s in binary interpolation. If we limit it to just buffer 
exporters and objects with a __bytes__ method then the problem goes away.


The numeric codes all exist in Python 2, so the porting requirement to 
the common 2/3 subset will be to update the cases of binary 
interpolation of a number with %s to use an appropriate numeric 
formatting code instead.



+1
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Steven D'Aprano
On Tue, Jan 14, 2014 at 10:16:17AM -0800, Guido van Rossum wrote:

> Hm. It is beginning to sound more and more flawed. I also worry that
> it will bring back the nightmare of data-dependent UnicodeError back.
> E.g. this (from tests/basic.py):
> 
> def test_asciistr_will_not_accept_codepoints_above_127(self):
> self.assertRaises(ValueError, asciistr, 'Schrödinger')
> 
> looks reasonable enough when you assume asciistr() is always used with
> a literal as argument -- but I suspect that plenty of people would
> misunderstand its purpose and write asciistr(s) as a "clever" way to
> turn a string into something that's compatible with both bytes and
> strings... :-(

I am one of those people. I've been trying to keep on top of this 
enormous multiple-thread discussion, and although I haven't read every 
single post in its entirety, I thought I understand the purpose of 
asciistr was exactly that, to produce something that was compatible with 
both bytes and strings.


-- 
Steven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Ryan Smith-Roberts
I favor a dual-mode approach. I think the existing behavior is best for the
conversion of existing modules, because it's easy to interactively verify
the generated code. Once that's done, long-term maintenance definitely
favors a more centralized format.

+1 _pickle.original.c /* used only during conversion of existing modules */
+0 _pickle.using-buffer.c
+1 _pickle.using-modified-buffer.c
-1  _pickle.using-multiple-buffers.c
NaN _pickle.using-sidefile.c /* not enough experience with it */

Pondering it this afternoon, I thought of a configuration that minimizes
both code churn and readability impact: two buffers. One at the top
containing forward declarations and defines (an inline header file if you
like), and the rest of the autogenerated code at the bottom. It's not
obvious that AC currently supports this configuration, or backtracking of
any kind. Nonetheless:

+1 _pickle.using-two-buffers.c


On Tue, Jan 14, 2014 at 12:22 PM, Larry Hastings  wrote:

>
> On 01/11/2014 07:35 PM, Larry Hastings wrote:
>
>
> On 01/08/2014 07:08 AM, Barry Warsaw wrote:
>
> How hard would it be to put together some sample branches that provide
> concrete examples of the various options?
>
> My own opinion could easily be influenced by having some hands-on time with
> actual code, and I suspect even Guido could be influenced if he could pull
> some things up in his editor and take a look around.
>
>
> I've uploaded a prototype here:
>
> https://bitbucket.org/larry/python-clinic-buffer
>
>
>
> I have now received exactly zero feedback about the prototype, which
> suggests people aren't using it.  In an attempt to jump-start this
> conversation, I've created a new repository containing the "concrete
> examples of the various options" that Barry proposed above.  You may find
> it here:
>
> https://bitbucket.org/larry/clinic-buffer-samples/src
>
> In it I converted Modules/_pickle.c four different ways.  There's a
> README, please read it.
>
> People who want to change how Clinic writes its output: this is your big
> chance.  Comment on these samples, or produce your own counterexamples, or
> something.  If you can enough people on your side maybe Clinic will
> change.  If there is no further debate on this topic, nothing will happen
> and Clinic will not change.
>
>
> */arry*
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net
>
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Ryan Smith-Roberts
On Tue, Jan 14, 2014 at 5:32 PM, Ryan Smith-Roberts  wrote:

> NaN _pickle.using-sidefile.c /* not enough experience with it */
>

I hate to weasel like that. Intellectually I think I favor the sidefile
over all other approaches for its cleanliness. But I'd have to actively use
it in a workflow a bit to know how practical it is.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Ethan Furman

On 01/14/2014 02:28 PM, Larry Hastings wrote:

On 01/14/2014 12:51 PM, Ethan Furman wrote:

I checked the README, the current file, and the buffered files.  My
preferences from highest to lowest:

  +1   modified buffer approach
  +0.5 buffer approach
  +0   side file


NaN on the others is fine.  ;)

--
~Ethan~

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Steven D'Aprano
On Wed, Jan 15, 2014 at 01:03:13PM +1300, Greg Ewing wrote:
> Nick Coghlan wrote:

> >That way lies the Python 2 text model, and we're not going there. It's 
> >probably best to think of asciistr as a way of demonstrating a 
> >rhetorical point about the superiority of the Python 3 text model
> 
> Hmmm... something like "The Python 3 text model is so
> superior that we have to use this weird hack to write
> something that makes perfectly good semantic sense
> but is very awkward to write otherwise" ?-)

I don't think mixing bytes and strings makes good semantic sense. If 
this discussion has taught me anything, it is that mixing the two is 
"Here Be Dragons" territory, fraught with danger.

It may be that there are applications where mixing them is 
*unavoidable*, but I think that it's never *sensible*.


-- 
Steven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-14 Thread Ethan Furman

On 01/14/2014 05:02 PM, Glenn Linderman wrote:

On 1/14/2014 2:38 PM, Nick Coghlan wrote:


I think Brett has the right idea: we shouldn't try to accept numbers
for %s in binary interpolation. If we limit it to just buffer
exporters and objects with a __bytes__ method then the problem goes away.

The numeric codes all exist in Python 2, so the porting requirement to
the common 2/3 subset will be to update the cases of binary
interpolation of a number with %s to use an appropriate numeric
formatting code instead.


+1


Agreed, PEP updated.

--
~Ethan~

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Larry Hastings


On 01/14/2014 05:32 PM, Ryan Smith-Roberts wrote:
Pondering it this afternoon, I thought of a configuration that 
minimizes both code churn and readability impact: two buffers. One at 
the top containing forward declarations and defines (an inline header 
file if you like), and the rest of the autogenerated code at the 
bottom. It's not obvious that AC currently supports this 
configuration, or backtracking of any kind.


Clinic is strictly one pass currently.  I could add this feature to the 
prototype if there was sufficient interest; for now, I'd accept a patch 
to the clinic-buffer-samples repo adding a sample of your proposal.  
Please start with "_pickle.original.c", and add simulated (but 
deliberately invalid!) Clinic instructions for an authentic flair.  I 
suggest the name "forward" for the destination, and 
"_pickle.using-forward.buffer.c" for the filename.


I take it "forward" would get the methoddef_define, the 
docstring_prototype, and the parser_prototype, "block" would get the 
impl_prototype, and "buffer" would get the docstring_definition and the 
parser_definition?


I'm happy to collect votes for this approach too.  I'll put you down as a +1


//arry/
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Binding problem

2014-01-14 Thread Rob Ward
I apologise if I have come to the wrong place here, but 12hrs searching, plus 
experimenting,  on the WWW for advice on it has not yielded any successful 
advice to resolve the issue.

I am am having trouble binding an Entry widget to 

Here is the snippet (butCol is a Frame widget to put buttons,labels and text 
entry down LHS)

KS1=StringVar()
KS1.set("Key is ??")
butCol.ks1   
=Label(butCol,textvariable=KS1).grid(column=0,row=18,sticky=(N,W))

myKey = [0,2,4,5,7,9,11] #Begin in the key of C
KS2   =StringVar()
KS2.set("C")
butCol.ks2   
=Entry(butCol,width=20,textvariable=KS2).grid(column=0,row=19,sticky=(N,W))


The above lines all render correctly, but will not trigger off "entry" of data 
at all on 

Adding the following line just crashes. 

butCol.ks2.bind("",chooseKey) 


I downloaded the Python 3 package from the recommended site last week to begin 
this project (it was previously working OK in Skulptor, but I was not prepared 
to develop it any further in that environment, though it was an excellent 
starting environment).  So I believe the Python language installed is up 
todate.  I have not previously installed Python so it should be "clean".

If you could give any direct advice I would be very grateful or if you can 
direct me to your best "forum" site maybe I could use that.

One overall observation that has frustrated me is how toi search for 
information that relates to Python3 and the latest tkinter modules.  I kept 
finding old python or old Tkinter or combinations of both.  Wrorking my way 
through this was very time consuming and rarely productive.  Is there any 
advice on how to get the "latest" information off the WWW?



Cheers, Rob Ward

PS In my state of eternal optimism I have attached the whole file :-)

PPS I have done some OOP in the past but not keen to jump in at the moment.
from tkinter import *
from tkinter import ttk

def chooseKey(event):
global myKey
global key
ans = KS2.get().lower
keyS="??,E,A,D,G,C,F,Bb,Eb,Ab"
if (ans =="e"):  #Key of E, 4 Sharps F,C,G,D
  # C D E F G A B 
   myKey = [1,3,4,6,8,9,11]
   keyS="Key: E "
if (ans =="a"):  #Key of A, 3 Sharps F,C,G
   myKey = [1,2,4,6,8,9,11]
   keyS="Key: A "
if (ans =="d"):  #key of D, 2 Sharps F,C 
   myKey = [1,2,4,6,7,9,11]
   keyS="Key: D "
if (ans =="g"):  #key of G, 1 Sharp F 
   myKey = [0,2,4,5,7,9,11]
   keyS="Key: G "
if (ans =="c"):  #Key of C, root scale
   myKey = [0,2,4,5,7,9,11]
   keyS="Key: C"
if (ans =="f"): #Key of F, 1 flat B
   myKey = [0,2,4,5,7,9,10]
   keyS="Key: F "
if (ans =="bb"): #Key of Bb, 2 Flats B,E
   myKey = [0,2,3,5,7,9,10]
   keyS="Key: Bb "
if (ans =="ab"): #Key of Eb, 3 Flats B,E,A 
   myKey = [0,2,3,5,7,8,10]
   keyS="Key: Eb "
if (ans =="eb"): #Key of Ab, 4 Flats B,E,A,D
   myKey = [0,1,3,5,7,8,10]
   keyS="Key: Ab "
loadMidi()
KS1.set(keyS)

#Durations of standard notes
def semiquaver():
print("semiquaver")
global dur
dur = 6
def quaver():
print("quaver")
global dur
dur = 12
def crochet():
print("crochet")
global dur
dur = 24
def semibreve():
print("semibreve")
global dur
dur = 48
def breve():
print("breve")
global dur
dur = 96


def loadMidi():
global myMidi
global myKey
del myMidi[:]
for notes in range(24,106,12):#C1 to C8 in steps of 12
myMidi.append(notes+myKey[0]) #Cskip C#,Db
myMidi.append(notes+myKey[1]) #Dskip Eb,D#
myMidi.append(notes+myKey[2]) #Ehalf tone
myMidi.append(notes+myKey[3]) #Fskip F#, Gb
myMidi.append(notes+myKey[4]) #Gskip G#, Ab
myMidi.append(notes+myKey[5]) #Askip Bb
myMidi.append(notes+myKey[6]) #Bhalf tone
#print 'Init loop',myMidi

#This is the handler for mouse click events. Note that it
#must take one parameter, a tuple of the position of the
#mouse click.
def canvas_click(event):
#print ('Mouse x=', event.x," - ",event.y)
global accidental
global aNote
global dotted
global myMidi
aNote = [int((event.x+5)/10)*10,int((event.y+5)/10)*10] #Quantise position, 
tuplet for X,Y of current note
Y=int(((785-event.y)/10))
#print ('Y-Coord ',Y)
#print ('Midi = ',myMidi[Y])
dummy = 
'"note"'+":["+str(int((aNote[0]-200)/10))+","+str(myMidi[Y]+accidental)+","+str(int(dur*dotted))+"],"
NL1.set(dummy)
#print (dummy)
tune.append(dummy)#Main data list of note information
#Push the Note Grahpic onto a list for redraw and also countback purposes 
for erasing notes and bars

noteHist.append(([aNote[0],aNote[1]+10],[aNote[0],aNote[1]-10],[aNote[0]+10*dur*dotted,aNote[1]]))
accidental=0 #reset accidentals 
sharpS="Sharpen"
natS="Natural True"
flatS="Flatten"
dotted = 1.0 #reset the dotted effect
dotS="Dotted False"
draw_note

Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Cameron Simpson
On 14Jan2014 11:43, Jim Jewett  wrote:
> Greg Ewing replied:
> >> ... ASCII compatible binary data is a
> >> *subset* of arbitrary binary data.
> 
> I wrote: [...]
> >(2)  It *may* be worth creating a virtual
> > split in the documentation. [...]
> 
> Ethan likes the idea, but points out that the term
> "Virtual" is confusing here. [...]
> (A)  What word should I use instead of "Virtual"?
> Imaginary?  Pretend?

I'd title it in terms of a common use case, not a "virtual class".
You even phrase the opening sentence as a use case already.

> (B)  Would it be good/bad/at least make the docs
> easier to create an actual class (or alias)?
> (C)  Same question for a pair of classes provided
> only in the documentation, like example code.

I don't think so. People might use it:-(

[...]
> >  A Bytes object could represent anything, [...]

Tiny nit: shouldn't that be "bytes", not "Bytes"?

> >  appropriate as the underlying storage for a sound sample
> >  or image file.
> >
> >  Virtual subclass ASCIIStructuredBytes
> >  

Possible alternate title:

Common use case: bytes containing text sequences, especially ASCII

Cheers,
-- 
Cameron Simpson 

I think... Therefore I ride.  I ride... Therefore I am.
- Mark Pope 
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Binding problem

2014-01-14 Thread Terry Reedy

On 1/14/2014 7:53 PM, Rob Ward wrote:

I apologise if I have come to the wrong place here,


Yes, you have ;-).
pydev is for development *of* future versions of Python. Try python-list 
for development *with* current version.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Binding problem

2014-01-14 Thread MRAB

On 2014-01-15 00:53, Rob Ward wrote:

I apologise if I have come to the wrong place here, but 12hrs searching,
plus experimenting,  on the WWW for advice on it has not yielded any
successful advice to resolve the issue.

I am am having trouble binding an Entry widget to 

Here is the snippet (butCol is a Frame widget to put buttons,labels and
text entry down LHS)

KS1=StringVar()
KS1.set("Key is ??")
butCol.ks1
=Label(butCol,textvariable=KS1).grid(column=0,row=18,sticky=(N,W))

myKey = [0,2,4,5,7,9,11] #Begin in the key of C
KS2 =StringVar()
KS2.set("C")
butCol.ks2
=Entry(butCol,width=20,textvariable=KS2).grid(column=0,row=19,sticky=(N,W))

The above lines all render correctly, but will not trigger off "entry"
of data at all on 

Adding the following line just crashes.

butCol.ks2.bind("",chooseKey)

I downloaded the Python 3 package from the recommended site last week to
begin this project (it was previously working OK in Skulptor, but I was
not prepared to develop it any further in that environment, though it
was an excellent starting environment).  So I believe the Python
language installed is up todate.  I have not previously installed Python
so it should be "clean".

If you could give any direct advice I would be very grateful or if you
can direct me to your best "forum" site maybe I could use that.

One overall observation that has frustrated me is how toi search for
information that relates to *Python3* and the latest *tkinte*r modules.
I kept finding old python or old Tkinter or combinations of both.
Wrorking my way through this was very time consuming and rarely
productive.  Is there any advice on how to get the "latest" information
off the WWW?

Cheers, Rob Ward

PS In my state of eternal optimism I have attached the whole file :-)

PPS I have done some OOP in the past but not keen to jump in at the moment.

I doubt it crashes. It's more likely that raises an exception 
complaining that 'None' doesn't have a 'bind' attribute.


That's because the .grid method returns None. (So the .pack method.)

Try this:

butCol.ks2 = Entry(butCol, width=20, textvariable=KS2)
butCol.ks2.grid(column=0, row=19, sticky=(N, W))

The same comment applies in a number of other places.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Binding problem

2014-01-14 Thread Guido van Rossum
Hey Rob,

The place to get help with Tkinter is [email protected]. I've
CC'ed that list for you.

--Guido

On Tue, Jan 14, 2014 at 4:53 PM, Rob Ward  wrote:
> I apologise if I have come to the wrong place here, but 12hrs searching,
> plus experimenting,  on the WWW for advice on it has not yielded any
> successful advice to resolve the issue.
>
> I am am having trouble binding an Entry widget to 
>
> Here is the snippet (butCol is a Frame widget to put buttons,labels and text
> entry down LHS)
>
> KS1=StringVar()
> KS1.set("Key is ??")
> butCol.ks1
> =Label(butCol,textvariable=KS1).grid(column=0,row=18,sticky=(N,W))
>
> myKey = [0,2,4,5,7,9,11] #Begin in the key of C
> KS2   =StringVar()
> KS2.set("C")
> butCol.ks2
> =Entry(butCol,width=20,textvariable=KS2).grid(column=0,row=19,sticky=(N,W))
>
> The above lines all render correctly, but will not trigger off "entry" of
> data at all on 
>
> Adding the following line just crashes.
>
> butCol.ks2.bind("",chooseKey)
>
> I downloaded the Python 3 package from the recommended site last week to
> begin this project (it was previously working OK in Skulptor, but I was not
> prepared to develop it any further in that environment, though it was an
> excellent starting environment).  So I believe the Python language installed
> is up todate.  I have not previously installed Python so it should be
> "clean".
>
> If you could give any direct advice I would be very grateful or if you can
> direct me to your best "forum" site maybe I could use that.
>
> One overall observation that has frustrated me is how toi search for
> information that relates to Python3 and the latest tkinter modules.  I kept
> finding old python or old Tkinter or combinations of both.  Wrorking my way
> through this was very time consuming and rarely productive.  Is there any
> advice on how to get the "latest" information off the WWW?
>
>
>
> Cheers, Rob Ward
>
> PS In my state of eternal optimism I have attached the whole file :-)
>
> PPS I have done some OOP in the past but not keen to jump in at the moment.
>
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Jim Jewett
On Tue, Jan 14, 2014 at 3:06 PM, Guido van Rossum  wrote:
> Personally I wouldn't add any words suggesting or referring to the
> option of creation another class for this purpose. You wouldn't
> recommend subclassing dict for constraining the types of keys or
> values, would you?

Yes, and it is so clear that I suspect I'm missing some context for
your question.

Do I recommend that each individual application should create new
concrete classes instead of just using the builtins?  No.

When trying to understand (learn about) the text/binary distinction, I
do recommend pretending that they are represented by separate classes.
 Limits on the values in a bytearray are NOT the primary reason for
this; the primary reason is that operations like the literal
representation or the capitalize method are arbitrary nonsense unless
the data happens to be representing ASCII.

sound_sample.capitalize()  -- syntactically valid, but semantic garbage
header.capitalize() -- OK, which implies that data is an instance
of something more specific than bytes.

Would I recommend subclassing dict if I wanted to constrain the key
types?  Yes -- though MutableMapping (fewer gates to guard) or the
upcoming TransformDict would probably be better still.

The existing dict implementation itself effectively uses (hidden,
quasi-)subclasses to restrict types of keys strictly for efficiency.
(lookdict* variants)

-jJ
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] magic method __bytes__

2014-01-14 Thread Stephen J. Turnbull
R. David Murray writes:

 > a file is a just a "wire" with an indefinite destination and
 > transmission time

+1 QOTW

Of course!  "Store and ... wait for it ... forward" architecture
4-ever!

Store and Forward, Inc.  Since 1969.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Guido van Rossum
I am exhausted from all these discussions. I just recommend not
touching those docs.

On Tue, Jan 14, 2014 at 8:08 PM, Jim Jewett  wrote:
> On Tue, Jan 14, 2014 at 3:06 PM, Guido van Rossum  wrote:
>> Personally I wouldn't add any words suggesting or referring to the
>> option of creation another class for this purpose. You wouldn't
>> recommend subclassing dict for constraining the types of keys or
>> values, would you?
>
> Yes, and it is so clear that I suspect I'm missing some context for
> your question.
>
> Do I recommend that each individual application should create new
> concrete classes instead of just using the builtins?  No.
>
> When trying to understand (learn about) the text/binary distinction, I
> do recommend pretending that they are represented by separate classes.
>  Limits on the values in a bytearray are NOT the primary reason for
> this; the primary reason is that operations like the literal
> representation or the capitalize method are arbitrary nonsense unless
> the data happens to be representing ASCII.
>
> sound_sample.capitalize()  -- syntactically valid, but semantic garbage
> header.capitalize() -- OK, which implies that data is an instance
> of something more specific than bytes.
>
> Would I recommend subclassing dict if I wanted to constrain the key
> types?  Yes -- though MutableMapping (fewer gates to guard) or the
> upcoming TransformDict would probably be better still.
>
> The existing dict implementation itself effectively uses (hidden,
> quasi-)subclasses to restrict types of keys strictly for efficiency.
> (lookdict* variants)
>
> -jJ
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

2014-01-14 Thread Greg Ewing

Steven D'Aprano wrote:


I don't think mixing bytes and strings makes good semantic sense.


It's not about mixing bytes and text -- it's about
writing polymorphic code that will work on either
bytes *or* text. Not both at the same time.

If we had quantum computers, this would be easy
to solve: asciistr would be of type
str/sqrt(2) + bytes/sqrt(2), and everything would
work out fine. :-)

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Larry Hastings

On 01/14/2014 12:22 PM, Larry Hastings wrote:


On 01/11/2014 07:35 PM, Larry Hastings wrote:

I've uploaded a prototype here:

https://bitbucket.org/larry/python-clinic-buffer



[...] I've created a new repository containing the "concrete examples 
of the various options" that Barry proposed above.  You may find it here:


https://bitbucket.org/larry/clinic-buffer-samples/src



I've added a fourth feature to the prototype:

set line_prefix

lets you set a string that is prepended to every line of code generated 
by Clinic.  Documentation is in the text file in the root.


I also updated the clinic-buffer-samples repository to match. There's 
now a "prefixes" subdirectory, with copies of all the samples adding a 
per-line prefix of "/*clinic*/ ".


Does that make Clinic any easier to swallow for anybody?

Cheers,


//arry/
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing Clinic's output

2014-01-14 Thread Meador Inge
On Tue, Jan 14, 2014 at 3:12 PM, Antoine Pitrou  wrote:

On Tue, 14 Jan 2014 12:22:12 -0800
> Larry Hastings  wrote:
> >
> > https://bitbucket.org/larry/clinic-buffer-samples/src
> >
> > In it I converted Modules/_pickle.c four different ways.  There's a
> > README, please read it.
>
> I'm +1 on the sidefile approach. +0 on the various buffer approaches.
> -0.5 on the current "sprinkled everywhere" approach.
>

After converting a few modules, I feel about the same.  The sprinkling does
clutter
the file.  Although, I do wonder if we can simplify things a bit for the
"sideline" file
by using macros and headers.  You could write the original definition like:

  /*[clinic input begin]
  _pickle.PicklerMemoProxy.copy

self: PicklerMemoProxyObject

  Copy the memo to a new object.
  [clinic input end]*/
  static PyObject *
  _PICKLE_PICKLERMEMOPROXY_COPY(PyObject *self, PyObject
*Py_UNUSED(ignored))
  {
  ...
  }

and then generate a header like:

  PyDoc_STRVAR(_pickle_PicklerMemoProxy_copy__doc__,
  "copy()\n"
  "Copy the memo to a new object.");

  #define _PICKLE_PICKLERMEMOPROXY_COPY_METHODDEF\
  {"copy", (PyCFunction)_pickle_PicklerMemoProxy_copy, METH_NOARGS,
_pickle_PicklerMemoProxy_copy__doc__},

  static PyObject *
  _pickle_PicklerMemoProxy_copy_impl(PicklerMemoProxyObject *self);

  #define _PICKLE_PICKLERMEMOPROXY_COPY(a, b) \
  _pickle_PicklerMemoProxy_copy(PyObject *self, PyObject
*Py_UNUSED(ignored)) \
  { \
  PyObject *return_value = NULL; \
 \
  return_value =
_pickle_PicklerMemoProxy_copy_impl((PicklerMemoProxyObject *)self); \
 \
  return return_value; \
  } \
\
  static PyObject * \
  _pickle_PicklerMemoProxy_copy_impl(PicklerMemoProxyObject *self) \

This way the docstring, method def, and argument parsing code is out of the
way, but
you still retain the helpful comments in the implementation file.  I am
pretty sure this
gets around the "where do I inject the side file part" too.  You also don't
have to do
much more editing than the original scheme: write the clinic comment,
#iinclude a
header, and then apply the macro.

That being said, this is somewhat half baked and some folks don't like
macros.  I just
wanted to throw it out there since it seems like a reasonable compromise.

FWIW, I have worked on several large programs that generate C header and
implementation
files on the side and it has never bothered me that much.  Well, unless,
something goes
wrong :-)

-- 
# Meador
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   >