date:20140117

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Eric Snow

On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith  wrote:
> For the first iteration of bytes.format(), I think we should just
> support the exact types of int, float, and bytes. It will call the
> type's__format__ (with the object as "self") and encode the result to
> ASCII. For the stated use case of 2.x compatibility, I suspect this will
> cover > 90% of the uses in real code. If we find there are cases where
> real code needs additional types supported, we can consider adding
> __format_ascii__ (or whatever name we cook up).

+1

-eric
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Eric Snow

On Thu, Jan 16, 2014 at 3:06 PM, Jan Kaliszewski  wrote:
> I'd treat the format()+.__format__()+str.format()-"ecosystem" as
> a nice text-data-oriented, *complete* Py3k feature, backported to
> Python 2 to share the benefits of the feature with it as well as
> to make the 2-to-3 transition a bit easier.
>
> IMHO, the PEP-3101's note cited above just describes a workaround
> over the flaws of the Py2's obsolete text model.  Moving such
> complications into Py3k would make the feature (and especially the
> ability to implement your own .__format__()) harder to understand
> and make use of -- for little profit.
>
> Such a move is not needed for compatibility.  And, IMHO, the
> format()/__format__()/str.format()-matter is all about nice and
> flexible *text* formatting, not about binary data interpolation.

[disclaimer: I personally don't have many use cases for any bytes formatting.]

Yet there is still a strong symmetry between str and bytes that makes
bytes easier to use.  I don't always use formatting, but when I do I
use .format(). :)

never-been-a-fan-of-mod-formatting-ly yours,

-eric
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] AC Derby and accepting None for optional positional arguments

2014-01-17 Thread Steven D'Aprano

On Thu, Jan 16, 2014 at 01:08:47PM -0800, Ryan Smith-Roberts wrote:

> socket.getservbyname(servicename[, protocolname])
> 
> This is not an inspectable signature, since pure Python does not support
> bracketed arguments. To make it inspectable, we must give protocolname a
> (valid Python) default value:
> 
> socket.getservbyname(servicename, protocolname=None)
> 
> Unfortunately, while useful and inspectable, this signature is not correct.
> For a pure Python function, passing None for protocolname is the same as
> omitting it. However, if you pass None to getservbyname(), it raises a
> TypeError. So, we have these three options:
> 
> 1) Don't give getservbyname() an inspectable signature.
> 2) Lie to the user about the acceptability of None.
> 3) Alter the semantics of getservbyname() to treat None as equivalent to
> omitting protocolname.
> 
> Obviously #2 is out. My question: is #3 ever acceptable? It's a real
> change, as it breaks any code that relies on the TypeError exception.

The answer seems straightforward to me: it should be treated as any 
other change of behaviour, and judged on a case-by-case basis. I 
think the bug tracker is the right place to ask. Since it's not a 
bug fix, it may be able to be changed, but not lightly, and not in a 
bug-fix release. 
 
The fact that the motivation for the behaviour change is Argument Clinic 
should not change the decision, as far as I can see. Would a feature 
request "Allow None as default protocolname" be accepted?


-- 
Steven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 updates

2014-01-17 Thread Stephen J. Turnbull

Steven D'Aprano writes:
 > On Fri, Jan 17, 2014 at 11:19:44AM +0900, Stephen J. Turnbull wrote:

 > > "ASCII compatible" is a technical term in encodings, which means
 > > "bytes in the range 0-127 always have ASCII coded character semantics,
 > > do what you like with bytes in the range 128-255."[1]
 > 
 > Examples, and counter-examples, may help. Let me see if I have got this 
 > right: an ASCII-compatible encoding may be an ASCII-superset like 
 > Latin-1, or a variable-width encoding like UTF-8 where the ASCII chars 
 > are encoded to the same bytes as ASCII, and non-ASCII chars are not. A 
 > counter-example would be UTF-16, or some of the Asian encodings like 
 > Big5. Am I right so far?

All correct.

 > But Nick isn't talking about an encoding, he's talking about a data 
 > format. I think that an ASCII-compatible format means one where (in at 
 > least *some* parts of the data) bytes between 0 and 127 have the same 
 > meaning as in ASCII, e.g. byte 84 is to be interpreted as ASCII 
 > character "T". This doesn't mean that every byte 84 means "T", only that 
 > some of them do -- hopefully a well-defined sections of the data. Below, 
 > you introduce the term "ASCII segments" for these.

Yes, except that I believe Nick, as well as the "file-and-wire guys",
strengthen "hopefully well-defined" to just "well-defined".

 > >  are designed for use *only* on bytes
 > > that are ASCII segments; use on other data is likely to cause
 > > hard-to-diagnose corruption.
 > 
 > An example: if you have the byte b'\x63', calling upper() on that will 
 > return b'\x43'. That is only meaningful if the byte is intended as the 
 > ASCII character "c".

Good example.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Migration from Python 2.7 and bytes formatting

2014-01-17 Thread Neil Schemenauer

As I see it, there are two separate goals in adding formatting
methods to bytes.  One is to make it easier to write new programs
that manipulate byte data.  Another is to make it easier to upgrade
Python 2.x programs to Python 3.x.  Here is an idea to better
address these separate goals.

Introduce %-interpolation for bytes.  Support the following format
codes to aid in writing new code:

%b: insert arbitrary bytes (via __bytes__ or Py_buffer)

%[dox]: insert an integer, encoded as ASCII

%[eEfFgG]: insert a float, encoded as ASCII

%a: call ascii(), insert result

Add a command-line option, disabled by default, that enables the
following format codes:

%s: if the object has __bytes__ or Py_buffer then insert it.
Otherwise, call str() and encode with the 'ascii' codec

%r: call repr(), encode with the 'ascii' codec

%[iuX]: as per Python 2.x, for backwards compatibility

Introducing these extra codes and the command-line option will
provide a more gradual upgrade path.  The next step in porting could
be to examine each %s inside bytes literals and decide if they
should either be converted to %b or if the literal should be
converted to a unicode literal.  Any %r codes could likely be safely
changed to %a.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Nick Coghlan

On 17 Jan 2014 18:03, "Eric Snow"  wrote:
>
> On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith 
wrote:
> > For the first iteration of bytes.format(), I think we should just
> > support the exact types of int, float, and bytes. It will call the
> > type's__format__ (with the object as "self") and encode the result to
> > ASCII. For the stated use case of 2.x compatibility, I suspect this will
> > cover > 90% of the uses in real code. If we find there are cases where
> > real code needs additional types supported, we can consider adding
> > __format_ascii__ (or whatever name we cook up).
>
> +1

Please don't make me learn the limitations of a new mini language without a
really good reason.

For the sake of argument, assume we have a Python 3.5 with bytes.__mod__
restored roughly as described in PEP 461. *Given* that feature set, what is
the rationale for *adding* bytes.format? What new capabilities will it
provide that aren't already covered by printf-style interpolation directly
to bytes or text formatting followed by encoding the result?

Cheers,
Nick.

>
> -eric
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Eric V. Smith

On 1/17/2014 6:42 AM, Nick Coghlan wrote:
> 
> On 17 Jan 2014 18:03, "Eric Snow"  > wrote:
>>
>> On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith  > wrote:
>> > For the first iteration of bytes.format(), I think we should just
>> > support the exact types of int, float, and bytes. It will call the
>> > type's__format__ (with the object as "self") and encode the result to
>> > ASCII. For the stated use case of 2.x compatibility, I suspect this will
>> > cover > 90% of the uses in real code. If we find there are cases where
>> > real code needs additional types supported, we can consider adding
>> > __format_ascii__ (or whatever name we cook up).
>>
>> +1
> 
> Please don't make me learn the limitations of a new mini language
> without a really good reason.
> 
> For the sake of argument, assume we have a Python 3.5 with bytes.__mod__
> restored roughly as described in PEP 461. *Given* that feature set, what
> is the rationale for *adding* bytes.format? What new capabilities will
> it provide that aren't already covered by printf-style interpolation
> directly to bytes or text formatting followed by encoding the result?

The only reason to add any of this, in my mind, is to ease porting of
2.x code. If my proposal covers most of the cases of b''.format() that
exist in 2.x code that wants to move to 3.5, then I think it's worth
doing. Is there any such code that's blocked from porting by the lack of
b''.format() that supports bytes, int, and float? I don't know. I
concede that it's unlikely.

IF this were a feature that we were going to add to 3.5 on its own
merits, I think we add __format_ascii__ and make the whole thing
extensible. Is there any new code that's blocked from being written by
missing b"".format()? I don't know that, either.

Eric.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Eric V. Smith

On 01/17/2014 07:34 AM, Eric V. Smith wrote:
> On 1/17/2014 6:42 AM, Nick Coghlan wrote:
>>
>> On 17 Jan 2014 18:03, "Eric Snow" > > wrote:
>>>
>>> On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith > > wrote:
 For the first iteration of bytes.format(), I think we should just
 support the exact types of int, float, and bytes. It will call the
 type's__format__ (with the object as "self") and encode the result to
 ASCII. For the stated use case of 2.x compatibility, I suspect this will
 cover > 90% of the uses in real code. If we find there are cases where
 real code needs additional types supported, we can consider adding
 __format_ascii__ (or whatever name we cook up).
>>>
>>> +1
>>
>> Please don't make me learn the limitations of a new mini language
>> without a really good reason.
>>
>> For the sake of argument, assume we have a Python 3.5 with bytes.__mod__
>> restored roughly as described in PEP 461. *Given* that feature set, what
>> is the rationale for *adding* bytes.format? What new capabilities will
>> it provide that aren't already covered by printf-style interpolation
>> directly to bytes or text formatting followed by encoding the result?
> 
> The only reason to add any of this, in my mind, is to ease porting of
> 2.x code. If my proposal covers most of the cases of b''.format() that
> exist in 2.x code that wants to move to 3.5, then I think it's worth
> doing. Is there any such code that's blocked from porting by the lack of
> b''.format() that supports bytes, int, and float? I don't know. I
> concede that it's unlikely.
> 
> IF this were a feature that we were going to add to 3.5 on its own
> merits, I think we add __format_ascii__ and make the whole thing
> extensible. Is there any new code that's blocked from being written by
> missing b"".format()? I don't know that, either.

Following up, I think this leaves us with 3 choices:

1. Do not implement bytes.format(). We tell any 2.x code that's written
to use str.format() to switch to %-formatting for their common code base.

2. Add the simplistic version of bytes.format() that I describe above,
restricted to accepting bytes, int, and float (and no subclasses). Some
2.x code will work, some will need to change to %-formatting.

3. Add bytes.format() and the __format_ascii__ protocol. We might want
to also add a format_ascii() builtin, to match __format__ and format().
This would require the least change to 2.x code that uses str.format()
and wants to move to bytes.format(), but would require some work on the
3.x side.

I'd advocate 1 or 2.

Eric.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Mark Lawrence


On 17/01/2014 14:50, Eric V. Smith wrote:

On 01/17/2014 07:34 AM, Eric V. Smith wrote:

On 1/17/2014 6:42 AM, Nick Coghlan wrote:


On 17 Jan 2014 18:03, "Eric Snow" mailto:[email protected]>> wrote:


On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith 
> wrote:

For the first iteration of bytes.format(), I think we should just
support the exact types of int, float, and bytes. It will call the
type's__format__ (with the object as "self") and encode the result to
ASCII. For the stated use case of 2.x compatibility, I suspect this will
cover > 90% of the uses in real code. If we find there are cases where
real code needs additional types supported, we can consider adding
__format_ascii__ (or whatever name we cook up).


+1


Please don't make me learn the limitations of a new mini language
without a really good reason.

For the sake of argument, assume we have a Python 3.5 with bytes.__mod__
restored roughly as described in PEP 461. *Given* that feature set, what
is the rationale for *adding* bytes.format? What new capabilities will
it provide that aren't already covered by printf-style interpolation
directly to bytes or text formatting followed by encoding the result?


The only reason to add any of this, in my mind, is to ease porting of
2.x code. If my proposal covers most of the cases of b''.format() that
exist in 2.x code that wants to move to 3.5, then I think it's worth
doing. Is there any such code that's blocked from porting by the lack of
b''.format() that supports bytes, int, and float? I don't know. I
concede that it's unlikely.

IF this were a feature that we were going to add to 3.5 on its own
merits, I think we add __format_ascii__ and make the whole thing
extensible. Is there any new code that's blocked from being written by
missing b"".format()? I don't know that, either.


Following up, I think this leaves us with 3 choices:

1. Do not implement bytes.format(). We tell any 2.x code that's written
to use str.format() to switch to %-formatting for their common code base.

2. Add the simplistic version of bytes.format() that I describe above,
restricted to accepting bytes, int, and float (and no subclasses). Some
2.x code will work, some will need to change to %-formatting.

3. Add bytes.format() and the __format_ascii__ protocol. We might want
to also add a format_ascii() builtin, to match __format__ and format().
This would require the least change to 2.x code that uses str.format()
and wants to move to bytes.format(), but would require some work on the
3.x side.

I'd advocate 1 or 2.

Eric.



For both options 1 and 2 surely you cannot be suggesting that after 
people have written 2.x code to use format() as %f formatting is to be 
deprecated, they now have to change the code back to the way they may 
well have written it in the first place?


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Eric V. Smith

On 01/17/2014 10:15 AM, Mark Lawrence wrote:
> On 17/01/2014 14:50, Eric V. Smith wrote:
>> On 01/17/2014 07:34 AM, Eric V. Smith wrote:
>>> On 1/17/2014 6:42 AM, Nick Coghlan wrote:

 On 17 Jan 2014 18:03, "Eric Snow" >>> > wrote:
>
> On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith >>> > wrote:
>> For the first iteration of bytes.format(), I think we should just
>> support the exact types of int, float, and bytes. It will call the
>> type's__format__ (with the object as "self") and encode the result to
>> ASCII. For the stated use case of 2.x compatibility, I suspect
>> this will
>> cover > 90% of the uses in real code. If we find there are cases
>> where
>> real code needs additional types supported, we can consider adding
>> __format_ascii__ (or whatever name we cook up).
>
> +1

 Please don't make me learn the limitations of a new mini language
 without a really good reason.

 For the sake of argument, assume we have a Python 3.5 with
 bytes.__mod__
 restored roughly as described in PEP 461. *Given* that feature set,
 what
 is the rationale for *adding* bytes.format? What new capabilities will
 it provide that aren't already covered by printf-style interpolation
 directly to bytes or text formatting followed by encoding the result?
>>>
>>> The only reason to add any of this, in my mind, is to ease porting of
>>> 2.x code. If my proposal covers most of the cases of b''.format() that
>>> exist in 2.x code that wants to move to 3.5, then I think it's worth
>>> doing. Is there any such code that's blocked from porting by the lack of
>>> b''.format() that supports bytes, int, and float? I don't know. I
>>> concede that it's unlikely.
>>>
>>> IF this were a feature that we were going to add to 3.5 on its own
>>> merits, I think we add __format_ascii__ and make the whole thing
>>> extensible. Is there any new code that's blocked from being written by
>>> missing b"".format()? I don't know that, either.
>>
>> Following up, I think this leaves us with 3 choices:
>>
>> 1. Do not implement bytes.format(). We tell any 2.x code that's written
>> to use str.format() to switch to %-formatting for their common code base.
>>
>> 2. Add the simplistic version of bytes.format() that I describe above,
>> restricted to accepting bytes, int, and float (and no subclasses). Some
>> 2.x code will work, some will need to change to %-formatting.
>>
>> 3. Add bytes.format() and the __format_ascii__ protocol. We might want
>> to also add a format_ascii() builtin, to match __format__ and format().
>> This would require the least change to 2.x code that uses str.format()
>> and wants to move to bytes.format(), but would require some work on the
>> 3.x side.
>>
>> I'd advocate 1 or 2.
>>
>> Eric.
>>
> 
> For both options 1 and 2 surely you cannot be suggesting that after
> people have written 2.x code to use format() as %f formatting is to be
> deprecated, they now have to change the code back to the way they may
> well have written it in the first place?
> 

That would be part of it, yes. Otherwise you need #3.

This is all assuming we've ruled out an option 4, because of the
exceptions raised depending on what __format__ does:

4. Add bytes.format(), have it convert the format specifier to str
(unicode), call __format__ and encode the result back to ASCII. Accept
that there will be data-driven exceptions depending on the result of the
__format__ call.

I'm open to other ideas.

Eric.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Ethan Furman


On 01/17/2014 07:15 AM, Mark Lawrence wrote:


For both options 1 and 2 surely you cannot be suggesting that
 after people have written 2.x code to use format() as %f
formatting is to be deprecated


%f formatting is not deprecated, and will not be in 3.x's lifetime.

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Eric V. Smith

On 01/17/2014 10:24 AM, Eric V. Smith wrote:
> On 01/17/2014 10:15 AM, Mark Lawrence wrote:
>> On 17/01/2014 14:50, Eric V. Smith wrote:
>>> On 01/17/2014 07:34 AM, Eric V. Smith wrote:
 On 1/17/2014 6:42 AM, Nick Coghlan wrote:
>
> On 17 Jan 2014 18:03, "Eric Snow"  > wrote:
>>
>> On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith  > wrote:
>>> For the first iteration of bytes.format(), I think we should just
>>> support the exact types of int, float, and bytes. It will call the
>>> type's__format__ (with the object as "self") and encode the result to
>>> ASCII. For the stated use case of 2.x compatibility, I suspect
>>> this will
>>> cover > 90% of the uses in real code. If we find there are cases
>>> where
>>> real code needs additional types supported, we can consider adding
>>> __format_ascii__ (or whatever name we cook up).
>>
>> +1
>
> Please don't make me learn the limitations of a new mini language
> without a really good reason.
>
> For the sake of argument, assume we have a Python 3.5 with
> bytes.__mod__
> restored roughly as described in PEP 461. *Given* that feature set,
> what
> is the rationale for *adding* bytes.format? What new capabilities will
> it provide that aren't already covered by printf-style interpolation
> directly to bytes or text formatting followed by encoding the result?

 The only reason to add any of this, in my mind, is to ease porting of
 2.x code. If my proposal covers most of the cases of b''.format() that
 exist in 2.x code that wants to move to 3.5, then I think it's worth
 doing. Is there any such code that's blocked from porting by the lack of
 b''.format() that supports bytes, int, and float? I don't know. I
 concede that it's unlikely.

 IF this were a feature that we were going to add to 3.5 on its own
 merits, I think we add __format_ascii__ and make the whole thing
 extensible. Is there any new code that's blocked from being written by
 missing b"".format()? I don't know that, either.
>>>
>>> Following up, I think this leaves us with 3 choices:
>>>
>>> 1. Do not implement bytes.format(). We tell any 2.x code that's written
>>> to use str.format() to switch to %-formatting for their common code base.
>>>
>>> 2. Add the simplistic version of bytes.format() that I describe above,
>>> restricted to accepting bytes, int, and float (and no subclasses). Some
>>> 2.x code will work, some will need to change to %-formatting.
>>>
>>> 3. Add bytes.format() and the __format_ascii__ protocol. We might want
>>> to also add a format_ascii() builtin, to match __format__ and format().
>>> This would require the least change to 2.x code that uses str.format()
>>> and wants to move to bytes.format(), but would require some work on the
>>> 3.x side.

For #3, hopefully this "additional work" on the 3.x side would just be
to add, to each class where you already have a custom __format__ used
for b''.format(), code like:

def __format_ascii__(self, fmt):
return self.__format__(fmt.decode()).encode('ascii')

That is, we're pushing the possibility of having to deal with an
encoding exception off to the type, instead of having it live in
bytes.format().

And to agree with Ethan: %-formatting isn't deprecated.

Eric.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Mark Lawrence


On 17/01/2014 15:41, Ethan Furman wrote:

On 01/17/2014 07:15 AM, Mark Lawrence wrote:


For both options 1 and 2 surely you cannot be suggesting that
 after people have written 2.x code to use format() as %f
formatting is to be deprecated


%f formatting is not deprecated, and will not be in 3.x's lifetime.

--
~Ethan~


I'm sorry, I got the above wrong, I should have said "was to be 
deprecated" :(


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Brett Cannon

On Fri, Jan 17, 2014 at 9:50 AM, Eric V. Smith  wrote:

> On 01/17/2014 07:34 AM, Eric V. Smith wrote:
> > On 1/17/2014 6:42 AM, Nick Coghlan wrote:
> >>
> >> On 17 Jan 2014 18:03, "Eric Snow"  >> > wrote:
> >>>
> >>> On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith  >> > wrote:
>  For the first iteration of bytes.format(), I think we should just
>  support the exact types of int, float, and bytes. It will call the
>  type's__format__ (with the object as "self") and encode the result to
>  ASCII. For the stated use case of 2.x compatibility, I suspect this
> will
>  cover > 90% of the uses in real code. If we find there are cases where
>  real code needs additional types supported, we can consider adding
>  __format_ascii__ (or whatever name we cook up).
> >>>
> >>> +1
> >>
> >> Please don't make me learn the limitations of a new mini language
> >> without a really good reason.
> >>
> >> For the sake of argument, assume we have a Python 3.5 with bytes.__mod__
> >> restored roughly as described in PEP 461. *Given* that feature set, what
> >> is the rationale for *adding* bytes.format? What new capabilities will
> >> it provide that aren't already covered by printf-style interpolation
> >> directly to bytes or text formatting followed by encoding the result?
> >
> > The only reason to add any of this, in my mind, is to ease porting of
> > 2.x code. If my proposal covers most of the cases of b''.format() that
> > exist in 2.x code that wants to move to 3.5, then I think it's worth
> > doing. Is there any such code that's blocked from porting by the lack of
> > b''.format() that supports bytes, int, and float? I don't know. I
> > concede that it's unlikely.
> >
> > IF this were a feature that we were going to add to 3.5 on its own
> > merits, I think we add __format_ascii__ and make the whole thing
> > extensible. Is there any new code that's blocked from being written by
> > missing b"".format()? I don't know that, either.
>
> Following up, I think this leaves us with 3 choices:
>
> 1. Do not implement bytes.format(). We tell any 2.x code that's written
> to use str.format() to switch to %-formatting for their common code base.
>

+1

I would rephrase it to "switch to %-formatting for bytes usage for their
common code base". If they are working with actual text then using
str.format() still works (and is actually nicer to use IMO). It actually
might make the str/bytes relationship even clearer, especially if we start
to promote that str.format() is for text and %-formatting is for bytes.


>
> 2. Add the simplistic version of bytes.format() that I describe above,
> restricted to accepting bytes, int, and float (and no subclasses). Some
> 2.x code will work, some will need to change to %-formatting.
>

-1

I am still not comfortable with the special-casing by type for
bytes.format().


>
> 3. Add bytes.format() and the __format_ascii__ protocol. We might want
> to also add a format_ascii() builtin, to match __format__ and format().
> This would require the least change to 2.x code that uses str.format()
> and wants to move to bytes.format(), but would require some work on the
> 3.x side.
>

+0

Would allow for easy porting and it's general enough, but I don't know if
working with bytes really requires this much beyond supporting the porting
story.

I'm still +1 on PEP 460 for bytes.format() as a nice way to simplify basic
bytes usage in Python 3, but if that's not accepted then I say just drop
bytes.format() entirely and let %-formatting be the way people do Python
2/3 bytes work (if they are not willing to build it up from scratch like
they already can do).

-Brett
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Paul Moore

On 17 January 2014 15:50, Eric V. Smith  wrote:
> For #3, hopefully this "additional work" on the 3.x side would just be
> to add, to each class where you already have a custom __format__ used
> for b''.format(), code like:
>
> def __format_ascii__(self, fmt):
> return self.__format__(fmt.decode()).encode('ascii')

For me, the big cost would seem to be in the necessary documentation,
explaining the new special method in the language reference,
explaining the 2 different forms of format() in the built in types
docs. And the conceptual overhead of another special method for people
to be aware of. If I implement my own number subclass, do I need to
implement __format_ascii__?

My gut feeling is that we simply don't implement format() for bytes. I
don't see sufficient benefit, if %-formatting is available.

Paul.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Barry Warsaw

On Jan 17, 2014, at 11:00 AM, Brett Cannon wrote:

>I would rephrase it to "switch to %-formatting for bytes usage for their
>common code base".

-1.  %-formatting is so neanderthal. :)

-Barry
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Nick Coghlan

On 18 Jan 2014 02:08, "Paul Moore"  wrote:
>
> On 17 January 2014 15:50, Eric V. Smith  wrote:
> > For #3, hopefully this "additional work" on the 3.x side would just be
> > to add, to each class where you already have a custom __format__ used
> > for b''.format(), code like:
> >
> > def __format_ascii__(self, fmt):
> > return self.__format__(fmt.decode()).encode('ascii')
>
> For me, the big cost would seem to be in the necessary documentation,
> explaining the new special method in the language reference,
> explaining the 2 different forms of format() in the built in types
> docs. And the conceptual overhead of another special method for people
> to be aware of. If I implement my own number subclass, do I need to
> implement __format_ascii__?
>
> My gut feeling is that we simply don't implement format() for bytes. I
> don't see sufficient benefit, if %-formatting is available.

Exactly, it's the documentation problem to explain "when would I recommend
using this over the alternatives?"  that turns me off the idea of general
purpose bytes formatting. printf style covers the use cases we have
identified, and the code bases of immediate interest support 2.5 or earlier
and thus *must* be using printf-style formatting.

Add to that the fact that to maintain the Python 3 text model, we either
have to gut it to the point where it has very few of the benefits the text
version offers printf-style formatting, or else we introduce a whole new
protocol for a feature that we consider so borderline that it took us six
Python 3 releases to add it back to the language.

By contrast, the following model is relatively easy to document:

* printf-style is low level and relatively inflexible, but available for
both text and for ASCII compatible segments in binary data. The %s
formatting code accepts arbitrary objects (using str) in text mode, but
only buffer exporters and objects with a __bytes__ method in binary mode.

* the format is high level and very flexible, but available only for text -
the result must be explicitly encoded to binary if that is needed.

Cheers,
Nick.

>
> Paul.
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 461 Final?

2014-01-17 Thread Ethan Furman


Here's the text for your reading pleasure.  I'll commit the PEP after I add 
some markup.

Major change:

  - dropped `format` support, just using %-interpolation

Coming soon:

  - Rationale section  ;)


PEP: 461
Title: Adding % formatting to bytes
Version: $Revision$
Last-Modified: $Date$
Author: Ethan Furman 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-01-13
Python-Version: 3.5
Post-History: 2014-01-14, 2014-01-15, 2014-01-17
Resolution:


Abstract


This PEP proposes adding % formatting operations similar to Python 2's str type
to bytes [1]_ [2]_.


Overriding Principles
=

In order to avoid the problems of auto-conversion and Unicode exceptions that
could plague Py2 code, all object checking will be done by duck-typing, not by
values contained in a Unicode representation [3]_.


Proposed semantics for bytes formatting
===

%-interpolation
---

All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.)
will be supported, and will work as they do for str, including the
padding, justification and other related modifiers.

Example::

   >>> b'%4x' % 10
   b'   a'

   >>> '%#4x' % 10
   ' 0xa'

   >>> '%04X' % 10
   '000A'

%c will insert a single byte, either from an int in range(256), or from
a bytes argument of length 1, not from a str.

Example:

>>> b'%c' % 48
b'0'

>>> b'%c' % b'a'
b'a'

%s is restricted in what it will accept::

  - input type supports Py_buffer?
use it to collect the necessary bytes

  - input type is something else?
use its __bytes__ method; if there isn't one, raise a TypeError

Examples:

>>> b'%s' % b'abc'
b'abc'

>>> b'%s' % 3.14
Traceback (most recent call last):
...
TypeError: 3.14 has no __bytes__ method

>>> b'%s' % 'hello world!'
Traceback (most recent call last):
...
TypeError: 'hello world' has no __bytes__ method, perhaps you need to 
encode it?

.. note::

   Because the str type does not have a __bytes__ method, attempts to
   directly use 'a string' as a bytes interpolation value will raise an
   exception.  To use 'string' values, they must be encoded or otherwise
   transformed into a bytes sequence::

  'a string'.encode('latin-1')


Numeric Format Codes


To properly handle int and float subclasses, int(), index(), and float()
will be called on the objects intended for (d, i, u), (b, o, x, X), and
(e, E, f, F, g, G).


Unsupported codes
-

%r (which calls __repr__), and %a (which calls ascii() on __repr__) are not
supported.


Proposed variations
===

It was suggested to let %s accept numbers, but since numbers have their own
format codes this idea was discarded.

It has been suggested to use %b for bytes instead of %s.

  - Rejected as %b does not exist in Python 2.x %-interpolation, which is
why we are using %s.

It has been proposed to automatically use .encode('ascii','strict') for str
arguments to %s.

  - Rejected as this would lead to intermittent failures.  Better to have the
operation always fail so the trouble-spot can be correctly fixed.

It has been proposed to have %s return the ascii-encoded repr when the value
is a str  (b'%s' % 'abc'  --> b"'abc'").

  - Rejected as this would lead to hard to debug failures far from the problem
site.  Better to have the operation always fail so the trouble-spot can be
easily fixed.

Originally this PEP also proposed adding format style formatting, but it was
decided that format and its related machinery were all strictly text (aka str)
based, and it was dropped.

Various new special methods were proposed, such as __ascii__, __format_bytes___,
etc.; such methods are not needed at this time, but can be visited again later
if real-world use shows deficiencies with this solution.


Footnotes
=

.. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting
.. [2] neither string.Template, format, nor str.format are under consideration.
.. [3] %c is not an exception as neither of its possible arguments are unicode.


Copyright
=

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Ethan Furman


On 01/16/2014 11:47 PM, Steven D'Aprano wrote:

On Thu, Jan 16, 2014 at 08:23:13AM -0800, Ethan Furman wrote:


As I understand it, str.format will call the object's __format__.  So, for
example, if I say:

   u'the value is: %d' % myNum(17)

then it will be myNum.__format__ that gets called, not int.__format__;


I seem to have missed something, because I am completely confused... Why
are you talking about str.format and then show an example using % instead?


Sorry, PEP 46x fatigue.  :/

It should have been

u'the value is {:d}'.format(myNum(17))

and yes I meant the str type.



%d calls __str__, not __format__. This is in Python 3.3:

py> class MyNum(int):
... def __str__(self):
... print("Calling MyNum.__str__")
... return super().__str__()
... def __format__(self):
... print("Calling MyNum.__format__")
... return super().__format__()
...
py> n = MyNum(17)
py> u"%d" % n
Calling MyNum.__str__
'17'


And that's a bug we fixed in 3.4:

Python 3.4.0b1 (default:172a6bfdd91b+, Jan  5 2014, 06:39:32)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.

--> class myNum(int):
...   def __int__(self):
... return 7
...   def __index__(self):
... return 11
...   def __float__(self):
... return 13.81727
...   def __str__(self):
... print('__str__')
... return '1'
...   def __repr__(self):
... print('__repr__')
... return '2'
...
--> '%d' % myNum()
'0'
--> '%f' % myNum()
'13.817270'


After all, consider:


'%d' % True

'1'

'%s' % True

'True'

So, in fact, on subclasses __str__ should *not* be called to get the integer representation.  First we do a conversion 
to make sure we have an int (or float, or ...), and then we call __str__ on our tried and trusted genuine core type.




The *worst* solution would be to completely ignore MyNum.__str__.
That's a nasty violation of the Principle Of Least Surprise, and will
lead to confusion ("why isn't my class' __str__ method being called?")


Because you asked for a numeric representation, not a string representation [1].

--
~Ethan~


[1] for all the gory details, see:
http://bugs.python.org/issue18780
http://bugs.python.org/issue18738
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Brett Cannon

On Fri, Jan 17, 2014 at 11:49 AM, Ethan Furman  wrote:

> Here's the text for your reading pleasure.  I'll commit the PEP after I
> add some markup.
>
> Major change:
>
>   - dropped `format` support, just using %-interpolation
>
> Coming soon:
>
>   - Rationale section  ;)
>
> 
> 
> PEP: 461
> Title: Adding % formatting to bytes
> Version: $Revision$
> Last-Modified: $Date$
> Author: Ethan Furman 
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 2014-01-13
> Python-Version: 3.5
> Post-History: 2014-01-14, 2014-01-15, 2014-01-17
> Resolution:
>
>
> Abstract
> 
>
> This PEP proposes adding % formatting operations similar to Python 2's str
> type
> to bytes [1]_ [2]_.
>
>
> Overriding Principles
> =
>
> In order to avoid the problems of auto-conversion and Unicode exceptions
> that
> could plague Py2 code, all object checking will be done by duck-typing,
> not by
>

Don't abbreviate; spell out "Python 2".


> values contained in a Unicode representation [3]_.
>
>
> Proposed semantics for bytes formatting
> ===
>
> %-interpolation
> ---
>
> All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.)
> will be supported, and will work as they do for str, including the
> padding, justification and other related modifiers.
>
> Example::
>
>>>> b'%4x' % 10
>b'   a'
>
>>>> '%#4x' % 10
>' 0xa'
>
>>>> '%04X' % 10
>'000A'
>
> %c will insert a single byte, either from an int in range(256), or from
> a bytes argument of length 1, not from a str.
>
> Example:
>
> >>> b'%c' % 48
> b'0'
>
> >>> b'%c' % b'a'
> b'a'
>
> %s is restricted in what it will accept::
>
>   - input type supports Py_buffer?
> use it to collect the necessary bytes
>
>   - input type is something else?
> use its __bytes__ method; if there isn't one, raise a TypeError
>
> Examples:
>
> >>> b'%s' % b'abc'
> b'abc'
>
> >>> b'%s' % 3.14
> Traceback (most recent call last):
> ...
> TypeError: 3.14 has no __bytes__ method
>
> >>> b'%s' % 'hello world!'
> Traceback (most recent call last):
> ...
> TypeError: 'hello world' has no __bytes__ method, perhaps you need to
> encode it?
>
> .. note::
>
>Because the str type does not have a __bytes__ method, attempts to
>directly use 'a string' as a bytes interpolation value will raise an
>exception.  To use 'string' values, they must be encoded or otherwise
>transformed into a bytes sequence::
>
>   'a string'.encode('latin-1')
>
>
> Numeric Format Codes
> 
>
> To properly handle int and float subclasses, int(), index(), and float()
> will be called on the objects intended for (d, i, u), (b, o, x, X), and
> (e, E, f, F, g, G).
>
>
> Unsupported codes
> -
>
> %r (which calls __repr__), and %a (which calls ascii() on __repr__) are not
> supported.
>
>
> Proposed variations
> ===
>
> It was suggested to let %s accept numbers, but since numbers have their own
> format codes this idea was discarded.
>
> It has been suggested to use %b for bytes instead of %s.
>
>   - Rejected as %b does not exist in Python 2.x %-interpolation, which is
> why we are using %s.
>
> It has been proposed to automatically use .encode('ascii','strict') for str
> arguments to %s.
>
>   - Rejected as this would lead to intermittent failures.  Better to have
> the
> operation always fail so the trouble-spot can be correctly fixed.
>
> It has been proposed to have %s return the ascii-encoded repr when the
> value
> is a str  (b'%s' % 'abc'  --> b"'abc'").
>
>   - Rejected as this would lead to hard to debug failures far from the
> problem
> site.  Better to have the operation always fail so the trouble-spot
> can be
> easily fixed.
>
> Originally this PEP also proposed adding format style formatting, but it
> was
>

"format-style"


> decided that format and its related machinery were all strictly text (aka
> str)
> based, and it was dropped.
>

"that the method and"


>
> Various new special methods were proposed, such as __ascii__,
> __format_bytes___,
> etc.; such methods are not needed at this time, but can be visited again
> later
> if real-world use shows deficiencies with this solution.
>
>
> Footnotes
> =
>
> .. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting
> .. [2] neither string.Template, format, nor str.format are under
> consideration.
> .. [3] %c is not an exception as neither of its possible arguments are
> unicode.
>

+1 from me
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Brett Cannon

On Fri, Jan 17, 2014 at 11:16 AM, Barry Warsaw  wrote:

> On Jan 17, 2014, at 11:00 AM, Brett Cannon wrote:
>
> >I would rephrase it to "switch to %-formatting for bytes usage for their
> >common code base".
>
> -1.  %-formatting is so neanderthal. :)
>

Very much so, which is why I'm willing to let it be bastardized in Python
3.5 for the sake of porting but not bytes.format(). =) I'm keeping format()
clean for my nieces and nephew to use; they can just turn their nose up at
%-formatting when they are old enough to program.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Summary of Python tracker Issues

2014-01-17 Thread Python tracker


ACTIVITY SUMMARY (2014-01-10 - 2014-01-17)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open4437 (+28)
  closed 27624 (+44)
  total  32061 (+72)

Open issues with patches: 2012 


Issues opened (47)
==

#14455: plistlib unable to read json and binary plist files
http://bugs.python.org/issue14455  reopened by ronaldoussoren

#20218: Add `pathlib.Path.write` and `pathlib.Path.read`
http://bugs.python.org/issue20218  opened by cool-RR

#20219: ElementTree: allow passing XMLPullParser instance into iterpar
http://bugs.python.org/issue20219  opened by scoder

#20220: TarFile.list() outputs wrong time
http://bugs.python.org/issue20220  opened by serhiy.storchaka

#20221: #define hypot _hypot conflicts with existing definition
http://bugs.python.org/issue20221  opened by tabrezm

#20222: unittest.mock-examples doc uses builtin file which is removed 
http://bugs.python.org/issue20222  opened by naoki

#20223: inspect.signature does not support new functools.partialmethod
http://bugs.python.org/issue20223  opened by yselivanov

#20224: C API docs need a clear "defining custom extension types" sect
http://bugs.python.org/issue20224  opened by ncoghlan

#20227: Argument Clinic: rename arguments in generated C?
http://bugs.python.org/issue20227  opened by georg.brandl

#20230: structseq types should expose _fields
http://bugs.python.org/issue20230  opened by abarnert

#20231: Argument Clinic accepts no-default args after default args
http://bugs.python.org/issue20231  opened by rmsr

#20233: Re-enable buffer API slots for heap types
http://bugs.python.org/issue20233  opened by Benno.Rice

#20237: Ambiguous sentence in document of xml package.
http://bugs.python.org/issue20237  opened by naoki

#20238: Incomplete gzip output with tarfile.open(fileobj=..., mode="w:
http://bugs.python.org/issue20238  opened by vadmium

#20239: Allow repeated deletion of unittest.mock.Mock attributes
http://bugs.python.org/issue20239  opened by michael.foord

#20241: Bad reference to RFC in document of ipaddress?
http://bugs.python.org/issue20241  opened by naoki

#20243: ReadError when open a tarfile for writing
http://bugs.python.org/issue20243  opened by serhiy.storchaka

#20244: Possible resources leak in tarfile.open()
http://bugs.python.org/issue20244  opened by serhiy.storchaka

#20245: Check empty mode in TarFile.*open()
http://bugs.python.org/issue20245  opened by serhiy.storchaka

#20247: Condition._is_owned is wrong
http://bugs.python.org/issue20247  opened by Antony.Lee

#20249: test_posix.test_initgroups fails when running with no suppleme
http://bugs.python.org/issue20249  opened by Rosuav

#20252: Argument Clinic howto: small typo in y# translation
http://bugs.python.org/issue20252  opened by rmsr

#20254: Duplicate bytearray test on test_socket.py
http://bugs.python.org/issue20254  opened by vajrasky

#20256: Argument Clinic: compare signed and unsigned ints
http://bugs.python.org/issue20256  opened by serhiy.storchaka

#20257: test_socket fails if using tipc module and SELinux enabled
http://bugs.python.org/issue20257  opened by vajrasky

#20260: Argument Clinic: add unsigned integers converters
http://bugs.python.org/issue20260  opened by serhiy.storchaka

#20261: Cannot pickle some objects that have a __getattr__()
http://bugs.python.org/issue20261  opened by barry

#20262: Convert some debugging prints in zipfile to warnings
http://bugs.python.org/issue20262  opened by serhiy.storchaka

#20264: Update patchcheck to looks for files with clinic comments
http://bugs.python.org/issue20264  opened by meador.inge

#20265: Bring Doc/using/windows up to date
http://bugs.python.org/issue20265  opened by zach.ware

#20266: Bring Doc/faq/windows up to date
http://bugs.python.org/issue20266  opened by zach.ware

#20267: TemporaryDirectory does not resolve path when created using a 
http://bugs.python.org/issue20267  opened by Antony.Lee

#20269: Inconsistent behavior in pdb when pressing Ctrl-C
http://bugs.python.org/issue20269  opened by xdegaye

#20270: urllib.parse doesn't work with empty port
http://bugs.python.org/issue20270  opened by serhiy.storchaka

#20271: urllib.parse.urlparse() accepts wrong URLs
http://bugs.python.org/issue20271  opened by serhiy.storchaka

#20274: sqlite module has bad argument parsing code, including undefin
http://bugs.python.org/issue20274  opened by larry

#20275: asyncio: remove debug code from BaseEventLoop
http://bugs.python.org/issue20275  opened by yselivanov

#20276: ctypes._dlopen should not force RTLD_NOW
http://bugs.python.org/issue20276  opened by Albert.Zeyer

#20280: add "predicate" to the glossary
http://bugs.python.org/issue20280  opened by flox

#20281: time.strftime %z format specifier is the same as %Z
http://bugs.python.org/issue20281  opened by Mike.Owens

#20282: Argument Clinic: int with boolean default
http://bugs.python.or

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Eric V. Smith

On 01/17/2014 11:58 AM, Brett Cannon wrote:
> 
> 
> 
> On Fri, Jan 17, 2014 at 11:16 AM, Barry Warsaw  > wrote:
> 
> On Jan 17, 2014, at 11:00 AM, Brett Cannon wrote:
> 
> >I would rephrase it to "switch to %-formatting for bytes usage for
> their
> >common code base".
> 
> -1.  %-formatting is so neanderthal. :)
> 
> 
> Very much so, which is why I'm willing to let it be bastardized in
> Python 3.5 for the sake of porting but not bytes.format(). =) I'm
> keeping format() clean for my nieces and nephew to use; they can just
> turn their nose up at %-formatting when they are old enough to program.

Given the problems with implementing it, I'm more than willing to drop
bytes.format() from PEP 461 (not that it's my PEP). But if we think that
%-formatting is neanderthal and will get dropped in the Python 4000
timeframe (that is, someday in the far future), then I think we should
have some advice to give to people who are writing new 3.x code for the
non-porting use-cases addressed by the PEP. I'm specifically thinking of
new code that wants to format some bytes for an on-the-wire ascii-like
protocol.

Is it:
  b'Content-Length: ' + str(47).encode('ascii')
or
  b'Content-Length: {}.format(str(47).encode('ascii'))
or something better?

I think it will look like the above, or involve something like
bytes.format() and __format_ascii__. Or, maybe a library that just
supports a few types (say, bytes, int, and float!).

Eric.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Neil Schemenauer

Ethan Furman  wrote:
> Overriding Principles
>=
>
> In order to avoid the problems of auto-conversion and Unicode exceptions that
> could plague Py2 code, all object checking will be done by duck-typing, not by
> values contained in a Unicode representation [3]_.

I think a longer "Rational" section is justified given the amount of
discussion this feature generated.  Here is a revised version of
what I already suggested:

Rational

A distruptive but useful change introduced in Python 3.0 was the
clean separation of byte strings (i.e. the "bytes" object) from
character strings (i.e. the "str" object).  The benefit is that
character encodings must be explicitly specified and the risk of
corrupting character data is reduced.

Unfortunately, this separation has made writing certain types of
programs more complicated and verbose.  For example, programs
that deal with network protocols often manipulate ASCII encoded
strings or assemble byte strings from fragments.  Since the
"bytes" type does not support string formatting, extra encoding
and decoding between the "str" type is often required.

For simplicity and convenience it is desireable to introduce
formatting methods to "bytes" that allow formatting of
ASCII-encoded character data.  This change would blur the clean
separation of byte strings and character strings.  However, it
is felt that the practical benefits outweigh the purity costs.
The implicit assumption of ASCII-encoding would be limited to
formatting methods.

One source of many problems with the Python 2 Unicode
implementation is the implicit coercion of Unicode character
strings into byte strings using the "ascii" codec.  If the
character strings contain only ASCII characters, all was well.
However, if the string contains a non-ASCII character then
coercion causes an exception.

The combination of implicit coercion and value dependent
failures has proven to be a recipe for hard to debug errors.  A
program may seem to work correctly when tested (e.g. string
input that happened to be ASCII only) but later would fail,
often with a traceback far from the source of the real error.
The formatting methods for bytes() should avoid this problem by
not implicitly encoding data that might fail based on the
content of the data.

I think we can back off on the duck-typing idea.  It's a good Python
principle but I now realize existing %-interpolation doesn't do it.
The numeric format codes coerce to long or float.  

> Unsupported codes
> -
>
> %r (which calls __repr__), and %a (which calls ascii() on __repr__) are not
> supported.

I think %a should be supported.  I imagine it would be quite useful
when dumping debugging output to a bytes stream.  It's easy to
implement and I think the danger for abuse or surprises is small.
It would also help when translating Python 2 code, change %r to %a.

> Proposed variations
>===
>
> It was suggested to let %s accept numbers, but since numbers have their own
> format codes this idea was discarded.
>
> It has been suggested to use %b for bytes instead of %s.
>
>- Rejected as %b does not exist in Python 2.x %-interpolation, which is
>  why we are using %s.

I think we should use %b instead of %s.  In that case, I'm fine with
%b not accepting numbers.  Using %b clearly indicates we are
inserting arbitrary bytes.  It also proves a useful code review step
when translating from Python 2.x.

To ease porting from Python 2.x code, I propose adding a
command-line option that enables %s and %r format codes for bytes
%-interpolation.  I'm going to write a draft PEP (it would depend on
PEP 461 being implemented).

> Originally this PEP also proposed adding format style formatting, but it was
> decided that format and its related machinery were all strictly text (aka str)
> based, and it was dropped.

I would also argue that we should limit the scope of this PEP.  It
has already generated a massive amount of discussion.  Nothing
precludes us from adding support for format() to bytes in the
future, if we decide we want it and how it should work.

> Various new special methods were proposed, such as __ascii__,
> __format_bytes___, etc.; such methods are not needed at this time,
> but can be visited again later if real-world use shows
> deficiencies with this solution.

I agree, new special methods are not needed at this time since
numeric codes do use duck-typing and __bytes__ already exists.

  Neil

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Mark Lawrence


On 17/01/2014 17:46, Neil Schemenauer wrote:


I think we should use %b instead of %s.  In that case, I'm fine with
%b not accepting numbers.  Using %b clearly indicates we are
inserting arbitrary bytes.  It also proves a useful code review step
when translating from Python 2.x.



Using %b could cause problems in the future as b is used in new style 
formatting to mean output numbers in binary, so %B seems to me the 
obvious choice as it's also unused.


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Larry Hastings



On 01/17/2014 09:46 AM, Neil Schemenauer wrote:

 Rational
 


"Rationale".  "Rational" is an adjective, "Rationale" is a noun.

Pedantically yours,


//arry/
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Ethan Furman


On 01/17/2014 09:13 AM, Eric V. Smith wrote:

On 01/17/2014 11:58 AM, Brett Cannon wrote:

On Fri, Jan 17, 2014 at 11:16 AM, Barry Warsaw wrote:

On Jan 17, 2014, at 11:00 AM, Brett Cannon wrote:


I would rephrase it to "switch to %-formatting for bytes usage for
their common code base".


-1.  %-formatting is so neanderthal. :)


Very much so, which is why I'm willing to let it be bastardized in
Python 3.5 for the sake of porting but not bytes.format(). =) I'm
keeping format() clean for my nieces and nephew to use; they can just
turn their nose up at %-formatting when they are old enough to program.


Given the problems with implementing it, I'm more than willing to drop
bytes.format() from PEP 461 (not that it's my PEP). But if we think that
%-formatting is neanderthal and will get dropped in the Python 4000
timeframe


I hope not!


 (that is, someday in the far future), then I think we should
have some advice to give to people who are writing new 3.x code for the
non-porting use-cases addressed by the PEP. I'm specifically thinking of
new code that wants to format some bytes for an on-the-wire ascii-like
protocol.


%-interpolation handles this use case well, format does not.


Is it:
   b'Content-Length: ' + str(47).encode('ascii')
or
   b'Content-Length: {}.format(str(47).encode('ascii'))
or something better?


Ew.  Neither of those look better than

b'Content-Length: %d' % 47

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Glenn Linderman


On 1/17/2014 6:50 AM, Eric V. Smith wrote:

Following up, I think this leaves us with 3 choices:

1. Do not implement bytes.format(). We tell any 2.x code that's written
to use str.format() to switch to %-formatting for their common code base.

2. Add the simplistic version of bytes.format() that I describe above,
restricted to accepting bytes, int, and float (and no subclasses). Some
2.x code will work, some will need to change to %-formatting.

3. Add bytes.format() and the __format_ascii__ protocol. We might want
to also add a format_ascii() builtin, to match __format__ and format().
This would require the least change to 2.x code that uses str.format()
and wants to move to bytes.format(), but would require some work on the
3.x side.

I'd advocate 1 or 2.


Nice summary.

I'd advocate 1 or 3.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Glenn Linderman


On 1/17/2014 7:15 AM, Mark Lawrence wrote:
For both options 1 and 2 surely you cannot be suggesting that after 
people have written 2.x code to use format() as %f formatting is to be 
deprecated, they now have to change the code back to the way they may 
well have written it in the first place?


If they are committed to format(), another option is to operate in the 
Unicode domain, and encode at the end.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Eric V. Smith

On 01/17/2014 02:04 PM, Glenn Linderman wrote:
> On 1/17/2014 7:15 AM, Mark Lawrence wrote:
>> For both options 1 and 2 surely you cannot be suggesting that after
>> people have written 2.x code to use format() as %f formatting is to be
>> deprecated, they now have to change the code back to the way they may
>> well have written it in the first place?
> 
> If they are committed to format(), another option is to operate in the
> Unicode domain, and encode at the end.

Maybe that's the best advice to give. It's better than my earlier
example of field-at-a-time encoding.

Eric.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Glenn Linderman

On 1/17/2014 8:49 AM, Ethan Furman wrote:

%s is restricted in what it will accept::

  - input type supports Py_buffer?
use it to collect the necessary bytes

  - input type is something else?
use its __bytes__ method; if there isn't one, raise a TypeError

Examples:

>>> b'%s' % b'abc'
b'abc'

>>> b'%s' % 3.14
Traceback (most recent call last):
...
TypeError: 3.14 has no __bytes__ method

>>> b'%s' % 'hello world!'
Traceback (most recent call last):
...
TypeError: 'hello world' has no __bytes__ method, perhaps you need 
to encode it?

If you produce a helpful error message for str (re: encoding), might it 
not be appropriate to produce a helpful error message for builtin number 
types (, perhaps you need a numeric format code?)?
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Ethan Furman


On 01/17/2014 11:40 AM, Glenn Linderman wrote:

On 1/17/2014 8:49 AM, Ethan Furman wrote:


>>> b'%s' % 3.14
Traceback (most recent call last):
...
TypeError: 3.14 has no __bytes__ method


If you produce a helpful error message for str (re: encoding), might it not be 
appropriate to produce a helpful error
message for builtin number types (, perhaps you need a numeric format code?)?


Good point!  Done.

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Neil Schemenauer

Mark Lawrence  wrote:
> Using %b could cause problems in the future as b is used in new style 
> formatting to mean output numbers in binary, so %B seems to me the 
> obvious choice as it's also unused.

After updating my patch, I've decided that %s works better.  My
patch implements PEP 461 as proposed with the following additional
features:

- add %a format code, calls PyObject_ASCII on the argument.  I
  see no reason not too add it as a useful debugging feature.

- add -2 command-line option.  When enabled: %s will fallback
  to calling PyObject_Str() after first trying the buffer API
  and __bytes__.  The value will be encoded using strict ASCII
  encoding.  Also, %r is enabled as an alias for %a.

The patch is v4, bugs.python.org/issue20284, still needs more review
and testing.

  Neil

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Migration from Python 2.7 and bytes formatting

2014-01-17 Thread Glenn Linderman


On 1/17/2014 2:49 AM, Neil Schemenauer wrote:

As I see it, there are two separate goals in adding formatting
methods to bytes.  One is to make it easier to write new programs
that manipulate byte data.  Another is to make it easier to upgrade
Python 2.x programs to Python 3.x.  Here is an idea to better
address these separate goals.

Introduce %-interpolation for bytes.  Support the following format
codes to aid in writing new code:

 %b: insert arbitrary bytes (via __bytes__ or Py_buffer)

 %[dox]: insert an integer, encoded as ASCII

 %[eEfFgG]: insert a float, encoded as ASCII

 %a: call ascii(), insert result

Add a command-line option, disabled by default, that enables the
following format codes:

 %s: if the object has __bytes__ or Py_buffer then insert it.
 Otherwise, call str() and encode with the 'ascii' codec

 %r: call repr(), encode with the 'ascii' codec

 %[iuX]: as per Python 2.x, for backwards compatibility

Introducing these extra codes and the command-line option will
provide a more gradual upgrade path.  The next step in porting could
be to examine each %s inside bytes literals and decide if they
should either be converted to %b or if the literal should be
converted to a unicode literal.  Any %r codes could likely be safely
changed to %a.


-1 overall.

Not worth the extra complexity in documentation and command line parameters.

%s, since it cannot be used for strings of characters (str) anyway, 
might as well be used for strings of bytes, and of necessity for 
single-code-base porting, must be usable in that manner.


I would give  +.5 to the idea of supporting %a in Python 3
I would give +.2 for %r as a synonym for %a in Python 3.

%r and %a don't produce fixed-width fields, so are likely used in places 
where the exact length in bytes is flexible, and in ASCII segments of 
the byte stream... supporting them both with the semantics of  %a  might 
be useful.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Migration from Python 2.7 and bytes formatting

2014-01-17 Thread Neil Schemenauer

I've refined this idea a little in my latest PEP 461 patch (issue
20284).  Continuing to use %s instead of introducing %b seems
better.  I've called the commmand-line option -2, it could be used
to enable other similar porting aids.

I'd like to try porting code making use of the -2 feature to see how
helpful it is.  The behavior is partway between Python 2.x laziness
and Python 3.x strictness in terms of specifying encodings.

Python 2.x:

- coerce byte strings to unicode strings to avoid making a
  decision about encoding

- when writing a unicode string to a bytes stream without
  a specified encoding, encode with ASCII.  Blow up with an
  exception if a non-ASCII character is encounted, often far
  from where the real bug is.

Python 3.x:

- refuse to accept unicode strings where bytes are expected,
  require explicit encoding to be preformed

Python 3.x with -2 command-line option:

- when objects are formatted into bytes, immediately
  encode them using strict ASCII encoding.

No code would be considered fully ported to Python 3 unless it can
run without the -2 command line option.

  Neil

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Migration from Python 2.7 and bytes formatting

2014-01-17 Thread Ryan Gonzalez

A command line parameter??

The annoying part would be telling every single user to call Python with a
certain argument and hope they read the README.

If it's a library, out of the question.

If it's a program, well, I hope your users read READMEs.


On Fri, Jan 17, 2014 at 4:49 AM, Neil Schemenauer  wrote:

> As I see it, there are two separate goals in adding formatting
> methods to bytes.  One is to make it easier to write new programs
> that manipulate byte data.  Another is to make it easier to upgrade
> Python 2.x programs to Python 3.x.  Here is an idea to better
> address these separate goals.
>
> Introduce %-interpolation for bytes.  Support the following format
> codes to aid in writing new code:
>
> %b: insert arbitrary bytes (via __bytes__ or Py_buffer)
>
> %[dox]: insert an integer, encoded as ASCII
>
> %[eEfFgG]: insert a float, encoded as ASCII
>
> %a: call ascii(), insert result
>
> Add a command-line option, disabled by default, that enables the
> following format codes:
>
> %s: if the object has __bytes__ or Py_buffer then insert it.
> Otherwise, call str() and encode with the 'ascii' codec
>
> %r: call repr(), encode with the 'ascii' codec
>
> %[iuX]: as per Python 2.x, for backwards compatibility
>
> Introducing these extra codes and the command-line option will
> provide a more gradual upgrade path.  The next step in porting could
> be to examine each %s inside bytes literals and decide if they
> should either be converted to %b or if the literal should be
> converted to a unicode literal.  Any %r codes could likely be safely
> changed to %a.
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com
>



-- 
Ryan
When your hammer is C++, everything begins to look like a thumb.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Terry Reedy


On 1/17/2014 10:15 AM, Mark Lawrence wrote:


For both options 1 and 2 surely you cannot be suggesting that after
people have written 2.x code to use format() as %f formatting is to be
deprecated,


I will not be for at least a decade.


they now have to change the code back to the way they may
well have written it in the first place?


I would suggest that people simply .encode the result if bytes are 
needed in 3.x as well as 2.x. Polyglot code will likely have a 'py3' 
boolean already to make the encoding conditional.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Migration from Python 2.7 and bytes formatting

2014-01-17 Thread Neil Schemenauer

On 2014-01-17, Ryan Gonzalez wrote:
> A command line parameter??

I believe it has to be global flag.  A __future__ statement will not
work.  Probably we should allow the flag to be set with an
environment variable as well.

> The annoying part would be telling every single user to call Python with a
> certain argument and hope they read the README.
> 
> If it's a library, out of the question.
> 
> If it's a program, well, I hope your users read READMEs.

The purpose of the command line parameter is not for end users.  It
is intended to help developers port millions of lines of existing
Python 2.x code.  I'm very sad if Python core developers don't
realize the enormity of the task and don't continue to make efforts
to make it easier.

Regards,

  Neil
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Ethan Furman


On 01/17/2014 08:53 AM, Brett Cannon wrote:


Don't abbreviate; spell out "Python 2".


Fixed.



Originally this PEP also proposed adding format style formatting, but it was


"format-style"


Fixed.



decided that format and its related machinery were all strictly text (aka 
str)
based, and it was dropped.

"that the method and"


Fixed.

Thanks.

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Migration from Python 2.7 and bytes formatting

2014-01-17 Thread Neil Schemenauer

Glenn Linderman  wrote:
> -1 overall.
>
> Not worth the extra complexity in documentation and command line
> parameters.

Really?  It's less than 20 lines of code to implement, probably
similar to document.  With millions maybe billions of lines of
existing Python 2.x code to port, I'm dismayed to hear this
objection.

Time to take a break from python-dev, I've got paying work to do,
programming in Python 2.x.

  Neil

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Migration from Python 2.7 and bytes formatting

2014-01-17 Thread Ryan Gonzalez

Regardless, I still feel the introduction of a switch and all that stuff is
too complicated. I understand you position, since all my applications are
written in Python 2(except 1). However, I don't think this is the best
solution.


On Fri, Jan 17, 2014 at 2:19 PM, Neil Schemenauer  wrote:

> On 2014-01-17, Ryan Gonzalez wrote:
> > A command line parameter??
>
> I believe it has to be global flag.  A __future__ statement will not
> work.  Probably we should allow the flag to be set with an
> environment variable as well.
>
> > The annoying part would be telling every single user to call Python with
> a
> > certain argument and hope they read the README.
> >
> > If it's a library, out of the question.
> >
> > If it's a program, well, I hope your users read READMEs.
>
> The purpose of the command line parameter is not for end users.  It
> is intended to help developers port millions of lines of existing
> Python 2.x code.  I'm very sad if Python core developers don't
> realize the enormity of the task and don't continue to make efforts
> to make it easier.
>
> Regards,
>
>   Neil
>



-- 
Ryan
When your hammer is C++, everything begins to look like a thumb.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Terry Reedy

Responding to two posts at once, as I consider them

On 1/17/2014 11:00 AM, Brett Cannon wrote:

I would rephrase it to "switch to %-formatting for bytes usage for their
common code base". If they are working with actual text then using
str.format() still works (and is actually nicer to use IMO). It actually
might make the str/bytes relationship even clearer, especially if we
start to promote that str.format() is for text and %-formatting is for
bytes.

Good idea, I think: printf % formatting was invented for formatting 
ascii text in bytestrings as it was being output (although sprintf 
allowed not-output). In retrospect, I think we should have introduced 
unicode.format when unicode was introduced in 2.0 and perhap never have 
had unicode % formatting. Or we should have dropped str % instead of 
bytes % in 3.0.

On 1/17/2014 12:13 PM, Eric V. Smith wrote:
> But if we think that %-formatting is neanderthal and will get dropped 
> in the Python 4000 timeframe (that is, someday in the far future),

Some people, such as Martin Loewis, have a different opinion of 
%-formatting and will fight deprecating it *ever*. (I suspect that 
%-format opinions are influenced by one's current relation to C.)

> then I think we should have some advice to give to people who are
> writing new 3.x code for the non-porting use-cases addressed by the
> PEP. I'm specifically thinking of new code that wants to format some 
> bytes for an on-the-wire ascii-like protocol.

If we add %-formatting back in 3.5 for its original purpose, formatting 
ascii in bytes for output, I think we should drop the idea of later 
deprecating it (a few releases later) for that purpose. I think the PEP 
should even say so, that bytes % will remain indefinitely even if str % 
were to be dropped in favor of str.format.

I would consider dropping unicode(now string).__mod__ in favor of 
.format to still be an eventual option, especially if someone were to 
write a converter.

--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 updates

2014-01-17 Thread Chris Barker

For the record, we've got a pretty good thread (not this good, though!)
over on the numpy list about how to untangle the mess that has resulted
from porting text-file-parsing code to py3 (and the underlying issue with
the 'S' data type in numpy...)

One note from the github issue:
"""
 The use of asbytes originates only from the fact that b'%d' % (20,) does
not work.
"""

So yeah PEP 461! (even if too late for numpy...)

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[email protected]
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 updates

2014-01-17 Thread Chris Barker

I hope you didn't mean to take this off-list:
On Fri, Jan 17, 2014 at 2:06 PM, Neil Schemenauer  wrote:

> In gmane.comp.python.devel, you wrote:
> > For the record, we've got a pretty good thread (not this good, though!)
> > over on the numpy list about how to untangle the mess that has resulted
>

> Not sure about your definition of good. ;-)

well, in the sense of "big" anyway...

>  Could you summarize the main points on python-dev?  I'm not feeling up to
> wading through
> another massive thread but I'm quite interested to hear the
> challenges that numpy deals with.

Well, not much new to it, really. But here's a re-cap:

numpy has had an 'S' dtype for a while, which corresponded to the py2
string type (except for being fixed length). So it could auto-convert
to-from python strings... all was good and happy.

Enter py3: what to do? there is no py2 string type anymore. So it was
decided to have the 'S' dtype correspond to the py3 bytes
type. Apparently there was thought of renaming it, but the 'B' and 'b'
type identifiers were already takes, so 'S' was kept.

However, as we all know in this thread, the py3 bytes type is not the same
thing as a py2 string (or py2 bytes, natch), and folks like to use the 'S'
type for text data -- so that is kind of broken in py3.

However, other folks use the 'S' type for binary data, so like (and rely
on) it being mapped to the py3 bytes type. So we are stuck with that.

Given the nature of numpy, and scientific data, there is talk of having a
one-byte-per-char text type in numpy (there is already a unicode type, but
it uses 4-bytes-per-char, as it's key to the numpy data model that all
objects of a given type are the same size.) This would be analogous to the
current multiple precision options for numbers. It would take up less
memory, and would not be able to hold all values. It's not clear what the
level of support is for this right now -- after all, you can do everything
you need to do with the appropriate calls to encode() and decode(), if a
bit awkward.

Meanwhile, back at the ranch -- related, but separate issues
have arisen with the functions that parse text files: numpy.loadtxt and
numpy.genfromtxt. These functions were adapted for py3 just enough to get
things to mostly work, but have some serious limitations when doing
anything with unicode -- and in fact do some weird things with plain ascii
text files if you ask it to create unicode objects, and that is a natural
thing to do (and the "right" thing to do in the Py3 text model) if you do
something like:

arr = loadtxt('a_file_name', dtype=str)

on py3, an str is a py3unicode string, so you get the numpy 'U' datatype
but loadtxt wasn't designed to deal with that, so you can get stuff like:

["b'C:UsersDocumentsProjectmytextfile1.txt'"
 "b'C:UsersDocumentsProjectmytextfile2.txt'"
 "b'C:UsersDocumentsProjectmytextfile3.txt'"]

This was (Presumably, I haven't debugged the code) due to conversion from
bytes to unicode...(I'm still confused about the extra slashes)

And this ascii text -- it gets worse if there is non-ascii text in there.

Anyway, the truth is, this stuff is hard, but it will get at least a touch
easier with PEP 461.

[though to be truthful, I'm not sure why someone put a comment in the issue
tracker about b'%d'%some_num being an issue ... I'm not sure how when we're
going from text to numbers, not the other way around...]

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[email protected]
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 updates

2014-01-17 Thread Eric V. Smith

On 1/17/2014 4:37 PM, Chris Barker wrote:
> For the record, we've got a pretty good thread (not this good, though!)
> over on the numpy list about how to untangle the mess that has resulted
> from porting text-file-parsing code to py3 (and the underlying issue
> with the 'S' data type in numpy...)
> 
> One note from the github issue:
> """
>  The use of asbytes originates only from the fact that b'%d' % (20,)
> does not work.
> """
> 
> So yeah PEP 461! (even if too late for numpy...)

Would they use "(u'%d' % (20,)).encode('ascii')" for that? Just curious
as to what they're planning on doing.

Eric.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Steven D'Aprano

On Fri, Jan 17, 2014 at 08:49:21AM -0800, Ethan Furman wrote:

> Overriding Principles
> =
> 
> In order to avoid the problems of auto-conversion and Unicode exceptions 
> that
> could plague Py2 code, all object checking will be done by duck-typing, not 
> by
> values contained in a Unicode representation [3]_.

I don't understand this paragraph. What does "values contained in a 
Unicode representation" mean?

[...]
> %s is restricted in what it will accept::
> 
>   - input type supports Py_buffer?
> use it to collect the necessary bytes

Can you give some examples of what types support Py_buffer? Presumably 
bytes. Anything else?

>   - input type is something else?
> use its __bytes__ method; if there isn't one, raise a TypeError

I think you should explicitly state that this is a new special method, 
and state which built-in types will grow a __bytes__ method (if any).

> Numeric Format Codes
> 
> 
> To properly handle int and float subclasses, int(), index(), and float()
> will be called on the objects intended for (d, i, u), (b, o, x, X), and
> (e, E, f, F, g, G).

-1 on this idea.

This is a rather large violation of the principle of least surprise, and 
radically different from the behaviour of Python 3 str. In Python 3, 
'%d' interpolation calls the __str__ method, so if you subclass, you can 
get the behaviour you want:

py> class HexInt(int):
... def __str__(self):
... return hex(self)
...
py> "%d" % HexInt(23)
'0x17'

which is exactly what we should expect from a subclass.

You're suggesting that bytes should ignore any custom display 
implemented by subclasses, and implicitly coerce them to the superclass 
int. What is the justification for this? You don't define or even 
describe what you consider "properly handle".

> Unsupported codes
> -
> 
> %r (which calls __repr__), and %a (which calls ascii() on __repr__) are not
> supported.

+1 on not supporting b'%r' (i.e. I agree with the PEP).

Why not support b'%a'? That seems to be a strange thing to prohibit.

Everythng else, well done and thank you.

-- 
Steven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Ethan Furman


On 01/17/2014 05:27 PM, Steven D'Aprano wrote:

On Fri, Jan 17, 2014 at 08:49:21AM -0800, Ethan Furman wrote:


Overriding Principles
=

In order to avoid the problems of auto-conversion and Unicode
exceptions that could plague Py2 code, all object checking will
be done by duck-typing, not by values contained in a Unicode
 representation [3]_.


I don't understand this paragraph. What does "values contained in a
Unicode representation" mean?


Yeah, that is clunky.  I'm trying to convey the idea that we don't want errors based on content, i.e. which characters 
happens to be in a str.




[...]

%s is restricted in what it will accept::

   - input type supports Py_buffer?
 use it to collect the necessary bytes


Can you give some examples of what types support Py_buffer? Presumably
bytes. Anything else?


Anybody?  Otherwise I'll go spelunking in the code.



   - input type is something else?
 use its __bytes__ method; if there isn't one, raise a TypeError


I think you should explicitly state that this is a new special method,
and state which built-in types will grow a __bytes__ method (if any).


It's not new.  I know bytes, str, and numbers /do not/ have __bytes__.



Numeric Format Codes


To properly handle int and float subclasses, int(), index(), and float()
will be called on the objects intended for (d, i, u), (b, o, x, X), and
(e, E, f, F, g, G).



-1 on this idea.

This is a rather large violation of the principle of least surprise, and
radically different from the behaviour of Python 3 str. In Python 3,
'%d' interpolation calls the __str__ method, so if you subclass, you can
get the behaviour you want:


Did you read the bug reports I linked to?  This behavior (which is a bug) has 
already been fixed for Python3.4.

As a quick thought experiment, why does "%d" % True return "1"?



Unsupported codes
-

%r (which calls __repr__), and %a (which calls ascii() on __repr__) are not
supported.


+1 on not supporting b'%r' (i.e. I agree with the PEP).

Why not support b'%a'? That seems to be a strange thing to prohibit.


I'll admit to being somewhat on the fence about %a.

It seems there are two possibilities with %a:

  1) have it be ascii(repr(obj))

  2) have it be str(obj).encode('ascii', 'strict')

(1) seems only useful for debugging, but even then not very -- if you switch from %s to %a you'll no longer see the 
bytes output (although you would get the name of the object, which could be handy);


(2) is (slightly) blurring the lines between text and encoded-ascii;  I would rather see "%s" % text.encode('ascii', 
'strict')"


So we have two possibilities, both can be useful, I don't know which is most 
useful or even most logical.

So I guess I'm still open to arguments.  :)



Everythng else, well done and thank you.


You're welcome!  Thank you to everyone who participated.

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Chris Angelico

On Sat, Jan 18, 2014 at 12:51 PM, Ethan Furman  wrote:
> It seems there are two possibilities with %a:
>
>   1) have it be ascii(repr(obj))

Wouldn't that be redundant? ascii() is already repr()-like.

ChrisA
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Ethan Furman


On 01/17/2014 06:03 PM, Chris Angelico wrote:

On Sat, Jan 18, 2014 at 12:51 PM, Ethan Furman  wrote:

It seems there are two possibilities with %a:

   1) have it be ascii(repr(obj))


Wouldn't that be redundant? ascii() is already repr()-like.


Good point.

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes

2014-01-17 Thread Nick Coghlan

On 18 Jan 2014 06:19, "Terry Reedy"  wrote:
>
> On 1/17/2014 10:15 AM, Mark Lawrence wrote:
>
>> For both options 1 and 2 surely you cannot be suggesting that after
>> people have written 2.x code to use format() as %f formatting is to be
>> deprecated,
>
>
> I will not be for at least a decade.

It will not be deprecated, period. Originally, we thought that the
introduction of the new flexible text formatting system made printf-style
formatting redundant.

After running both in parallel for a while, we learned we were wrong:

- it's far more difficult than we originally anticipated to migrate away
from it to the new text formatting system
- in particular, the lazy interpolation support in the logging module (and
similar systems) has no reasonable migration path
- two different core interpolation systems make it much easier to
interpolate into format strings
- it's a better fit for code which needs to semantically align with C
- it's a useful micro-optimisation
- as the current discussion shows, it's much better suited to the
interpolation of ASCII compatible segments in binary data formats

Do many of the core devs strongly prefer the new formatting system? Yes.
Were we originally planning to deprecate and remove the printf-style
formatting system? Yes. Are there still any plans to do so? No. That's why
we rewrote the relevant docs to always describe it as "mod formatting" or
"printf-style formatting", rather than "legacy" or "old-style". If there
are any instances (or even implications) of the latter left in the official
docs, that's a bug to be fixed.

Perhaps this needs to be a new Q in my Python 3 Q&A, since a lot of people
still seem to have the wrong idea...

Regards,
Nick.

>
>
>> they now have to change the code back to the way they may
>> well have written it in the first place?
>
>
> I would suggest that people simply .encode the result if bytes are needed
in 3.x as well as 2.x. Polyglot code will likely have a 'py3' boolean
already to make the encoding conditional.
>
> --
> Terry Jan Reedy
>
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Nick Coghlan

+1 on the technical spec from me. The rationale needs work, but you already
know that :)

For API consistency, I suggest explicitly noting that bytearray will also
support the operation, generating a bytearray result.

I also suggest introducing the phrase "ASCII compatible segments in binary
formats" somewhere, as the intended use case for *all* the ASCII assuming
methods on the bytes and bytearray types, including this new one.

Cheers,
Nick.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Migration from Python 2.7 and bytes formatting

2014-01-17 Thread Stephen J. Turnbull

Neil Schemenauer writes:

 > I'd like to try porting code making use of the -2 feature to see how
 > helpful it is.  The behavior is partway between Python 2.x laziness
 > and Python 3.x strictness in terms of specifying encodings.
 > 
 > Python 2.x:
 > [...]
 > Python 3.x:
 > [...]

The above are descriptions of current behavior (ie, unchanged by PEPs
460, 461), and this:

 > Python 3.x with -2 command-line option:
 > 
 > - when objects are formatted into bytes, immediately
 >   encode them using strict ASCII encoding.

is the content of this proposal, is that right?

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Stephen J. Turnbull

Nick Coghlan writes:

 > I also suggest introducing the phrase "ASCII compatible
 > segments in binary formats" somewhere,

What is the use case for "ASCII *compatible* segments"?  Can't you
just say "ASCII segments"?

I'm not sure exactly what PEP 461 says at this point, but most of the
discussion prescribes .encode('ascii', errors='strict') for implicit
interpolation of str.  "ASCII compatible" is a term that people
consistently to interpret to include the bytes representation of their
data.  Although the actual rule isn't terribly complex (bytes 0-127
must always have ASCII coded character semantics[1]), AFAIK there are
no use cases for that other than encoded text, ie, interpolating str,
and nobody wants that done leniently in Python 3.

Footnotes: 
[1]  Otherwise you need to analyze the content of data to determine
whether "ASCII-compatible" operations are safe to perform.  Of course
that's possible but it was repeatedly rejected in favor of duck-typing.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461 Final?

2014-01-17 Thread Stefan Behnel

Steven D'Aprano, 18.01.2014 02:27:
> On Fri, Jan 17, 2014 at 08:49:21AM -0800, Ethan Furman wrote:
>> %s is restricted in what it will accept::
>>
>>   - input type supports Py_buffer?
>> use it to collect the necessary bytes
> 
> Can you give some examples of what types support Py_buffer? Presumably 
> bytes. Anything else?

Lots of things: bytes, bytearray, memoryview, array.array, NumPy arrays,
just to name a few.

Basically anything that wants itself to be representable as a chunk of
memory with metadata. It's a very common thing in the Big Data department
(although many people wouldn't know that they're actually heavy users of
this protocol because they just use NumPy and/or Cython and don't look
under the hood).

Stefan

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

54 matches

Mail list logo