[Python-Dev] Format strings, Unicode, and Py2.7: need clarification

2017-05-17 Thread Craig Rodrigues
Hi,

While cleaning up some code during Python 2 -> Python 3 porting,
I switched some code to use str.format(), I found this behavor:

Python 2.7
=
a = "%s" % "hi"
b = "%s" % u"hi"
c = u"%s" % "hi"
d = "{}".format("hi")
e = "{}".format(u"hi")
f = u"{}".format("hi")

type(a) == str
type(b) == unicode
type(c) == unicode
type(d) == str
type(e) == str
type(f) == unicode

My intuition would lead me to believe that type(b)
and type(e) would be the same (unicode), but they are not.
The confusion for me is why is type(e) of type str, and not unicode?

Can someone clarify this for me?

I understand that in Python 3, all these cases are str, so it is not
as big a problem there, but I am trying to keep things working on
Python 2.7.

Thanks.
--
Craig
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Format strings, Unicode, and Py2.7: need clarification

2017-05-17 Thread Steven D'Aprano
On Wed, May 17, 2017 at 02:41:29PM -0700, Craig Rodrigues wrote:

> e = "{}".format(u"hi")
[...]
> type(e) == str

> The confusion for me is why is type(e) of type str, and not unicode?

I think that's one of the reasons why the Python 2.7 string model is (1) 
convenient to those using purely ASCII, but (2) ultimately broken.

You can see why it's broken if you do this:

py> "{}".format(u"hiµ")
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in 
position 2: ordinal not in range(128)


So it tries to encode the Unicode string to ASCII, and if that succeeds, 
format returns a byte str. I'm not sure if that was a deliberate design 
choice for format, or just a side-effect of it calling str() on its 
arguments by default.

I'm not sure if I've answered your question or not. Are you looking for 
justification of this misfeature, or an explanation of the historical 
reasons why it exists, or something else?


(If you're looking for the same behaviour in Python 3 and 2.7, probably 
the best thing you can do is just religiously use unicode strings u'' in 
both. You might try:

from __future__ import unicode_literals

in 2.7, but I'm not sure that's enough.)


-- 
Steve
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Format strings, Unicode, and Py2.7: need clarification

2017-05-17 Thread Eric V. Smith

> On May 17, 2017, at 2:41 PM, Craig Rodrigues  wrote:
> 
> Hi,
> 
> While cleaning up some code during Python 2 -> Python 3 porting,
> I switched some code to use str.format(), I found this behavor:
> 
> Python 2.7
> =
> a = "%s" % "hi"
> b = "%s" % u"hi"
> c = u"%s" % "hi"
> d = "{}".format("hi")
> e = "{}".format(u"hi")
> f = u"{}".format("hi")
> 
> type(a) == str
> type(b) == unicode
> type(c) == unicode
> type(d) == str
> type(e) == str
> type(f) == unicode
> 
> My intuition would lead me to believe that type(b)
> and type(e) would be the same (unicode), but they are not.
> The confusion for me is why is type(e) of type str, and not unicode?
> 
> Can someone clarify this for me?

I think it's because I wanted to return str if possible, and didn't want to 
find out that one of the calls to __format__ returned unicode, and then go back 
and convert all of the previous results to unicode from str.

And, I guess we didn't consider it important enough at the time.

Eric. 

> I understand that in Python 3, all these cases are str, so it is not
> as big a problem there, but I am trying to keep things working on
> Python 2.7.
> 
> Thanks.
> --
> Craig
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40trueblade.com

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Format strings, Unicode, and Py2.7: need clarification

2017-05-17 Thread Hobson Lane
Because `.format()` is a method on an instantiated `str` object in e and so
must return the same type so additional str methods could be stacked on
after it, like `.format(u'hi').decode()`.  Whereas the % string
interpolation is a binary operation, so, like addition, where the more
general type can be used for the return value, analogous to `1 + 2.0`
returning a float.

--Hobson
(503) 974-6274
gh  twtr  li
 g+
 so


On Wed, May 17, 2017 at 2:41 PM, Craig Rodrigues 
wrote:

> Hi,
>
> While cleaning up some code during Python 2 -> Python 3 porting,
> I switched some code to use str.format(), I found this behavor:
>
> Python 2.7
> =
> a = "%s" % "hi"
> b = "%s" % u"hi"
> c = u"%s" % "hi"
> d = "{}".format("hi")
> e = "{}".format(u"hi")
> f = u"{}".format("hi")
>
> type(a) == str
> type(b) == unicode
> type(c) == unicode
> type(d) == str
> type(e) == str
> type(f) == unicode
>
> My intuition would lead me to believe that type(b)
> and type(e) would be the same (unicode), but they are not.
> The confusion for me is why is type(e) of type str, and not unicode?
>
> Can someone clarify this for me?
>
> I understand that in Python 3, all these cases are str, so it is not
> as big a problem there, but I am trying to keep things working on
> Python 2.7.
>
> Thanks.
> --
> Craig
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> hobsonlane%40gmail.com
>
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com