On Jun 20, 1:21 am, Steven D'Aprano <steve +comp.lang.pyt...@pearwood.info> wrote: > On Mon, 18 Jun 2012 07:00:01 -0700, jmfauth wrote: > > On 18 juin, 12:11, Steven D'Aprano <steve > > +comp.lang.pyt...@pearwood.info> wrote: > >> On Mon, 18 Jun 2012 02:30:50 -0700, jmfauth wrote: > >> > On 18 juin, 10:28, Benjamin Kaplan <benjamin.kap...@case.edu> wrote: > >> >> The u prefix is only there to > >> >> make it easier to port a codebase from Python 2 to Python 3. It > >> >> doesn't actually do anything. > > >> > It does. I shew it! > > >> Incorrect. You are assuming that Python 3 input eval's the input like > >> Python 2 does. That is wrong. All you show is that the one-character > >> string "a" is not equal to the four-character string "u'a'", which is > >> hardly a surprise. You wouldn't expect the string "3" to equal the > >> string "int('3')" would you? > > >> -- > >> Steven > > > A string is a string, a "piece of text", period. > > > I do not see why a unicode literal and an (well, I do not know how the > > call it) a "normal class <str>" should behave differently in code source > > or as an answer to an input(). > > They do not. As you showed earlier, in Python 3.3 the literal strings > u'a' and 'a' have the same meaning: both create a one-character string > containing the Unicode letter LOWERCASE-A. > > Note carefully that the quotation marks are not part of the string. They > are delimiters. Python 3.3 allows you to create a string by using > delimiters: > > ' ' > " " > u' ' > u" " > > plus triple-quoted versions of the same. The delimiter is not part of the > string. They are only there to mark the start and end of the string in > source code so that Python can tell the difference between the string "a" > and the variable named "a". > > Note carefully that quotation marks can exist inside strings: > > my_string = "This string has 'quotation marks'." > > The " at the start and end of the string literal are delimiters, not part > of the string, but the internal ' characters *are* part of the string. > > When you read data from a file, or from the keyboard using input(), > Python takes the data and returns a string. You don't need to enter > delimiters, because there is no confusion between a string (all data you > read) and other programming tokens. > > For example: > > py> s = input("Enter a string: ") > Enter a string: 42 > py> print(s, type(s)) > 42 <class 'str'> > > Because what I type is automatically a string, I don't need to enclose it > in quotation marks to distinguish it from the integer 42. > > py> s = input("Enter a string: ") > Enter a string: This string has 'quotation marks'. > py> print(s, type(s)) > This string has 'quotation marks'. <class 'str'> > > What you type is exactly what you get, no more, no less. > > If you type 42, you get the two character string "42" and not the int 42. > > If you type [1, 2, 3], then you get the nine character string "[1, 2, 3]" > and not a list containing integers 1, 2 and 3. > > If you type 3**0.5 then you get the six character string "3**0.5" and not > the float 1.7320508075688772. > > If you type u'a' then you get the four character string "u'a'" and not > the single character 'a'. > > There is nothing new going on here. The behaviour of input() in Python 3, > and raw_input() in Python 2, has not changed. > > > Should a user write two derived functions? > > > input_for_entering_text() > > and > > input_if_you_are_entering_a_text_as_litteral() > > If you, the programmer, want to force the user to write input in Python > syntax, then yes, you have to write a function to do so. input() is very > simple: it just reads strings exactly as typed. It is up to you to > process those strings however you wish. > > -- > Steven
Python 3.3.0a4 (v3.3.0a4:7c51388a3aa7+, May 31 2012, 20:15:21) [MSC v. 1600 32 bit (Intel)] on win32 >>> --- running smidzero.py... ...smidzero has been executed >>> --- input(':') :éléphant 'éléphant' >>> --- input(':') :u'éléphant' 'éléphant' >>> --- input(':') :u'\u00e9l\xe9phant' 'éléphant' >>> --- input(':') :u'\U000000e9léphant' 'éléphant' >>> --- input(':') :\U000000e9léphant 'éléphant' >>> --- >>> --- # this is expected >>> --- input(':') :b'éléphant' "b'éléphant'" >>> --- len(input(':')) :b'éléphant' 11 --- Good news on the ru''/ur'' front: http://bugs.python.org/issue15096 --- Finally I'm just wondering if this unicode_literal reintroduction is not a bad idea. b'these_are_bytes' u'this_is_a_unicode_string' I wrote all my Py2 code in a "unicode mode" since ... Py2.3 (?). jmf -- http://mail.python.org/mailman/listinfo/python-list