Hi Peter Otten
re:
There is no assignment
soup_atag = whatever
but there is one to atag. The whole session should when you omit the
offending line
> atag = soup_atag.a
or insert
soup_atag = soup
before it.
-
On Tue, Nov 25, 2014 at 10:56 PM, Steven D'Aprano
wrote:
> I think this conversation is going nowhere, so it's probably best to end it.
\0
ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Marko Rauhamaa wrote:
> Steven D'Aprano :
>
>> Marko Rauhamaa wrote:
>>
Py3's byte strings are still strings, though.
>>>
>>> Hm. I don't think so. In a plain English sense, maybe, but that kind of
>>> usage can lead to confusion.
>>
>> Only if you are determined to confuse yourself.
>>
>>
Steven D'Aprano :
> Marko Rauhamaa wrote:
>
>>> Py3's byte strings are still strings, though.
>>
>> Hm. I don't think so. In a plain English sense, maybe, but that kind of
>> usage can lead to confusion.
>
> Only if you are determined to confuse yourself.
>
> {...]
>
> In Python usage, "string" a
On Tue, Nov 25, 2014 at 9:56 AM, Steven D'Aprano
wrote:
> In all cases apart from an explicit "byte string", the word "string" is
> always used for the native array-of-characters type delimited by plain
> quotation marks, as used for error messages, user prompts, etc., regardless
> whether the imp
Marko Rauhamaa wrote:
>> Py3's byte strings are still strings, though.
>
> Hm. I don't think so. In a plain English sense, maybe, but that kind of
> usage can lead to confusion.
Only if you are determined to confuse yourself.
People are quite capable of interpreting correctly sentences like:
"
Chris Angelico :
> Py3's byte strings are still strings, though.
Hm. I don't think so. In a plain English sense, maybe, but that kind of
usage can lead to confusion.
For example,
A subscription selects an item of a sequence (string, tuple or list)
or mapping (dictionary) object:
subsc
On Mon, Nov 24, 2014 at 5:57 PM, Marko Rauhamaa wrote:
> Yes, people call strings "Unicdoe strings" because Python2 *did have*
> unicode strings separate from regular strings:
>
> Python2Python3
> --
> string bytes (byte strin
Gregory Ewing :
> Marko Rauhamaa wrote:
>> Unicode strings is not wrong but the technical emphasis on Unicode is as
>> strange as a "tire car" or "rectangular door" when "car" and "door" are
>> what you usually mean.
>
> The reason Unicode gets emphasised so much is that until relatively
> recently
On Sun, Nov 23, 2014, at 15:31, Dave Angel wrote:
> I didn't realize Windows shell (DOS box) had that bug. Course I don't
> use Windows much the last few years.
>
> it's one thing to not display it properly. It's quite another to supply
> faulty data to the clipboard. Especially since the Win
On Mon, Nov 24, 2014 at 9:51 AM, Gregory Ewing
wrote:
> Marko Rauhamaa wrote:
>>
>> Unicode strings is not wrong but the technical emphasis on Unicode is as
>> strange as a "tire car" or "rectangular door" when "car" and "door" are
>> what you usually mean.
>
>
> The reason Unicode gets emphasised
Marko Rauhamaa wrote:
Unicode strings is not wrong but the technical emphasis on Unicode is as
strange as a "tire car" or "rectangular door" when "car" and "door" are
what you usually mean.
The reason Unicode gets emphasised so much is that
until relatively recently, it *wasn't* what "string"
u
On Mon, Nov 24, 2014 at 7:31 AM, Dave Angel wrote:
> On 11/23/2014 01:13 PM, random...@fastmail.us wrote:
>>
>> On Sun, Nov 23, 2014, at 11:33, Dennis Lee Bieber wrote:
>>>
>>> Why would that be possible? Many truetype fonts only supply
>>> glyphs for
>>> single-byte encodings (ISO-Latin-1
On 11/23/2014 01:13 PM, random...@fastmail.us wrote:
On Sun, Nov 23, 2014, at 11:33, Dennis Lee Bieber wrote:
Why would that be possible? Many truetype fonts only supply glyphs for
single-byte encodings (ISO-Latin-1, for example -- pop up the Windows
character map utility and see what so
On Sun, Nov 23, 2014, at 11:33, Dennis Lee Bieber wrote:
> Why would that be possible? Many truetype fonts only supply glyphs for
> single-byte encodings (ISO-Latin-1, for example -- pop up the Windows
> character map utility and see what some of the font files contain.
With a bitmap font se
On Mon, Nov 24, 2014 at 3:33 AM, Dennis Lee Bieber
wrote:
> On Sat, 22 Nov 2014 20:52:37 -0500, random...@fastmail.us declaimed the
> following:
>
>>On Sat, Nov 22, 2014, at 18:38, Mark Lawrence wrote:
>>> ...
>>> That is a standard Windows build. He is again conflating problems with
>>> using the
On Sun, Nov 23, 2014 at 5:17 PM, Steven D'Aprano
wrote:
> If Python treated the character set as an implementation detail, the
> programmer would have no way of knowing whether
>
> s = u"ö"
>
> is legal or not, since you cannot know whether or not ö is a supported
> character in the running Python
random...@fastmail.us wrote:
> On Fri, Nov 21, 2014, at 23:38, Steven D'Aprano wrote:
>> I really don't understand what bothers you about this. In Python, we have
>> Unicode strings and byte strings. In computing in general, strings can
>> consist of Unicode characters, ASCII characters, Tron char
On Sat, Nov 22, 2014, at 21:11, Chris Angelico wrote:
> Is that true? Does WriteConsoleW support every Unicode character? It's
> not obvious from the docs whether it uses UCS-2 or UTF-16 (or maybe
> something else).
I was defining "every unicode character" loosely. There are certainly
display prob
On Sun, Nov 23, 2014 at 12:52 PM, wrote:
> On Sat, Nov 22, 2014, at 18:38, Mark Lawrence wrote:
>> ...
>> That is a standard Windows build. He is again conflating problems with
>> using the Windows command line for a given code page with the FSR.
>
> The thing is, with a truetype font selected, a
On Sat, Nov 22, 2014, at 18:38, Mark Lawrence wrote:
> ...
> That is a standard Windows build. He is again conflating problems with
> using the Windows command line for a given code page with the FSR.
The thing is, with a truetype font selected, a correctly written win32
console problem should be
On Fri, Nov 21, 2014, at 23:38, Steven D'Aprano wrote:
> I really don't understand what bothers you about this. In Python, we have
> Unicode strings and byte strings. In computing in general, strings can
> consist of Unicode characters, ASCII characters, Tron characters, EBCDID
> characters, ISO-88
On 22/11/2014 22:31, Chris Angelico wrote:
On Sun, Nov 23, 2014 at 9:04 AM, Mark Lawrence wrote:
My favourite "find thousand and one ways to make Python crashing or
failing." but I don't recall a single bug report in the last two years from
anybody regarding problems with the FSR, or have I mis
On Sun, Nov 23, 2014 at 9:04 AM, Mark Lawrence wrote:
> My favourite "find thousand and one ways to make Python crashing or
> failing." but I don't recall a single bug report in the last two years from
> anybody regarding problems with the FSR, or have I missed something?
What you've missed is th
On 22/11/2014 20:17, Chris Angelico wrote:
On Sun, Nov 23, 2014 at 5:17 AM, Mark Lawrence wrote:
Please don't feed him. Your average troll is bad enough but he really takes
the biscuit.
... someone was feeding him biscuits?
ChrisA
Surely it's better than feeding him unicode?
As I needed
On Sun, Nov 23, 2014 at 5:17 AM, Mark Lawrence wrote:
> Please don't feed him. Your average troll is bad enough but he really takes
> the biscuit.
... someone was feeding him biscuits?
ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
On 22/11/2014 17:49, Marko Rauhamaa wrote:
wxjmfa...@gmail.com:
- By chance, I found on the web a German py dev who was commenting and
he had not an updated "DUDEN" (a German dictionnary).
That... leaves me utterly speachless!
Marko
Please don't feed him. Your average troll is bad enoug
wxjmfa...@gmail.com:
> - By chance, I found on the web a German py dev who was commenting and
> he had not an updated "DUDEN" (a German dictionnary).
That... leaves me utterly speachless!
Marko
--
https://mail.python.org/mailman/listinfo/python-list
On Saturday, November 22, 2014 8:14:15 PM UTC+5:30, Roy Smith wrote:
> Marko Rauhamaa wrote:
>
> > Steven D'Aprano:
> >
> > > You haven't given any good reason for objecting to calling Unicode
> > > strings by what they are. Maybe you think that it is an implementation
> > > detail, and that som
Roy Smith :
> For that matter, we will eventually get to the point where when people
> say, "just plain text", they will mean Unicode, in the same way that
> "just plain text" today really means ASCII (and the text/plain MIME
> type will become a historical curiosity).
MIME has:
Content-Type:
In article <87y4r348uf@elektro.pacujo.net>,
Marko Rauhamaa wrote:
> Steven D'Aprano :
>
> > You haven't given any good reason for objecting to calling Unicode
> > strings by what they are. Maybe you think that it is an implementation
> > detail, and that some version of Python might suddenl
Steven D'Aprano :
> You haven't given any good reason for objecting to calling Unicode
> strings by what they are. Maybe you think that it is an implementation
> detail, and that some version of Python might suddenly and without
> warning change to only supporting KOI8-R strings or GB2312 strings?
On Sun, Nov 23, 2014 at 12:50 AM, Steven D'Aprano
wrote:
> "Tire car" makes no sense. "Rectangular door" makes perfect sense, and in a
> world where there are dozens of legacy non-rectangular doors, it would be
> very sensible to specify the kind of door. Just as we specify sliding door,
> glass d
Marko Rauhamaa wrote:
> Steven D'Aprano :
>
>> In Python, we have Unicode strings and byte strings.
>
> No, you don't. You have strings and bytes:
Python has strings of Unicode code points, a.k.a. "Unicode strings",
or "text strings", and strings of bytes, a.k.a. "byte strings". These are
the p
Steven D'Aprano :
> In Python, we have Unicode strings and byte strings.
No, you don't. You have strings and bytes:
Textual data in Python is handled with str objects, or strings.
Strings are immutable sequences of Unicode code points. String
literals are written in a variety of ways: [...
Marko Rauhamaa wrote:
> Rustom Mody :
>
>> Likewise in 2014, and given the arguments, inconsistencies, etc
>> remembering the nuts-n-bolts below the strings-represented-as-unicode
>> abstraction may be in order.
>
> No need to hide Unicode, but talking about a
>
>Unicode string
>
> is like
On Sat, Nov 22, 2014 at 3:36 AM, Marko Rauhamaa wrote:
> No need to hide Unicode, but talking about a
>
>Unicode string
>
> is like talking about an
>
>electronic computer
>
>visible spectrum display
>
>mouse user interface
>
>ethernet socket
>
>magnetic file
>
>electri
Rustom Mody :
> Likewise in 2014, and given the arguments, inconsistencies, etc
> remembering the nuts-n-bolts below the strings-represented-as-unicode
> abstraction may be in order.
No need to hide Unicode, but talking about a
Unicode string
is like talking about an
electronic computer
On Sat, Nov 22, 2014 at 3:11 AM, Francis Moreau wrote:
> Yes I finally used str() since only setlocale() reported to have some
> issues with unicode_literals active in my appliction.
>
> Thanks Chris for your useful insight.
My pleasure. Unicode is a bit of a hobby-horse of mine, so I'm always
ha
On 11/20/2014 04:15 PM, Chris Angelico wrote:
> On Fri, Nov 21, 2014 at 1:14 AM, Francis Moreau
> wrote:
>> Hi,
>>
>> Thanks for the "from __future__ import unicode_literals" trick, it makes
>> that switch much less intrusive.
>>
>> However it seems that I will suddenly be trapped by all modules
On Friday, November 21, 2014 12:06:54 PM UTC+5:30, Marko Rauhamaa wrote:
> Chris Angelico :
>
> > On Fri, Nov 21, 2014 at 5:56 AM, Marko Rauhamaa wrote:
> >> I don't really like it how Unicode is equated with text, or even
> >> character strings.
> > [...]
> > Do you have actual text that you're
On 2014-11-22 02:23, Steven D'Aprano wrote:
> LATIN SMALL LETTER E
> COMBINING CIRCUMFLEX ACCENT
>
> then my application should treat that as a single "character" and
> display it as:
>
> LATIN SMALL LETTER E WITH CIRCUMFLEX
>
> which looks like this: ê
>
> rather than two distinct "characters"
On Sat, Nov 22, 2014 at 2:23 AM, Steven D'Aprano
wrote:
> Chris Angelico wrote:
>
>> On Fri, Nov 21, 2014 at 11:32 AM, Steven D'Aprano
>> wrote:
>>> (E.g. there are millions of existing files across the world containing
>>> text which use legacy encodings that are not compatible with Unicode.)
>>
Chris Angelico wrote:
> On Fri, Nov 21, 2014 at 11:32 AM, Steven D'Aprano
> wrote:
>> (E.g. there are millions of existing files across the world containing
>> text which use legacy encodings that are not compatible with Unicode.)
>
> Not compatible with Unicode? There aren't many character sets
On Fri, Nov 21, 2014 at 7:16 PM, Marko Rauhamaa wrote:
> Chris Angelico :
>
>> Then you need to read more about Unicode. The *codepoint* for the
>> letter 'A' is 65. That is not Unicode, that is one part of the Unicode
>> spec.
>
> I don't think Python users need to know anything more about Unicod
Chris Angelico :
> Then you need to read more about Unicode. The *codepoint* for the
> letter 'A' is 65. That is not Unicode, that is one part of the Unicode
> spec.
I don't think Python users need to know anything more about Unicode than
they need to know about IEEE-754.
How many bits are reser
On Fri, Nov 21, 2014 at 6:14 PM, Marko Rauhamaa wrote:
> Chris Angelico :
>
>> On Fri, Nov 21, 2014 at 5:36 PM, Marko Rauhamaa wrote:
>>> I'm saying equating an abstract data type (string) with its
>>> representation (Unicode vector) is bad taste.
>>
>> What about "sequence of Unicode code points
Chris Angelico :
> On Fri, Nov 21, 2014 at 5:36 PM, Marko Rauhamaa wrote:
>> I'm saying equating an abstract data type (string) with its
>> representation (Unicode vector) is bad taste.
>
> What about "sequence of Unicode code points" is "representation"? What
> is your abstraction over that?
Th
On Fri, Nov 21, 2014 at 5:36 PM, Marko Rauhamaa wrote:
> Chris Angelico :
>
>> On Fri, Nov 21, 2014 at 5:56 AM, Marko Rauhamaa wrote:
>>> I don't really like it how Unicode is equated with text, or even
>>> character strings.
>> [...]
>> Do you have actual text that you're unable to represent in
Chris Angelico :
> On Fri, Nov 21, 2014 at 5:56 AM, Marko Rauhamaa wrote:
>> I don't really like it how Unicode is equated with text, or even
>> character strings.
> [...]
> Do you have actual text that you're unable to represent in Unicode?
Not my point at all.
I'm saying equating an abstract
On Fri, Nov 21, 2014 at 12:31 PM, wrote:
> On Thu, Nov 20, 2014, at 20:10, Chris Angelico wrote:
>> 2) Languages which use a different alphabet (eg Cyrillic - Russian,
>> Bulgarian). You could possibly cram them into an eight-bit encoding
>> without tipping ASCII out, but I'm not sure. In Unicode
On Thu, Nov 20, 2014, at 20:10, Chris Angelico wrote:
> 2) Languages which use a different alphabet (eg Cyrillic - Russian,
> Bulgarian). You could possibly cram them into an eight-bit encoding
> without tipping ASCII out, but I'm not sure. In Unicode, these
> languages are all easily supported by
On Fri, Nov 21, 2014 at 11:32 AM, Steven D'Aprano
wrote:
> (E.g. there are millions of existing files across the world containing text
> which use legacy encodings that are not compatible with Unicode.)
Not compatible with Unicode? There aren't many character sets out
there that include character
Marko Rauhamaa wrote:
> Michael Torrie :
>
>> Unicode can only be encoded to bytes.
>> Bytes can only be decoded to unicode.
>
> I don't really like it how Unicode is equated with text, or even
> character strings.
That surely depends on the context. To be technically correct, Unicode is a
char
On Fri, Nov 21, 2014 at 5:56 AM, Marko Rauhamaa wrote:
> Michael Torrie :
>
>> Unicode can only be encoded to bytes.
>> Bytes can only be decoded to unicode.
>
> I don't really like it how Unicode is equated with text, or even
> character strings.
>
> There's barely any difference between the trut
On Fri, Nov 21, 2014 at 4:42 AM, wrote:
> On Thu, Nov 20, 2014, at 09:59, Chris Angelico wrote:
>>
>> Why should it encode to bytes?
>
> Because a bytes format string suggests a bytes result. Why does unicode
> always "win", rather than the type of the format string always winning?
For the same
On Thu, Nov 20, 2014, at 16:29, Ethan Furman wrote:
> If your unicode string happens to contain a base64 encoded .png, then you
> could decode that into bytes. ;)
Bytes of the PNG, or of the raw pixels?
--
https://mail.python.org/mailman/listinfo/python-list
Ethan Furman :
> If your unicode string happens to contain a base64 encoded .png, then
> you could decode that into bytes. ;)
You could embed your PNG file in XML in binary form as CDATA. Then, your
"characters" would represent 8- or 16-bit integers. You just need to
replace all accidental occurr
On 11/20/2014 07:53 AM, Chris Angelico wrote:
> On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten <__pete...@web.de> wrote:
>> I think that you may get a Unicode/Encode/Error when you try to /decode/ a
>> unicode string is more confusing...
>
> Hang on a minute, what does it even mean to decode a Unico
On 20/11/2014 18:06, Ian Kelly wrote:
On Thu, Nov 20, 2014 at 10:42 AM, wrote:
and it means you can't safely
blindly use %s with an unknown object.
You can't safely do this anyway. Whether it's %s with a str and a
unicode, or %s with a unicode and a str, *something* is going to have
to be im
Michael Torrie :
> Unicode can only be encoded to bytes.
> Bytes can only be decoded to unicode.
I don't really like it how Unicode is equated with text, or even
character strings.
There's barely any difference between the truth value of these
statements:
Python strings are ASCII.
Python
random...@fastmail.us wrote:
> On Thu, Nov 20, 2014, at 09:59, Chris Angelico wrote:
>> On Fri, Nov 21, 2014 at 12:59 AM, wrote:
>> > On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote:
>> >> >>> "%s nötig %s" % (u"üblich", u"ähnlich")
>> >> Traceback (most recent call last):
>> >> File "", lin
On Thu, Nov 20, 2014 at 11:06 AM, Ian Kelly wrote:
> On Thu, Nov 20, 2014 at 10:42 AM, wrote:
>> and it means you can't safely
>> blindly use %s with an unknown object.
>
> You can't safely do this anyway. Whether it's %s with a str and a
> unicode, or %s with a unicode and a str, *something* is
On Thu, Nov 20, 2014 at 10:42 AM, wrote:
> and it means you can't safely
> blindly use %s with an unknown object.
You can't safely do this anyway. Whether it's %s with a str and a
unicode, or %s with a unicode and a str, *something* is going to have
to be implicitly encoded or decoded, and if as
On Thu, Nov 20, 2014, at 09:59, Chris Angelico wrote:
> On Fri, Nov 21, 2014 at 12:59 AM, wrote:
> > On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote:
> >> >>> "%s nötig %s" % (u"üblich", u"ähnlich")
> >> Traceback (most recent call last):
> >> File "", line 1, in
> >> UnicodeDecodeError: 'as
Chris Angelico wrote:
> On Fri, Nov 21, 2014 at 3:32 AM, Peter Otten <__pete...@web.de> wrote:
>> Chris Angelico wrote:
>>
>>> On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten <__pete...@web.de> wrote:
I think that you may get a Unicode/Encode/Error when you try to
/decode/ a unicode string
On 11/20/2014 09:32 AM, Peter Otten wrote:
> Chris Angelico wrote:
>
>> On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten <__pete...@web.de> wrote:
>>> I think that you may get a Unicode/Encode/Error when you try to /decode/
>>> a unicode string is more confusing...
>>
>> Hang on a minute, what does it
On Fri, Nov 21, 2014 at 3:32 AM, Peter Otten <__pete...@web.de> wrote:
> Chris Angelico wrote:
>
>> On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten <__pete...@web.de> wrote:
>>> I think that you may get a Unicode/Encode/Error when you try to /decode/
>>> a unicode string is more confusing...
>>
>> Han
Chris Angelico wrote:
> On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten <__pete...@web.de> wrote:
>> I think that you may get a Unicode/Encode/Error when you try to /decode/
>> a unicode string is more confusing...
>
> Hang on a minute, what does it even mean to decode a Unicode string?
Let's not g
On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten <__pete...@web.de> wrote:
> I think that you may get a Unicode/Encode/Error when you try to /decode/ a
> unicode string is more confusing...
Hang on a minute, what does it even mean to decode a Unicode string?
That's where the problem is. Fortunately th
random...@fastmail.us wrote:
> On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote:
>> >>> "%s nötig %s" % (u"üblich", u"ähnlich")
>> Traceback (most recent call last):
>> File "", line 1, in
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4:
>> ordinal not in range(128)
>
On Fri, Nov 21, 2014 at 1:14 AM, Francis Moreau wrote:
> Hi,
>
> Thanks for the "from __future__ import unicode_literals" trick, it makes
> that switch much less intrusive.
>
> However it seems that I will suddenly be trapped by all modules which
> are not prepared to handle unicode. For example:
On Fri, Nov 21, 2014 at 12:59 AM, wrote:
> On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote:
>> >>> "%s nötig %s" % (u"üblich", u"ähnlich")
>> Traceback (most recent call last):
>> File "", line 1, in
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4:
>> ordinal not in
Hi,
On 11/20/2014 11:47 AM, Chris Angelico wrote:
> On Thu, Nov 20, 2014 at 8:40 PM, Francis Moreau
> wrote:
>> My question is: how should this be fixed properly ?
>>
>> A simple solution would be to force all strings passed to the
>> logger to be unicode:
>>
>> log.debug(u"%s: %s" % ...)
>>
>
On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote:
> >>> "%s nötig %s" % (u"üblich", u"ähnlich")
> Traceback (most recent call last):
> File "", line 1, in
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4:
> ordinal not in range(128)
This is surprising to me - why is it
On Thu, Nov 20, 2014 at 11:35 PM, Peter Otten <__pete...@web.de> wrote:
> You don't need to change an all-ascii bytestring to unicode.
> Lo and behold:
>
"%s %s" % (u"üblich", u"ähnlich")
> u'\xfcblich \xe4hnlich'
u"%s %s" % (u"üblich", u"ähnlich")
> u'\xfcblich \xe4hnlich'
>
> Only non-a
Francis Moreau wrote:
> Hello,
>
> My application is using gettext module to do the translation
> stuff. Translated messages are unicode on both python 2 and
> 3 (with python2.7 I had to explicitely asked for unicode).
>
> A problem arises when formatting those messages before logging
> them. Fo
On Thu, Nov 20, 2014 at 8:40 PM, Francis Moreau wrote:
> My question is: how should this be fixed properly ?
>
> A simple solution would be to force all strings passed to the
> logger to be unicode:
>
> log.debug(u"%s: %s" % ...)
>
> and more generally force all string in my code to be unicode b
Hello,
My application is using gettext module to do the translation
stuff. Translated messages are unicode on both python 2 and
3 (with python2.7 I had to explicitely asked for unicode).
A problem arises when formatting those messages before logging
them. For example:
log.debug("%s: %s" % (hea
79 matches
Mail list logo