On 04Nov2024 13:02, Loris Bennett wrote:
OK, so I can do:
##
if args.verbose:
for k in mail.keys():
print(f"{k}: {mail.get(k)}")
print('')
print(mail.get_content())
##
On 2024-11-04 13:02:21 +0100, Loris Bennett via Python-list wrote:
> "Loris Bennett" writes:
> > "Loris Bennett" writes:
> >> Cameron Simpson writes:
> >>> On 01Nov2024 10:10, Loris Bennett wrote:
> >>>>as expected. The n
"Loris Bennett" writes:
> "Loris Bennett" writes:
>
>> Cameron Simpson writes:
>>
>>> On 01Nov2024 10:10, Loris Bennett wrote:
>>>>as expected. The non-UTF-8 text occurs when I do
>>>>
>>>> mail = EmailMes
"Loris Bennett" writes:
> Cameron Simpson writes:
>
>> On 01Nov2024 10:10, Loris Bennett wrote:
>>>as expected. The non-UTF-8 text occurs when I do
>>>
>>> mail = EmailMessage()
>>> mail.set_content(body, cte="quoted-pri
ill create a process that displays a graphical
>> > console. The console uses an encoding scheme to represent the text
>> > output. I believe that the default on MS Windows is to use some
>> > single-byte encoding. This answer from SE family site tells you how to
>>
Cameron Simpson writes:
> On 01Nov2024 10:10, Loris Bennett wrote:
>>as expected. The non-UTF-8 text occurs when I do
>>
>> mail = EmailMessage()
>> mail.set_content(body, cte="quoted-printable")
>> ...
>>
>> if args.verbose:
&g
console. The console uses an encoding scheme to represent the text
> > output. I believe that the default on MS Windows is to use some
> > single-byte encoding. This answer from SE family site tells you how to
> > set the console encoding to UTF-8 permanently:
> >
> http
In comp.lang.python, Gilmeh Serda wrote:
> Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> help('modules')
>
> Please wait a moment while I gather a list of all available modules...
>
> Ass
On 2024-11-01, Eli the Bearded <*@eli.users.panix.com> wrote:
> In comp.lang.python, Gilmeh Serda wrote:
>> Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux
>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> help('modules')
>>
>> Please wai
> On 1 Nov 2024, at 22:57, Left Right wrote:
>
> Does this Windows Terminal support the use
> of programs like tmux?
I have not tried, but should work.
Best to install the terminal app from the MS app store.
Most use I make is to ssh into linux systems and stuff like editors.
Colour output a
> Windows does now. They implemented this feature over the last few years.
> Indeed they took inspiration from how linux does this.
>
> You might find https://devblogs.microsoft.com/commandline/ has interesting
> articles about this.
I don't have MS Windows. My wife does, but I don't want to both
nux does this.
You might find https://devblogs.microsoft.com/commandline/ has interesting
articles about this.
They also have implemented utf-8 as code page 65001.
Barry
--
https://mail.python.org/mailman/listinfo/python-list
On 31Oct2024 21:53, alan.ga...@yahoo.co.uk wrote:
On 31/10/2024 20:50, Cameron Simpson via Python-list wrote:
If you're just dealing with this directly, use the `quopri` stdlib
module: https://docs.python.org/3/library/quopri.html
One of the things I love about this list are these little feat
On 01Nov2024 10:10, Loris Bennett wrote:
as expected. The non-UTF-8 text occurs when I do
mail = EmailMessage()
mail.set_content(body, cte="quoted-printable")
...
if args.verbose:
print(mail)
which is presumably also correct.
The question is: What conversion is necessar
pproach to me.
And you are right that encoding for the actual mail which is received
is
automatically sorted out. If I display the raw email in my client I get
the following:
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
...
Su
Loris Bennett wrote at 2024-11-1 10:10 +0100:
> ...
> mail.set_content(body, cte="quoted-printable")
In the line above, you request the content to use
the "cte" (= "Content-Transfer-Encoding") "quoted-printable"
and consequently, the content is encoded with `quoted-printable`.
Maybe, you do not n
=9Cbung.
>>
>>What do I need to do to prevent the body from getting mangled?
>
> That looks to me like quoted-printable. This is an encoding for binary
> transport of text to make it robust against not 8-buit clean
> transports. So your Unicode text is encodings as UTF-8, and
gle-byte encoding. This answer from SE family site tells you how to
> set the console encoding to UTF-8 permanently:
> https://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8
> , which, I believe, will solve your problem with how the text is
> di
bungsbetreff
>>>
>>> Sehr geehrter Herr Dr. Bennett,
>>>
>>> Dies ist eine =C3=9Cbung.
>>>
>>>What do I need to do to prevent the body from getting mangled?
>>
>> That looks to me like quoted-printable. This is an encoding for bi
On 31/10/2024 20:50, Cameron Simpson via Python-list wrote:
> That looks to me like quoted-printable. This is an encoding for binary
> transport of text to make it robust against not 8-buit clean
...
> If you're just dealing with this directly, use the `quopri` stdlib
> module: https://docs.py
make it robust against not 8-buit clean transports.
So your Unicode text is encodings as UTF-8, and then that is encoded in
quoted-printable for transport through the email system.
Your terminal probably accepts UTF-8 - I imagine other German text
renders corectly?
You need to get the text
run, eg. cmd.exe, it will create a process that displays a graphical
console. The console uses an encoding scheme to represent the text
output. I believe that the default on MS Windows is to use some
single-byte encoding. This answer from SE family site tells you how to
set the console e
Hi,
I have a command-line program which creates an email containing German
umlauts. On receiving the mail, my mail client displays the subject and
body correctly:
Subject: Übung
Sehr geehrter Herr Dr. Bennett,
Dies ist eine Übung.
So far, so good. However, when I use the --verbose opti
On 08May2023 12:19, jak wrote:
In reality you should also take into account the fact that if the
header
contains a 'b' instead of a 'q' as a penultimate character, then the
rest of the package is converted on the basis64
"=?utf-8?Q?" --> "=?utf-
Chris Green wrote at 2023-5-6 15:58 +0100:
>Chris Green wrote:
>> I'm having a real hard time trying to do anything to a string (?)
>> returned by mailbox.MaildirMessage.get().
>>
>What a twit I am :-)
>
>Strings are immutable, I have to do:-
>
>newstring = oldstring.replace("_", " ")
The sol
ring.replace("_", " ")
>
> Job done!
Not necessarily.
The subject in the original article was:
=?utf-8?Q?aka_Marne_=C3=A0_la_Sa=C3=B4ne_(Waterways_Continental_Europe)?=
That's some kind of MIME encoding. Just replacing underscores by spaces
won't necessarily give yo
Peter Pearson ha scritto:
On Sat, 6 May 2023 14:50:40 +0100, Chris Green wrote:
[snip]
So, what do those =?utf-8? and ?= sequences mean? Are they part of
the string or are they wrapped around the string on output as a way to
show that it's utf-8 encoded?
Yes, "=?utf-8?" signa
On Sat, 6 May 2023 14:50:40 +0100, Chris Green wrote:
[snip]
> So, what do those =?utf-8? and ?= sequences mean? Are they part of
> the string or are they wrapped around the string on output as a way to
> show that it's utf-8 encoded?
Yes, "=?utf-8?" signals "MIME
place("_", " ")
Job done!
Not necessarily.
The subject in the original article was:
=?utf-8?Q?aka_Marne_=C3=A0_la_Sa=C3=B4ne_(Waterways_Continental_Europe)?=
That's some kind of MIME encoding. Just replacing underscores by spaces
won't necessarily give you anyth
non-ASCII characters in it) is:-
=?utf-8?Q?aka_Marne_=C3=A0_la_Sa=C3=B4ne_(Waterways_Continental_Europe)?=
Whatever I try I am unable to change the underscore characters in the
above string back to spaces.
So, what do those =?utf-8? and ?= sequences mean? Are they part of
the string o
Strings are immutable, I have to do:-
> >
> > newstring = oldstring.replace("_", " ")
> >
> > Job done!
>
> Not necessarily.
>
> The subject in the original article was:
> =?utf-8?Q?aka_Marne_=C3=A0_la_Sa=C3=B4ne_(Waterways_Continental_Europe)
Chris Green wrote:
> I'm having a real hard time trying to do anything to a string (?)
> returned by mailbox.MaildirMessage.get().
>
What a twit I am :-)
Strings are immutable, I have to do:-
newstring = oldstring.replace("_", " ")
Job done!
--
Chris Green
·
--
https://mail.python.org/m
;> path = pathlib.Path( name )
> >> for encoding in( "utf_8", "cp1252", "latin_1" ):
> >> try:
> >> with path.open( encoding=encoding, errors="strict" )as file:
> >
> > I also read a book which claimed that the tkinter.T
ot;, "latin_1" ):
>> try:
>> with path.open( encoding=encoding, errors="strict" )as file:
>
> I also read a book which claimed that the tkinter.Text
> widget would accept bytes and guess whether these are
> encoded in UTF-8 or "ISO 8859-1&quo
On Thu, 18 Aug 2022 11:33:59 -0700, Tobiah declaimed the
following:
>
>So how does this break down? When a person enters
>Montréal, Quebéc into a form field, what are they
>doing on the keyboard to make that happen? As the
>string sits there in the text box, is it latin
Thanks!
发件人: Stefan Ram<mailto:r...@zedat.fu-berlin.de>
发送时间: 2022年8月19日 6:23
收件人: python-list@python.org<mailto:python-list@python.org>
主题: Re: UTF-8 and latin1
Tobiah writes:
> When a person enters
>Montréal, Quebéc into a form field, what are
ntréal, Quebéc into a form field, what are they
> doing on the keyboard to make that happen? As the
> string sits there in the text box, is it latin1, or utf-8
> or something else? How does the browser know what
> sort of data it has in that text box?
>
As it sits there in the text box
#x27;e'. If they're using a French ("azerty") keyboard then I think they
can enter it by holding 'shift' and typing '2'.
> As the string sits there in the text box, is it latin1, or utf-8
> or something else?
That depends on which browser you're
there in the text box, is it latin1, or utf-8
or something else? How does the browser know what
sort of data it has in that text box?
--
https://mail.python.org/mailman/listinfo/python-list
On 2022-08-18, Tobiah wrote:
>> Generally speaking browser submisisons were/are supposed to be sent
>> using the same encoding as the page, so if you're sending the page
>> as "latin1" then you'll see that a fair amount I should think. If you
>> send i
;>> some_string.decode('latin1')
>>> to get unicode that I can use with xlsxwriter,
>>> or put in the header of a web page to display
>>> European characters correctly. But normally UTF-8 is recommended as
>>> the encoding to use today. latin1
Generally speaking browser submisisons were/are supposed to be sent
using the same encoding as the page, so if you're sending the page
as "latin1" then you'll see that a fair amount I should think. If you
send it as "utf-8" then you'll get 100% utf-8 back
On 2022-08-17, Tobiah wrote:
>> That has already been decided, as much as it ever can be. UTF-8 is
>> essentially always the correct encoding to use on output, and almost
>> always the correct encoding to assume on input absent any explicit
>> indication of another
hod. ("bytes" objects do.)
>
>> to get unicode that I can use with xlsxwriter,
>> or put in the header of a web page to display
>> European characters correctly.
>
> |You should always use the UTF-8 character encoding. (Remember
> |that this means you also
;)
>> to get unicode that I can use with xlsxwriter,
>> or put in the header of a web page to display
>> European characters correctly. But normally UTF-8 is recommended as
>> the encoding to use today. latin1 works correctly more often when I
>> am using data from
That has already been decided, as much as it ever can be. UTF-8 is
essentially always the correct encoding to use on output, and almost
always the correct encoding to assume on input absent any explicit
indication of another encoding. (e.g. the HTML "standard" says that
all HTML files m
On 8/17/22 08:33, Stefan Ram wrote:
Tobiah writes:
I get data from various sources; client emails, spreadsheets, and
data from web applications. I find that I can do some_string.decode('latin1')
Strings have no "decode" method. ("bytes" objects do.)
I'm using 2.7. Maybe that's why.
I get data from various sources; client emails, spreadsheets, and
data from web applications. I find that I can do some_string.decode('latin1')
to get unicode that I can use with xlsxwriter,
or put in the header of a web page to display
European characters correctly. But normall
> European characters correctly. But normally UTF-8 is recommended as
> the encoding to use today. latin1 works correctly more often when I
> am using data from the wild. It's frustrating that I have to play
> a guessing game to figure out how to use incoming text. I'
Dennis Lee Bieber writes:
> On Fri, 1 Apr 2022 03:59:32 +1100, Chris Angelico
> declaimed the following:
>
>
>>That's jmf. Ignore him. He knows nothing about Unicode and is
>>determined to make everyone aware of that fact.
>>
>>He got blocked from the mailing list ages ago, and I don't think
>>a
On Fri, 1 Apr 2022 at 11:16, Dennis Lee Bieber wrote:
>
> On Fri, 1 Apr 2022 03:59:32 +1100, Chris Angelico
> declaimed the following:
>
>
> >That's jmf. Ignore him. He knows nothing about Unicode and is
> >determined to make everyone aware of that fact.
> >
> >He got blocked from the mailing lis
On Fri, 1 Apr 2022 03:59:32 +1100, Chris Angelico
declaimed the following:
>That's jmf. Ignore him. He knows nothing about Unicode and is
>determined to make everyone aware of that fact.
>
>He got blocked from the mailing list ages ago, and I don't think
>anyone's regretted it.
>
Ah yes.
On Fri, 1 Apr 2022 at 03:45, Dennis Lee Bieber wrote:
>
> On Thu, 31 Mar 2022 00:36:10 -0700 (PDT), moi
> declaimed the following:
>
> >>>> 'äÄöÖüÜ'.encode('utf-8')
> >b'\xc3\xa4\xc3\x84\xc3\xb6\xc3\x96\xc3\xbc\xc3\x9c'
> >&
On Thu, 31 Mar 2022 00:36:10 -0700 (PDT), moi
declaimed the following:
>>>> 'äÄöÖüÜ'.encode('utf-8')
>b'\xc3\xa4\xc3\x84\xc3\xb6\xc3\x96\xc3\xbc\xc3\x9c'
>>>> len('äÄöÖüÜ'.encode('utf-8'))
>12
>>>
* MRAB:
> On 2019-03-19 20:32, Florian Weimer wrote:
>> I've seen occasional proposals like this one coming up:
>>
>> | I therefore suggested 1999-11-02 on the unic...@unicode.org mailing
>> | list the following approach. Instead of using U+FFFD, simply encode
&
On 2019-03-19 20:32, Florian Weimer wrote:
I've seen occasional proposals like this one coming up:
| I therefore suggested 1999-11-02 on the unic...@unicode.org mailing
| list the following approach. Instead of using U+FFFD, simply encode
| malformed UTF-8 sequences as malformed U
I've seen occasional proposals like this one coming up:
| I therefore suggested 1999-11-02 on the unic...@unicode.org mailing
| list the following approach. Instead of using U+FFFD, simply encode
| malformed UTF-8 sequences as malformed UTF-16 sequences. Malformed
| UTF-8 sequences co
is non existent. Unicode input text won't show up.
It probably needs to be rewritten with get_wch() as was suggested in the
following SO question before get_wch() was implemented, together with proper
key code parsing (in do_command()) and probably more as to prevent breakage
[
https://stackove
I can display UTF-8 when I use wxPython:
--
import wx
app = wx.App()
s = 'testing\xf0\x9f\x98\x80'
frame = wx.Frame(None, wx.ID_ANY)
font = wx.Font("Arial")
textbox = wx.TextCtrl(frame, id=wx.ID_ANY)
textbox.SetFont(font)
textbox.WriteText(s)
frame.Show()
app.MainLoop()
-
On Tue, Jan 16, 2018 at 8:29 AM, Peng Yu wrote:
>> Just to be clear, TAB *only* appears in utf-8 as the encoding for the actual
>> TAB character, not as a part of any other character's encoding. The only
>> bytes that can appear in the utf-8 encoding of non-ascii char
> Just to be clear, TAB *only* appears in utf-8 as the encoding for the actual
> TAB character, not as a part of any other character's encoding. The only
> bytes that can appear in the utf-8 encoding of non-ascii characters are
> starting with 0xC2 through 0xF4, followed by on
On Mon, Jan 15, 2018, at 09:35, Peter Otten wrote:
> Peng Yu wrote:
>
> > Can utf-8 encoded character contain a byte of TAB?
>
> Yes; ascii is a subset of utf8.
>
> If you want to allow fields containing TABs in a file where TAB is also the
> field separator you need
Peng Yu wrote:
> Can utf-8 encoded character contain a byte of TAB?
Yes; ascii is a subset of utf8.
Python 2.7.6 (default, Nov 23 2017, 15:49:48)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>&g
Hi,
I use the following code to process TSV input.
$ printf '%s\t%s\n' {1..10} | ./main.py
['1', '2']
['3', '4']
['5', '6']
['7', '8']
['9', '10']
$ cat main.py
#!/usr/bin/env p
gt;> I'm afraid Python's choice may lead to exploitable security holes in
>>> Python programs.
>>
>> Feel free to back up that with an actual demonstration of an exploit,
>> rather than just FUD.
>
> It might come as a surprise to programmers that pathnames can
Steve D'Aprano :
> On Sun, 22 Jan 2017 07:34 pm, Marko Rauhamaa wrote:
>
>> Steve D'Aprano :
>>
>>> On Sun, 22 Jan 2017 06:52 am, Marko Rauhamaa wrote:
Also, [surrogates] don't exist as Unicode code points. Python
shouldn't allow surrogate characters in strings.
>>>
>>> Not quite. This
On Sun, 22 Jan 2017 07:34 pm, Marko Rauhamaa wrote:
> Steve D'Aprano :
>
>> On Sun, 22 Jan 2017 06:52 am, Marko Rauhamaa wrote:
>>> Also, [surrogates] don't exist as Unicode code points. Python
>>> shouldn't allow surrogate characters in strings.
>>
>> Not quite. This is where it gets a bit messy
Steve D'Aprano :
> On Sun, 22 Jan 2017 06:52 am, Marko Rauhamaa wrote:
>> Also, [surrogates] don't exist as Unicode code points. Python
>> shouldn't allow surrogate characters in strings.
>
> Not quite. This is where it gets a bit messy and confusing. The bottom
> line is: surrogates *are* code po
eryk sun :
> On Sat, Jan 21, 2017 at 8:21 PM, Pete Forman wrote:
>> Marko Rauhamaa writes:
>>
py> low = '\uDC37'
>>>
>>> That should raise a SyntaxError exception.
>>
>> Quite. [...]
>
> CPython allows surrogate codes for use with the "surrogateescape" and
> "surrogatepass" error handlers,
ng only makes sense (for every
> use-case I've been able to come up with) in the context of known
> offsets like you describe with tell().
I'm sorry, I find it hard to believe that you've never needed to add or
subtract 1 from a given offset returned by find() or equiv
On Sun, 22 Jan 2017 07:21 am, Pete Forman wrote:
> Marko Rauhamaa writes:
>
>>> py> low = '\uDC37'
>>
>> That should raise a SyntaxError exception.
>
> Quite. My point was that with older Python on a narrow build (Windows
> and Mac) you need to understand that you are using UTF-16 rather than
>
On Sun, 22 Jan 2017 06:52 am, Marko Rauhamaa wrote:
> Pete Forman :
>
>> Surrogates only exist in UTF-16. They are expressly forbidden in UTF-8
>> and UTF-32.
>
> Also, they don't exist as Unicode code points. Python shouldn't allow
> surrogate characters
Right, so here, you've done a (likely linear, but however you get
here) search, which then makes sense to use this opaque "offset"
token for slicing purposes:
> py> stuff = text[offset:]
> py> assert stuff == "фxx"
> That works fine whether indexing refers t
On 2017-01-21 10:50, Pete Forman wrote:
> Thanks for a very thorough reply, most useful. I'm going to pick you up
> on the above, though.
>
> Surrogates only exist in UTF-16. They are expressly forbidden in UTF-8
> and UTF-32. The rules for UTF-8 were tightened up in Unicode 4
#x27;ascii', 'surrogateescape')
b'\x81'
This error handler is required by CPython on POSIX to handle arbitrary
bytes in file-system paths. For example, when running with LANG=C:
>>> sys.getfilesystemencoding()
'ascii'
>>> os.listdir(b'.')
Marko Rauhamaa writes:
>> py> low = '\uDC37'
>
> That should raise a SyntaxError exception.
Quite. My point was that with older Python on a narrow build (Windows
and Mac) you need to understand that you are using UTF-16 rather than
Unicode. On a wide build or Python 3.3+ then all is rosy. (At th
Pete Forman :
> Surrogates only exist in UTF-16. They are expressly forbidden in UTF-8
> and UTF-32.
Also, they don't exist as Unicode code points. Python shouldn't allow
surrogate characters in strings.
Thus the range of code points that are available for use as
charac
Chris Angelico writes:
> On Sun, Jan 22, 2017 at 2:56 AM, Jussi Piitulainen wrote:
>> Steve D'Aprano writes:
>>
>> [snip]
>>
>>> You could avoid that error by increasing the offset by the right
>>> amount:
>>>
>>> stuff = text[
On Sun, Jan 22, 2017 at 2:56 AM, Jussi Piitulainen
wrote:
> Steve D'Aprano writes:
>
> [snip]
>
>> You could avoid that error by increasing the offset by the right
>> amount:
>>
>> stuff = text[offset + len("ф".encode('utf-8'):]
>&g
Steve D'Aprano writes:
[snip]
> You could avoid that error by increasing the offset by the right
> amount:
>
> stuff = text[offset + len("ф".encode('utf-8'):]
>
> which is awful. I believe that's what Go and Julia expect you to do.
Julia provides
Steve D'Aprano writes:
> [...]
> Another factor which I didn't see discussed anywhere is that Python
> strings treat surrogates as normal code points. I believe that would
> be troublesome for a UTF-8 implementation:
>
> py> '\uDC37'.encode('utf-8
de points or bytes.
py> "αβγдлфxx".find("ф")
5
py> "αβγдлфxx".encode('utf-8').find("ф".encode('utf-8'))
10
Either way, you get the expected result. However:
py> stuff = text[offset + 1:]
py> assert stuff == "xx"
Tha
On Sat, 21 Jan 2017 09:35 am, Pete Forman wrote:
> Can anyone point me at a rationale for PEP 393 being incorporated in
> Python 3.3 over using UTF-8 as an internal string representation?
I've read over the PEP, and the email discussion, and there is very little
mention of UTF-8, and
On 2017-01-21 11:58, Chris Angelico wrote:
> So, how could you implement this function? The current
> implementation maintains an index - an integer position through the
> string. It repeatedly requests the next character as string[idx],
> and can also slice the string (to check for keywords like "
Chris Angelico writes:
> You can't do a look-ahead with a vanilla string iterator. That's
> necessary for a lot of parsers.
For JSON? For other parsers you usually have a tokenizer that reads
characters with maybe 1 char of lookahead.
> Yes, which gives a two-level indexing (first find the stra
Chris Angelico writes:
> On Sat, Jan 21, 2017 at 11:30 AM, Pete Forman wrote:
>> I was asserting that most useful operations on strings start from
>> index 0. The r* operations would not be slowed down that much as
>> UTF-8 has the useful property that attempting to interpre
On Sat, Jan 21, 2017 at 5:01 PM, Paul Rubin wrote:
> Chris Angelico writes:
>> decoding JSON... the scanner, which steps through the string and
>> does the actual parsing. ...
>> The only way for it to be fast enough would be to have some sort of
>> retainable string iterator, which means exposin
I'm missing something. Of
course a json parser should use it, though who uses the non-C json
parser anyway these days?
[Chris Kaynor writes:]
> rfind/rsplit/rindex/rstrip and the other related reverse
> functions would require walking the string from start to end, rather
> than short-circu
7;m not getting paid for the work,
it's purely voluntary.
PEP 393 / Python 3.3 required extension writers to revisit their access
to strings. My explicit question was about why PEP 393 was adopted to
replace the deficient old implementations rather than another approach.
The implicit questio
of astral characters (plus *maybe* a faster encode-to-UTF-8; you
wouldn't get a faster decode-from-UTF-8, because you still need to
check that the byte sequence is valid). Can you show a use-case that
would be materially improved by UTF-8?
ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
On Sat, Jan 21, 2017 at 11:30 AM, Pete Forman wrote:
> I was asserting that most useful operations on strings start from index
> 0. The r* operations would not be slowed down that much as UTF-8 has the
> useful property that attempting to interpret from a byte that is not at
> th
ace the deficient old implementations rather than another approach.
The implicit question is whether a UTF-8 internal representation should
replace that of PEP 393.
--
Pete Forman
--
https://mail.python.org/mailman/listinfo/python-list
Chris Kaynor writes:
> On Fri, Jan 20, 2017 at 2:35 PM, Pete Forman wrote:
>> Can anyone point me at a rationale for PEP 393 being incorporated in
>> Python 3.3 over using UTF-8 as an internal string representation?
>> I've found good articles by Nick Coghlan, Armin
On 2017-01-20 23:06, Chris Kaynor wrote:
On Fri, Jan 20, 2017 at 2:35 PM, Pete Forman wrote:
Can anyone point me at a rationale for PEP 393 being incorporated in
Python 3.3 over using UTF-8 as an internal string representation? I've
found good articles by Nick Coghlan, Armin Ronache
.
On Fri, Jan 20, 2017 at 3:15 PM, Thomas Nyberg wrote:
> On 01/20/2017 03:06 PM, Chris Kaynor wrote:
>>
>>
>> [...snip...]
>>
>> --
>> Chris Kaynor
>>
>
> I was able to delete my response which was a wholly contained subset of this
> one. :)
>
>
> But I have one extra question. Is string
ally
> change if it weren't for all the reasons you mentioned.) I found this which
> at details (if not explicitly "guarantees") the complexity properties of
> other datatypes:
>
No, it isn't; this question came up in the context of MicroPython,
which chose
On 01/20/2017 03:06 PM, Chris Kaynor wrote:
[...snip...]
--
Chris Kaynor
I was able to delete my response which was a wholly contained subset of
this one. :)
But I have one extra question. Is string indexing guaranteed to be
constant-time for python? I thought so, but I couldn't
On Fri, Jan 20, 2017 at 2:35 PM, Pete Forman wrote:
> Can anyone point me at a rationale for PEP 393 being incorporated in
> Python 3.3 over using UTF-8 as an internal string representation? I've
> found good articles by Nick Coghlan, Armin Ronacher and others on the
> matter
Can anyone point me at a rationale for PEP 393 being incorporated in
Python 3.3 over using UTF-8 as an internal string representation? I've
found good articles by Nick Coghlan, Armin Ronacher and others on the
matter. What I have not found is discussion of pros and cons of
alternatives to th
CII, you are
> now running a broken system with subtle bugs, including in data structures
> as fundamental as dicts.
>
> The standard behaviour:
>
> py> d = {u'café': 1}
> py> for key in d:
> ... print key == 'caf\xc3\xa9'
> ...
> False
&g
1 - 100 of 864 matches
Mail list logo