Re: UTF-8 and latin1

2022-10-25 Thread Chris Angelico
On Wed, 26 Oct 2022 at 05:09, Barry Scott wrote: > > > > > On 25 Oct 2022, at 11:16, Stefan Ram wrote: > > > > r...@zedat.fu-berlin.de (Stefan Ram) writes: > >> You can let Python guess the encoding of a file. > >> def encoding_of( name ): > >> path = pathlib.Path( name ) > >> for encoding in( "u

Re: UTF-8 and latin1

2022-10-25 Thread Barry Scott
> On 25 Oct 2022, at 11:16, Stefan Ram wrote: > > r...@zedat.fu-berlin.de (Stefan Ram) writes: >> You can let Python guess the encoding of a file. >> def encoding_of( name ): >> path = pathlib.Path( name ) >> for encoding in( "utf_8", "cp1252", "latin_1" ): >> try: >> with path.open( encoding=e

Re: UTF-8 and latin1

2022-08-19 Thread Dennis Lee Bieber
On Thu, 18 Aug 2022 11:33:59 -0700, Tobiah declaimed the following: > >So how does this break down? When a person enters >Montréal, Quebéc into a form field, what are they >doing on the keyboard to make that happen? As the >string sits there in the text box, is it latin1, or utf-8 >or something

回复: UTF-8 and latin1

2022-08-19 Thread Daniel Lee
Thanks! 发件人: Stefan Ram<mailto:r...@zedat.fu-berlin.de> 发送时间: 2022年8月19日 6:23 收件人: python-list@python.org<mailto:python-list@python.org> 主题: Re: UTF-8 and latin1 Tobiah writes: > When a person enters >Montréal, Quebéc into a form field, what are

Re: UTF-8 and latin1

2022-08-18 Thread Chris Angelico
On Fri, 19 Aug 2022 at 08:15, Tobiah wrote: > > > You configure the web server to send: > > > > Content-Type: text/html; charset=... > > > > in the HTTP header when it serves HTML files. > > So how does this break down? When a person enters > Montréal, Quebéc into a form field, what are they

Re: UTF-8 and latin1

2022-08-18 Thread Jon Ribbens via Python-list
On 2022-08-18, Tobiah wrote: >> You configure the web server to send: >> >> Content-Type: text/html; charset=... >> >> in the HTTP header when it serves HTML files. > > So how does this break down? When a person enters > Montréal, Quebéc into a form field, what are they > doing on the keyb

Re: UTF-8 and latin1

2022-08-18 Thread Tobiah
You configure the web server to send: Content-Type: text/html; charset=... in the HTTP header when it serves HTML files. So how does this break down? When a person enters Montréal, Quebéc into a form field, what are they doing on the keyboard to make that happen? As the string sits ther

Re: UTF-8 and latin1

2022-08-18 Thread Jon Ribbens via Python-list
On 2022-08-18, Tobiah wrote: >> Generally speaking browser submisisons were/are supposed to be sent >> using the same encoding as the page, so if you're sending the page >> as "latin1" then you'll see that a fair amount I should think. If you >> send it as "utf-8" then you'll get 100% utf-8 back.

Re: UTF-8 and latin1

2022-08-18 Thread Jon Ribbens via Python-list
On 2022-08-17, Barry wrote: >> On 17 Aug 2022, at 18:30, Jon Ribbens via Python-list >> wrote: >> On 2022-08-17, Tobiah wrote: >>> I get data from various sources; client emails, spreadsheets, and >>> data from web applications. I find that I can do >>> some_string.decode('latin1') >>> to get

Re: UTF-8 and latin1

2022-08-18 Thread Tobiah
Generally speaking browser submisisons were/are supposed to be sent using the same encoding as the page, so if you're sending the page as "latin1" then you'll see that a fair amount I should think. If you send it as "utf-8" then you'll get 100% utf-8 back. The only trick I know is to use . Woul

Re: UTF-8 and latin1

2022-08-18 Thread Jon Ribbens via Python-list
On 2022-08-17, Tobiah wrote: >> That has already been decided, as much as it ever can be. UTF-8 is >> essentially always the correct encoding to use on output, and almost >> always the correct encoding to assume on input absent any explicit >> indication of another encoding. (e.g. the HTML "standa

Re: UTF-8 and latin1

2022-08-17 Thread dn
On 18/08/2022 03.33, Stefan Ram wrote: > Tobiah writes: >> I get data from various sources; client emails, spreadsheets, and >> data from web applications. I find that I can do >> some_string.decode('latin1') > > Strings have no "decode" method. ("bytes" objects do.) > >> to get unicode that

Re: UTF-8 and latin1

2022-08-17 Thread Barry
> On 17 Aug 2022, at 18:30, Jon Ribbens via Python-list > wrote: > > On 2022-08-17, Tobiah wrote: >> I get data from various sources; client emails, spreadsheets, and >> data from web applications. I find that I can do >> some_string.decode('latin1') >> to get unicode that I can use with x

Re: UTF-8 and latin1

2022-08-17 Thread Tobiah
That has already been decided, as much as it ever can be. UTF-8 is essentially always the correct encoding to use on output, and almost always the correct encoding to assume on input absent any explicit indication of another encoding. (e.g. the HTML "standard" says that all HTML files must be UTF-

Re: UTF-8 and latin1

2022-08-17 Thread Tobiah
On 8/17/22 08:33, Stefan Ram wrote: Tobiah writes: I get data from various sources; client emails, spreadsheets, and data from web applications. I find that I can do some_string.decode('latin1') Strings have no "decode" method. ("bytes" objects do.) I'm using 2.7. Maybe that's why.

UTF-8 and latin1

2022-08-17 Thread Tobiah
I get data from various sources; client emails, spreadsheets, and data from web applications. I find that I can do some_string.decode('latin1') to get unicode that I can use with xlsxwriter, or put in the header of a web page to display European characters correctly. But normally UTF-8 is recom

Re: UTF-8 and latin1

2022-08-17 Thread Jon Ribbens via Python-list
On 2022-08-17, Tobiah wrote: > I get data from various sources; client emails, spreadsheets, and > data from web applications. I find that I can do some_string.decode('latin1') > to get unicode that I can use with xlsxwriter, > or put in the header of a web page to display > European characters