Re: UTF-8 and latin1

2022-10-25 Thread Chris Angelico
On Wed, 26 Oct 2022 at 05:09, Barry Scott wrote: > > > > > On 25 Oct 2022, at 11:16, Stefan Ram wrote: > > > > r...@zedat.fu-berlin.de (Stefan Ram) writes: > >> You can let Python guess the encoding of a file. > >> def encoding_of( name ): > >> path = pathlib.Path( name ) > >> for encoding in( "u

Re: UTF-8 and latin1

2022-10-25 Thread Barry Scott
> On 25 Oct 2022, at 11:16, Stefan Ram wrote: > > r...@zedat.fu-berlin.de (Stefan Ram) writes: >> You can let Python guess the encoding of a file. >> def encoding_of( name ): >> path = pathlib.Path( name ) >> for encoding in( "utf_8", "cp1252", "latin_1" ): >> try: >> with path.open( encoding=e

Re: UTF-8 and latin1

2022-08-19 Thread Dennis Lee Bieber
On Thu, 18 Aug 2022 11:33:59 -0700, Tobiah declaimed the following: > >So how does this break down? When a person enters >Montréal, Quebéc into a form field, what are they >doing on the keyboard to make that happen? As the >string sits there in the text box, is it latin1, or utf-8 >or something

Re: UTF-8 and latin1

2022-08-18 Thread Chris Angelico
On Fri, 19 Aug 2022 at 08:15, Tobiah wrote: > > > You configure the web server to send: > > > > Content-Type: text/html; charset=... > > > > in the HTTP header when it serves HTML files. > > So how does this break down? When a person enters > Montréal, Quebéc into a form field, what are they

Re: UTF-8 and latin1

2022-08-18 Thread Jon Ribbens via Python-list
On 2022-08-18, Tobiah wrote: >> You configure the web server to send: >> >> Content-Type: text/html; charset=... >> >> in the HTTP header when it serves HTML files. > > So how does this break down? When a person enters > Montréal, Quebéc into a form field, what are they > doing on the keyb

Re: UTF-8 and latin1

2022-08-18 Thread Tobiah
You configure the web server to send: Content-Type: text/html; charset=... in the HTTP header when it serves HTML files. So how does this break down? When a person enters Montréal, Quebéc into a form field, what are they doing on the keyboard to make that happen? As the string sits ther

Re: UTF-8 and latin1

2022-08-18 Thread Jon Ribbens via Python-list
On 2022-08-18, Tobiah wrote: >> Generally speaking browser submisisons were/are supposed to be sent >> using the same encoding as the page, so if you're sending the page >> as "latin1" then you'll see that a fair amount I should think. If you >> send it as "utf-8" then you'll get 100% utf-8 back.

Re: UTF-8 and latin1

2022-08-18 Thread Jon Ribbens via Python-list
On 2022-08-17, Barry wrote: >> On 17 Aug 2022, at 18:30, Jon Ribbens via Python-list >> wrote: >> On 2022-08-17, Tobiah wrote: >>> I get data from various sources; client emails, spreadsheets, and >>> data from web applications. I find that I can do >>> some_string.decode('latin1') >>> to get

Re: UTF-8 and latin1

2022-08-18 Thread Tobiah
Generally speaking browser submisisons were/are supposed to be sent using the same encoding as the page, so if you're sending the page as "latin1" then you'll see that a fair amount I should think. If you send it as "utf-8" then you'll get 100% utf-8 back. The only trick I know is to use . Woul

Re: UTF-8 and latin1

2022-08-18 Thread Jon Ribbens via Python-list
On 2022-08-17, Tobiah wrote: >> That has already been decided, as much as it ever can be. UTF-8 is >> essentially always the correct encoding to use on output, and almost >> always the correct encoding to assume on input absent any explicit >> indication of another encoding. (e.g. the HTML "standa

Re: UTF-8 and latin1

2022-08-17 Thread dn
On 18/08/2022 03.33, Stefan Ram wrote: > Tobiah writes: >> I get data from various sources; client emails, spreadsheets, and >> data from web applications. I find that I can do >> some_string.decode('latin1') > > Strings have no "decode" method. ("bytes" objects do.) > >> to get unicode that

Re: UTF-8 and latin1

2022-08-17 Thread Barry
> On 17 Aug 2022, at 18:30, Jon Ribbens via Python-list > wrote: > > On 2022-08-17, Tobiah wrote: >> I get data from various sources; client emails, spreadsheets, and >> data from web applications. I find that I can do >> some_string.decode('latin1') >> to get unicode that I can use with x

Re: UTF-8 and latin1

2022-08-17 Thread Tobiah
That has already been decided, as much as it ever can be. UTF-8 is essentially always the correct encoding to use on output, and almost always the correct encoding to assume on input absent any explicit indication of another encoding. (e.g. the HTML "standard" says that all HTML files must be UTF-

Re: UTF-8 and latin1

2022-08-17 Thread Tobiah
On 8/17/22 08:33, Stefan Ram wrote: Tobiah writes: I get data from various sources; client emails, spreadsheets, and data from web applications. I find that I can do some_string.decode('latin1') Strings have no "decode" method. ("bytes" objects do.) I'm using 2.7. Maybe that's why.

Re: UTF-8 and latin1

2022-08-17 Thread Jon Ribbens via Python-list
On 2022-08-17, Tobiah wrote: > I get data from various sources; client emails, spreadsheets, and > data from web applications. I find that I can do some_string.decode('latin1') > to get unicode that I can use with xlsxwriter, > or put in the header of a web page to display > European characters

Re: UTF-8 Encoding Error

2016-12-29 Thread subhabangalore
On Friday, December 30, 2016 at 7:16:25 AM UTC+5:30, Steve D'Aprano wrote: > On Sun, 25 Dec 2016 04:50 pm, Grady Martin wrote: > > > On 2016年12月22日 22時38分, wrote: > >>I am getting the error: > >>UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: > >>invalid start byte > > > >

Re: UTF-8 Encoding Error

2016-12-29 Thread Steve D'Aprano
On Sun, 25 Dec 2016 04:50 pm, Grady Martin wrote: > On 2016年12月22日 22時38分, subhabangal...@gmail.com wrote: >>I am getting the error: >>UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: >>invalid start byte > > The following is a reflex of mine, whenever I encounter Python 2

Re: UTF-8 Encoding Error

2016-12-29 Thread subhabangalore
On Friday, December 30, 2016 at 3:35:56 AM UTC+5:30, subhaba...@gmail.com wrote: > On Monday, December 26, 2016 at 3:37:37 AM UTC+5:30, Gonzalo V wrote: > > Try utf-8-sig > > El 25 dic. 2016 2:57 AM, "Grady Martin" <> escribió: > > > > > On 2016年12月22日 22時38分, wrote: > > > > > >> I am getting the

Re: UTF-8 Encoding Error

2016-12-29 Thread subhabangalore
On Monday, December 26, 2016 at 3:37:37 AM UTC+5:30, Gonzalo V wrote: > Try utf-8-sig > El 25 dic. 2016 2:57 AM, "Grady Martin" <> escribió: > > > On 2016年12月22日 22時38分, wrote: > > > >> I am getting the error: > >> UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: > >> inval

Re: UTF-8 Encoding Error

2016-12-25 Thread Gonzalo V
Try utf-8-sig El 25 dic. 2016 2:57 AM, "Grady Martin" escribió: > On 2016年12月22日 22時38分, subhabangal...@gmail.com wrote: > >> I am getting the error: >> UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: >> invalid start byte >> > > The following is a reflex of mine, whenever

Re: UTF-8 Encoding Error

2016-12-24 Thread Grady Martin
On 2016年12月22日 22時38分, subhabangal...@gmail.com wrote: I am getting the error: UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: invalid start byte The following is a reflex of mine, whenever I encounter Python 2 Unicode errors: import sys reload(sys) sys.setdefaultencod

Re: UTF-8 Encoding Error

2016-12-22 Thread Cameron Simpson
On 22Dec2016 22:38, Subhabrata Banerjee wrote: I am getting the error: UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: invalid start byte as I try to read some files through TaggedCorpusReader. TaggedCorpusReader is a module of NLTK. My files are saved in ANSI format i

Re: UTF-8 question from Dive into Python 3

2011-01-20 Thread jmfauth
On Jan 19, 11:33 pm, Terry Reedy wrote: > On 1/19/2011 1:02 PM, Tim Harig wrote: > > > Right, but I only have to do that once.  After that, I can directly address > > any piece of the stream that I choose.  If I leave the information as a > > simple UTF-8 stream, I would have to walk the stream ag

Re: UTF-8 question from Dive into Python 3

2011-01-19 Thread Terry Reedy
On 1/19/2011 1:02 PM, Tim Harig wrote: Right, but I only have to do that once. After that, I can directly address any piece of the stream that I choose. If I leave the information as a simple UTF-8 stream, I would have to walk the stream again, I would have to walk through the the first byte o

Re: UTF-8 question from Dive into Python 3

2011-01-19 Thread Antoine Pitrou
On Wed, 19 Jan 2011 19:18:49 + (UTC) Tim Harig wrote: > On 2011-01-19, Antoine Pitrou wrote: > > On Wed, 19 Jan 2011 18:02:22 + (UTC) > > Tim Harig wrote: > >> Converting to a fixed byte > >> representation (UTF-32/UCS-4) or separating all of the bytes for each > >> UTF-8 into 6 byte con

Re: UTF-8 question from Dive into Python 3

2011-01-19 Thread Tim Harig
On 2011-01-19, Antoine Pitrou wrote: > On Wed, 19 Jan 2011 18:02:22 + (UTC) > Tim Harig wrote: >> Converting to a fixed byte >> representation (UTF-32/UCS-4) or separating all of the bytes for each >> UTF-8 into 6 byte containers both make it possible to simply index the >> letters by a const

Re: UTF-8 question from Dive into Python 3

2011-01-19 Thread Antoine Pitrou
On Wed, 19 Jan 2011 18:02:22 + (UTC) Tim Harig wrote: > On 2011-01-19, Antoine Pitrou wrote: > > On Wed, 19 Jan 2011 16:03:11 + (UTC) > > Tim Harig wrote: > >> > >> For many operations, it is just much faster and simpler to use a single > >> character based container opposed to having t

Re: UTF-8 question from Dive into Python 3

2011-01-19 Thread Tim Harig
On 2011-01-19, Antoine Pitrou wrote: > On Wed, 19 Jan 2011 16:03:11 + (UTC) > Tim Harig wrote: >> >> For many operations, it is just much faster and simpler to use a single >> character based container opposed to having to process an entire byte >> stream to determine individual letters from

Re: UTF-8 question from Dive into Python 3

2011-01-19 Thread Antoine Pitrou
On Wed, 19 Jan 2011 16:03:11 + (UTC) Tim Harig wrote: > > For many operations, it is just much faster and simpler to use a single > character based container opposed to having to process an entire byte > stream to determine individual letters from the bytes or to having > adaptive size contai

Re: UTF-8 question from Dive into Python 3

2011-01-19 Thread Tim Harig
On 2011-01-19, Antoine Pitrou wrote: > On Wed, 19 Jan 2011 14:00:13 + (UTC) > Tim Harig wrote: >> UTF-8 has no apparent endianess if you only store it as a byte stream. >> It does however have a byte order. If you store it using multibytes >> (six bytes for all UTF-8 possibilites) , which is

Re: UTF-8 question from Dive into Python 3

2011-01-19 Thread Tim Harig
On 2011-01-19, Adam Skutt wrote: > On Jan 19, 9:00 am, Tim Harig wrote: >> That is why I say that byte streams are essentially big endian. It is >> all a matter of how you look at it. > > It is nothing of the sort. Some byte streams are in fact, little > endian: when the bytes are combined into

Re: UTF-8 question from Dive into Python 3

2011-01-19 Thread Adam Skutt
On Jan 19, 9:00 am, Tim Harig wrote: > > So, you can always assume a big-endian and things will work out correctly > while you cannot always make the same assumption as little endian > without potential issues.  The same holds true for any byte stream data. You need to spend some serious time pro

Re: UTF-8 question from Dive into Python 3

2011-01-19 Thread Antoine Pitrou
On Wed, 19 Jan 2011 14:00:13 + (UTC) Tim Harig wrote: > > - Q: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)? If > - yes, then can I still assume the remaining UTF-8 bytes are in big-endian > ^^ > - or

Re: UTF-8 question from Dive into Python 3

2011-01-19 Thread Tim Harig
Considering you post contained no information or evidence for your negations, I shouldn't even bother responding. I will bite once. Hopefully next time your arguments will contain some pith. On 2011-01-19, Antoine Pitrou wrote: > On Wed, 19 Jan 2011 11:34:53 + (UTC) > Tim Harig wrote: >> Th

Re: UTF-8 question from Dive into Python 3

2011-01-19 Thread Antoine Pitrou
On Wed, 19 Jan 2011 11:34:53 + (UTC) Tim Harig wrote: > That is why the FAQ I linked to > says yes to the fact that you can consider UTF-8 to always be in big-endian > order. It certainly doesn't. Read better. > Essentially all byte based data is big-endian. This is pure nonsense. -- htt

Re: UTF-8 question from Dive into Python 3

2011-01-19 Thread Tim Harig
On 2011-01-19, Tim Roberts wrote: > Tim Harig wrote: >>On 2011-01-17, carlo wrote: >> >>> 2- If that were true, can you point me to some documentation about the >>> math that, as Mark says, demonstrates this? >> >>It is true because UTF-8 is essentially an 8 bit encoding that resorts >>to the ne

Re: UTF-8 question from Dive into Python 3

2011-01-18 Thread Tim Roberts
Tim Harig wrote: >On 2011-01-17, carlo wrote: > >> 2- If that were true, can you point me to some documentation about the >> math that, as Mark says, demonstrates this? > >It is true because UTF-8 is essentially an 8 bit encoding that resorts >to the next bit once it exhausts the addressible spac

Re: UTF-8 question from Dive into Python 3

2011-01-18 Thread Raymond Hettinger
On Jan 17, 2:19 pm, carlo wrote: > Hi, > recently I had to study *seriously* Unicode and encodings for one > project in Python but I left with a couple of doubts arised after > reading the unicode chapter of Dive into Python 3 book by Mark > Pilgrim. > > 1- Mark says: > "Also (and you’ll have to t

Re: UTF-8 question from Dive into Python 3

2011-01-17 Thread carlo
On 17 Gen, 23:34, Antoine Pitrou wrote: > On Mon, 17 Jan 2011 14:19:13 -0800 (PST) > > carlo wrote: > > Is it true UTF-8 does not have any "big-endian/little-endian" issue > > because of its encoding method? > > Yes. > > > And if it is true, why Mark (and > > everyone does) writes about UTF-8 wit

Re: UTF-8 question from Dive into Python 3

2011-01-17 Thread Antoine Pitrou
On Mon, 17 Jan 2011 14:19:13 -0800 (PST) carlo wrote: > Is it true UTF-8 does not have any "big-endian/little-endian" issue > because of its encoding method? Yes. > And if it is true, why Mark (and > everyone does) writes about UTF-8 with and without BOM some chapters > later? What would be the

Re: UTF-8 question from Dive into Python 3

2011-01-17 Thread Tim Harig
On 2011-01-17, carlo wrote: > Is it true UTF-8 does not have any "big-endian/little-endian" issue > because of its encoding method? And if it is true, why Mark (and > everyone does) writes about UTF-8 with and without BOM some chapters > later? What would be the BOM purpose then? Yes, it is true.

Re: UTF-8 question from Dive into Python 3

2011-01-17 Thread Alexander Kapps
On 17.01.2011 23:19, carlo wrote: Is it true UTF-8 does not have any "big-endian/little-endian" issue because of its encoding method? And if it is true, why Mark (and everyone does) writes about UTF-8 with and without BOM some chapters later? What would be the BOM purpose then? Can't answer yo

Re: Re: Re: UTF-8 problem encoding and decoding in Python3

2010-10-14 Thread hidura
Finally did it, thank you all for your help, the code i will upload because can be used by Python 3 for handle the wsgi issue of the Bytes! Almar, sorry for the mails gmails sometimes sucks!! On Oct 14, 2010 1:00pm, hid...@gmail.com wrote: Finally did it, thank you all for your help, the code i

Re: UTF-8 problem encoding and decoding in Python3

2010-10-12 Thread Almar Klein
>So if you can, you could make sure to send the file as just bytes, >>or if it must be a string, base64 encoded. If this is not possible >>you can try the code below to obtain the bytes, not a very fast >>solution, but it should work (Python 3): >> >> >>MAP = {} >>for i in r

Re: UTF-8 problem encoding and decoding in Python3

2010-10-12 Thread MRAB
On 12/10/2010 15:45, Hidura wrote: Don't work this is the error what give me TypeError: sequence item 0: expected bytes, str found, i continue trying to figure out how resolve it if you have another idea please tellme, but thanks anyway!!! On Mon, Oct 11, 2010 at 4:27 AM, Almar Klein mailto:alma

Re: UTF-8 problem encoding and decoding in Python3

2010-10-12 Thread Hidura
Don't work this is the error what give me TypeError: sequence item 0: expected bytes, str found, i continue trying to figure out how resolve it if you have another idea please tellme, but thanks anyway!!! On Mon, Oct 11, 2010 at 4:27 AM, Almar Klein wrote: > > On 10 October 2010 23:01, Hidura w

Re: UTF-8 problem encoding and decoding in Python3

2010-10-11 Thread Almar Klein
On 10 October 2010 23:01, Hidura wrote: > I try to encode a binary file what was upload to a server and is > extract from the wsgi.input of the environ and comes as an unicode > string. > Firstly, UTF-8 is not meant to encode arbitrary binary data. But I guess you could have a Unicode string in

Re: UTF-8 problem encoding and decoding in Python3

2010-10-10 Thread Hidura
I try to encode a binary file what was upload to a server and is extract from the wsgi.input of the environ and comes as an unicode string. 2010/10/10, Almar Klein : > Hi, > > please tell us what you are trying to do. Encoding (with UTF-8) is a method > to convert a Unicode string to a sequence of

Re: UTF-8 problem encoding and decoding in Python3

2010-10-10 Thread Almar Klein
Hi, please tell us what you are trying to do. Encoding (with UTF-8) is a method to convert a Unicode string to a sequence of bytes. Decoding does the reverse. When i open > directly and try to decode the file the error is this: `UnicodeDecodeError: > 'utf8' codec can't decode byte 0xff in positi

Re: UTF-8 problem encoding and decoding in Python3

2010-10-10 Thread Chris Rebert
On Sun, Oct 10, 2010 at 10:25 AM, wrote: > Hello everybody i am trying to encode a file string of an upload file and i > am facing some problems with the first part of the file. When i open > directly and try to decode the file the error is this: `UnicodeDecodeError: > 'utf8' codec can't decode b

Re: utf-8 and ctypes

2010-09-30 Thread Diez B. Roggisch
Brendan Miller writes: > 2010/9/29 Lawrence D'Oliveiro : >> In message , Brendan >> Miller wrote: >> >>> It seems that characters not in the ascii subset of UTF-8 are >>> discarded by c_char_p during the conversion ... >> >> Not a chance. >> >>> ... or at least they don't print out when I go to p

Re: utf-8 and ctypes

2010-09-29 Thread Mark Tolonen
"Brendan Miller" wrote in message news:aanlkti=2f3l++398st-16mpes8wzfblbu+qa8ztpa...@mail.gmail.com... 2010/9/29 Lawrence D'Oliveiro : In message , Brendan Miller wrote: It seems that characters not in the ascii subset of UTF-8 are discarded by c_char_p during the conversion ... Not a ch

Re: utf-8 and ctypes

2010-09-29 Thread MRAB
On 29/09/2010 19:33, Brendan Miller wrote: > 2010/9/29 Lawrence D'Oliveiro: >> In message, Brendan >> Miller wrote: >> >>> It seems that characters not in the ascii subset of UTF-8 are >>> discarded by c_char_p during the conversion ... >> >> Not a chance. >> >>> ... or at least they don't print ou

Re: utf-8 and ctypes

2010-09-29 Thread Brendan Miller
2010/9/29 Lawrence D'Oliveiro : > In message , Brendan > Miller wrote: > >> It seems that characters not in the ascii subset of UTF-8 are >> discarded by c_char_p during the conversion ... > > Not a chance. > >> ... or at least they don't print out when I go to print the string. > > So it seems the

Re: utf-8 and ctypes

2010-09-29 Thread Lawrence D'Oliveiro
In message , Brendan Miller wrote: > It seems that characters not in the ascii subset of UTF-8 are > discarded by c_char_p during the conversion ... Not a chance. > ... or at least they don't print out when I go to print the string. So it seems there’s a problem on the printing side. What happ

Re: utf-8 and ctypes

2010-09-28 Thread MRAB
On 28/09/2010 23:54, Brendan Miller wrote: I'm using python 2.5. Currently I have some python bindings written in ctypes. On the C side, my strings are in utf-8. On the python side I use ctypes.c_char_p to convert my strings to python strings. However, this seems to break for non-ascii character

Re: utf-8 coding sometimes it works, most of the time it don't work.

2010-09-22 Thread Stef Mientki
hello Uli, thanks, I think you hit the nail on it's head, PyScripter indeed changes default encoding but .. On Wed, Sep 22, 2010 at 9:16 AM, Ulrich Eckhardt wrote: > Stef Mientki wrote: > > When running this python application from the command line ( or launched > > from another Python program),

Re: utf-8 coding sometimes it works, most of the time it don't work.

2010-09-22 Thread Ulrich Eckhardt
Stef Mientki wrote: > When running this python application from the command line ( or launched > from another Python program), the wrong character encoding (probably > windows-1252) is used. Rule #1: If you know the correct encoding, set it yourself. This particularly applies to files you open you

Re: utf-8 read/write file

2008-10-09 Thread gigs
Kent Johnson wrote: On Oct 8, 5:55 pm, gigs <[EMAIL PROTECTED]> wrote: Benjamin wrote: On Oct 8, 12:49 pm, Bruno <[EMAIL PROTECTED]> wrote: Hi! I have big .txt file which i want to read, process and write to another .txt file. I have done script for that, but im having problem with croatian c

Re: utf-8 read/write file

2008-10-08 Thread Kent Johnson
On Oct 8, 5:55 pm, gigs <[EMAIL PROTECTED]> wrote: > Benjamin wrote: > > On Oct 8, 12:49 pm, Bruno <[EMAIL PROTECTED]> wrote: > >> Hi! > > >> I have big .txt file which i want to read, process and write to another > >> .txt file. > >> I have done script for that, but im having problem with croatia

Re: utf-8 read/write file

2008-10-08 Thread Aleksandar Radulovic
Hi, What is the encoding of the file1 you're reading from? I just ran tests on my machine (OS X) with both python2.5 and 2.6 and was able to read from a file containing: "život je lep" The file is UTF-8 encoded. >>> data = open("test.txt").read() >>> data '\xc5\xbeivot je lep.' >>> f = open("tes

Re: utf-8 read/write file

2008-10-08 Thread gigs
Benjamin wrote: On Oct 8, 12:49 pm, Bruno <[EMAIL PROTECTED]> wrote: Hi! I have big .txt file which i want to read, process and write to another .txt file. I have done script for that, but im having problem with croatian characters (Š,Đ,Ž,Č,Ć). Can you show us what you have so far? How can

Re: utf-8 read/write file

2008-10-08 Thread Benjamin
On Oct 8, 12:49 pm, Bruno <[EMAIL PROTECTED]> wrote: > Hi! > > I have big .txt file which i want to read, process and write to another .txt > file. > I have done script for that, but im having problem with croatian characters > (Š,Đ,Ž,Č,Ć). Can you show us what you have so far? > How can I read/

Re: UTF-8 and stdin/stdout?

2008-05-28 Thread Martin v. Löwis
> $ cat utf8_from_stdin.py > import sys > data = sys.stdin.read() > print "length of data =", len(data) sys.stdin is a byte stream in Python 2, not a character stream. To make it a character stream, do sys.stdin = codecs.getreader("utf-8")(sys.stdin) HTH, Martin -- http:/

Re: UTF-8 and stdin/stdout?

2008-05-28 Thread Ulrich Eckhardt
Chris wrote: > On May 28, 11:08 am, [EMAIL PROTECTED] wrote: >> Say I have a file, utf8_input, that contains a single character, é, >> coded as UTF-8: >> >> $ hexdump -C utf8_input >>  c3 a9 >> 0002 [...] > weird thing is 'c3 a9' is é on my side... and copy/pasting the é > gives me 'e

Re: UTF-8 and stdin/stdout?

2008-05-28 Thread dave_140390
> Shouldn't you do data = data.decode('utf8') ? Yes, that's it! Thanks. -- dave -- http://mail.python.org/mailman/listinfo/python-list

Re: UTF-8 and stdin/stdout?

2008-05-28 Thread Chris
On May 28, 11:08 am, [EMAIL PROTECTED] wrote: > Hi, > > I have problems getting my Python code to work with UTF-8 encoding > when reading from stdin / writing to stdout. > > Say I have a file, utf8_input, that contains a single character, é, > coded as UTF-8: > >         $ hexdump -C utf8_input >  

Re: UTF-8 and stdin/stdout?

2008-05-28 Thread Arnaud Delobelle
[EMAIL PROTECTED] writes: > Hi, > > I have problems getting my Python code to work with UTF-8 encoding > when reading from stdin / writing to stdout. > > Say I have a file, utf8_input, that contains a single character, é, > coded as UTF-8: > > $ hexdump -C utf8_input > c3 a9

Re: UTF-8 in basic CGI mode

2008-01-17 Thread coldpizza
Thanks, Sion, that makes sense! Would it be correct to assume that the encoding of strings retrieved by FieldStorage() would be the same as the encoding of the submitted web form (in my case utf-8)? Funny but I have the same form implemented in PSP (Python Server Pages), running under Apache with

Re: UTF-8 in basic CGI mode

2008-01-16 Thread Sion Arrowsmith
coldpizza <[EMAIL PROTECTED]> wrote: >I am using this 'word' variable like this: > >print u'' % (word) > >and apparently this causes exceptions with non-ASCII strings. > >I've also tried this: >print u'' % >(word.encode('utf8')) >but I still get the same UnicodeDecodeError.. Your 'word' i

Re: UTF-8 characters in doctest

2007-09-22 Thread John J. Lee
Peter Otten <[EMAIL PROTECTED]> writes: [...] >> Forgive me if this is a stupid question, but: What purpose does >> function f serve? > > Like the OP's get_inventary_number() it takes a unicode string and > returns a tuple of unicode strings. I'ts pointless otherwise. I hoped I > had stripped down

Re: UTF-8 characters in doctest

2007-09-20 Thread Peter Otten
John J. Lee wrote: > Peter Otten <[EMAIL PROTECTED]> writes: > [...] >> # -*- coding: utf8 -*- >> r""" > f("äöü".decode("utf8")) >> (u'\xe4\xf6\xfc',) >> """ >> def f(s): >> return (s,) > > Forgive me if this is a stupid question, but: What purpose does > function f serve? Like the OP's

Re: UTF-8 characters in doctest

2007-09-20 Thread J. Cliff Dyer
J. Cliff Dyer wrote: > John J. Lee wrote: > >> Peter Otten <[EMAIL PROTECTED]> writes: >> [...] >> >> >>> def f(s): >>> return (s,) >>> >>> >> Forgive me if this is a stupid question, but: What purpose does >> function f serve? >> >> >> John >> >> > > Well, it has

Re: UTF-8 characters in doctest

2007-09-20 Thread J. Cliff Dyer
John J. Lee wrote: > Peter Otten <[EMAIL PROTECTED]> writes: > [...] > >> def f(s): >> return (s,) >> > > Forgive me if this is a stupid question, but: What purpose does > function f serve? > > > John > Well, it has nothing to do with the unicode bit that came before it. It just ta

Re: UTF-8 characters in doctest

2007-09-20 Thread John J. Lee
Peter Otten <[EMAIL PROTECTED]> writes: [...] > # -*- coding: utf8 -*- > r""" f("äöü".decode("utf8")) > (u'\xe4\xf6\xfc',) > """ > def f(s): > return (s,) Forgive me if this is a stupid question, but: What purpose does function f serve? John -- http://mail.python.org/mailman/listinfo/p

Re: UTF-8 characters in doctest

2007-09-19 Thread Peter Otten
Bzyczek wrote: > So my question is: Is it possible to run doctests with UTF-8 > characters? And if your answer will be YES, tell me please how... Use raw strings in combination with explicit decoding and a little try-and-error. E. g. this little gem passes ;) # -*- coding: utf8 -*- r""" >>> f("ä

Re: UTF-8 Support of Curses in Python 2.5

2007-07-21 Thread Andrey
Yes, it does solve the problem. Compile python with ncursesw library. Btw Ubuntu 7 has it "out of the box". > Hi All, > > Recently I ran into a problem with UTF-8 surrport when using curses > library in python 2.5 in Fedora 7. I found out that the program using > curses cannot print out unicode

Re: UTF-8

2007-03-12 Thread Eric Brunel
On Sat, 10 Mar 2007 15:00:04 +0100, Olivier Verdier <[EMAIL PROTECTED]> wrote: [snip] > The default encoding i wish to set is UTF-8 since it encodes unicode and > is nowadays the standard encoding. I can't agree with that: there are still many tools completely ignoring the encoding problem,

Re: UTF-8

2007-03-10 Thread John Machin
On Mar 11, 1:00 am, Olivier Verdier <[EMAIL PROTECTED]> wrote: > First off: i thoroughly enjoy python. I use it for scientific > computing with scipy, numpy and matplotlib and it's an amazingly > efficient and elegant language. > > About this mailing list: it is very hard to search. I can't find an

Re: UTF-8

2007-03-10 Thread Leif K-Brooks
Laurent Pointal wrote: > You should prefer to put > # -*- coding: utf-8 -*- > at the begining of your sources files. With that you are ok with all Python > installations, whatever be the defautl encoding. > Hope this will become mandatory in a future Python version. The default encoding

Re: UTF-8 output problems

2007-03-10 Thread Laurent Pointal
Michael B. Trausch wrote: > I am having a slight problem with UTF-8 output with Python. I have the > following program: > > x = 0 > > while x < 0x4000: > print u"This is Unicode code point %d (0x%x): %s" % (x, x, > unichr(x)) > x += 1 > > This program works perfectly when run directly:

Re: UTF-8

2007-03-10 Thread Laurent Pointal
Olivier Verdier wrote: > My question is the following: how to set a default encoding in > python? I read an old thread about that and it didn't seem possible > by then. You *can* put a sys.setdefaultencoding("utf-8") in your sitecustomize.py (see Python libs/site-packages/). Note that this functi

Re: UTF-8 output problems

2007-03-10 Thread Marc 'BlackJack' Rintsch
In <[EMAIL PROTECTED]>, Michael B. Trausch wrote: > However, when I attempt to redirect the output to a file: > > [EMAIL PROTECTED]:~/tmp$ python test.py >f > Traceback (most recent call last): > File "test.py", line 6, in > print u"This is Unicode code point %d (0x%x): %s" % (x, x, > unic

Re: UTF-8 to unicode or latin-1 (and yes, I read the FAQ)

2006-10-19 Thread Neil Cerutti
On 2006-10-19, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > In <[EMAIL PROTECTED]>, Neil Cerutti wrote: >>> Note that 'K\xc3\xb6ni'.decode('utf-8') returns a Unicode >>> object. With print this is implicitly converted to string. The >>> char set used depends on your console >> >> No, the

Re: UTF-8 to unicode or latin-1 (and yes, I read the FAQ)

2006-10-19 Thread Neil Cerutti
On 2006-10-19, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > In <[EMAIL PROTECTED]>, Neil Cerutti wrote: > >>> Note that 'K\xc3\xb6ni'.decode('utf-8') returns a Unicode >>> object. With print this is implicitly converted to string. The >>> char set used depends on your console >> >> No, th

Re: UTF-8 to unicode or latin-1 (and yes, I read the FAQ)

2006-10-19 Thread Marc 'BlackJack' Rintsch
In <[EMAIL PROTECTED]>, Neil Cerutti wrote: >> Note that 'K\xc3\xb6ni'.decode('utf-8') returns a Unicode >> object. With print this is implicitly converted to string. The >> char set used depends on your console > > No, the setting of the console encoding (sys.stdout.encoding) is > ignored. Nope

Re: UTF-8 to unicode or latin-1 (and yes, I read the FAQ)

2006-10-19 Thread Neil Cerutti
On 2006-10-19, Michael Ströder <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: >> >> print 'K\xc3\xb6ni'.decode('utf-8') >> >> and this line raised a UnicodeDecode exception. > > Works for me. > > Note that 'K\xc3\xb6ni'.decode('utf-8') returns a Unicode > object. With print this is implici

Re: UTF-8 to unicode or latin-1 (and yes, I read the FAQ)

2006-10-19 Thread NoelByron
Michael Ströder wrote: > [EMAIL PROTECTED] wrote: > > > > print 'K\xc3\xb6ni'.decode('utf-8') > > > > and this line raised a UnicodeDecode exception. > > Works for me. > > Note that 'K\xc3\xb6ni'.decode('utf-8') returns a Unicode object. With > print this is implicitly converted to string. The char

Re: UTF-8 to unicode or latin-1 (and yes, I read the FAQ)

2006-10-19 Thread Michael Ströder
[EMAIL PROTECTED] wrote: > > print 'K\xc3\xb6ni'.decode('utf-8') > > and this line raised a UnicodeDecode exception. Works for me. Note that 'K\xc3\xb6ni'.decode('utf-8') returns a Unicode object. With print this is implicitly converted to string. The char set used depends on your console Chec

Re: UTF-8 to unicode or latin-1 (and yes, I read the FAQ)

2006-10-19 Thread NoelByron
Duncan Booth wrote: > [EMAIL PROTECTED] wrote: > > > 'K\xc3\xb6ni'.decode('utf-8') # 'K\xc3\xb6ni' should be 'König', > > contains a german 'umlaut' > > > > but failed since python assumes every string to decode to be ASCII? > > No, Python would assume the string to be utf-8 encoded in this cas

Re: UTF-8 to unicode or latin-1 (and yes, I read the FAQ)

2006-10-19 Thread NoelByron
> > > > 'K\xc3\xb6ni'.decode('utf-8') # 'K\xc3\xb6ni' should be 'König', > > "Köni", to be precise. Äh, yes. ;o) > > contains a german 'umlaut' > > > > but failed since python assumes every string to decode to be ASCII? > > that should work, and it sure works for me: > > >>> s = 'K\xc3\xb6ni

Re: UTF-8 to unicode or latin-1 (and yes, I read the FAQ)

2006-10-19 Thread Duncan Booth
[EMAIL PROTECTED] wrote: > 'K\xc3\xb6ni'.decode('utf-8') # 'K\xc3\xb6ni' should be 'König', > contains a german 'umlaut' > > but failed since python assumes every string to decode to be ASCII? No, Python would assume the string to be utf-8 encoded in this case: >>> 'K\xc3\xb6ni'.decode('utf

Re: UTF-8 to unicode or latin-1 (and yes, I read the FAQ)

2006-10-19 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote: > I'm struggling with the conversion of a UTF-8 string to latin-1. As far > as I know the way to go is to decode the UTF-8 string to unicode and > then encode it back again to latin-1? > > So I tried: > > 'K\xc3\xb6ni'.decode('utf-8') # 'K\xc3\xb6ni' should be 'Kön

Re: RE + UTF-8

2005-09-24 Thread Michael Ströder
[EMAIL PROTECTED] wrote: > > I have tried to test RE and UTF-8 in Python generally and the results > are even more confusing (done with locale cs_CZ.UTF-8 in konsole): > >>>locale.getpreferredencoding() > > 'UTF-8' > print re.sub("(\w*)","X","[Chelcický]",re.L) You first have to turn the r

RE + UTF-8

2005-09-24 Thread [EMAIL PROTECTED]
Working on extension of genericwiki.py plugin for PyBlosxom and I have problems with UTF-8 and RE. When I have this wiki line, it does break URL too early: [http://en.wikipedia.org/wiki/Petr_Chelcický Petr Chelcický's] work(s) into English. and creates [http://en.wikipedia.org/wiki/Petr_Chel";>h

Re: UTF-8 / German, Scandinavian letters - is it really this difficult?? Linux & Windows XP

2005-02-23 Thread Paul Boddie
"Serge Orlov" <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>... > Paul Boddie wrote: > > Anyone who has needed to expose filesystems > > created by Linux distributions before the UTF-8 "big push" to later > > distributions can attest to the fact that the "see no evil" brass > > monke

Re: UTF-8 / German, Scandinavian letters - is it really this difficult?? Linux & Windows XP

2005-02-22 Thread "Martin v. Löwis"
Fuzzyman wrote: ust = 'æøå'.decode('utf-8') Which is now deprecated isn't it ? (including encoded string literals in source without declaring an encoiding). Not having an encoding declaration while having non-ASCII characters in source code is deprecated. Having non-ASCII characters in string liter

Re: UTF-8 / German, Scandinavian letters - is it really this difficult?? Linux & Windows XP

2005-02-22 Thread Serge Orlov
Paul Boddie wrote: > One side-effect of the "big push" to UTF-8 amongst the Linux > distribution vendors/maintainers is the evasion of issues such as > filesystem encodings and "real" Unicode at the system level. In > Python, when you have a Unicode object, you are dealing with > idealised > sequen

Re: UTF-8 / German, Scandinavian letters - is it really this difficult?? Linux & Windows XP

2005-02-22 Thread Paul Boddie
Mike Dee <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>... > A very very basic UTF-8 question that's driving me nuts: > > If I have this in the beginning of my Python script in Linux: > > #!/usr/bin/env python > # -*- coding: UTF-8 -*- > > should I - or should I not - be able to u

Re: UTF-8 / German, Scandinavian letters - is it really this difficult?? Linux & Windows XP

2005-02-22 Thread Fuzzyman
Max M wrote: > Fuzzyman wrote: > > Mike Dee wrote: > > >>#!/usr/bin/env python > >># -*- coding: UTF-8 -*- > > > This will mean string literals in your source code will be encoded as > > UTF8 - if you handle them with normal string operations you might get > > funny results. > > It means that you

  1   2   >