On Wed, 26 Oct 2022 at 05:09, Barry Scott wrote:
>
>
>
> > On 25 Oct 2022, at 11:16, Stefan Ram wrote:
> >
> > r...@zedat.fu-berlin.de (Stefan Ram) writes:
> >> You can let Python guess the encoding of a file.
> >> def encoding_of( name ):
> >> path = pathlib.Path( name )
> >> for encoding in( "u
> On 25 Oct 2022, at 11:16, Stefan Ram wrote:
>
> r...@zedat.fu-berlin.de (Stefan Ram) writes:
>> You can let Python guess the encoding of a file.
>> def encoding_of( name ):
>> path = pathlib.Path( name )
>> for encoding in( "utf_8", "cp1252", "latin_1" ):
>> try:
>> with path.open( encoding=e
On Thu, 18 Aug 2022 11:33:59 -0700, Tobiah declaimed the
following:
>
>So how does this break down? When a person enters
>Montréal, Quebéc into a form field, what are they
>doing on the keyboard to make that happen? As the
>string sits there in the text box, is it latin1, or utf-8
>or something
On Fri, 19 Aug 2022 at 08:15, Tobiah wrote:
>
> > You configure the web server to send:
> >
> > Content-Type: text/html; charset=...
> >
> > in the HTTP header when it serves HTML files.
>
> So how does this break down? When a person enters
> Montréal, Quebéc into a form field, what are they
On 2022-08-18, Tobiah wrote:
>> You configure the web server to send:
>>
>> Content-Type: text/html; charset=...
>>
>> in the HTTP header when it serves HTML files.
>
> So how does this break down? When a person enters
> Montréal, Quebéc into a form field, what are they
> doing on the keyb
You configure the web server to send:
Content-Type: text/html; charset=...
in the HTTP header when it serves HTML files.
So how does this break down? When a person enters
Montréal, Quebéc into a form field, what are they
doing on the keyboard to make that happen? As the
string sits ther
On 2022-08-18, Tobiah wrote:
>> Generally speaking browser submisisons were/are supposed to be sent
>> using the same encoding as the page, so if you're sending the page
>> as "latin1" then you'll see that a fair amount I should think. If you
>> send it as "utf-8" then you'll get 100% utf-8 back.
On 2022-08-17, Barry wrote:
>> On 17 Aug 2022, at 18:30, Jon Ribbens via Python-list
>> wrote:
>> On 2022-08-17, Tobiah wrote:
>>> I get data from various sources; client emails, spreadsheets, and
>>> data from web applications. I find that I can do
>>> some_string.decode('latin1')
>>> to get
Generally speaking browser submisisons were/are supposed to be sent
using the same encoding as the page, so if you're sending the page
as "latin1" then you'll see that a fair amount I should think. If you
send it as "utf-8" then you'll get 100% utf-8 back.
The only trick I know is to use . Woul
On 2022-08-17, Tobiah wrote:
>> That has already been decided, as much as it ever can be. UTF-8 is
>> essentially always the correct encoding to use on output, and almost
>> always the correct encoding to assume on input absent any explicit
>> indication of another encoding. (e.g. the HTML "standa
On 18/08/2022 03.33, Stefan Ram wrote:
> Tobiah writes:
>> I get data from various sources; client emails, spreadsheets, and
>> data from web applications. I find that I can do
>> some_string.decode('latin1')
>
> Strings have no "decode" method. ("bytes" objects do.)
>
>> to get unicode that
> On 17 Aug 2022, at 18:30, Jon Ribbens via Python-list
> wrote:
>
> On 2022-08-17, Tobiah wrote:
>> I get data from various sources; client emails, spreadsheets, and
>> data from web applications. I find that I can do
>> some_string.decode('latin1')
>> to get unicode that I can use with x
That has already been decided, as much as it ever can be. UTF-8 is
essentially always the correct encoding to use on output, and almost
always the correct encoding to assume on input absent any explicit
indication of another encoding. (e.g. the HTML "standard" says that
all HTML files must be UTF-
On 8/17/22 08:33, Stefan Ram wrote:
Tobiah writes:
I get data from various sources; client emails, spreadsheets, and
data from web applications. I find that I can do some_string.decode('latin1')
Strings have no "decode" method. ("bytes" objects do.)
I'm using 2.7. Maybe that's why.
On 2022-08-17, Tobiah wrote:
> I get data from various sources; client emails, spreadsheets, and
> data from web applications. I find that I can do some_string.decode('latin1')
> to get unicode that I can use with xlsxwriter,
> or put in the header of a web page to display
> European characters
On Friday, December 30, 2016 at 7:16:25 AM UTC+5:30, Steve D'Aprano wrote:
> On Sun, 25 Dec 2016 04:50 pm, Grady Martin wrote:
>
> > On 2016年12月22日 22時38分, wrote:
> >>I am getting the error:
> >>UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15:
> >>invalid start byte
> >
> >
On Sun, 25 Dec 2016 04:50 pm, Grady Martin wrote:
> On 2016年12月22日 22時38分, subhabangal...@gmail.com wrote:
>>I am getting the error:
>>UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15:
>>invalid start byte
>
> The following is a reflex of mine, whenever I encounter Python 2
On Friday, December 30, 2016 at 3:35:56 AM UTC+5:30, subhaba...@gmail.com wrote:
> On Monday, December 26, 2016 at 3:37:37 AM UTC+5:30, Gonzalo V wrote:
> > Try utf-8-sig
> > El 25 dic. 2016 2:57 AM, "Grady Martin" <> escribió:
> >
> > > On 2016年12月22日 22時38分, wrote:
> > >
> > >> I am getting the
On Monday, December 26, 2016 at 3:37:37 AM UTC+5:30, Gonzalo V wrote:
> Try utf-8-sig
> El 25 dic. 2016 2:57 AM, "Grady Martin" <> escribió:
>
> > On 2016年12月22日 22時38分, wrote:
> >
> >> I am getting the error:
> >> UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15:
> >> inval
Try utf-8-sig
El 25 dic. 2016 2:57 AM, "Grady Martin" escribió:
> On 2016年12月22日 22時38分, subhabangal...@gmail.com wrote:
>
>> I am getting the error:
>> UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15:
>> invalid start byte
>>
>
> The following is a reflex of mine, whenever
On 2016年12月22日 22時38分, subhabangal...@gmail.com wrote:
I am getting the error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: invalid
start byte
The following is a reflex of mine, whenever I encounter Python 2 Unicode errors:
import sys
reload(sys)
sys.setdefaultencod
On 22Dec2016 22:38, Subhabrata Banerjee wrote:
I am getting the error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: invalid
start byte
as I try to read some files through TaggedCorpusReader. TaggedCorpusReader is a
module
of NLTK.
My files are saved in ANSI format i
On Jan 19, 11:33 pm, Terry Reedy wrote:
> On 1/19/2011 1:02 PM, Tim Harig wrote:
>
> > Right, but I only have to do that once. After that, I can directly address
> > any piece of the stream that I choose. If I leave the information as a
> > simple UTF-8 stream, I would have to walk the stream ag
On 1/19/2011 1:02 PM, Tim Harig wrote:
Right, but I only have to do that once. After that, I can directly address
any piece of the stream that I choose. If I leave the information as a
simple UTF-8 stream, I would have to walk the stream again, I would have to
walk through the the first byte o
On Wed, 19 Jan 2011 19:18:49 + (UTC)
Tim Harig wrote:
> On 2011-01-19, Antoine Pitrou wrote:
> > On Wed, 19 Jan 2011 18:02:22 + (UTC)
> > Tim Harig wrote:
> >> Converting to a fixed byte
> >> representation (UTF-32/UCS-4) or separating all of the bytes for each
> >> UTF-8 into 6 byte con
On 2011-01-19, Antoine Pitrou wrote:
> On Wed, 19 Jan 2011 18:02:22 + (UTC)
> Tim Harig wrote:
>> Converting to a fixed byte
>> representation (UTF-32/UCS-4) or separating all of the bytes for each
>> UTF-8 into 6 byte containers both make it possible to simply index the
>> letters by a const
On Wed, 19 Jan 2011 18:02:22 + (UTC)
Tim Harig wrote:
> On 2011-01-19, Antoine Pitrou wrote:
> > On Wed, 19 Jan 2011 16:03:11 + (UTC)
> > Tim Harig wrote:
> >>
> >> For many operations, it is just much faster and simpler to use a single
> >> character based container opposed to having t
On 2011-01-19, Antoine Pitrou wrote:
> On Wed, 19 Jan 2011 16:03:11 + (UTC)
> Tim Harig wrote:
>>
>> For many operations, it is just much faster and simpler to use a single
>> character based container opposed to having to process an entire byte
>> stream to determine individual letters from
On Wed, 19 Jan 2011 16:03:11 + (UTC)
Tim Harig wrote:
>
> For many operations, it is just much faster and simpler to use a single
> character based container opposed to having to process an entire byte
> stream to determine individual letters from the bytes or to having
> adaptive size contai
On 2011-01-19, Antoine Pitrou wrote:
> On Wed, 19 Jan 2011 14:00:13 + (UTC)
> Tim Harig wrote:
>> UTF-8 has no apparent endianess if you only store it as a byte stream.
>> It does however have a byte order. If you store it using multibytes
>> (six bytes for all UTF-8 possibilites) , which is
On 2011-01-19, Adam Skutt wrote:
> On Jan 19, 9:00 am, Tim Harig wrote:
>> That is why I say that byte streams are essentially big endian. It is
>> all a matter of how you look at it.
>
> It is nothing of the sort. Some byte streams are in fact, little
> endian: when the bytes are combined into
On Jan 19, 9:00 am, Tim Harig wrote:
>
> So, you can always assume a big-endian and things will work out correctly
> while you cannot always make the same assumption as little endian
> without potential issues. The same holds true for any byte stream data.
You need to spend some serious time pro
On Wed, 19 Jan 2011 14:00:13 + (UTC)
Tim Harig wrote:
>
> - Q: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)? If
> - yes, then can I still assume the remaining UTF-8 bytes are in big-endian
> ^^
> - or
Considering you post contained no information or evidence for your
negations, I shouldn't even bother responding. I will bite once.
Hopefully next time your arguments will contain some pith.
On 2011-01-19, Antoine Pitrou wrote:
> On Wed, 19 Jan 2011 11:34:53 + (UTC)
> Tim Harig wrote:
>> Th
On Wed, 19 Jan 2011 11:34:53 + (UTC)
Tim Harig wrote:
> That is why the FAQ I linked to
> says yes to the fact that you can consider UTF-8 to always be in big-endian
> order.
It certainly doesn't. Read better.
> Essentially all byte based data is big-endian.
This is pure nonsense.
--
htt
On 2011-01-19, Tim Roberts wrote:
> Tim Harig wrote:
>>On 2011-01-17, carlo wrote:
>>
>>> 2- If that were true, can you point me to some documentation about the
>>> math that, as Mark says, demonstrates this?
>>
>>It is true because UTF-8 is essentially an 8 bit encoding that resorts
>>to the ne
Tim Harig wrote:
>On 2011-01-17, carlo wrote:
>
>> 2- If that were true, can you point me to some documentation about the
>> math that, as Mark says, demonstrates this?
>
>It is true because UTF-8 is essentially an 8 bit encoding that resorts
>to the next bit once it exhausts the addressible spac
On Jan 17, 2:19 pm, carlo wrote:
> Hi,
> recently I had to study *seriously* Unicode and encodings for one
> project in Python but I left with a couple of doubts arised after
> reading the unicode chapter of Dive into Python 3 book by Mark
> Pilgrim.
>
> 1- Mark says:
> "Also (and you’ll have to t
On 17 Gen, 23:34, Antoine Pitrou wrote:
> On Mon, 17 Jan 2011 14:19:13 -0800 (PST)
>
> carlo wrote:
> > Is it true UTF-8 does not have any "big-endian/little-endian" issue
> > because of its encoding method?
>
> Yes.
>
> > And if it is true, why Mark (and
> > everyone does) writes about UTF-8 wit
On Mon, 17 Jan 2011 14:19:13 -0800 (PST)
carlo wrote:
> Is it true UTF-8 does not have any "big-endian/little-endian" issue
> because of its encoding method?
Yes.
> And if it is true, why Mark (and
> everyone does) writes about UTF-8 with and without BOM some chapters
> later? What would be the
On 2011-01-17, carlo wrote:
> Is it true UTF-8 does not have any "big-endian/little-endian" issue
> because of its encoding method? And if it is true, why Mark (and
> everyone does) writes about UTF-8 with and without BOM some chapters
> later? What would be the BOM purpose then?
Yes, it is true.
On 17.01.2011 23:19, carlo wrote:
Is it true UTF-8 does not have any "big-endian/little-endian" issue
because of its encoding method? And if it is true, why Mark (and
everyone does) writes about UTF-8 with and without BOM some chapters
later? What would be the BOM purpose then?
Can't answer yo
Finally did it, thank you all for your help, the code i will upload because
can be used by Python 3 for handle the wsgi issue of the Bytes!
Almar, sorry for the mails gmails sometimes sucks!!
On Oct 14, 2010 1:00pm, hid...@gmail.com wrote:
Finally did it, thank you all for your help, the code i
>So if you can, you could make sure to send the file as just bytes,
>>or if it must be a string, base64 encoded. If this is not possible
>>you can try the code below to obtain the bytes, not a very fast
>>solution, but it should work (Python 3):
>>
>>
>>MAP = {}
>>for i in r
On 12/10/2010 15:45, Hidura wrote:
Don't work this is the error what give me TypeError: sequence item 0:
expected bytes, str found, i continue trying to figure out how resolve
it if you have another idea please tellme, but thanks anyway!!!
On Mon, Oct 11, 2010 at 4:27 AM, Almar Klein mailto:alma
Don't work this is the error what give me TypeError: sequence item 0:
expected bytes, str found, i continue trying to figure out how resolve it if
you have another idea please tellme, but thanks anyway!!!
On Mon, Oct 11, 2010 at 4:27 AM, Almar Klein wrote:
>
> On 10 October 2010 23:01, Hidura w
On 10 October 2010 23:01, Hidura wrote:
> I try to encode a binary file what was upload to a server and is
> extract from the wsgi.input of the environ and comes as an unicode
> string.
>
Firstly, UTF-8 is not meant to encode arbitrary binary data. But I guess you
could have a Unicode string in
I try to encode a binary file what was upload to a server and is
extract from the wsgi.input of the environ and comes as an unicode
string.
2010/10/10, Almar Klein :
> Hi,
>
> please tell us what you are trying to do. Encoding (with UTF-8) is a method
> to convert a Unicode string to a sequence of
Hi,
please tell us what you are trying to do. Encoding (with UTF-8) is a method
to convert a Unicode string to a sequence of bytes. Decoding does the
reverse.
When i open
> directly and try to decode the file the error is this: `UnicodeDecodeError:
> 'utf8' codec can't decode byte 0xff in positi
On Sun, Oct 10, 2010 at 10:25 AM, wrote:
> Hello everybody i am trying to encode a file string of an upload file and i
> am facing some problems with the first part of the file. When i open
> directly and try to decode the file the error is this: `UnicodeDecodeError:
> 'utf8' codec can't decode b
Brendan Miller writes:
> 2010/9/29 Lawrence D'Oliveiro :
>> In message , Brendan
>> Miller wrote:
>>
>>> It seems that characters not in the ascii subset of UTF-8 are
>>> discarded by c_char_p during the conversion ...
>>
>> Not a chance.
>>
>>> ... or at least they don't print out when I go to p
"Brendan Miller" wrote in message
news:aanlkti=2f3l++398st-16mpes8wzfblbu+qa8ztpa...@mail.gmail.com...
2010/9/29 Lawrence D'Oliveiro :
In message ,
Brendan
Miller wrote:
It seems that characters not in the ascii subset of UTF-8 are
discarded by c_char_p during the conversion ...
Not a ch
On 29/09/2010 19:33, Brendan Miller wrote:
> 2010/9/29 Lawrence D'Oliveiro:
>> In message, Brendan
>> Miller wrote:
>>
>>> It seems that characters not in the ascii subset of UTF-8 are
>>> discarded by c_char_p during the conversion ...
>>
>> Not a chance.
>>
>>> ... or at least they don't print ou
2010/9/29 Lawrence D'Oliveiro :
> In message , Brendan
> Miller wrote:
>
>> It seems that characters not in the ascii subset of UTF-8 are
>> discarded by c_char_p during the conversion ...
>
> Not a chance.
>
>> ... or at least they don't print out when I go to print the string.
>
> So it seems the
In message , Brendan
Miller wrote:
> It seems that characters not in the ascii subset of UTF-8 are
> discarded by c_char_p during the conversion ...
Not a chance.
> ... or at least they don't print out when I go to print the string.
So it seems there’s a problem on the printing side. What happ
On 28/09/2010 23:54, Brendan Miller wrote:
I'm using python 2.5.
Currently I have some python bindings written in ctypes. On the C
side, my strings are in utf-8. On the python side I use
ctypes.c_char_p to convert my strings to python strings. However, this
seems to break for non-ascii character
hello Uli,
thanks, I think you hit the nail on it's head,
PyScripter indeed changes default encoding
but ..
On Wed, Sep 22, 2010 at 9:16 AM, Ulrich Eckhardt wrote:
> Stef Mientki wrote:
> > When running this python application from the command line ( or launched
> > from another Python program),
Stef Mientki wrote:
> When running this python application from the command line ( or launched
> from another Python program), the wrong character encoding (probably
> windows-1252) is used.
Rule #1: If you know the correct encoding, set it yourself. This
particularly applies to files you open you
Kent Johnson wrote:
On Oct 8, 5:55 pm, gigs <[EMAIL PROTECTED]> wrote:
Benjamin wrote:
On Oct 8, 12:49 pm, Bruno <[EMAIL PROTECTED]> wrote:
Hi!
I have big .txt file which i want to read, process and write to another .txt
file.
I have done script for that, but im having problem with croatian c
On Oct 8, 5:55 pm, gigs <[EMAIL PROTECTED]> wrote:
> Benjamin wrote:
> > On Oct 8, 12:49 pm, Bruno <[EMAIL PROTECTED]> wrote:
> >> Hi!
>
> >> I have big .txt file which i want to read, process and write to another
> >> .txt file.
> >> I have done script for that, but im having problem with croatia
Hi,
What is the encoding of the file1 you're reading from? I just ran
tests on my machine (OS X)
with both python2.5 and 2.6 and was able to read from a file containing:
"život je lep"
The file is UTF-8 encoded.
>>> data = open("test.txt").read()
>>> data
'\xc5\xbeivot je lep.'
>>> f = open("tes
Benjamin wrote:
On Oct 8, 12:49 pm, Bruno <[EMAIL PROTECTED]> wrote:
Hi!
I have big .txt file which i want to read, process and write to another .txt
file.
I have done script for that, but im having problem with croatian characters
(Š,Đ,Ž,Č,Ć).
Can you show us what you have so far?
How can
On Oct 8, 12:49 pm, Bruno <[EMAIL PROTECTED]> wrote:
> Hi!
>
> I have big .txt file which i want to read, process and write to another .txt
> file.
> I have done script for that, but im having problem with croatian characters
> (Š,Đ,Ž,Č,Ć).
Can you show us what you have so far?
> How can I read/
> $ cat utf8_from_stdin.py
> import sys
> data = sys.stdin.read()
> print "length of data =", len(data)
sys.stdin is a byte stream in Python 2, not a character stream.
To make it a character stream, do
sys.stdin = codecs.getreader("utf-8")(sys.stdin)
HTH,
Martin
--
http:/
Chris wrote:
> On May 28, 11:08 am, [EMAIL PROTECTED] wrote:
>> Say I have a file, utf8_input, that contains a single character, é,
>> coded as UTF-8:
>>
>> $ hexdump -C utf8_input
>> c3 a9
>> 0002
[...]
> weird thing is 'c3 a9' is é on my side... and copy/pasting the é
> gives me 'e
> Shouldn't you do data = data.decode('utf8') ?
Yes, that's it! Thanks.
-- dave
--
http://mail.python.org/mailman/listinfo/python-list
On May 28, 11:08 am, [EMAIL PROTECTED] wrote:
> Hi,
>
> I have problems getting my Python code to work with UTF-8 encoding
> when reading from stdin / writing to stdout.
>
> Say I have a file, utf8_input, that contains a single character, é,
> coded as UTF-8:
>
> $ hexdump -C utf8_input
>
[EMAIL PROTECTED] writes:
> Hi,
>
> I have problems getting my Python code to work with UTF-8 encoding
> when reading from stdin / writing to stdout.
>
> Say I have a file, utf8_input, that contains a single character, é,
> coded as UTF-8:
>
> $ hexdump -C utf8_input
> c3 a9
Thanks, Sion, that makes sense!
Would it be correct to assume that the encoding of strings retrieved
by FieldStorage() would be the same as the encoding of the submitted
web form (in my case utf-8)?
Funny but I have the same form implemented in PSP (Python Server
Pages), running under Apache with
coldpizza <[EMAIL PROTECTED]> wrote:
>I am using this 'word' variable like this:
>
>print u'' % (word)
>
>and apparently this causes exceptions with non-ASCII strings.
>
>I've also tried this:
>print u'' %
>(word.encode('utf8'))
>but I still get the same UnicodeDecodeError..
Your 'word' i
Peter Otten <[EMAIL PROTECTED]> writes:
[...]
>> Forgive me if this is a stupid question, but: What purpose does
>> function f serve?
>
> Like the OP's get_inventary_number() it takes a unicode string and
> returns a tuple of unicode strings. I'ts pointless otherwise. I hoped I
> had stripped down
John J. Lee wrote:
> Peter Otten <[EMAIL PROTECTED]> writes:
> [...]
>> # -*- coding: utf8 -*-
>> r"""
> f("äöü".decode("utf8"))
>> (u'\xe4\xf6\xfc',)
>> """
>> def f(s):
>> return (s,)
>
> Forgive me if this is a stupid question, but: What purpose does
> function f serve?
Like the OP's
J. Cliff Dyer wrote:
> John J. Lee wrote:
>
>> Peter Otten <[EMAIL PROTECTED]> writes:
>> [...]
>>
>>
>>> def f(s):
>>> return (s,)
>>>
>>>
>> Forgive me if this is a stupid question, but: What purpose does
>> function f serve?
>>
>>
>> John
>>
>>
>
> Well, it has
John J. Lee wrote:
> Peter Otten <[EMAIL PROTECTED]> writes:
> [...]
>
>> def f(s):
>> return (s,)
>>
>
> Forgive me if this is a stupid question, but: What purpose does
> function f serve?
>
>
> John
>
Well, it has nothing to do with the unicode bit that came before it. It
just ta
Peter Otten <[EMAIL PROTECTED]> writes:
[...]
> # -*- coding: utf8 -*-
> r"""
f("äöü".decode("utf8"))
> (u'\xe4\xf6\xfc',)
> """
> def f(s):
> return (s,)
Forgive me if this is a stupid question, but: What purpose does
function f serve?
John
--
http://mail.python.org/mailman/listinfo/p
Bzyczek wrote:
> So my question is: Is it possible to run doctests with UTF-8
> characters? And if your answer will be YES, tell me please how...
Use raw strings in combination with explicit decoding and a little
try-and-error. E. g. this little gem passes ;)
# -*- coding: utf8 -*-
r"""
>>> f("ä
Yes, it does solve the problem.
Compile python with ncursesw library.
Btw Ubuntu 7 has it "out of the box".
> Hi All,
>
> Recently I ran into a problem with UTF-8 surrport when using curses
> library in python 2.5 in Fedora 7. I found out that the program using
> curses cannot print out unicode
On Sat, 10 Mar 2007 15:00:04 +0100, Olivier Verdier <[EMAIL PROTECTED]>
wrote:
[snip]
> The default encoding i wish to set is UTF-8 since it encodes unicode and
> is nowadays the standard encoding.
I can't agree with that: there are still many tools completely ignoring
the encoding problem,
On Mar 11, 1:00 am, Olivier Verdier <[EMAIL PROTECTED]> wrote:
> First off: i thoroughly enjoy python. I use it for scientific
> computing with scipy, numpy and matplotlib and it's an amazingly
> efficient and elegant language.
>
> About this mailing list: it is very hard to search. I can't find an
Laurent Pointal wrote:
> You should prefer to put
> # -*- coding: utf-8 -*-
> at the begining of your sources files. With that you are ok with all Python
> installations, whatever be the defautl encoding.
> Hope this will become mandatory in a future Python version.
The default encoding
Michael B. Trausch wrote:
> I am having a slight problem with UTF-8 output with Python. I have the
> following program:
>
> x = 0
>
> while x < 0x4000:
> print u"This is Unicode code point %d (0x%x): %s" % (x, x,
> unichr(x))
> x += 1
>
> This program works perfectly when run directly:
Olivier Verdier wrote:
> My question is the following: how to set a default encoding in
> python? I read an old thread about that and it didn't seem possible
> by then.
You *can* put a sys.setdefaultencoding("utf-8") in your sitecustomize.py
(see Python libs/site-packages/). Note that this functi
In <[EMAIL PROTECTED]>, Michael B.
Trausch wrote:
> However, when I attempt to redirect the output to a file:
>
> [EMAIL PROTECTED]:~/tmp$ python test.py >f
> Traceback (most recent call last):
> File "test.py", line 6, in
> print u"This is Unicode code point %d (0x%x): %s" % (x, x,
> unic
On 2006-10-19, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
> In <[EMAIL PROTECTED]>, Neil Cerutti wrote:
>>> Note that 'K\xc3\xb6ni'.decode('utf-8') returns a Unicode
>>> object. With print this is implicitly converted to string. The
>>> char set used depends on your console
>>
>> No, the
On 2006-10-19, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
> In <[EMAIL PROTECTED]>, Neil Cerutti wrote:
>
>>> Note that 'K\xc3\xb6ni'.decode('utf-8') returns a Unicode
>>> object. With print this is implicitly converted to string. The
>>> char set used depends on your console
>>
>> No, th
In <[EMAIL PROTECTED]>, Neil Cerutti wrote:
>> Note that 'K\xc3\xb6ni'.decode('utf-8') returns a Unicode
>> object. With print this is implicitly converted to string. The
>> char set used depends on your console
>
> No, the setting of the console encoding (sys.stdout.encoding) is
> ignored.
Nope
On 2006-10-19, Michael Ströder <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
>>
>> print 'K\xc3\xb6ni'.decode('utf-8')
>>
>> and this line raised a UnicodeDecode exception.
>
> Works for me.
>
> Note that 'K\xc3\xb6ni'.decode('utf-8') returns a Unicode
> object. With print this is implici
Michael Ströder wrote:
> [EMAIL PROTECTED] wrote:
> >
> > print 'K\xc3\xb6ni'.decode('utf-8')
> >
> > and this line raised a UnicodeDecode exception.
>
> Works for me.
>
> Note that 'K\xc3\xb6ni'.decode('utf-8') returns a Unicode object. With
> print this is implicitly converted to string. The char
[EMAIL PROTECTED] wrote:
>
> print 'K\xc3\xb6ni'.decode('utf-8')
>
> and this line raised a UnicodeDecode exception.
Works for me.
Note that 'K\xc3\xb6ni'.decode('utf-8') returns a Unicode object. With
print this is implicitly converted to string. The char set used depends
on your console
Chec
Duncan Booth wrote:
> [EMAIL PROTECTED] wrote:
>
> > 'K\xc3\xb6ni'.decode('utf-8') # 'K\xc3\xb6ni' should be 'König',
> > contains a german 'umlaut'
> >
> > but failed since python assumes every string to decode to be ASCII?
>
> No, Python would assume the string to be utf-8 encoded in this cas
> >
> > 'K\xc3\xb6ni'.decode('utf-8') # 'K\xc3\xb6ni' should be 'König',
>
> "Köni", to be precise.
Äh, yes.
;o)
> > contains a german 'umlaut'
> >
> > but failed since python assumes every string to decode to be ASCII?
>
> that should work, and it sure works for me:
>
> >>> s = 'K\xc3\xb6ni
[EMAIL PROTECTED] wrote:
> 'K\xc3\xb6ni'.decode('utf-8') # 'K\xc3\xb6ni' should be 'König',
> contains a german 'umlaut'
>
> but failed since python assumes every string to decode to be ASCII?
No, Python would assume the string to be utf-8 encoded in this case:
>>> 'K\xc3\xb6ni'.decode('utf
[EMAIL PROTECTED] wrote:
> I'm struggling with the conversion of a UTF-8 string to latin-1. As far
> as I know the way to go is to decode the UTF-8 string to unicode and
> then encode it back again to latin-1?
>
> So I tried:
>
> 'K\xc3\xb6ni'.decode('utf-8') # 'K\xc3\xb6ni' should be 'Kön
[EMAIL PROTECTED] wrote:
>
> I have tried to test RE and UTF-8 in Python generally and the results
> are even more confusing (done with locale cs_CZ.UTF-8 in konsole):
>
>>>locale.getpreferredencoding()
>
> 'UTF-8'
>
print re.sub("(\w*)","X","[Chelcický]",re.L)
You first have to turn the r
Working on extension of genericwiki.py plugin for PyBlosxom and I have
problems with UTF-8 and RE. When I have this wiki line, it does break
URL too early:
[http://en.wikipedia.org/wiki/Petr_Chelcický Petr Chelcický's]
work(s) into English.
and creates
[http://en.wikipedia.org/wiki/Petr_Chel";>h
"Serge Orlov" <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>...
> Paul Boddie wrote:
> > Anyone who has needed to expose filesystems
> > created by Linux distributions before the UTF-8 "big push" to later
> > distributions can attest to the fact that the "see no evil" brass
> > monke
Fuzzyman wrote:
ust = 'æøå'.decode('utf-8')
Which is now deprecated isn't it ? (including encoded string literals
in source without declaring an encoiding).
Not having an encoding declaration while having non-ASCII characters
in source code is deprecated.
Having non-ASCII characters in string liter
Paul Boddie wrote:
> One side-effect of the "big push" to UTF-8 amongst the Linux
> distribution vendors/maintainers is the evasion of issues such as
> filesystem encodings and "real" Unicode at the system level. In
> Python, when you have a Unicode object, you are dealing with
> idealised
> sequen
Mike Dee <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>...
> A very very basic UTF-8 question that's driving me nuts:
>
> If I have this in the beginning of my Python script in Linux:
>
> #!/usr/bin/env python
> # -*- coding: UTF-8 -*-
>
> should I - or should I not - be able to u
Max M wrote:
> Fuzzyman wrote:
> > Mike Dee wrote:
>
> >>#!/usr/bin/env python
> >># -*- coding: UTF-8 -*-
>
> > This will mean string literals in your source code will be encoded
as
> > UTF8 - if you handle them with normal string operations you might
get
> > funny results.
>
> It means that you
1 - 100 of 107 matches
Mail list logo