Re: Treating a unicode string as latin-1

2008-01-03 Thread Fredrik Lundh
Diez B. Roggisch wrote: >> I would think it more likely that he wants to end up with u'Bob\u2019s >> Breakfast' rather than u'Bob\x92s Breakfast' although u'Dog\u2019s dinner' >> seems a probable consequence. > > If that's the case, he should read the file as string, de- and encode it > (proba

Re: Treating a unicode string as latin-1

2008-01-03 Thread Diez B. Roggisch
Duncan Booth schrieb: > Fredrik Lundh <[EMAIL PROTECTED]> wrote: > >> ET has already decoded the CP1252 data for you. If you want UTF-8, all >> you need to do is to encode it: >> > u'Bob\x92s Breakfast'.encode('utf8') >> 'Bob\xc2\x92s Breakfast' >> > I think he is claiming that the encoding

Re: Treating a unicode string as latin-1

2008-01-03 Thread Duncan Booth
Fredrik Lundh <[EMAIL PROTECTED]> wrote: > ET has already decoded the CP1252 data for you. If you want UTF-8, all > you need to do is to encode it: > > >>> u'Bob\x92s Breakfast'.encode('utf8') > 'Bob\xc2\x92s Breakfast' > I think he is claiming that the encoding information in the file is inc

Re: Treating a unicode string as latin-1

2008-01-03 Thread Fredrik Lundh
Simon Willison wrote: > But ElementTree gives me back a unicode string, so I get the following > error: > print u'Bob\x92s Breakfast'.decode('cp1252').encode('utf8') > Traceback (most recent call last): > File "", line 1, in > File "/Library/Frameworks/Python.framework/Versions/2.5/lib/

Re: Treating a unicode string as latin-1

2008-01-03 Thread Jeroen Ruigrok van der Werven
-On [20080103 14:36], Simon Willison ([EMAIL PROTECTED]) wrote: >How can I tell Python "I know this says it's a unicode string, but I >need you to treat it like a bytestring"? Although it does not address the exact question it does raise the issue how you are using ElementTree. When I use the foll

Re: Treating a unicode string as latin-1

2008-01-03 Thread Diez B. Roggisch
Simon Willison wrote: > Hello, > > I'm using ElementTree to parse an XML file which includes some data > encoded as cp1252, for example: > > Bob\x92s Breakfast > > If this was a regular bytestring, I would convert it to utf8 using the > following: > print 'Bob\x92s Breakfast'.decode('cp12

Re: Treating a unicode string as latin-1

2008-01-03 Thread Duncan Booth
Simon Willison <[EMAIL PROTECTED]> wrote: > How can I tell Python "I know this says it's a unicode string, but I > need you to treat it like a bytestring"? Can you not just fix your xml file so that it uses the same encoding as it claims to use? If the xml says it contains utf8 encoded data then

Re: Treating a unicode string as latin-1

2008-01-03 Thread Paul Hankin
On Jan 3, 1:31 pm, Simon Willison <[EMAIL PROTECTED]> wrote: > How can I tell Python "I know this says it's a unicode string, but I > need you to treat it like a bytestring"? u'Bob\x92s Breakfast'.encode('latin-1') -- Paul Hankin -- http://mail.python.org/mailman/listinfo/python-list

Treating a unicode string as latin-1

2008-01-03 Thread Simon Willison
Hello, I'm using ElementTree to parse an XML file which includes some data encoded as cp1252, for example: Bob\x92s Breakfast If this was a regular bytestring, I would convert it to utf8 using the following: >>> print 'Bob\x92s Breakfast'.decode('cp1252').encode('utf8') Bob's Breakfast But Ele