Re: Replacement in unicodestrings?

Martin v. Löwis Sat, 04 Oct 2008 22:50:33 -0700

>         s_str=repr(s.encode('UTF-8'))

It would be easier to encode this in cp1252 here, as this is apparently
the encoding that you want to use in the RTF file, too. You could then
loop over the string, replacing all bytes >= 128 with \\'%.2x


As yet another alternative, you could create a Unicode error handler
(call it 'rtf'), and then do

          return s.encode('ascii', errors='rtf')

>         replDic={'\xc3\xa0':"\\'e0",'\xc3\xa4':"\\'e4",'\xc3\xa1':"\
> \'e1",
>                 '\xc3\xa8':"\\'e8",'\xc3\xab':"\\'eb",'\xc3\xa9':"\
> \'e9",
>                 '\xc3\xb2':"\\'f2",'\xc3\xb6':"\\'f6",'\xc3\xb3':"\
> \'f3",
>                 '\xe2\x82\xac':"\\'80"}
>         for k in replDic.keys():
>             if repr(k) in s_str:
>                 s_str=s_str.replace(repr(k),replDic[k])
>         return s_str
> 
> However interactive:
> 
>>>> '\xc3\xab' in 'Arj\xc3\xabn'
> True
> 
> I just don't get it, what's the difference?

It's the repr():

py> '\xc3\xab' in 'Arj\xc3\xabn'
True
py> repr('\xc3\xab') in repr('Arj\xc3\xabn')
False
py> repr('\xc3\xab')
"'\\xc3\\xab'"
py> repr('Arj\xc3\xabn')
"'Arj\\xc3\\xabn'"

repr('\xc3\xab') starts with an apostrophe, which doesn't
appear before the \\xc3 in repr('Arj\xc3\xabn').

HTH,
Martin
--
http://mail.python.org/mailman/listinfo/python-list

Re: Replacement in unicodestrings?

Reply via email to