Re: Python and encodings drives me crazy

2005-06-20 Thread John Machin
Oliver Andrich wrote: > 2005/6/21, Konstantin Veretennicov <[EMAIL PROTECTED]>: > >>It does, as long as headline and caption *can* actually be encoded as >>macroman. After you decode headline from utf-8 it will be unicode and >>not all unicode characters can be mapped to macroman: >> >> >u'\u0

Re: Python and encodings drives me crazy

2005-06-20 Thread Oliver Andrich
2005/6/21, Konstantin Veretennicov <[EMAIL PROTECTED]>: > It does, as long as headline and caption *can* actually be encoded as > macroman. After you decode headline from utf-8 it will be unicode and > not all unicode characters can be mapped to macroman: > > >>> u'\u0160'.encode('utf8') > '\xc5\x

Re: Python and encodings drives me crazy

2005-06-20 Thread Konstantin Veretennicov
On 6/20/05, Oliver Andrich <[EMAIL PROTECTED]> wrote: > Does the following code write headline and caption in > MacRoman encoding to the disk? > > f = codecs.open(outfilename, "w", "macroman") > f.write(headline) It does, as long as headline and caption *can* actually be encoded as macrom

Re: Python and encodings drives me crazy

2005-06-20 Thread Diez B. Roggisch
Oliver Andrich wrote: > Well, I narrowed my problem down to writing a macroman or cp850 file > using the codecs module. The rest was basically a misunderstanding > about codecs module and the wrong assumption, that my input data is > iso-latin-1 encode. It is UTF-8 encoded. So, curently I am at the

Re: Python and encodings drives me crazy

2005-06-20 Thread Oliver Andrich
Well, I narrowed my problem down to writing a macroman or cp850 file using the codecs module. The rest was basically a misunderstanding about codecs module and the wrong assumption, that my input data is iso-latin-1 encode. It is UTF-8 encoded. So, curently I am at the point where I have my data re

Re: Python and encodings drives me crazy

2005-06-20 Thread Oliver Andrich
> I know this isn't your question, but why write: > > > data = apply(string.replace, [data, html, char]) > > when you could write > > data = data.replace(html, char) > > ?? Cause I guess, that I am already blind. Thanks. Oliver -- Oliver Andrich <[EMAIL PROTECTED]> --- http://fith

Re: Python and encodings drives me crazy

2005-06-20 Thread Steven Bethard
Oliver Andrich wrote: > def remove_html_entities(data): > for html, char in html2text: > data = apply(string.replace, [data, html, char]) > return data I know this isn't your question, but why write: > data = apply(string.replace, [data, html, char]) when you could write data

Python and encodings drives me crazy

2005-06-20 Thread Oliver Andrich
Hi everybody, I have to write a little skript, that reads some nasty xml formated files. "Nasty xml formated" means, we have a xml like syntax, no dtd, use html entities without declaration and so on. A task as I like it. My task looks like that... 1. read the data from the file. 2. get rid of th