Oliver Andrich wrote:
> 2005/6/21, Konstantin Veretennicov <[EMAIL PROTECTED]>:
>
>>It does, as long as headline and caption *can* actually be encoded as
>>macroman. After you decode headline from utf-8 it will be unicode and
>>not all unicode characters can be mapped to macroman:
>>
>>
>u'\u0
2005/6/21, Konstantin Veretennicov <[EMAIL PROTECTED]>:
> It does, as long as headline and caption *can* actually be encoded as
> macroman. After you decode headline from utf-8 it will be unicode and
> not all unicode characters can be mapped to macroman:
>
> >>> u'\u0160'.encode('utf8')
> '\xc5\x
On 6/20/05, Oliver Andrich <[EMAIL PROTECTED]> wrote:
> Does the following code write headline and caption in
> MacRoman encoding to the disk?
>
> f = codecs.open(outfilename, "w", "macroman")
> f.write(headline)
It does, as long as headline and caption *can* actually be encoded as
macrom
Oliver Andrich wrote:
> Well, I narrowed my problem down to writing a macroman or cp850 file
> using the codecs module. The rest was basically a misunderstanding
> about codecs module and the wrong assumption, that my input data is
> iso-latin-1 encode. It is UTF-8 encoded. So, curently I am at the
Well, I narrowed my problem down to writing a macroman or cp850 file
using the codecs module. The rest was basically a misunderstanding
about codecs module and the wrong assumption, that my input data is
iso-latin-1 encode. It is UTF-8 encoded. So, curently I am at the
point where I have my data re
> I know this isn't your question, but why write:
>
> > data = apply(string.replace, [data, html, char])
>
> when you could write
>
> data = data.replace(html, char)
>
> ??
Cause I guess, that I am already blind. Thanks.
Oliver
--
Oliver Andrich <[EMAIL PROTECTED]> --- http://fith
Oliver Andrich wrote:
> def remove_html_entities(data):
> for html, char in html2text:
> data = apply(string.replace, [data, html, char])
> return data
I know this isn't your question, but why write:
> data = apply(string.replace, [data, html, char])
when you could write
data
Hi everybody,
I have to write a little skript, that reads some nasty xml formated
files. "Nasty xml formated" means, we have a xml like syntax, no dtd,
use html entities without declaration and so on. A task as I like it.
My task looks like that...
1. read the data from the file.
2. get rid of th