Re: write xml to txt encoding

Stefan Behnel Thu, 29 Jul 2010 06:06:20 -0700

William Johnston, 29.07.2010 14:12:

I have a Python app that parses XML files and then writes to text files.


XML or HTML?

However, the output text file is "sometimes" encoded in some Asian language.

Here is my code:


encoding = "iso-8859-1"

clean_sent = nltk.clean_html(sent.text)

clean_sent = clean_sent.encode(encoding, "ignore");


I also tried "UTF-8" encoding, but received the same results.


What result?

Maybe the NLTK cannot determine the encoding of the HTML file (because thefile is broken and/or doesn't correctly specify its own encoding) and thusfails to decode it?


Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Re: write xml to txt encoding

Reply via email to