On May 3, 2005, at 4:35 AM, Bartosch Warzecha wrote:
Hello,
I´m building a search engine for HTML-Dokuments, and I´ve got a HTML-parsing
problem.
This documents are in german. In this documents are different special
characters, and different ways of writing this special characters, like "ö",
"ö" and "ö". Do somebody know a parsing engine that has no problems
with all this different ways to write this special characters?
What HTML parser are you using? Those entity references should not be seen by your code once resolved by a parser. Try NekoHTML.
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]