Hi, I am parsing html documents using the html parser from libxml2, and if the encoding is included in the document it works perfectly but if it is not, I think it does not work well (probably because I am doing something wrong).
As it is said in http://xmlsoft.org/encoding.html the parser should detect the encoding. So I tested it putting an utf-8 word in a file and it does not detect it (it generates a wrong string). Example: reducción --> reducción. I just use the parser as a SAX parser because I do not need a tree, so to parse the file I use the htmlParseChunk() function and I create the context with htmlCreatePushParser(). Is it posible that the encoding detection does not work with htmlParseChunk? If it is so, what method should I use? Thanks, Cesar -- http://mail.python.org/mailman/listinfo/python-list