> def characters(self, chars): > > newchars=[] > newchars.append(chars.encode('ISO-8859-1'))
The SAX parser calls characters() multiple times for the same text block. For example, in the input <foo>123</foo>, characters() could be called once: handler.characters("123") or twice: handler.characters("12") handler.characters("3") or: handler.characters("1") handler.cahraceters("23") or three times: handler.characters("1") handler.characters("2") handler.characters("3") If you want the whole text block, then you need to do something like this: in __init__: self.newchars = [] in startElement: self.newchars = [] in characters: self.newchars.append(chars) in endElement: if len(self.newchars) > 0: combined = "".join(self.newchars).encode('ISO-8859-1') print "Strean read is '%s'" % combined I recommend using ElementTree instead. - Brian -- http://mail.python.org/mailman/listinfo/python-list