I am using lxml iterparse and running into a very obscure error. When I run iterparse on a file, it will occasionally return an element that has a element.text == None when the element clearly has text in it.
I copy and pasted the problem xml into a python string, used StringIO to create a file-like object out of it, and ran a test using iterparse with expected output, and it ran perfectly fine. So it only happens when I try to run iterparse on the actual file. So then I tried opening the file, reading the data, turning that data into a file-like object using StringIO, then running iterparse on it, and the same problem (element.text == None) occurred. I even tried this: f = codecs.open(abbyy_filename, 'r', encoding='utf-8') file_data = f.read() file_like_object = StringIO.StringIO(file_data) for event, element in iterparse(file_like_object, events=("start", "end")): And I got this Traceback: Traceback (most recent call last): File "abbyyParser/parseAbbyy.py", line 391, in <module> extension=options.extension, File "abbyyParser/parseAbbyy.py", line 103, in __init__ self.generate_output_files() File "abbyyParser/parseAbbyy.py", line 164, in generate_output_files AbbyyDocParse(abby_filename, self.extension, self.output_types) File "abbyyParser/parseAbbyy.py", line 239, in __init__ self.parse_doc(abbyy_filename) File "abbyyParser/parseAbbyy.py", line 281, in parse_doc for event, element in iterparse(file_like_object, events=("start", "end")): File "iterparse.pxi", line 484, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:86333) TypeError: reading file objects must return plain strings If I do this: file_data = f.read().encode("utf-8") iterparse will run on it, but I still get elements.text with a value of None when I should not. My XML file does have diacritics in it, but I've put the proper encoding at the head of the XML file (<?xml version="1.0" encoding="UTF-8"?>). I've also tried using elementree's iterparse, and I get even more of the same problem with the same files. Any idea what the problem might be? -- http://mail.python.org/mailman/listinfo/python-list