Hi, I was looking at using the elementtree parser in python to pull out a more or less plain text version of a module quickly for search indexing. Incidentally, it is quite a bit faster than calling striptext - on the esv and kjv, it took about 80% of the time striptext takes
I ran into problems trying it on the NETfree however - there seems to be trailing osis tags at the end of books: For example, from Genesis 50:26 'So Joseph died at the age of 110.<note osisRef="Gen.50.26" n="33"></note> After they embalmed him, his body<note osisRef="Gen.50.26" n="34"></note> was placed in a coffin in Egypt.<milestone type="line" /><milestone type="line" /> </div> *<chapter eID="Gen.50"/></div>*' The last two tags in bold shouldn't be there - they are unmatched anywhere, and removing them allows parsing to work. The third last tag, which is a div, matches with a tag in the heading of the chapter - is the raw entry of a verse meant to be able to be taken as valid xml by itself? If so, this is also invalid. God Bless, Ben ------------------------------------------------------------------------------------------- The Lord is not slow to fulfill his promise as some count slowness, but is patient toward you, not wishing that any should perish, but that all should reach repentance. 2 Peter 3:9 (ESV)
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page