Guys, Was just running usfm2osis.pl across some files that my Aunt and Uncle have given me to convert for the language they're working with through Wycliffe. It ran great, saw no problems with it. When I tried to run title_cleanup.pl across the output it revealed a minor issue... the language they have used appears to use the "French style" of quotation mark, but it is marked up in the SFM text as "<<" and ">>". A pair of ASCII angle characters. This causes title_cleanup.pl, which is expecting good XML, to puke on parsing the file. Of course, it would also cause osis2mod to puke when I get to that stage.
Obviously this is an encoding issue in the source file, but I thought I should mention that this is also a bug/shortcoming of usfm2osis.pl. If it is supposed to be outputting well-formed XML then it should encode the plain text to escape such characters with their proper XML entity representations. Is there anyone who wants to look into that, or do I need to roll up my Perl sleeves and get dirty? --Greg _______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
